How to update a multichracter delimited field in a file in bash?-CodePudding

I'm trying to match a certain field and update its data from a file delimited with multiple characters. I'm using this to create an imitation of SQL's UPDATE. This is part of a bigger project to create a mini DBMS with bash.

What I tried:

sed "s/\^\_\^/:/g" $file_path | cut -d: -f1 | grep -nw SAR | sed "s/\^\_\^/:/g" | cut -d: -f2 | sed -i "s/$match/$update/g"

My issue is I am unable to use sed -i to update only these specific columns found as you can't pipe into it.

The delimiter being used is : ^_^

Example of data file:

'EGP'^_^'Egypt'
'SAR'^_^'Europe'
'SAR'^_^'Europe'
'SAR'^_^'Europe'
'SAR'^_^'Europe'
'Europe'^_^'SAR'
'SAR'^_^'Europe'
'MYR'^_^'Malaysia'
'MYR'^_^'Malasia'

my $match can be SAR for example and $update would be USD

Expected change to data file

'EGP'^_^'Egypt'
'USD'^_^'Europe'
'USD'^_^'Europe'
'USD'^_^'Europe'
'USD'^_^'Europe'
'Europe'^_^'SAR'
'USD'^_^'Europe'
'MYR'^_^'Malaysia'
'MYR'^_^'Malasia'

If there is a different approach that is better, that is also welcome as I am fairly new to bash scripting.

CodePudding user response：

Better to use awk here:

awk -v s="'SAR'" -v q="'USD'" -F'\\^_\\^' -v OFS='^_^' '$1==s {$1=q} 1' file

'EGP'^_^'Egypt'
'USD'^_^'Europe'
'USD'^_^'Europe'
'USD'^_^'Europe'
'USD'^_^'Europe'
'Europe'^_^'SAR'
'USD'^_^'Europe'
'MYR'^_^'Malaysia'
'MYR'^_^'Malasia'

Explanation:

-v s="'SAR'" sets command line variable s='SAR'
-v q="'USD'" sets command line variable s='USD'
-F '\\^_\\^' sets input field separator as ^_^. We need to escape ^ as that is a special regex meta character and we need to double escape it because we're using it in a string that is later converted to a regexp (field separator).
-v OFS='^_^' sets output field separator to ^_^
$1 == s compares first field to 'SAR'
$1 = q sets $1 to variable 'USD'

CodePudding user response：

Using sed you could use $match at the start of the string ^, and change it with $update:

match="'SAR'"
update="'USD'"
sed "s/^$match\^_\^/$update^_^/" file

Output

'EGP'^_^'Egypt'
'USD'^_^'Europe'
'USD'^_^'Europe'
'USD'^_^'Europe'
'USD'^_^'Europe'
'Europe'^_^'SAR'
'USD'^_^'Europe'
'MYR'^_^'Malaysia'
'MYR'^_^'Malasia'

See a bash demo.

CodePudding user response：

With your shown samples, please try following awk program.

awk -v matchvalue="'SAR'" -v updatedvalue="'USD'" '
match($0,"^"matchvalue"\\^_\\^"){
  print updatedvalue"^_^" substr($0,RSTART RLENGTH)
  next
}
1
'  Input_file

Explanation: Creating 2 variables named matchvalue and updatedvalue with 'SAR' and 'USD' values shown by OP in samples. Then in main program using match function to match if a line starts with 'SAR'^_^ then print new value in its place and then print rest of the line, if line doesn't start from 'SAR'^_^ then simply print it.

CodePudding user response：

Here's what I ended up doing:

escaped_delm=$(echo $curr_delim | sed 's/[^^\\]/[&]/g; s/\^/\\^/g; s/\\/\\\\/g')
awk -F"$escaped_delm" -v a_col_update=$update_field -v pick=$p_tmp_field -v a_del="$match" -v a_ins="$insert" -v OFS="$curr_delim" '$pick==a_del {$a_col_update=a_ins} 1' $t_path > tmp && mv tmp $t_path

The biggest issue I faced is that I am passing my delimiter ^_^ as a variable to awk. My script assumes that I don't know what the variable is and that it is passed to it at the beginning of the script. This causes an issue, just like the first answer suggested the -F option for awk thinks that ^ is a regex argument.

This is where the sed command I used comes in super handy! It sanitizes the delimiter by escaping any regex meta characters. Would like to thank Ed Morton for the great answer to this question where I got the sed command from.

Hope someone else finds this useful!

My awk variables are:

curr_delim="^_^"
a_col_update=<column number to be updated>
pick=<column used to pick the record/row>
a_del=<field to be updated from picked record/row>
a_ins=<new value for the picked field>
OFS="$curr_delim"