I would like to transform the header of many csv files automatically using awk and bash scripts.
Currently, I am using the following code-block, which is working fine:
for FILE in *.csv;
do
awk 'FNR>1{print $0}' $FILE | awk 'NR == 1{print "aaa,bbb,ccc,ddd,eee,fff,ggg,hhh,iii,jjj,kkk,lll,mmm,nnn,...,zzz"}1' > OUT_$FILE
done
What these commands are doing is that it first removes the old header from $FILE and then append a new comma-separated (very long) header aaa,bbb,ccc,ddd,eee,fff,ggg,hhh,iii,jjj,kkk,lll,mmm,nnn,...,zzz to $FILE and then save the output to OUT_$FILE.
Currently, I am copying the part aaa,bbb,ccc,ddd,eee,fff,ggg,hhh,iii,jjj,kkk,lll,mmm,nnn,...,zzz manually from another csv file and pasting into this field to replace the header from $FILE. While it is working, it is getting tedious, repetitive and time-consuming for many csv files.
Instead of copying the header manually, I am trying to extract the header from another csv file new_headers.csv and save to a new variable $NEWHEAD.
NEWHEAD=$(awk 'NR==1{print $0}' new_headers.csv)
While I can view the extracted header $NEWHEAD, I am not sure how to merge this command into previous workflow to append the headers from $FILE.
I will certainly appreciate any suggestions to resolve this problem. Thank you :)
CodePudding user response:
With GNU awk for "inplace" editing:
awk -i inplace 'NR==1{hdr=$0} {print (FNR>1 ? $0 : hdr)}' new_headers.csv *.csv
CodePudding user response:
You can read the header inside awk script, like this
awk '
BEGIN{
do {
h = (h) ? (h "\n" line) : line
} while ((getline line <"new_header.csv") > 0)
}
...
'
and h contains the new header.
CodePudding user response:
newheader=$(head -n 1 new_headers.csv)
for file in *.csv
do
{
printf '%s\n' "$newheader"
tail -n 2 "$file"
} > OUT_"$file"
done
notes:
head -n 1outputs the first line of a filetail -n 2outputs all the lines but the first{ }is to group commands, so that you redirect their output as a whole
CodePudding user response:
$ awk 'NR==FNR {header=$0; next}
{print (FNR==1?header:$0) > (FILENAME".updated")}' new_header.csv other files...
capture the first record from the header file and replace the first lines from the rest of the files, updated files will have suffix ".updated".
caveat emptor not tested.
