I have a bunch of text files, all with the same structure, and I need to extract a specific piece in a specific line.
I can easily extract the line with awk:
awk 'NR==23' blast_out.txt
CP046310.1 Lactobacillus jensenii strain FDAARGOS_749 chromosome,... 787 0.0
But I don't want the whole line, rather just the part between the first space on the left (after CP046310.1) and the double space on the right (before 787). The final output should be:
Lactobacillus jensenii strain FDAARGOS_749 chromosome,...
I tried several combination of awk and grep but cannot find the correct one to extract this specific pattern.
CodePudding user response:
1st solution: With your shown samples, please try following awk code. Simple explanation would be, nullifying 1st, 2nd last field and last field, then globally substituting starting and ending space with NULL, then printing the line.
awk '{$1=$NF=$(NF-1)="";gsub(/^ | $/,"")} 1' Input_file
OR to run it on 23rd line change it to:
awk 'FNR==23{$1=$NF=$(NF-1)="";gsub(/^ | $/,"");print;exit}' Input_file
2nd solution: Going through fields and printing values which are required as per need.
awk '{for(i=2;i<(NF-1);i ){printf("%s%s",$i,i==(NF-2)?ORS:OFS)}}' Input_file
OR on 23rd line try following:
awk 'FNR==23{for(i=2;i<(NF-1);i ){printf("%s%s",$i,i==(NF-2)?ORS:OFS)};exit}' Input_file
CodePudding user response:
Using sed you can use this solution:
sed -En '23s/^[^ ] | .*$//gp' file
Lactobacillus jensenii strain FDAARGOS_749 chromosome,...
Or using awk:
awk 'NR == 23 {gsub(/^[^ ] | .*$/, ""); print}' file
CodePudding user response:
If I get what you ask, you want to extract the fields from the second (included) to the second-last (excluded). I would go with:
awk ' FNR==23 {for (i = 2; i < NF - 2; i ) { printf("%s ", $i) }; printf("%s\n", $i); exit }' file_path
An example with the line you posted:
$ echo "CP046310.1 Lactobacillus jensenii strain FDAARGOS_749 chromosome,... 787 0.0" | awk '{for (i = 2; i < NF - 2; i ) { printf("%s ", $i) }; printf("%s\n", $i); exit }'
$ Lactobacillus jensenii strain FDAARGOS_749 chromosome,...
I assume that chromosome,... does not contains spaces and you have only single spaces separating the fields you want to extract. If the second condition is not true, those extra spaces are removed.
