A folder contains a README.txt and several dicom files named emr_000x.sx (where x are numerical values). In the README.txt are different lines, one of which contains the characters "xyz" and a corresponding emr_000x.sx in the line.
I would like to: read into the .txt, identify which line contains "xyz", and extract the emr_000x.sx from that line only. For reference, the line in the .txt is formatted in this way:
A:emr_000x.sx, B:00001, C:number, D(characters)string_string_number_**xyz**_number_number
I think using grep might be helpful, but am not familiar enough to bash coding myself. Does anyone know how to solve this? Many thanks!
CodePudding user response:
You can use awk to match fields on you csv:
awk -F, '$4 ~ "xyz" {sub(/^A:/, "", $1); print $1}'
CodePudding user response:
I like sed for this sort of thing.
sed -nE '/xyz/{ s/^.*A:([^,] ),.*/\1/; p; }' README.txt
This says, "On lines where you see xyz replace the whole line with the non-commas between A: and a comma, then print the line."
-n is no printing unless I say so. (p means print.)
-E just means to use Extended regexes.
/xyz/{...} means "on lines where you see xyz do the stuff between the curlies."
s/^.*A:([^,] ),.*/\1/ will substitute the matched part (which should be the whole line) with just the part between the parens.
