I have a folder consisted of many logs. Each log have a similar format.
This is the log1
Finding intermodel H-bonds
Finding intramodel H-bonds
Constraints relaxed by 0.55 angstroms and 40 degrees
Models used:
1.1 SarsCov2_Y6A_nsp5holo_rep1.pdb
6 H-bonds
H-bonds (donor, acceptor, hydrogen, D..A dist, D-H..A dist):
/? SER 144 OG /d UNL 1 S /? SER 144 HG 3.940 3.529
/? HIS 163 NE2 /d UNL 1 S no hydrogen 3.821 N/A
/? GLN 189 NE2 /d UNL 1 O /? GLN 189 1HE2 3.178 2.453
/d UNL 1 N /? THR 25 OG1 /d UNL 1 HN 2.755 2.270
/d UNL 1 N /? CYS 44 O /d UNL 1 HN 3.277 2.501
/d UNL 1 N /? ARG 188 O /d UNL 1 HN 3.056 2.055
log2
Finding intermodel H-bonds
Finding intramodel H-bonds
Constraints relaxed by 0.55 angstroms and 40 degrees
Models used:
1.1 SarsCov2_06I_nsp5holo_rep1.pdb
4 H-bonds
H-bonds (donor, acceptor, hydrogen, D..A dist, D-H..A dist):
/? THR 26 N /d UNL 1 O /? THR 26 H 3.579 2.754
/? ASN 142 ND2 /d UNL 1 O /? ASN 142 1HD2 3.250 2.324
/d UNL 1 N /? THR 26 O /d UNL 1 H 3.458 2.630
/d UNL 1 N /? HIS 163 NE2 /d UNL 1 HN 3.222 2.456
This is the log 3:
Finding intermodel H-bonds
Finding intramodel H-bonds
Constraints relaxed by 0.55 angstroms and 40 degrees
Models used:
1.1 SarsCov2_X7V_nsp5holo_rep1.pdb
2 H-bonds
H-bonds (donor, acceptor, hydrogen, D..A dist, D-H..A dist):
/? GLN 189 NE2 /d UNL 1 O /? GLN 189 1HE2 3.185 2.258
/d UNL 1 N /? LEU 141 O /d UNL 1 HN 2.868 1.958
I need to fuse all the logs together taking only the strings starting from # H-bonds adding the name of the initial file in the same line:
This is fused log produced by combining log1 -log 3:
log 1: 6 H-bonds
H-bonds (donor, acceptor, hydrogen, D..A dist, D-H..A dist):
/? SER 144 OG /d UNL 1 S /? SER 144 HG 3.940 3.529
/? HIS 163 NE2 /d UNL 1 S no hydrogen 3.821 N/A
/? GLN 189 NE2 /d UNL 1 O /? GLN 189 1HE2 3.178 2.453
/d UNL 1 N /? THR 25 OG1 /d UNL 1 HN 2.755 2.270
/d UNL 1 N /? CYS 44 O /d UNL 1 HN 3.277 2.501
/d UNL 1 N /? ARG 188 O /d UNL 1 HN 3.056 2.055
log 2: 4 H-bonds
H-bonds (donor, acceptor, hydrogen, D..A dist, D-H..A dist):
/? THR 26 N /d UNL 1 O /? THR 26 H 3.579 2.754
/? ASN 142 ND2 /d UNL 1 O /? ASN 142 1HD2 3.250 2.324
/d UNL 1 N /? THR 26 O /d UNL 1 H 3.458 2.630
/d UNL 1 N /? HIS 163 NE2 /d UNL 1 HN 3.222 2.456
log3: 2 H-bonds
H-bonds (donor, acceptor, hydrogen, D..A dist, D-H..A dist):
/? GLN 189 NE2 /d UNL 1 O /? GLN 189 1HE2 3.185 2.258
/d UNL 1 N /? LEU 141 O /d UNL 1 HN 2.868 1.958
I've tried trivial solution with CAT but it does not work correcly since in each log I have different number of lines and TAIL could not recognize it correctly:
for log in ${results}/*_rep"${i}".log; do
log_name=$(basename "$log" .log)
echo "$log_name" >> ${results}/combined.log
cat $log | tail -n 10 >> ${results}/combined.log
done
may I use cat in some specific expressin to recognize lines or alternatively I have to use SED before CAT to delete unused lines from each initial log ??
CodePudding user response:
This awk does the job:
awk '
FNR==1 {p=0}
/^[0-9] [[:space:]] H-bonds$/ && FNR!=NR {printf "\n"}
/^[0-9] [[:space:]] H-bonds$/ {printf "log %d: ", c; p=1}
p==1'
For each given file:
- Stop printing when each new file starts.
- If a line in the file matches a pattern for
4 H-bondsetc, print an empty line break, if it's not the first file. Then print the log number, and set the flagpto begin printing that log file. - Note that instead of testing for the same regex twice, you could omit the first one, and put
if (FNR!=NR) {printf "\n"}inside the block of the second one. That's mainly about readability. - I'm unsure which pattern expression you need for the file names,
*.logis an example. Maybe"${results}"/*_rep*.log?
CodePudding user response:
Using sed
$ for file in log{1..3}; do echo "${file##*/}: $(sed -n '/[0-9] H-bonds/,$p' "$file")"; echo ""; done
log1: 6 H-bonds
H-bonds (donor, acceptor, hydrogen, D..A dist, D-H..A dist):
/? SER 144 OG /d UNL 1 S /? SER 144 HG 3.940 3.529
/? HIS 163 NE2 /d UNL 1 S no hydrogen 3.821 N/A
/? GLN 189 NE2 /d UNL 1 O /? GLN 189 1HE2 3.178 2.453
/d UNL 1 N /? THR 25 OG1 /d UNL 1 HN 2.755 2.270
/d UNL 1 N /? CYS 44 O /d UNL 1 HN 3.277 2.501
/d UNL 1 N /? ARG 188 O /d UNL 1 HN 3.056 2.055
log2: 4 H-bonds
H-bonds (donor, acceptor, hydrogen, D..A dist, D-H..A dist):
/? THR 26 N /d UNL 1 O /? THR 26 H 3.579 2.754
/? ASN 142 ND2 /d UNL 1 O /? ASN 142 1HD2 3.250 2.324
/d UNL 1 N /? THR 26 O /d UNL 1 H 3.458 2.630
/d UNL 1 N /? HIS 163 NE2 /d UNL 1 HN 3.222 2.456
log3: 2 H-bonds
H-bonds (donor, acceptor, hydrogen, D..A dist, D-H..A dist):
/? GLN 189 NE2 /d UNL 1 O /? GLN 189 1HE2 3.185 2.258
/d UNL 1 N /? LEU 141 O /d UNL 1 HN 2.868 1.958
CodePudding user response:
If I'm not misunderstanding the question, a simple grep seems to be what you are after:
grep -EH '^([0-9] \s )?H-bonds($| \()|^/' log*
Or if you need that exact format:
for log in log*; do
sed -n "s/^[0-9]\ \s\ H-bonds$/$log: &/; /^$log: /,\${\$s/\$/\n/;p};" "$log"
done | sed '$d'
But I'm gonna guess the extra row break isn't necessary, and it then becomes simply:
for log in log*; do
sed -n "s/^[0-9]\ \s\ H-bonds$/$log: &/; /^$log: /,\$p" "$log"
done
I'll happily return to edit this reply if extra explanation is wanted.
CodePudding user response:
$ awk 'FNR==1{f=0} /^[0-9] H-bonds/{$0=sep FILENAME": " $0; sep=ORS; f=1} f' log?
log1: 6 H-bonds
H-bonds (donor, acceptor, hydrogen, D..A dist, D-H..A dist):
/? SER 144 OG /d UNL 1 S /? SER 144 HG 3.940 3.529
/? HIS 163 NE2 /d UNL 1 S no hydrogen 3.821 N/A
/? GLN 189 NE2 /d UNL 1 O /? GLN 189 1HE2 3.178 2.453
/d UNL 1 N /? THR 25 OG1 /d UNL 1 HN 2.755 2.270
/d UNL 1 N /? CYS 44 O /d UNL 1 HN 3.277 2.501
/d UNL 1 N /? ARG 188 O /d UNL 1 HN 3.056 2.055
log2: 4 H-bonds
H-bonds (donor, acceptor, hydrogen, D..A dist, D-H..A dist):
/? THR 26 N /d UNL 1 O /? THR 26 H 3.579 2.754
/? ASN 142 ND2 /d UNL 1 O /? ASN 142 1HD2 3.250 2.324
/d UNL 1 N /? THR 26 O /d UNL 1 H 3.458 2.630
/d UNL 1 N /? HIS 163 NE2 /d UNL 1 HN 3.222 2.456
log3: 2 H-bonds
H-bonds (donor, acceptor, hydrogen, D..A dist, D-H..A dist):
/? GLN 189 NE2 /d UNL 1 O /? GLN 189 1HE2 3.185 2.258
/d UNL 1 N /? LEU 141 O /d UNL 1 HN 2.868 1.958
