Home > Blockchain >  Extract text between 2 similar or different strings separately in shell script
Extract text between 2 similar or different strings separately in shell script

Time:01-12

I want to extract text between each ### separately to compare with a different file. Need to extract all CVE numbers for all docker images to compare from previous report. File looks as shown below. This is a snippet and it has more than 100 such lines. Need to do this via Shell Script. Kindly help.

### Vulnerabilities found in docker image alarm-integrator:22.0.0-150
| CVE  | X-ray Severity | Anchore Severity | Trivy Severity | TR   |
| :--- | :------------: | :--------------: | :------------: | :--- |
|[CVE-2020-29361](#221fbde4e2e4f3dd920622768262ee64c52d1e1384da790c4ba997ce4383925e)|||Important|
|[CVE-2021-35515](#898e82a9a616cf44385ca288fc73518c0a6a20c5e0aae74ed8cf4db9e36f25ce)|||High|

### Vulnerabilities found in docker image br-agent:22.0.0-154
| CVE  | X-ray Severity | Anchore Severity | Trivy Severity | TR   |
| :--- | :------------: | :--------------: | :------------: | :--- |
|[CVE-2020-29361](#221fbde4e2e4f3dd920622768262ee64c52d1e1384da790c4ba997ce4383925e)|||Important|
|[CVE-2021-23214](#75eaa96ec256afa7bc6bc3445bab2e7c5a5750678b7cda792e3c690667eacd98)|||Important|

I've tried something like this grep -oP '(?<=\"##\").*?(?=\"##\")' but it doesn't work.

Expected Output:

For alarm-integrator
CVE-2020-29361
CVE-2021-35515

For br-agent
CVE-2020-29361
CVE-2021-23214

CodePudding user response:

With your shown samples, please try following awk code.

awk '
/^##/ && match($0,/docker image[[:space:]] [^:]*/){
  split(substr($0,RSTART,RLENGTH),arr1)
  print "for "arr1[3]
  next
}
match($0,/^\|\[[^]]*/){
  print substr($0,RSTART 2,RLENGTH-2)
}
'  Input_file

Explanation: Adding detailed explanation for above awk code.

awk '                                   ##Starting awk program from here.
/^##/ && match($0,/docker image[[:space:]] [^:]*/){  ##Checking condition if line starts from ## AND using match function to match regex docker image[[:space:]] [^:]* to get needed value.
  split(substr($0,RSTART,RLENGTH),arr1) ##Splitting matched part in above match function into arr1 array with default delimiter of space here.
  print "for "arr1[3]                   ##Printing string for space arr1 3rd element here
  next                                  ##next will skip all further statements from here.
}
match($0,/^\|\[[^]]*/){                 ##using match function to match starting |[ till first occurrence of ] here.
  print substr($0,RSTART 2,RLENGTH-2)   ##printing matched sub string from above regex.
}
'  Input_file                           ##mentioning Input_file name here.

CodePudding user response:

with awk you can do:

awk -v FS=' |[[]|[]]' '/^[#] /{sub(/:.*$/,"");print "For " $NF} /^\|\[/{print  $2} /^$/ {print ""}' file
For alarm-integrator
CVE-2020-29361
CVE-2021-35515

For br-agent
CVE-2020-29361
CVE-2021-23214
  • we config the field separator FS as |[[]|[]]: space or [ character or ] character.
  • first condition-action is for getting For alarm-integrator and For br-agent
  • second condition-action for all CVE numbers
  • and lastly we add the blank line.
  •  Tags:  
  • Related