Filter lines based on certain string and then print only some attributes greater-CodePudding

I have a big text file with million of log lines.

I would like to filter all the lines which satisfy following criteria

url should be url=/v2/testB
totalTime value should be greater than 500

INFO|id=1|totaltime=5000|httpmethod=POST|url=/v1/testA
INFO|id=2|totaltime=200|httpmethod=POST|url=/v2/testB
INFO|id=3|totaltime=1000|httpmethod=POST|url=/v2/testB
INFO|id=4|totaltime=501|httpmethod=POST|url=/v2/testB

result:-

id=3,totaltime=1000
id=4,totaltime=501

I have tried using multiple awk and then putting if block, I wonder, it can be done quickly? Thanks !

while IFS= read -r line; do
value=`echo $line|grep "url=/v2/testB" | awk -F"totaltime=" '{ print $2}'| awk -F"|" '{ print $1}'`
if (( $value > 500 )); then
    echo $line
fi
done < file.log

CodePudding user response：

You may use this awk:

awk -F '|' -v OFS=, '$NF == "url=/v2/testB" {v=$3; sub(/^totaltime=/, "", v); if (v 0 > 500) print $2, $3}' file

id=3,totaltime=1000
id=4,totaltime=501

To make it more readable:

awk -F '|' -v OFS=, '
$NF == "url=/v2/testB" {
   v = $3
   sub(/^totaltime=/, "", v)
   if (v 0 > 500)
      print $2, $3
}' file

If you have gnu-awk then it can be reduced to:

awk -F '|' -v OFS=, '$NF == "url=/v2/testB" &&
gensub(/^totaltime=/, "", "1", $3) 0 > 500 {print $2, $3}' file

v 0 is shorthand in awk to covert a string value to number.

CodePudding user response：

$ awk -F'|' -v OFS=',' '{split($3,t,/=/)} $5=="url=/v2/testB" && t[2]>500{print $2, $3}' file
id=3,totaltime=1000
id=4,totaltime=501

CodePudding user response：

With your shown samples, please try following awk program.

awk -F'\\||totaltime=' '$NF=="url=/v2/testB" && $4>500{print $2",totaltime="$4}' Input_file

Explanation: Following is the detailed explanation for above code.

Setting field separator by using -F option in awk program.
Setting field separators to | and totaltime= for all the lines of Input_file.
In main program, checking conditions: a- If $NF(last field) is equal to url=/v2/testB AND b- 4th field is greater than 500 then do:
print 2nd field of current line followed by string ,totaltime= followed by 4th field as per required output by OP.

CodePudding user response：

You seem to be in luck:

awk -F'|' 'BEGIN{FS="|"; OFS=","}
           { url = substr($NF,index($NF,"=") 1)
             totaltime = substr($3,index($3,"=") 1)
           }
           (url == "/v1/testB") && (totaltime 0 > 500) { print $2,$3 }
          ' file

CodePudding user response：

All the awk solutions are great, and if that is a solution use them.

If you wanted to fix your Bash effort, you can do:

while IFS='|' read -r id ti; do
    [[ "${ti#*=}" -gt 500 ]] && printf "%s,%s\n" "$id" "$ti"
done < <(grep 'url=/v2/testB$' file | cut -d '|' -f 2,3)

Alternatively, you can eliminate cut and keep all five fields:

while IFS='|' read -r c1 c2 c3 c4 c5; do
    [[ "${c3#*=}" -gt 500 ]] && printf "%s,%s\n" "$c2" "$c5"
done < <(grep 'url=/v2/testB$' file)

Either prints:

id=3,totaltime=1000
id=4,totaltime=501