I search for log files having errors using egrep and it outputs a bunch of files. What I want to do is manipulate those strings and present in a different way.
/abcd/efgh/ijkl/logs/fac_unet_abp99507.log.20220708111219.26476752.0
/abcd/efgh/ijkl/logs/fac_oxf_abp3506.log.20220708111219.26476752.0
The output should look like:
ABP99507,UNET
ABP3506,OXF
I tried awk and sed and couldn't figure out a way to do this. I want to be able to make it dynamic and do it via regular expressions.
What I have tried so far is:
egrep -li "^error" /abcd/efgh/ijkl/logs/*202207* | awk '/unet|cirrus|oxf|csp|cmcd|cmcr|nice/ {print}'
egrep -li "^error" /abcd/efgh/ijkl/logs/*202207* | sed -n "s/.*\(cirrus|unet|cmcr|csp|cmcd|oxf|nice\)\(abp[0-9]*[A-ZA-Za-za-z]*\).*/\1,\2/p"
Sed doesn't work as the "|" operator doesn't work because I am not using GNU Awk. Even escaping it doesn't work. Also I can't seem to make use of capture groups.
CodePudding user response:
1st solution: Simplest option would be, using awk's field separator option. With your shown samples please try following awk code.
awk -F'/|\\.|_' '{print toupper($8","$7)}' Input_file
2nd solution: In case you want to try with regular expression in awk then try. Written and tested in GNU awk.
awk 'match($0,/logs\/[^_]*_([^_]*)_([^.]*)\.log/,arr){print toupper(arr[2]","arr[1])}' Input_file
3rd solution: With GNU sed's enabling ERE with -E option please try following code.
sed -E 's/.*logs\/[^_]*_([^_]*)_([^.]*)\.log\..*/\U\2,\U\1/' Input_file
4th solution: Adding a NON-GNU awk solution using match function.
awk '
match($0,/logs\/[^_]*_([^_]*)_([^.]*)\.log/){
val=substr($0,RSTART 5,RLENGTH-5)
sub(/\.log/,"",val)
split(val,arr,"_")
print toupper(arr[3]","arr[2])
}
' Input_file
CodePudding user response:
Also I can't seem to make use of capture groups.
You did not escape | so they are meaning literal |, you need to escape it to mean alternative, as is case with ( and ) (literal vs group delimiter). After doing that and repairing minor issues I get it working: let file.txt content be
/abcd/efgh/ijkl/logs/fac_unet_abp99507.log.20220708111219.26476752.0
/abcd/efgh/ijkl/logs/fac_oxf_abp3506.log.20220708111219.26476752.0
then
sed -e 's/.*\(cirrus\|unet\|cmcr\|csp\|cmcd\|oxf\|nice\)_\(abp[0-9]*[A-ZA-Za-za-z]*\).*/\2,\1/' -e 's/[a-z]/\U&/g' file.txt
gives output
ABP99507,UNET
ABP3506,OXF
Explanation: I introduced following changes: escaped |, added _ between groups, change order of replacement (2nd group is first), dropped /p as it caused doubling output. After doing this I added second action: uppercasing using standard GNU sed way of doing so. As there are now 2 actions, I use -e to register them.
(tested in GNU sed 4.2.2)
