I'm trying to find out (grep) which of my patterns from file don't appear in log file.
I have file input.txt which contains:
00123
00124
00125
00126
and log file 20210716.log
00123
a
b
c
d
00125
00126
xy
z
...
(tons of text)
...
00127
When using grep -f input.txt 20210716.log in output i get:
00123
00125
00126
How can i output patterns from input.txt that don't appear in log file?, so I would like to get:
00124
CodePudding user response:
You may try this grep:
grep -vFf file.log input.txt
00124
Or else you can use awk like this:
awk 'NR == FNR {seen[$1]; next} !($0 in seen)' file.log input.txt
00124
CodePudding user response:
It depends on a bit what you really want. You talk about patterns, and matching patterns is tough. Example if your input file contains words that should be matched, you can use the following:
$ grep -woFf input.txt file.log | grep -vwoFf - input.txt
This reads the file input.txt as a list of patterns to search (-f), but these patterns are assumed to be fixed strings and not regular expressions (-F). We also assume that we only want to match full words (-w) and only output wha tis matched (-o). The output of this command is feed back into a pipe to grep where we do an inverse (-v) match of all found words as fixed strings (-woFf -).
The problem here is that if input.txt contains actual regular expressions, the reverse grep does not work (you can not search for foo and try to match the regex fo* which could appear in input.txt.
A more bulletproof match would be to make use of awk:
awk '(NR==FNR){a[$1];next}
{for(r in a) a[r] =(r~a)}
END{for(r in a) if (a[r]==0) print r}
' input.txt file.log
CodePudding user response:
You could also use join for this. -v1 suppresses matched output in input.txt
join requires that the data be sorted
join -v1 <(sort input.txt) <(sort 20210716.log)
