I need to get a list of unique client computer names/ip addresses that are accessing a server from the access logs of the server.
The target log line looks like this:
2020-11-17 15:34:04.208 -0500 Information 94 XYZ-ASDF-FMP123 Client "%USERNAME% (QWER-L1212-W6) [11.22.333.44]" opening database "databasename" as "username".
In this example, the string (QWER-L1212-W6) [11.22.333.44] would be an example of a unique instance of a client computer/ip address.
So the result would be something like this:
(QWER-L1212-W6) [11.22.333.44]
(QWER-L1234-W7) [11.22.333.55]
etc...
I tried this without success:
grep --only-matching '\(. \) \[. \]' | sort --unique Access.log
the matching fails and the entire log line is returned.
CodePudding user response:
Note you are using a POSIX BRE regex flavor since you do not pass -E/-r nor -P options to change the regex flavor from the default one. \(...\) defines a capturing group in POSIX BRE. There are more issues here though.
You need to use
grep -o '([^()]*) \[[^][]*]' Access.log | sort -u
Note the location of the input file argument to grep.
The ([^()]*) \[[^][]*] here is a POSIX BRE pattern that matches
(- a literal(char (a\(is the start of a capturing group)[^()]*- zero or more chars other than(and))- a literal)char (a\)is the end of a capturing group)- a space\[- a[char[^][]*- zero or more chars other than[and]]- a]char.
See the online demo:
#!/bin/bash
s='2020-11-17 15:34:04.208 -0500 Information 94 XYZ-ASDF-FMP123 Client "%USERNAME% (QWER-L1212-W6) [11.22.333.44]" opening database "databasename" as "username".'
grep -o '([^()]*) \[[^][]*]' <<< "$s" | sort -u
# => (QWER-L1212-W6) [11.22.333.44]
CodePudding user response:
grep --only-matching '\(. \) \[. \]' file.log
This is failing because you are not using ERE (extended regex or -E) in grep and is not escaped. So for your case following may work:
grep -E --only-matching '\(. \) \[. \]' file.log
However this regex is problematic because . will match 1 of any character before matching closing ) and closing ]. If you have (...) [...] substring in your log like this:
2020-11-17 15:34:04.208 -0500 Information 94 XYZ-ASDF-FMP123 Client "%USERNAME% (QWER-L1212-W6) [11.22.333.44]" opening database "databasename" as "username".
2020-11-17 15:34:04.208 -0500 Information 94 XYZ-ASDF-FMP123 Client "%USERNAME% (QWER-L1212-W6) [21.22.333.33]" opening database "databasename" as "username" (QWER-L1234-W7) [11.22.333.55]
Then you will get incorrect results. Incorrect results will also show up with the pattern as '([^()]*) \[[^][]*]'.
Since you are using access.log where format and positions of fields are fixed it is much safer and efficient to use awk for this extraction like this:
awk -F '"' '{sub(/^[^ ]* /, "", $2); print $2}' file.log
(QWER-L1212-W6) [11.22.333.44]
(QWER-L1212-W6) [21.22.333.33]
