I have 4 different named log files, all with txt extensions. I need to write a bash script file that extracts JavaScript file names from any of these log files regardless of their names. The output of the script should not include the path, have to be unique, and sorted
After some research I came up with this:
cat logfile1.txt | grep '[^.(]*\.js' | awk -F " " '{print $7}' | sort | uniq -c| sort -nr
This code does only haft the job;
- PRO: It does extract any JS,
sortsit, and givesuniqueresults.
- CON: I need this in a file.sh not a command line as, it is now. Also, I'm getting the entire path to the JS file. I only need the file name
jquery.js
I tried adding grep -v "*/name-of-path-before-JS" to block the result from giving me the full path but that isn't working.
I found someone who made something kind of similar using python; source
filenames = set()
with open(r"/home/filelog.txt") as f:
for line in f:
end = line.rfind(".js") 3 # 3 = len(".js")
start = line.rfind("/", 0, end) 1 # 1 = len("/")
filename = line[start:end]
if filename.endswith(".js"):
filenames.add(filename)
for filename in sorted(filenames, key=str.lower):
print(filename)
Although is missing the sort and uniq options when giving the output it does give the results by only putting out filename.js and not the whole path as the command line I made. Also, I to add the path to the log.txt file while running the script and not just appended it as in the python script below.
Example;
$./LogReaderScript.sh File-log.txt
CodePudding user response:
Would you please try the shell script LogReaderScript.sh:
#!/bin/bash
if [[ $# -eq 0 ]]; then # if no filenames are given
echo "usage: $0 logfile .." # then show the usage and abort
exit 1
fi
grep -hoE "[^/] \.js" "$@" | sort | uniq -c | sort -nr
By setting the file as executable with chmod x LogReaderScript.sh,
you can invoke:
./LogReaderScript.sh File-log.txt
If you want to process multiple files at a time, you can also say something like:
./LogReaderScript.sh *.txt
-ooption togreptells grep to print the matched substrings only, instead of printing the matched line.-Eoption specifiesextended regexas a pettern.-hoption suppresses the prefixed filenames on the output if multiple files are given.- The pattern (regex)
[^/] \.jsmatches a sequence of any characters other than a slash, and followed by a extention.js. It will match the target filenames. "$@"is expanded to the filename(s) passed as arguments to the script.
CodePudding user response:
There's really no need to have a script as you can do the job with the oneliner, since you've mentioned you have multiple log files to parse i'm assuming this is a task you're doing on a regular basis.
In this case just define an alias in your .bashrc file with this oneliner:
cat $1 | awk '{print $7}' | grep '.js' | awk -F\/ '{print $NF}' | sort | uniq
Let's say you've created the alias parser then you'd just have to invoke parser /path/to/logfile.log
With the example logfile you've provided above, the output is:
➜ ~ cat logfile.txt | awk '{print $7}' | grep '.js' | awk -F\/ '{print $NF}' | sort | uniq
jquery.js
jquery.jshowoff.min.js
jshowoff.css
Explanation:
catis used to parse the file and then pipe the content into..awkwhich is extracting the 7th space separated field from the file, since those are apache access logs and you're searching for the requested file, the seventh field is what you needgrepis extracting only the javascript files, ie. those ending with the.jsextensionawkis used again to print only the file name, we're defining a custom field separator this time with the-Fflag, and executing theprintcommand using the$NFargument which instructsawkto print only the last fieldsortanduniqare self explanatory, we're sorting the output then printing only the first occurrence for each repeated value.
jquery.jshowoff.min.js looked like bogus to me and i suspected i did something wrong with my commands, but it's an actual line (280) in your logfile
75.75.112.64 - - [21/Apr/2013:17:32:23 -0700] "GET /include/jquery.jshowoff.min.js HTTP/1.1" 200 2553 "http://random-site.com/" "Mozilla/5.0 (iPod; CPU iPhone OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A403 Safari/8536.25" "random-site.com"
