I have this find script
Find $DATA/ -mindepth 1 -maxdepth 1 |\
perl -ane ' s:.*/((. )\-[0-9]{8,10}[a-z]*([_\-].*)?):$2: && print; ' | \
sort -u > $loctmp/speakers_all.txt
can any shell script god decode this for me? what is the perl -ane command doing?
CodePudding user response:
perl -n says "Iterate over the input lines, but don't print them."
perl -a means to break apart the input lines like in awk, but it doesn't look like it's necessary here.
perl -e says "This argument is the program to run".
Run perldoc perlrun to read more about Perl's command-line usage.
CodePudding user response:
perl -ane ' s:.*/((. )\-[0-9]{8,10}[a-z]*([_\-].*)?):$2: && print; '
The command switches for Perl, as found in perl -h:
-a autosplit mode with -n or -p (splits $_ into @F)
-n assume "while (<>) { ... }" loop around program
Autosplit is not used, and can be safely removed.
-e is just to denote where the code to run is. Can also be replaced with a file that contains the code. E.g. perl foo.pl.
The code itself is just a regex substitution. If expanded, the code looks like this:
while (<>) {
s:.*/((. )\-[0-9]{8,10}[a-z]*([_\-].*)?):$2: && print;
}
while (<>)loops over the input, putting each line into$_, the default variable.s:...is the substitution operator, but the default delimiters/have been replaced with colons:. Typically this is done to avoid having to escape delimiters inside the regex.- The regex itself matches any character
., 0 or more times*, followed by a slash/(presumably the reason for the changed delimiter). Then a string made up of any character 1 or more times., which is captured(). Then a dash\-followed by 8 to 10 digits 0-9, followed by characters a-z 0 or more times. Then it captures a string made up of either_or-, followed by any character.0 or more times. This capture is also made optional by?, meaning it can match 0 or 1 time. If there is a match, it will be replaced by whatever is captured in$2. The capture$2is the first part, before the slash(. ), as near as I can tell. &&means only execute the RHS if the LHS is true. I.e. only print if the regex matches.printis the same asprint $_.
In short, the code will extract the part between the last slash, and the 8 to 10 digits, and discard everything else in the string. With the exception of the optional match, which means that a string after 8-10 digits can be kept, if it has underscore or dash at the beginning. Then print.
