Home > OS >  what does this perl -ane mean in the find command of shell script?
what does this perl -ane mean in the find command of shell script?

Time:01-21

I have this find script

Find $DATA/ -mindepth 1 -maxdepth 1 |\
perl -ane ' s:.*/((. )\-[0-9]{8,10}[a-z]*([_\-].*)?):$2: && print; ' | \
sort -u > $loctmp/speakers_all.txt

can any shell script god decode this for me? what is the perl -ane command doing?

CodePudding user response:

perl -n says "Iterate over the input lines, but don't print them."

perl -a means to break apart the input lines like in awk, but it doesn't look like it's necessary here.

perl -e says "This argument is the program to run".

Run perldoc perlrun to read more about Perl's command-line usage.

CodePudding user response:

perl -ane ' s:.*/((. )\-[0-9]{8,10}[a-z]*([_\-].*)?):$2: && print; ' 

The command switches for Perl, as found in perl -h:

-a                autosplit mode with -n or -p (splits $_ into @F)
-n                assume "while (<>) { ... }" loop around program

Autosplit is not used, and can be safely removed.

-e is just to denote where the code to run is. Can also be replaced with a file that contains the code. E.g. perl foo.pl.

The code itself is just a regex substitution. If expanded, the code looks like this:

while (<>) {
    s:.*/((. )\-[0-9]{8,10}[a-z]*([_\-].*)?):$2: && print; 
}
  • while (<>) loops over the input, putting each line into $_, the default variable.
  • s:... is the substitution operator, but the default delimiters / have been replaced with colons :. Typically this is done to avoid having to escape delimiters inside the regex.
  • The regex itself matches any character ., 0 or more times *, followed by a slash / (presumably the reason for the changed delimiter). Then a string made up of any character 1 or more times . , which is captured (). Then a dash \- followed by 8 to 10 digits 0-9, followed by characters a-z 0 or more times. Then it captures a string made up of either _ or -, followed by any character . 0 or more times. This capture is also made optional by ?, meaning it can match 0 or 1 time. If there is a match, it will be replaced by whatever is captured in $2. The capture $2 is the first part, before the slash (. ), as near as I can tell.
  • && means only execute the RHS if the LHS is true. I.e. only print if the regex matches.
  • print is the same as print $_.

In short, the code will extract the part between the last slash, and the 8 to 10 digits, and discard everything else in the string. With the exception of the optional match, which means that a string after 8-10 digits can be kept, if it has underscore or dash at the beginning. Then print.

  •  Tags:  
  • Related