I'm trying to write a Perl one-liner that outputs a substring from a string piped to it. The following works perfectly:
$ echo rubbishdatarubbish | perl -ne 'print $_ =~ /rubbish(.*)rubbish/'
data
However, it breaks when there's more occurances of the ending string:
$ echo rubbishdatarubbishrubbish | perl -ne 'print $_ =~ /rubbish(.*)rubbish/'
datarubbish
I tried adding the ? 'non-greedy' parameter (both before and after the ending string) but that does not make a difference. I appear to be using a recent version of Perl so I guess that can't be it:
$ perl -v
This is perl 5, version 32, subversion 1 (v5.32.1) built for x86_64-msys-thread-multi
What am I missing? Something obvious I'm sure...
CodePudding user response:
To get the string between the first 'rubbish' and upto the second 'rubbish' you can/should use the non-greedy '?' twice.
echo rubbishdatarubbishrubbish | perl -ne 'print $_ =~ /^.*?rubbish(.*?)rubbish/'
returns 'data'
CodePudding user response:
If I understand correctly, your data is surrounded by a fixed string, that repeats multiple times. For this, you want to use a regex to extract the data. While this is certainly possible, with the right regex skill level, it may not be the best method. Consider for example
$ echo rubbishdatarubbishrubbish | perl -ple's/rubbish//g'
data
Or
$ echo rubbishdatarubbishdatarubbishdata | perl -F/rubbish/ -lanwe'print for @F'
data
data
data
Both of these just remove rubbish from the string. The latter I used split, which allows you to more easily separate the various data. What you see above is autosplit mode:
-a autosplit mode with -n or -p (splits $_ into @F)
-F/pattern/ split() pattern for -a switch (//'s are optional)
-l[octal] enable line ending processing, specifies line terminator
Basically it does
perl -ne'chomp; @F = split /rubbish/, $_; print $_, $/ for @F;'
$/ is the input record separator, normally a newline.
The benefit of these methods is that you do not need balanced pairs of rubbish to encapsulate your data.
