Reading a book about bash and it was introducing regular expressions(I'm pretty new to them) with an example:
rename -n 's/(.*)(.*)/new$1$2/' *
'file1' would be renamed to 'newfile1'
'file2' would be renamed to 'newfile2'
'file3' would be renamed to 'newfile3'
There wasn't really a breakdown provided with this example, unfortunately. I kind of get what capture groups are and that .* is greedy and will match all characters but I'm uncertain as to why two capture groups are needed. Also, I get that $ represents the end of the line but am unsure of what $1$2 is actually doing here. Appreciate any insight provided.
Attempted to research capture groups and the $ for some similar examples with explanations but came up short.
CodePudding user response:
You are correct. (.*)(.*) makes no sense. The second .* will always match the empty string.
For example, matching against file,
- the first
.*will match the 4 character string starting at position 0 (file), and - the second
.*will match the 0 character string starting at position 4 (empty string).
You could simply the pattern to
rename -n 's/(.*)/new$1/' *
rename -n 's/.*/new$&/' *
rename -n 's/^/new/' *
rename -n '$_ = "new$_"' *
rename -n '$_ = "new" . $_' *
CodePudding user response:
I don't know that rename command. The regular expression looks like sed syntax. If that is the case (as in many other regex forms), it has 3 parts:
sfor substitute- everything between the first two slashes
(.*)(.*)to specify what to match - everything between the 2nd and 3rd slash
new$1$2is the replacement
$ only mean end of the line on the first part of the regular expression. On the second part $ number refers to the capture groups, $1 is the first group, $2 the second, and so on, with $0 often being the whole matched text.
You are right that .* is greedy and it's pointless to have that repeated. Maybe there was a \. in between and that was an attempt to capture file name and extension. There are better ways to parse file names, like basename. So you could simplify the command to rename -n 's/(.*)/new$1/' *
