My strings are:
- "TESTING_ABC_1-JAN-2022.BCK-gz;1"
- "TESTING_ABC_30-JAN-2022.BCK-gz;1"
In bash when I run:
echo "TESTING_ABC_1-JAN-2022.BCK-gz;1" | sed 's/.*\([0-9]\{1,2\}-[A-Z][A-Z][A-Z]-[0-9][0-9][0-9][0-9]\).*/\1/' it returns 1-JAN-2022 which is good.
But when I run:
echo "TESTING_ABC_30-JAN-2022.BCK-gz;1" | sed 's/.*\([0-9]\{1,2\}-[A-Z][A-Z][A-Z]-[0-9][0-9][0-9][0-9]\).*/\1/' I get 0-JAN-2022 but I want 30-JAN-2022.
From me passing in my string. How can I do it so that I can get single or double digit dates in one line like "30-JAN-2022" or "1-JAN-2022"
CodePudding user response:
Using sed
$ echo "TESTING_ABC_1-JAN-2022.BCK-gz;1
> TESTING_ABC_30-JAN-2022.BCK-gz;1" | sed -E 's/[^0-9]*([^.]*).*/\1/'
1-JAN-2022
30-JAN-2022
CodePudding user response:
It is much easier to use awk and avoid any regex:
cat file
TESTING_ABC_1-JAN-2022.BCK-gz;1
TESTING_ABC_30-JAN-2022.BCK-gz;1
awk -F '[_.]' '{print $3}' file
1-JAN-2022
30-JAN-2022
Another option is to use grep -Eo with a valid regex for date in DD-MON-YYYY format:
grep -Eo '[0-9]{1,2}-[A-Z]{3}-[0-9]{4}' file
1-JAN-2022
30-JAN-2022
CodePudding user response:
The problem with your regex is that greedy * quantifier: .* will match as many characters as possible while still being able to match the rest of your expression. In many regex implementations you can switch the greedyness of * by adding ?. So /.*?a/ would match as few characters as possible until it finds an a.
Unfortunately, sed doesn't support switching greedyness. Here are two options:
If your string always ends with _ before the date, you can simply add _ to the .* part:
$ sed -r 's/.*_([0-9]{1,2}-[A-Z]{3}-[0-9]{4}).*/\1/' <<< "TESTING_ABC_30-JAN-2022.BCK-gz;1"
30-JAN-2022
Or just grep the relevant parts:
$ grep -Po '([0-9]{1,2}-[A-Z]{3}-[0-9]{4})' <<< "TESTING_ABC_30-JAN-2022.BCK-gz;1"
30-JAN-2022
