I have been trying to extract part of string in bash. I'm using it on Mac.
Pattern of input string:
- Some random word follow by a
/. This is optional. - Keyword (
def,foo, andbar) followed by hyphen(-) followed by numbers. This can be 2-6 digit numbers - These numbers are followed by hyphens again and few hyphen separated words.
Sample inputs and outputs:
abc/def-1234-random-words // def-1234
bla/foo-12-random-words // foo-12
bar-12345-random-words // bar-12345
So I tried following command to fetch it but for some weird reason, it returns entire string.
extractedValue=`getInputString | sed -e 's/.*\(\(def\|bar\|foo\)-[^-]*\).*/\1/g'`
// and
extractedValue=`getInputString | sed -e 's/.*\(\(def\|bar\|foo\)-\d{2,6}\).*/\1/g'`
I also tried to make it case-insensitive using I flag but it threw error for me:
: bad flag in substitute command: 'I'
Following are the references I tried:
CodePudding user response:
You can use the -E option to use extended regular expressions, then you don't have to escape ( and |.
echo abc/def-1234-random-words | sed -E -e 's/.*((def|bar|foo)-[^-]*).*/\1/g'
def-1234
CodePudding user response:
This gnu sed should work with ignore case flag:
sed -E 's~^(.*/){0,1}((def|foo|bar)-[0-9]{2,6})-.*~\2~I' file
def-1234
foo-12
bar-12345
This sed matches:
(.*/){0,1}: Match a string upto/optionally at the start(: Start capture group #2(def|foo|bar): Matchdeforfooorbar-: Match a-[0-9]{2,6}: Match 2 to 6 digits
): End capture group #2-.*: Match-followed by anything till end- Substitution is value we capture in group #2
Or you may use this awk:
awk -v IGNORECASE=1 -F / 'match($NF, /^(def|foo|bar)-[0-9]{2,6}-/) {print substr($NF, 1, RLENGTH-1)}' file
def-1234
foo-12
bar-12345
Awk explanation:
-v IGNORECASE=1: Enable ignore case matching-F /: Use/as field separatormatch($NF, /^(def|foo|bar)-[0-9]{2,6}-/): Match text using regex^(def|foo|bar)-[0-9]{2,6}-in$NFwhich is last field using/as field separator (to ignore text before/)- If match is successful then using
substrprint text from position1toRLENGTH-1(since we matching until-after digits)
