Im trying to capture numbers inside a file using AWK, I could capture all, but im not being able to capture those in a certain amount of digits. What im doing wrong?
echo -e "$teste" | awk '/_OA/ { match($0,/\[\([:digit:]{4,13}\]/);oa = substr($0,RSTART,RLENGTH);print oa}'
File sample:
_OA ............. [6712227000168]
_OA Tasdsd, OA .. [91][355016]
_OA Tasdsd, DA .. [91][5512987000]
Expected:
6712227000168
355016
5512987000
CodePudding user response:
With your shown samples please try following awk solution. Simply making field separator as ] OR [ and in main block checking condition if line starts from _QA then printing the 2nd last field.
awk -F"[][]" '/^_QA /{print $(NF-1)}' Input_file
CodePudding user response:
You could update the pattern and the values for RSTART and RLENGTH to not match the leading and trailing square brackets.
The digits part should be [[:digit:]] and there is a \( in the pattern that matches ( that should not be there.
awk '/_OA/ { match($0,/\[[[:digit:]]{4,13}\]/);oa = substr($0,RSTART 1,RLENGTH-2);print oa}' <<< "$teste"
Output
6712227000168
355016
5512987000
As there are multiple occurrences of digits between square brackets, if you want to match multiple occurrences:
teste='_OA Tasdsd, OA .. [91][355016][123456789][1][9999]'
awk '/_OA/ {
while(match($0,/\[[[:digit:]]{4,13}]/)){
start=RSTART 1; len=RLENGTH-2
s=substr($0,start,len)
res=res?res","s:s
$0=substr($0,start len)
}
print res
res = ""
}' <<< "$teste"
Output
355016,123456789,9999
CodePudding user response:
You can use
awk '/_OA/ { match($0,/\[[[:digit:]]{4,13}]/);print substr($0,RSTART 1,RLENGTH-2)}'
See the online demo:
#!/bin/bash
s='_OA ............. [6712227000168]
_OA Tasdsd, OA .. [91][355016]
_OA Tasdsd, DA .. [91][5512987000]'
awk '/_OA/ { match($0,/\[[[:digit:]]{4,13}]/);print substr($0,RSTART 1,RLENGTH-2)}' <<< "$s"
Output:
6712227000168
355016
5512987000
Details:
\[- a[char[[:digit:]]{4,13}- four to thirteen digits (note that the[:digit:]POSIX character class must be used within[...], a bracket expression)]- a]char (it is not special, no need escaping)
And substr($0,RSTART 1,RLENGTH-2) means that we
$0- take the matchRSTART 1- starting with the second charRLENGTH-2- and then as many characters as is the match length - 2 (thus getting rid of enclosing[and]chars)
CodePudding user response:
Your regexp \[\([:digit:]{4,13}\] says:
\[= the literal character[\(= the literal character([:digit:]= a bracket expression containing a character set of the characters:,d,i,g,t{4,13}= a regexp interval that's 4 to 13 repetitions of the preceding bracket expression\]= the literal character]
The 2 main issues with that which are causing your regexp to be unable to match any of your input are:
- You don't have any
(s in your input (from #2 above), and - To match digits you need a character class
[:digit:]inside a bracket expression[[:digit:]], not a character set:digit:inside a bracket expression[:digit:](from #3 above)
You also don't actually need to escape the ] at the end of the regexp as it's only a regexp metachar (end of bracket expression) if preceded by a matching unescaped [ (start of bracket expression).
So the regexp I think you wanted to write instead would have been:
\[[[:digit:]]{4,13}]
e.g.:
$ awk '/_OA/ { match($0,/\[[[:digit:]]{4,13}]/);oa = substr($0,RSTART,RLENGTH);print oa}' file
[6712227000168]
[355016]
[5512987000]
or to only print the numbers:
$ awk '/_OA/ { match($0,/\[[[:digit:]]{4,13}]/);oa = substr($0,RSTART 1,RLENGTH-2);print oa}' file
6712227000168
355016
5512987000
