How to get the first integer from all lines that match a pattern?-CodePudding

I have a file and I only want to find lines that have "here". In each of these lines there are multiple string and integer values (see example below). I only want the first integer of each line that matches the pattern.

I have created a solution that uses a bash script, but is there a simpler method I am missing. I was hoping something like grep -w here -Eo [0-9] file would work. However when I try that it expects anything that comes after "here" to be the file.

STEP 1 STAGE 1 here other info
foo
bar
STEP 2 STAGE 1 here other info
more
foo
bar
STEP 3 STAGE 1 here other info

For this file the desired output would be

1
2
3

CodePudding user response：

Another variant with gnu-grep using -P for Perl-compatible regular expressions if supported:

grep -oP "^\D*\K\d (?=.*\bhere\b)" file

The pattern matches:

^ Start of string
\D* Match optional non digits
\K Forget what is matched do far
\d Match 1 digits
(?=.*\bhere\b) Positive lookahead, assert here to the right

Output

1
2
3

CodePudding user response：

This simpler awk should work for you:

awk '/ here / {sub(/^[^0-9] /, ""); print $1 0}' file

1
2
3

CodePudding user response：

With GNU awk you could try following awk code. Written and tested with your shown samples.

awk '
match($0,/(^|[[:space:]] )([0-9] )[[:space:]] .*here /,arr){
  print arr[2]
}
' Input_file

Explanation: In GNU awk first searching string here keyword AND then using match function of GNU awk where using (^|[[:space:]] )([0-9] )[[:space:]] .*here regex which creates 2 capturing Groups and stores their values into an array named arr with index of 1,2 respectively. If all these conditions are verified then printing the 2nd element of that array which is required value(integer of line).

CodePudding user response：

grep is not the right command for this. I'd use sed:

sed -n '/ here /s/[^0-9]*\([0-9]*\).*/\1/p' file