I am trying to print a line with 4 leading whitespaces. When I apply my regex with egrep, everything works as expected. But when I use awk, the results highly differ.
Can u say what I am doing wrong?
Example:
echo " testtest" | egrep '^[[:space:]]{4}'
=> prints: testtest
echo " testtest" | awk '/^[[:space:]]{4}/ {print}'
=> prints nothing
CodePudding user response:
Regarding your comment that echo "(whitespace x 4) testtest" | awk '/^[ \t]{4}/ {print}' --> prints nothing as well as the issue in your question - with mawk 1.3.4 you're running a pre-POSIX version of a minimal featured (for execution speed) variant of awk, mawk 1, so you shouldn't expect it to understand relatively modern POSIX concepts like character classes ([[:space:]]) or RE intervals ({4}) or non-POSIX extensions like \s or various other things. mawk 2 is now available which should have better support of POSIX features but get GNU awk, gawk, for the fullest functionality and excellent speed.
By the way, egrep is deprecated, use grep -E instead.
CodePudding user response:
inaccuracies I need to point out :
mawk 'BEGIN { __="[[:space:]]" for(_=_<_; (_ _) < 4^4; _ ) { if(sprintf("%c",_)~__) { printf("U %6.4X\n",_) } } }'
U 0009 # horizontal tab \t
U 000A
U 000B
U 000C # \f
U 000D
U 0020 # space "[ ]"
mawk-1recognizesPOSIXspaces properly in theASCIIside of thingsmawk-2, at its current beta stage, doesn't yet solve the{n,m}interval problem thatmawk-1faces
as for matching 4 spaces up front, something like
echo " testtest" |
mawk 'BEGIN { _="[ \t]"; gsub(".",_,_); _^=FS=("^")_ } _<NF' or # if u wanna be posixly-pedantic about it mawk 'BEGIN { _^=FS="^"(_=(_="[[:space:]]")_)_ } _<NF'
testtest
