Count trailing newlines with POSIX utilities or GNU coreutils or Perl-CodePudding

I'm looking for ways to count the number of trailing newlines from possibly binary data either:

read from standard input
or already in a shell variable (then of course the "binary" excludes at least 0x0) using POSIX or coreutils utilities or maybe Perl.

This should work without temporary files or FIFOs.

When the input is in a shell variable, I already have the following (possibly ugly but) working solution:

original_string=$'abc\n\n\def\n\n\n'
string_without_trailing_newlines="$( printf '%s' "${original_string}" )"
printf '%s' $(( ${#original_string}-${#string_without_trailing_newlines} ))

which gives 3 in the above example.

The idea above is simply to subtract the string lengths and use the "feature" of command substitution that it discards any trailing newlines.

CodePudding user response：

With GNU sed we can use the -z option, plus the e modifier of the substitute command and pack all this in a single sed script:

$ printf 'abc\n\n\def\n\n\n' | sed -Ezn '${s/.*[^\n]//;s/.*/wc -l <<!\n&!/ep}'
3

Or, if the string is in a variable:

$ printf '%s' "$original_string" | sed -Ezn '${s/.*[^\n]//;s/.*/wc -l <<!\n&!/ep}'
3

Explanations:

The -z option tells sed that input lines are terminated by the NUL character instead of newline.
The -n option disables the automatic printing.
The 2 substitute commands are applied to the last line only (the $ address), that is, everything after the last NUL character or, if there is no NUL character, the complete input string.
The first substitute command deletes everything except the trailing newlines.
The second substitute command replaces these trailing newlines by:
```
wc -l <<!



!
```
with as many lines in the here-document as there are trailing newlines in the input. As the e modifier is used, this new pattern space is executed, the pattern space is replaced by the result and printed (thanks to the p modifier).

Edit

As noticed by the OP this produces no output at all when the input is the empty string, instead of the expected 0. A simpler version, that also works with the empty string could be:

$ printf '%s' "$original_string" | sed -Ezn '${s/.*[^\n]//;p}' | wc -l

CodePudding user response：

Some perl based solutions:

#!/usr/bin/env bash

original_string=$'abc\n\n\ndef\n\n\n'

# From a shell variable. Look ma, no pipes!
input="$original_string" perl -E '$ENV{input} =~ /(\n*)\z/; say length $1'

# From standard input (Note: The herestring adds an extra newline)
perl -0777 -nE '/(\n*)\z/; say length($1) - 1' <<<"$original_string"

# Or in a shell without herestrings (But then you're also not getting the
# above $'' quoting syntax)
printf "%s" "$original_string" | perl -0777 -nE '/(\n*)\z/; say length $1'

And a more verbose way that doesn't involve reading the input as a single chunk like -0777 does (Unless there are no newlines at all), good for large amounts of data:

printf "abc\n\ndef\n\n\n" | perl -nE '
  if (/^\n\z/) { # Nothing but a newline
    $blank  
  } elsif (/\n\z/) { # Data that ends in a newline; reset counter to 1
    $blank = 1
  } else { # No newline (Last line is missing one?); reset counter to 0
    $blank = 0
  }
  END { say $blank }'

CodePudding user response：

Using GNU awk for RT and without reading all of the input into memory at once:

$ printf 'abc\n\n\def\n\n\n' | awk '/./{n=NR} END{print NR-n (n && (RT==RS))}'
3

$ printf 'a\n' | awk '/./{n=NR} END{print NR-n (n && (RT==RS))}'
1

$ printf 'a' | awk '/./{n=NR} END{print NR-n (n && (RT==RS))}'
0

$ printf '' | awk '/./{n=NR} END{print NR-n (n && (RT==RS))}'
0

$ printf '\n' | awk '/./{n=NR} END{print NR-n (n && (RT==RS))}'
1

$ printf '\n\n' | awk '/./{n=NR} END{print NR-n (n && (RT==RS))}'
2

CodePudding user response：

How about another perl solution:

echo -ne 'abc\n\n\def\n\n\n' | perl -0777 -ne '/\n*$/; print length($&), "\n";'
=> 3
echo -ne '\n' | perl -0777 -ne '/\n*$/; print length($&), "\n";'
=> 1
echo -ne '\n\n' | perl -0777 -ne '/\n*$/; print length($&), "\n";'
=> 2
echo -ne 'a\n\n' | perl -0777 -ne '/\n*$/; print length($&), "\n";'
=> 2
echo -ne 'a' | perl -0777 -ne '/\n*$/; print length($&), "\n";'
=> 0

The -0777 option tells perl to slurp all input lines at once.
The -ne option is similar to that of sed.
The regex \n*$ matches trailing newlines of the input string.
The perl variable $& is assigned to the matched substring.