Home > Software engineering >  Remove first two lines, last two lines and space from file and add quotes on each line and replace n
Remove first two lines, last two lines and space from file and add quotes on each line and replace n

Time:02-01

I have to input.txt file which needs to be formatted by shell script with following condition

  1. remove first two lines and last two lines
  2. remove all spaces in each lines(each line have two spaces at beginning and one space at end)
  3. Each line should be within single quotes(' ')
  4. At last replace newline($) with commas.

(original) input.txt

 sql
--------
  Abce
  Bca
  Efr
-------
Row (3)

Desired output file

output.txt

'Abce','Bca','Efr'

I have tried using following commands

Sed -i 1,2d input.txt > input.txt
Sed "$(( $(wc -l <input.txt) -2 1)), $ d" Input.txt > input.txt
Sed ':a;N;$!ba;s/\n/, /g' input.txt > output.txt

But i get blank output.txt

CodePudding user response:

Would you please try the following:

mapfile -t ary < <(tail -n  3 input.txt | head -n -2 | sed -E "s/^[[:blank:]]*/'/; s/[[:blank:]]*$/'/")
(IFS=,; echo "${ary[*]}")
  • tail -n 3 outputs lines after the 3rd line, inclusive.
  • head -n -2 outputs lines excluding the last 2 lines.
  • sed -E "s/^[[:blank:]]*/'/" removes leading whitespaces and prepends a single quote.
  • Similarly the sed command "s/[[:blank:]]*$/'/" removes trailing whitespaces and appends a single quote.
  • The syntax <(command ..) is a process substitution and the output of the commands within the parentheses is fed to the mapfile via the redirect.
  • mapfile -t ary reads lines from the standard input into the array variable named ary.
  • echo "${ary[*]}" expands to a single string with the contents of the array ary separated by the value of IFS, which is just assigned to a comma.
  • The assignment of IFS and the array expansion are enclosed with parentheses to be executed in the subshell. This prevents the IFS to be modified in the current process.

CodePudding user response:

With your shown samples, please try following awk program. Written and tested in GNU awk, should work with any version.

awk -v s1="'" -v lines=$(wc -l < Input_file) '
BEGIN{ OFS="," }
FNR==(lines-1) {
  print val
  exit
}
FNR>2{
  sub(/^[[:space:]] /,"")
  val=(val?val OFS:"") (s1 $0 s1)
}
' Input_file

Explanation: Adding detailed explanation for above code, this is only for explanation purposes.

awk -v s1="'" -v lines=$(wc -l < Input_file) '  ##Starting awk program, setting s1 variable to ' and creating lines which has total number of lines in it, using wc -l command on Input_file file.
BEGIN{ OFS="," }                                ##Setting OFS to comma in BEGIN section of this program.
FNR==(lines-1) {                                ##Checking condition if its 2nd last line of Input_file.
  print val                                     ##Then printing val here.
  exit                                          ##exiting from program from here.
}
FNR>2{                                          ##Checking condition if FNR is greater than 2 then do following.
  sub(/^[[:space:]] /,"")                       ##Substituting initial spaces with NULL here.
  val=(val?val OFS:"") (s1 $0 s1)               ##Creating val which has ' current line ' in it and keep adding it in val.
}
' Input_file                                    ##Mentioning Input_file name here.

CodePudding user response:

The first sed -i overwrites input.txt with an empty file. You can't write output back to the file you are reading, and sed -i does not produce any output anyway.

The minimal fix is to take out the -i and string together the commands into a pipeline; but of course, sed allows you to combine the commands into a single script.

len=$(wc -l <input.txt)
sed -e '1,2d' -e "$((len - 3))"',$d' \
    -e ':a' \
    -e 's/^  \(.*\) $/'"'\\1'/" \
    -e N -e '$!ba' -e 's/\n/, /g' input.txt >output.txt

(Untested; if your sed does not allow multiple -e options, needs refactoring to use a single string with semicolons or newlines between the commands.)

This is hard to write and debug and brittle because of the ways you have to combine the quoting features of the shell with the requirements of sed and this particular script, but also more inherently because sed is a terse and obscure language.

A much more legible and maintainable solution is to switch to Awk, which allows you to express the logic in more human terms, and avoid having to pull in support from the shell for simple tasks like arithmetic and string formatting.

awk 'FNR > 2 { sub(/^  /, ""); sub(/ $/, "");
    a[  i] = sprintf("\047%s\047,", $0); }
    END { for(j=1; j < i-1;   j) printf "%s", a[j] }' input.txt >output.txt

This literally replaces all newlines with commas; perhaps you would in fact like to print a newline instead of the comma on the last line?

awk 'FNR > 2 { sub(/^  /, ""); sub(/ $/, "");
    a[  i] = sprintf("%s\047%s\047", sep, $0); sep="," }
    END { for(j=1; j < i-1;   j) printf "%s", a[j]; printf "\n" }' input.txt >output.txt

If the input file is really large, you might want to refactor this to not keep all the lines in memory. The array a collects the formatted output and we print all its elements except the last two in the END block.

CodePudding user response:

sed -E '
/^- $/,/^- $/!d
//d
s/^[[:space:]]*|[[:space:]]*$/'\''/g
' input.txt |
paste -sd ,
  • This uses a trick that doesn't work on all sed implementations, to print the lines between two patterns (the dashes in this case), excluding those patterns.
  • On the plus side if the ---- pattern is at a different line number, it still works. Down side is it breaks, if that pattern (a line containing only dashes) occurs an odd number of times (ie. not in pairs, that wrap the lines you want).
  • Then sub line start and end (including white space) with single quotes.
  • Finally pipe to paste to sub the new lines with commas, excluding a trailing comma.

CodePudding user response:

Using sed

$ sed "1,2d; /-/,$ d; s/\s\ //;s/.*/'&'/" input_file | sed -z 's/\n/,/g;s/,$/\n/'
'Abce','Bca','Efr'

CodePudding user response:

I'll post a sed solution which is rather light.

sed '$d' input.txt | sed "\$d; 1,2d; s/^\s*\|\s*$/'/g" | paste -sd ',' > output.txt
  • $d Remove last line with first sed
  • \$d Remove the last line. $ escaped with backslash as we are within double-quotes.
  • 1,2d Remove the first two lines.
  • s/^\s*\|\s*$/'/g Replace all leading and trailing whitespace with single quotes.
  • Use paste to concatenate to a single, comma delimited strings.

If we know that the relevant lines always start with two spaces, then it can even be simplified further.

sed -n "s/\s*$/'/; s/^  /'/p" input.txt | paste -sd ',' > output.txt
  • -n suppress printing lines unless told to
  • s/\s*$/'/ replace trailing whitespace with single quotes
  • s/^ /'/p replace two leading spaces and print lines that match
  • paste to concat

Then an awk solution:

awk -v i=1 -v q=\' 'FNR>2 {
    gsub(/^[[:space:]]*|[[:space:]]*$/, q)
    a[i  ]=$0
} END {
    for(i=1; i<=length(a)-3; i  )
        printf "%s,", a[i]
    print a[i  ]
}' input.txt > output.txt
  • -v i=1 create an awk variable starting at one
  • -v q=\' create an awk variable for the single quote character
  • FNR>2 { ... tells it to only process line 3
  • gsub(/^[[:space:]]*|[[:space:]]*$/, q) substitute leading and trailing whitespace with single quotes
  • a[i ]=$0 add line to array
  • END { ... Process the rest after reaching end of file
  • for(i=1; i<=length(a)-3; i ) take the length of the array but subtract three -- representing the last three lines
  • printf "%s,", a[i] print all but last three entries comma delimited
  • print a[i ] print next entry and complete the script (skipping the last two entries)

CodePudding user response:

An overpiped solution, and with UUOC to boot, but easy to understand:

cat file.txt | tail -n 3 | head -n 3 | sed -E -e "s/^ */'/;s/ *$/'/" | paste -sd ',' -
'Abce','Bca','Efr'
  • tail -n 3 all but the first two lines
  • head -n 3 all but the last two lines
  • sed -E -e "s/^ */'/;s/ *$/'/" replace all leading and trailing space characters with a '
  • paste -sd ',' join the lines with commas

An awk solution that relies on the --- pattern instead of the lines numbers:

awk '/^- $/{if(f){exit}f=1;next}f{gsub(/^ *| *$/,"'\''");r=(r?r",":"")$0}END{print r}' file.txt
  • /^- $/ { if(flag){ exit }; flag = 1; next; }
    when a line is composed of dashes, set the flag (or go to END if the flag was set).
  • flag { gsub(/^ | $/,"") ; result = (result ? result "," : "'") $0; }
    for each line, when the flag is true, replace all the leading and trailing space characters from the line with a ' and append it to the result string.
  • END { print result; }
    at the end of the process, print the result string.
  •  Tags:  
  • Related