Home > database >  The flow of stdout from combined commands
The flow of stdout from combined commands

Time:01-28

I need to edit a bash script that sorts .vcf files. vcf files are roughly structured as shown below:

## header line
## header line
…
Data line
Data line
…

The script is called vcfsort and is part of a library for manipulating vcf files. It looks like this:

head -1000 $1 | grep "^#"; cat $@ | grep -v "^#" | sort -k1,1d -k2,2n

And it is run by writing vcfsort input.vcf > output.vcf. I understand roughly what it does: since sorting should only be done on the data lines, it gets the header lines:

head -1000 $1 | grep "^#";

And combines it with sorted data lines:

cat $@ | grep -v "^#" | sort -k1,1d -k2,2n

I need the head command to read more lines. Instead of calling vcfsort like above, I thought I could just edit the script myself and write it out directly as a command like this:

head -10000 input.vcf | grep "^#"; cat input.vcf | grep -v "^#" | sort -k1,1d -k2,2n > output.vcf

This does not work as expected. My attempt above writes the correct output to stdout, if I leave out > output.vcf. However, if I include it, only the data lines are written to file and the header lines are written to stdout. So, I have a couple of questions:

  1. In this stack overflow answer, it is said that to combine semicolon-separated commands, they should be enclosed in parentheses. Why is that not the case in the vcfsort script?
  2. Why is $@ used in the cat command instead of $1? $@ should refer to all of a shell scripts arguments, but since only one is given (the input file), why not just use $1? If there is a reason for this, how can I transfer that to my command line expression?
  3. Why do I only get part of the stdout when I send it to a file?
  4. Could you show me the edits I need to make to get my command to work as intended?

CodePudding user response:

So the script gets first 1000 lines of first file! Separates header, and basically just copy all comments in those first 1000 lines to output.

Next, it filters all comments lines (leaving only data lines) for all files, and does sorting.

so if you use

vcfsort file1 file2 file3

$1 = "file1" and header from file1 only will be presented in output.

while $@ referring to all files: "file1 file2 file3"

if you need to get headers from all files and merge it - I would recommend to use loop.

for file in $@; do
    head -1000 $file | grep "^#"; 
done
cat $@ | grep -v "^#" | sort -k1,1d -k2,2n

Why do I only get part of the stdout when I send it to a file?

head -10000 input.vcf | grep "^#"; cat input.vcf | grep -v "^#" | sort -k1,1d -k2,2n > output.vcf

Each command executing separatelly (divided by semicolon ";"). So in example above you just redirecting data lines output after sorting. It doesn't redirect to file header part. I would recommend to delete redirecting to file and just use:

vcfsort input.vcf > output.vcf

This does not work as expected

May I know what was expected?

CodePudding user response:

There are two command lists, separated by a ;, inside vcfsort:

  • head -1000 $1 | grep "^#"
  • cat $@ | grep -v "^#" | sort -k1,1d -k2,2n

Each list is a single pipeline. The final two commands in each pipeline inherit their standard output from vcfsort, so that when you run

vcfsort input.vcf > output.vcf

both grep and sort write to output.vcf.

The equivalent using braces would be (replacing ; with a newline for readability)

# Quoting the parameter expansions is important, to protect
# against word-splitting and pathname expansion of the original arguments.
{ head -1000 "$1" | grep "^#"
  cat "$@" | grep -v "^#" | sort -k1,1d -k2,2n
} > output.vcf

Output redirections apply only to a single command, not a command list. Here, a command group serves as that single command: the standard output of the command group is output.vcf, and the two lists in the group inherit that just as before.


Your attempt

head -10000 input.vcf | grep "^#"; cat input.vcf | grep -v "^#" | sort -k1,1d -k2,2n > output.vcf

only opened output.vcf to use as the standard output for sort; the standard output of grep remains whatever standard output it inherits from its parent, namely your terminal.

  •  Tags:  
  • Related