awk prints unnecessary new line in output-CodePudding

I am trying to create csv file from the below data(Elasticsearch data)

health status index                                                uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   78_data_store-2021.12.12                              cYZDB7NGQbyowP-WyF99Zw   1   1    5438232            0      8.3gb          4.1gb
green  open   78_data_store-2021.12.24                             RWrhN4QKT2OlbP4MB7CYmw   1   1     663431            0    745.3mb        372.6mb
green  open   78_data_store-2021.11.26                              CivwBCtAROCejmpZ6RaXOA   1   1     989983            0    956.5mb        478.2mb

I want below output (Expected output)

78,78_data_store-2021.11.26,2021.11.26,1,1,478.2mb
78,78_data_store-2021.12.12,2021.12.12,1,1,4.1gb 
78,78_data_store-2021.12.24,2021.12.24,1,1,372.6mb

But when I am using below command I got unnecessary new line in output

cat filename | grep " 78_" | sort -k 3 | awk '{split($3,a,"_");print a[1];split($3,a,"-"); print ","  $3 "," a[2] "," $5 "," $6 "," $10}'

output of this command

78
,78_data_store-2021.11.26,2021.11.26,1,1,478.2mb
78
,78_data_store-2021.12.12,2021.12.12,1,1,4.1gb
78
,78_data_store-2021.12.24,2021.12.24,1,1,372.6mb

What is the missing in the command?

CodePudding user response：

print adds a newline by default. You can use printf("%s", a[1]); instead or move the printing of a[1] to where all the other fields are being printed. I've renamed the first use of a into b instead be be able to keep the value until later:

grep " 78_" filename | sort -k 3 | \
awk '{split($3,b,"_");split($3,a,"-"); print b[1] ","  $3 "," a[2] "," $5 "," $6 "," $10}'

Output:

78,78_data_store-2021.11.26,2021.11.26,1,1,478.2mb
78,78_data_store-2021.12.12,2021.12.12,1,1,4.1gb
78,78_data_store-2021.12.24,2021.12.24,1,1,372.6mb

CodePudding user response：

Based on your shown samples, please try following awk code. Using Schwartzian transform in awk. Also using awk sort awk combination here.

awk '
BEGIN{ OFS="," }
FNR>1 && /78_/{
  split($3,arr,"[_-]")
  print arr[4]"@"arr[1],$3,arr[4],$5,$6,$NF
}
' Input_file | 
sort -t'@' -k1 | 
awk '{sub(/^[^@]*@/,"")} 1'

Explanation for above code:

Passing Input_file(OP's file) into awk program.
Setting OFS as comma here for all lines.
Checking condition if its greater than 1st line and having 78_ in it then only move further.
Using split function to split 3rd field into an array named arr where delimiters are _- here.
printing arr[4]"@"arr[1],$3,arr[4],$5,$6,$NF which is as per needed output, only thing is additionally arr[4]@ is added front of the output so that we can sort it easily(could be removed later in this program).
Passing awk program's output to sort command where setting field separator as @ and sorting it with 1st field(eg: 2021.12.12 in shown samples).
Passing sorted data to another awk program where removing everything from starting of value till 1st occurrence of @(which was added additionally as mentioned in previous step).

Improvements in OP's attempts:

We need not to use grep when we are using awk, it can take care of searching string itself, so removed it from answer.
We need not to use 2 times split that could also be merged into a single split by mentioning multiple separators in split.