I am trying to create csv file from the below data(Elasticsearch data)
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open 78_data_store-2021.12.12 cYZDB7NGQbyowP-WyF99Zw 1 1 5438232 0 8.3gb 4.1gb
green open 78_data_store-2021.12.24 RWrhN4QKT2OlbP4MB7CYmw 1 1 663431 0 745.3mb 372.6mb
green open 78_data_store-2021.11.26 CivwBCtAROCejmpZ6RaXOA 1 1 989983 0 956.5mb 478.2mb
I want below output (Expected output)
78,78_data_store-2021.11.26,2021.11.26,1,1,478.2mb
78,78_data_store-2021.12.12,2021.12.12,1,1,4.1gb
78,78_data_store-2021.12.24,2021.12.24,1,1,372.6mb
But when I am using below command I got unnecessary new line in output
cat filename | grep " 78_" | sort -k 3 | awk '{split($3,a,"_");print a[1];split($3,a,"-"); print "," $3 "," a[2] "," $5 "," $6 "," $10}'
output of this command
78
,78_data_store-2021.11.26,2021.11.26,1,1,478.2mb
78
,78_data_store-2021.12.12,2021.12.12,1,1,4.1gb
78
,78_data_store-2021.12.24,2021.12.24,1,1,372.6mb
What is the missing in the command?
CodePudding user response:
print adds a newline by default. You can use printf("%s", a[1]); instead or move the printing of a[1] to where all the other fields are being printed. I've renamed the first use of a into b instead be be able to keep the value until later:
grep " 78_" filename | sort -k 3 | \
awk '{split($3,b,"_");split($3,a,"-"); print b[1] "," $3 "," a[2] "," $5 "," $6 "," $10}'
Output:
78,78_data_store-2021.11.26,2021.11.26,1,1,478.2mb
78,78_data_store-2021.12.12,2021.12.12,1,1,4.1gb
78,78_data_store-2021.12.24,2021.12.24,1,1,372.6mb
CodePudding user response:
Based on your shown samples, please try following awk code. Using Schwartzian transform in awk. Also using awk sort awk combination here.
awk '
BEGIN{ OFS="," }
FNR>1 && /78_/{
split($3,arr,"[_-]")
print arr[4]"@"arr[1],$3,arr[4],$5,$6,$NF
}
' Input_file |
sort -t'@' -k1 |
awk '{sub(/^[^@]*@/,"")} 1'
Explanation for above code:
- Passing Input_file(OP's file) into
awkprogram. - Setting
OFSas comma here for all lines. - Checking condition if its greater than 1st line and having 78_ in it then only move further.
- Using
splitfunction to split 3rd field into an array named arr where delimiters are_-here. - printing
arr[4]"@"arr[1],$3,arr[4],$5,$6,$NFwhich is as per needed output, only thing is additionallyarr[4]@is added front of the output so that we can sort it easily(could be removed later in this program). - Passing
awkprogram's output tosortcommand where setting field separator as@and sorting it with 1st field(eg: 2021.12.12 in shown samples). - Passing sorted data to another
awkprogram where removing everything from starting of value till 1st occurrence of@(which was added additionally as mentioned in previous step).
Improvements in OP's attempts:
- We need not to use
grepwhen we are usingawk, it can take care of searching string itself, so removed it from answer. - We need not to use 2 times
splitthat could also be merged into a singlesplitby mentioning multiple separators insplit.
