file1.csv
col1,col2,col3
a,b,c
b,v,n
x,u,v
t,m,m
file2.csv
col1,col2,col3
p,m,n
a,z,i
col1 acts as a primary key in both files. If any value of col1 of file2.csv occurs in file1.csv, then this row will be discarded from file1.csv.
output file:
col1,col2,col3
b,v,n
x,u,v
t,m,m
Note: I'm interested in Unix based solution. please provide a solution sort or uniq cmd or anything else.
CodePudding user response:
There's a join command on Unix that does pretty much exactly what you want:
join -v1 -t, \
<(tail 2 file1.txt | sort -k1 -t,) \
<(tail 2 file2.txt | sort -k1 -t,)
For the sample files you've given, this is its output:
b,v,n
t,m,m
x,u,v
Command Breakdown
join -v1 -t,-v1: display rows from the 1st file that are not pairable with lines in the 2nd file via the join column (default column used for joining is 1, but it's overridable via the-1and-2options)-t,: use comma as a field/column separator
<(tail 2 file1.txt | sort -k1 -t,)<( … ): thejoincommand expects file names as arguments, so we use process substitution to create such temporary files from the output of the nested commandstail 2 file1.txt: skip the header linesort -k1 -t,: thejoincommand expects sorted files-t,: use comma as a field/column separator-k1: sort by the 1st field
