I am trying to join two files that have identical column 1 and different column 2:
File1
aaa 1
bbb 3
bbb 3
ccc 1
ccc 1
ccc 0
File2
aaa 2
bbb 2
bbb 2
ccc 1
ccc 1
ccc 0
When I try to join them with
join File1 File2 > File3
I get
aaa 1 2
bbb 3 2
bbb 3 2
bbb 3 2
bbb 3 2
ccc 1 1
ccc 1 1
ccc 1 0
ccc 1 1
ccc 1 1
ccc 1 0
ccc 0 1
ccc 0 1
ccc 0 0
join is trying to expand the duplicates when all I want it to do is go line-by line so the output should be
aaa 1 2
bbb 3 2
bbb 3 2
ccc 1 1
ccc 1 1
ccc 0 0
How do I tell join to ignore duplicates and just combine the files line-by-line?
EDIT: This is being done in a loop with multiple files that all have the same column 1 but different column 2. I am joining the first two files into a temporary file and then looping through the other files joining with that temporary file.
CodePudding user response:
Assumptions:
- all files have the same number of rows
- all files have the same values in the first column for the same numbered row
- the final result set can fit into memory
Sample input:
$ for f in f{1..4}
do
echo "############ $f"
cat $f
done
############ f1
aaa 1
bbb 3
bbb 3
ccc 1
ccc 1
ccc 0
############ f2
aaa 2
bbb 2
bbb 2
ccc 1
ccc 1
ccc 0
############ f3
aaa 12
bbb 12
bbb 12
ccc 11
ccc 11
ccc 10
############ f4
aaa 202
bbb 202
bbb 202
ccc 201
ccc 201
ccc 200
One awk idea:
awk '
FNR==NR { a[FNR]=$0; next }
{ a[FNR]=a[FNR] OFS $2 }
END { for (i=1;i<=FNR;i )
print a[i]
}
' f1 f2 f3 f4
This generates:
aaa 1 2 12 202
bbb 3 2 12 202
bbb 3 2 12 202
ccc 1 1 11 201
ccc 1 1 11 201
ccc 0 0 10 200
CodePudding user response:
Based on a suggestion from @Andre Wildberg, this worked best:
paste File1 <(cut -d " " -f 2 File2)
This allowed be to loop through a list of files:
cat File1 > tmp
for file in $files
do
paste tmp <(cut -d " " -f 2 $file) > tmpf
mv tmpf tmp
done
mv tmp FinalFile
