Joining two files that both have duplicate rows-CodePudding

I am trying to join two files that have identical column 1 and different column 2:

File1

    aaa 1
    bbb 3
    bbb 3
    ccc 1
    ccc 1
    ccc 0

File2

    aaa 2
    bbb 2
    bbb 2
    ccc 1
    ccc 1
    ccc 0

When I try to join them with

    join File1 File2 > File3

I get

    aaa 1 2
    bbb 3 2
    bbb 3 2
    bbb 3 2
    bbb 3 2
    ccc 1 1
    ccc 1 1
    ccc 1 0
    ccc 1 1
    ccc 1 1
    ccc 1 0
    ccc 0 1
    ccc 0 1
    ccc 0 0

join is trying to expand the duplicates when all I want it to do is go line-by line so the output should be

    aaa 1 2
    bbb 3 2
    bbb 3 2
    ccc 1 1
    ccc 1 1
    ccc 0 0

How do I tell join to ignore duplicates and just combine the files line-by-line?

EDIT: This is being done in a loop with multiple files that all have the same column 1 but different column 2. I am joining the first two files into a temporary file and then looping through the other files joining with that temporary file.

CodePudding user response：

Assumptions:

all files have the same number of rows
all files have the same values in the first column for the same numbered row
the final result set can fit into memory

Sample input:

$ for f in f{1..4}
do
echo "############ $f"
cat $f
done
############ f1
aaa 1
bbb 3
bbb 3
ccc 1
ccc 1
ccc 0
############ f2
aaa 2
bbb 2
bbb 2
ccc 1
ccc 1
ccc 0
############ f3
aaa 12
bbb 12
bbb 12
ccc 11
ccc 11
ccc 10
############ f4
aaa 202
bbb 202
bbb 202
ccc 201
ccc 201
ccc 200

One awk idea:

awk '
FNR==NR { a[FNR]=$0; next }
        { a[FNR]=a[FNR] OFS $2 }
END     { for (i=1;i<=FNR;i  ) 
              print a[i]
        }
' f1 f2 f3 f4

This generates:

aaa 1 2 12 202
bbb 3 2 12 202
bbb 3 2 12 202
ccc 1 1 11 201
ccc 1 1 11 201
ccc 0 0 10 200

CodePudding user response：

Based on a suggestion from @Andre Wildberg, this worked best:

    paste File1 <(cut -d " " -f 2 File2)

This allowed be to loop through a list of files:

    cat File1 > tmp

    for file in $files
    do
        paste tmp <(cut -d " " -f 2 $file) > tmpf
        mv tmpf tmp
    done

    mv tmp FinalFile