Home > Enterprise >  Pandas: aggregate and join if different string
Pandas: aggregate and join if different string

Time:01-21

I have the following table:

data = [['abc', 'bin_1', "bin_2"], ['abc', 'bin_1', "bin_1"]]
data = pd.DataFrame(data, columns = ['name', 'bin1', 'bin2'])

And I want to merge the columns bin1 and bin2. As you see, there can be the same cell value in these two columns. I want to combine the two columns by | if the values differ, otherwise just put a single unique value.

data["bin"] = data[['bin1', 'bin2']].agg(' | '.join, axis=1)

Unfortunately gives me:

name    bin1    bin2
abc bin_1   bin_2
abc bin_1   bin_1

if I want:

name    bin1    bin2    bin
abc bin_1   bin_2   bin_1 | bin_2
abc bin_1   bin_1   bin_1

CodePudding user response:

Use sets if order is not important:

data["bin"] = data[['bin1', 'bin2']].agg(lambda x: ' | '.join(set(x)), axis=1)
print (data)
  name   bin1   bin2            bin
0  abc  bin_1  bin_2  bin_1 | bin_2
1  abc  bin_1  bin_1          bin_1

Or dict.fromkeys if ordering is important:

data["bin"] = data[['bin1', 'bin2']].agg(lambda x: ' | '.join(dict.fromkeys(x)), axis=1)
  •  Tags:  
  • Related