I have the following table:
data = [['abc', 'bin_1', "bin_2"], ['abc', 'bin_1', "bin_1"]]
data = pd.DataFrame(data, columns = ['name', 'bin1', 'bin2'])
And I want to merge the columns bin1 and bin2.
As you see, there can be the same cell value in these two columns.
I want to combine the two columns by | if the values differ, otherwise just put a single unique value.
data["bin"] = data[['bin1', 'bin2']].agg(' | '.join, axis=1)
Unfortunately gives me:
name bin1 bin2
abc bin_1 bin_2
abc bin_1 bin_1
if I want:
name bin1 bin2 bin
abc bin_1 bin_2 bin_1 | bin_2
abc bin_1 bin_1 bin_1
CodePudding user response:
Use sets if order is not important:
data["bin"] = data[['bin1', 'bin2']].agg(lambda x: ' | '.join(set(x)), axis=1)
print (data)
name bin1 bin2 bin
0 abc bin_1 bin_2 bin_1 | bin_2
1 abc bin_1 bin_1 bin_1
Or dict.fromkeys if ordering is important:
data["bin"] = data[['bin1', 'bin2']].agg(lambda x: ' | '.join(dict.fromkeys(x)), axis=1)
