CSV Python: How to combine duplicates in column A and put A-attached values together in Column B?-CodePudding

In a CSV file, I have duplicate values in column A, but they have different values in column B. I would like the value (IP) in column A to be reduced to one line and the individual associated values to be listed next to it in column B, separated by a comma. I've tried different code examples but somehow nothing worked. I would be very happy if you could help me :)

I'm able to use every module working with Python 3.x

How it actually looks:

IP;Ports
192.168.3.1;53
192.168.3.2;22
192.168.3.2;80
192.168.3.2;2000
192.168.3.2;3000

How it should look:

IP;Ports
192.168.3.1;53
192.168.3.2;22, 80 ,2000, 3000

CodePudding user response：

You should specify the str datatype at load time:

df = pd.read_csv('input.csv', sep=';', dtype=str)
df.groupby('IP').agg(', '.join).reset_index().to_csv('output.csv', sep=';',
                                                     index=False)

It gives as expected:

IP;Ports
192.168.3.1;53
192.168.3.2;22, 80, 2000, 3000

CodePudding user response：

You can always use groupby with unique function:

df = pd.DataFrame(data={
    'IP': ['192.168.3.1', '192.168.3.2', '192.168.3.2', '192.168.3.2', '192.168.3.2'],
    'Port': [53, 22, 80, 2000, 3000]
})

grouped_df = df.groupby(['IP'])['Port'].unique().reset_index()

which will result as:

    IP          Port
0   192.168.3.1 [53]
1   192.168.3.2 [22, 80, 2000, 3000]