In a CSV file, I have duplicate values in column A, but they have different values in column B. I would like the value (IP) in column A to be reduced to one line and the individual associated values to be listed next to it in column B, separated by a comma. I've tried different code examples but somehow nothing worked. I would be very happy if you could help me :)
I'm able to use every module working with Python 3.x
How it actually looks:
IP;Ports
192.168.3.1;53
192.168.3.2;22
192.168.3.2;80
192.168.3.2;2000
192.168.3.2;3000
How it should look:
IP;Ports
192.168.3.1;53
192.168.3.2;22, 80 ,2000, 3000
CodePudding user response:
You should specify the str datatype at load time:
df = pd.read_csv('input.csv', sep=';', dtype=str)
df.groupby('IP').agg(', '.join).reset_index().to_csv('output.csv', sep=';',
index=False)
It gives as expected:
IP;Ports
192.168.3.1;53
192.168.3.2;22, 80, 2000, 3000
CodePudding user response:
You can always use groupby with unique function:
df = pd.DataFrame(data={
'IP': ['192.168.3.1', '192.168.3.2', '192.168.3.2', '192.168.3.2', '192.168.3.2'],
'Port': [53, 22, 80, 2000, 3000]
})
grouped_df = df.groupby(['IP'])['Port'].unique().reset_index()
which will result as:
IP Port
0 192.168.3.1 [53]
1 192.168.3.2 [22, 80, 2000, 3000]
