How can I combine several elements in columns with all other columns stay the same?-CodePudding

I need help with combining language columns into one row, and then drop duplicate columns, just combine two different language of the same Movie, year, and Id.

There are more similar columns in the CSV, so please help me figure out a way to combine those.Here is the existing csv:

f = pd.DataFrame({'Movie': ['name1','name1','name2','name3','name4','name4'],
                  'Year': ['1905', '1905','1906','1907','1910','1910'],
                  'Id': ['tt0283985', 'tt0283985','tt0284043','tt3402904','tt3458360','tt3458360'],
                  'language':['Mandarian','Cantonese','Mandarian','unknown','Cantonese','Mandarian']})

Where f now looks like:

   Movie  Year         Id   language
0  name1  1905  tt0283985  Mandarian
1  name1  1905  tt0283985  Cantonese
2  name2  1906  tt0284043  Mandarian
3  name3  1907  tt3402904    unknown
4  name4  1910  tt3458360  Cantonese
5  name4  1910  tt3458360  Mandarian

And the result should be like this:

   Movie  Year         Id             language
0  name1  1905  tt0283985  Mandarian,Cantonese
1  name2  1906  tt0284043            Mandarian
2  name3  1907  tt3402904              unknown
3  name4  1910  tt3458360  Cantonese,Mandarian

So row 1 and 2 are identical except language, we just need to combine them into one row, so does row 5 and 6. Here's my try:

ff = new_f.groupby(by = ['Movie','Year','Id']).agg(','.join)
ff.to_csv("File.csv", index=False)

But the output is weird: All other columns disappeared and language is the only column left:

language
Mandarian,Cantonese
Mandarian
unknown
Cantonese,Mandarian

...

CodePudding user response：

By default, groupby sets the grouping keys as index, and you explicitly asked to_csv not to export the index,

Use as_index=False in groupby:

ff = f.groupby(by=['Movie','Year','Id'], as_index=False).agg(','.join)
ff.to_csv("File.csv", index=False)

Or, export the index in to_csv:

ff = f.groupby(by=['Movie','Year','Id']).agg(','.join)
ff.to_csv("File.csv")

NB. if the intermediate is not useful to you, you do not need to set the ff variable, you can directly chain f.groupby(...).agg(...).to_csv(...)

CodePudding user response：

Use the following command: ff = f.groupby(['Movie','Year','Id']).agg(','.join).reset_index()

It should work.