I got a dataframe of albums rating and I'm trying to add a column that will state the number of albums the artist produced (number of 'artist' duplicates in the df)
I created a separate df with 2 columns 'artist' and 'number_of_albums' and thought I could then add the value of 'number_of_albums' to the original df according to the artist in each row.
dups_artists = df.pivot_table(columns=['artist'], aggfunc='size')
artists_df = pd.DataFrame({'artist':dups_artists.index, 'number_of_albums':dups_artists.values})
but I'm not sure how to do that, also it seems like there must be a simpler way to achieve the result...
CodePudding user response:
left = df.set_index('artist')
right = df2.set_index('artist')
left.join(right)
or you can use on='artist' when joining.
