I want to move some rows from df1 to df2 when calories in df1 and df2 are the same. The two dfs have the same columns.
import numpy as np
import pandas as pd
np.random.seed(0)
df1 = pd.DataFrame(data = {
"calories": [420, 80, 90, 10],
"duration": [50, 4, 5, 3]
})
df2 = pd.DataFrame(data = {
"calories": [420, 380, 390],
"duration": [60, 40, 45]
})
print(df1)
print(df2)
calories duration
0 420 50
1 80 4
2 90 5
3 10 2
calories duration
0 420 60
1 380 40
2 390 45
rows = df1.loc[df1.calories == df2.calories, :]
df2 = df2.append(rows, ignore_index=True)
df1.drop(rows.index, inplace=True)
print('df1:')
print(df1)
print('df2:')
print(df2)
Then it reports this error:
raise ValueError("Can only compare identically-labeled Series objects")
ValueError: Can only compare identically-labeled Series objects
EDIT: Solution
def move_rows(df1, df2):
for index, row in df1.iterrows():
if row['calories'] in df2.values:
df2 = df2.append(row, ignore_index=True)
df1.drop(index, inplace=True)
return df1, df2
CodePudding user response:
Since your dataframes are not the same length, you need to use merge to find rows with common calories values. You need to merge on the index and calories values; that can most easily be achieved by using reset_index to temporarily add an index column to merge on:
dftemp = df1.reset_index().merge(df2.reset_index(), on=['index', 'calories'], suffixes=['', '_y'])
Output:
index calories duration duration_y
0 0 420 50 60
You can now concat the calories and duration values from dftemp to df2 (using reset_index again to reset the index):
df2 = pd.concat([df2, dftemp[['calories', 'duration']]]).reset_index(drop=True)
Output (for your sample data):
calories duration
0 420 60
1 380 40
2 390 45
3 420 50
To remove the rows that were copied to df2 from df1, we merge just on index, then filter out rows where the two calories values are different:
dftemp = df1.merge(df2, left_index=True, right_index=True, suffixes=['', '_y']).query('calories != calories_y')
df1 = dftemp[['calories', 'duration']].reset_index(drop=True)
Output (for your sample data):
calories duration
0 80 4
1 90 5
2 10 3
