Copy contents from one Dataframe to another based on column values in Pandas-CodePudding

Two seperate similar DataFrames with different lengths

df2=

Index	First Name	Age	Gender	Weight
0	James	25	Male	155
1	John	27	Male	175
2	Patricia	23	Female	135
3	Mary	22	Female	125
4	Martin	30	Male	185
5	Margaret	29	Female	141
6	Kevin	22	Male	198

df1=

Index	First Name	Age	Gender	Weight	Height
0	James	25	Male	165	5'10
1	John	27	Male	175	5'9
2	Matthew	29	Male	183	6'0
3	Patricia	23	Female	135	5'3
4	Mary	22	Female	125	5'4
5	Rachel	29	Male	123	5'3
6	Jose	20	Male	175	5'11
7	Kevin	22	Male	192	6'2

df2 has some rows which are not in df1 and df1 has some values which are not in df2.

I am comparing df1 against df2. I have calculated the newentries with the following code

newentries = df2.loc[~df2['First Name'].isin(df1['First Name'])]
deletedentries = df1.loc[~df1['First Name'].isin(df2['First Name'])]

where newentries denote the rows/entries that are there in df2 but not in df1; deletedentries denote the rows/entries that are there in df1 but not in df2. The above code works perfectly fine.

I need to copy the height from df1 to df2 when the first names are equal.

df2.loc[df2['First Name'].isin(df1['First Name']),"Height"] = df1.loc[dfxh2['First Name'].isin(df2['First Name']),"Height"]

The above code copies the values however indexing is causing an issue and the values are not copied to the corresponding rows, I tried to promote First Name as the Index but that doesn't solve the issue. Please help me with a solution

Also, I need to calculate the modified values, if the First Name is same, I need to check for modified values; for example in df1, the weight of James is 155 however in df2 the weight is 165, so I need to store the modified weight of James(165) and index(0) in a new dataframe without iteration; the iteration takes a long time because this is a sample of a big dataframe with a lot more rows and columns.

Desired output:

df2=

Index	First Name	Age	Gender	Weight	Height
0	James	25	Male	155	5'10
1	John	27	Male	175	5'9
2	Patricia	23	Female	135	5'3
3	Mary	22	Female	125	5'4
4	Martin	30	Male	185
5	Margaret	29	Female	141
6	Kevin	22	Male	198	6'2

Martin's and Margaret's heights are not there in df1, so their heights are not updated in df2

newentries=

Index	First Name	Age	Gender	Weight	Height
4	Martin	30	Male	185
5	Margaret	29	Female	141

deletedentries=

Index	First Name	Age	Gender	Weight	Height
2	Matthew	29	Male	183	6'0
5	Rachel	29	Male	123	5'3
6	Jose	20	Male	175	5'11

modval=

Index	First Name	Age	Gender	Weight	Height
0	James			165
7	Kevin			192

CodePudding user response：

for your desired output for df2 you can try this:

desired_df2 = df2.merge(df1[['First Name','Height']], on='First Name', how='left')
#if you want to change the "NaN" values just add ".fillna(fill_value=0)" for e.g 0 after the merge
print(desired_df2)


  First Name  Age   Gender  Weight Height
0     James    25    Male      155   5'10
1      John    27    Male      175    5'9
2  Patricia    23  Female      135    5'3
3      Mary    22  Female      125    5'4
4    Martin    30    Male      185    NaN
5  Margaret    29  Female      141    NaN
6     Kevin    22    Male      198    6'2

new and deleted entries is already right. for the moment I'm a bit stuck how to get the modval dataframe. I'll update my answer if I get a solution.

CodePudding user response：

Building off of Rabinzel's answer:

output = df2.merge(df1, how='left', on='First Name', suffixes=[None, '_old'])
df3 = output[['First Name', 'Age', 'Gender', 'Weight', 'Height']]

cols = df1.columns[1:-1]
modval = pd.DataFrame()
for col in cols:
    modval = pd.concat([modval, output[['First Name', col   '_old']][output[col] != output[col   '_old']].dropna()])
    modval.rename(columns={col  '_old':col}, inplace=True)

newentries = df2[~df2['First Name'].isin(df1['First Name'])]
deletedentries = df1[~df1['First Name'].isin(df2['First Name'])]

print(df3, newentries, deletedentries, modval, sep='\n\n')

Output:

  First Name  Age  Gender  Weight Height
0      James   25    Male     155   5'10
1       John   27    Male     175    5'9
2   Patricia   23  Female     135    5'3
3       Mary   22  Female     125    5'4
4     Martin   30    Male     185    NaN
5   Margaret   29  Female     141    NaN
6      Kevin   22    Male     198    6'2

  First Name  Age  Gender  Weight
4     Martin   30    Male     185
5   Margaret   29  Female     141

  First Name  Age Gender  Weight Height
2    Matthew   29   Male     183    6'0
5     Rachel   29   Male     123    5'3
6       Jose   20   Male     175   5'11

  First Name  Age Gender  Weight
0      James  NaN    NaN   165.0
6      Kevin  NaN    NaN   192.0