I am trying to overwrite specific rows and columns from one dataframe with a second dataframe rows and columns. I can't give the actual data but I will use a proxy here.
Here is an example and what I have tried:
df1
UID B C D
0 X14 cat red One
1 X26 cat blue Two
2 X99 cat pink One
3 X54 cat pink One
df2
UID B C EX2
0 X14 dog blue coat
1 X88 rat green jacket
2 X99 bat red glasses
3 X29 bat red shoes
What I want to do here is overwrite column B and C in df1 with the values in df2 based upon UID. Therefore in this example X88 and X29 from df2 would not appear in df2. Also column D would not be affected and EX2 not
The outcome would looks as such:
df1
UID B C D
0 X14 dog blue One
1 X26 cat blue Two
2 X99 bat red One
3 X54 cat pink One
I looked at this solution : Pandas merge two dataframe and overwrite rows However this appears to only update null values whereas I want an overwrite.
My attempt looked this like:
df = df1.merge(df2.filter(['B', 'C']), on=['B', 'C'], how='left')
For my data this actually doesn't seem to overwrite anything. Please could someone explain why this would not work?
Thanks
CodePudding user response:
One approach could be as follows:
- First, use
df.set_indexto make columnUIDyour index (inplace). - Next, use
df.updatewith parameteroverwriteset toTrue(also useset_indexhere for the "other" df:df2). This will overwrite all the columns that the two dfs have in common (i.e.BandC) based on index matches (i.e. nowUID). - Finally, restore the standard index using
df.reset_index.
df1.set_index('UID', inplace=True)
df1.update(df2.set_index('UID'), overwrite=True)
df1.reset_index(inplace=True)
print(df1)
UID B C D
0 X14 dog blue One
1 X26 cat blue Two
2 X99 bat red One
3 X54 cat pink One
CodePudding user response:
You can approach this by using reindex_like and combine_first.
Try this :
out = (
df2.set_index("UID")
.reindex_like(df1.set_index("UID"))
.combine_first(df1.set_index("UID"))
.reset_index()
)
# Output :
print(out)
UID B C D
0 X14 dog blue One
1 X26 cat blue Two
2 X99 bat red One
3 X54 cat pink One
CodePudding user response:
Using Update function
df1.set_index('UID', inplace=True)
df2.set_index('UID', inplace=True)
df1.update(df2)
df1.reset_index(inplace=True)
print(df1)
Output
UID B C D
0 X14 dog blue One
1 X26 cat blue Two
2 X99 bat red One
3 X54 cat pink One
