I have two dataframes:
import pandas as pd
df1 = pd.DataFrame({
"id": [1,2,3,4,5,6], "c1": [1,2,3,4,5,6], "c2": [1,2,3,4,5,6], "c3": [1,2,3,4,5,6]
})
and
df2 = pd.DataFrame({
"id": [1,2,3,4,5,6], "column": ["c1", "c2", "c3", "c1", "c2", "c3"], "new-value": [10,20,30,40,50,60]
})
I would like to update df1 based on information from df2 so that the result is:
df3 = pd.DataFrame({
"id": [1,2,3,4,5,6], "c1": [10,2,3,40,5,6], "c2": [1,20,3,4,50,6], "c3": [1,2,30,4,5,60]
})
- Is it possible to do this using pandas?
- Are update/merge viable options for this?
CodePudding user response:
We can reshape df2 using pivot, then use it to substitute values in df1
df1.replace(df2.pivot(*df2.columns)).fillna(df1)
id c1 c2 c3
0 1 10.0 1.0 1.0
1 2 2.0 20.0 2.0
2 3 3.0 3.0 30.0
3 4 40.0 4.0 4.0
4 5 5.0 50.0 5.0
5 6 6.0 6.0 60.0
CodePudding user response:
It is doable, but I am not sure if it is the best way to do such operation.
- The hard way:
df1 = df1.set_index('id')
for idx, row in df2.iterrows():
df1.loc[row['id'], row['column']] = row['new-value']
df1.reset_index()
- The more pythonic way:
df2 = df2.pivot(index='id', columns='column', values='new-value').reset_index()
df1.update(df2)
CodePudding user response:
It is possible with pandas with the following code:
ids = df2["id"].to_list()
columns = df2.columns.to_list()
new_values = df2["new-value"].to_list()
for id in ids:
for col in columns:
for value in new_values:
df3.loc[id, col] = value
Probably not optimal if your datasets are very large.
