I have 2 dataframes looking like this :
In each dataframe there is pattern of 1-2 in the values column. (the values are not significant to my problem, just to demonstrate the pattern)
df1 = {'idx': [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'values': [20, 1000, 10001, 21, 1000, 1002, 22, 1003, 1007,23]}
df2 = {'idx': [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'values': [1000, 21, 22, 1000, 22, 23, 1000, 20, 21, 1000]}
I need to swap rows between the two dataframes so that the outcome would be :
df_expected1 = {'idx': [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'values': [20, 21, 22, 21, 22, 23, 22, 20, 21,23]}
df_expected2 = {'idx': [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'values': [1000, 1000, 10001, 1000, 1000, 1002, 1000, 1003, 1007, 1000]}
CodePudding user response:
Knowing that swap-needed indices are from 3n-2, you can make mask then use numpy.where:
m = df1["idx"].add(2).mod(3).eq(0)
s1 = np.where(m, df1["values"], df2["values"])
s2 = np.where(~m, df1["values"], df2["values"])
df1["values"] = s1
df2["values"] = s2
Output:
idx values
0 1 20
1 2 21
2 3 22
3 4 21
4 5 22
5 6 23
6 7 22
7 8 20
8 9 21
9 10 23
idx values
0 1 1000
1 2 1000
2 3 10001
3 4 1000
4 5 1000
5 6 1002
6 7 1000
7 8 1003
8 9 1007
9 10 1000
CodePudding user response:
This should do it, knowing that the indices are the same between two dataframes :
df1[df1['idx']%3 == 1], df2[df1['idx']%3 == 1] = df2[df1['idx']%3 == 1], df1[df1['idx']%3 == 1]
Output :
idx values
0 1 1000
1 2 1000
2 3 10001
3 4 1000
4 5 1000
5 6 1002
6 7 1000
7 8 1003
8 9 1007
9 10 1000
idx values
0 1 20
1 2 21
2 3 22
3 4 21
4 5 22
5 6 23
6 7 22
7 8 20
8 9 21
9 10 23
CodePudding user response:
In the code snippet below, I assumed that indexes in both df1 and df2 were equal and that values from df1 are always supposed to be greater than df2.
import pandas as pd
import numpy as np
from pprint import pprint
df1 = {'idx': [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'values': [20, 1000, 10001, 21, 1000, 1002, 22, 1003, 1007,23]}
df2 = {'idx': [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'values': [1000, 21, 22, 1000, 22, 23, 1000, 20, 21, 1000]}
a = pd.DataFrame(df1).set_index('idx')
b = pd.DataFrame(df2).set_index('idx')
col_name = 'values'
a_series = a[col_name]
b_series = b[col_name]
for i in a_series.index:
if a_series.loc[i] > b_series.loc[i]:
a_series.loc[i], b_series.loc[i] = b_series.loc[i], a_series.loc[i]
df_expected1 = {'idx': a_series.index.tolist(), 'values': a_series.values.tolist()}
df_expected2 = {'idx': b_series.index.tolist(), 'values': b_series.values.tolist()}
pprint(df_expected1)
pprint(df_expected2)
Output:
{'idx': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'values': [1000, 1000, 10001, 1000, 1000, 1002, 1000, 1003, 1007, 1000]}
{'idx': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'values': [20, 21, 22, 21, 22, 23, 22, 20, 21, 23]}
