I have two dataframes that I am trying to concatenate together. For the sake of clarity, let's call these dataframes A and B. What I want to do is to concatenate alternating columns from each dataframe in a repeating A-B-A, A-B-A pattern. What I mean by that is this: I want the first column to come from dataframe A, the next from dataframe B, then dataframe A again; then the pattern should repeat A-B-A.
Below are sample of dataframes (A and B) and the expected output.
data1 = {'ipt_v1': {'V7': 24.588074000000002, 'V4': 22.116738},
'ipt_v2': {'V7': 4.958045, 'V4': 4.726775}}
data2 = {('ipt_v1', 'l'): {'V7': 24.251368038901415, 'V4': 21.806257204224853},
('ipt_v1', 'u'): {'V7': 24.92477996109859, 'V4': 22.427218795775136},
('ipt_v2', 'l'): {'V7': 4.901618779535504, 'V4': 4.665580151848285},
('ipt_v2', 'u'): {'V7': 5.014471220464495, 'V4': 4.787969848151715}}
# expected out
output = {('ipt_v1', 'l'): {'V7': 24.251368038901415, 'V4': 21.806257204224853},
('ipt_v1', 'm'): {'V7': 24.588074000000002, 'V4': 22.116738},
('ipt_v1', 'u'): {'V7': 24.92477996109859, 'V4': 22.427218795775136},
('ipt_v2', 'l'): {'V7': 4.901618779535504, 'V4': 4.665580151848285},
('ipt_v2', 'm'): {'V7': 4.958045, 'V4': 4.726775},
('ipt_v2', 'u'): {'V7': 5.014471220464495, 'V4': 4.787969848151715}}
Additionally, I have included a function that is intended to give the expected output. The part I need some help with in regard to the function relates to # alternating the columns
def combo(data1, data2):
rws = data1
ciws = data2
rws.columns = pd.MultiIndex.from_product([['m'],rws.columns])
rws = rws.swaplevel(0,1, axis=1)
output = pd.concat([ciws, rws], axis=1)
# alternating the columns
output = (output[list(sum(zip(ciws.columns, rws.columns), ()))])
return output
# call the function
output = combo(data1, data2)
print(output)
CodePudding user response:
You can use insert, which takes position (integer), name and values as argument. In your case you want to insert columns at every third position from the first dataframe starting with column 1:
import pandas as pd
import numpy as np
data1 = {('ipt_v1', 'm'): {'V7': 24.588074000000002, 'V4': 22.116738},
('ipt_v2', 'm'): {'V7': 4.958045, 'V4': 4.726775}}
data2 = {('ipt_v1', 'l'): {'V7': 24.251368038901415, 'V4': 21.806257204224853},
('ipt_v1', 'u'): {'V7': 24.92477996109859, 'V4': 22.427218795775136},
('ipt_v2', 'l'): {'V7': 4.901618779535504, 'V4': 4.665580151848285},
('ipt_v2', 'u'): {'V7': 5.014471220464495, 'V4': 4.787969848151715}}
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
df = df2.copy()
i = 1
for c in df1.columns:
df.insert(i, c, df1[c])
i = 3
In your case you could also achive the same result using a simple sort:
df = pd.concat([df1, df2], axis=1)
sorted_columns = sorted(df.columns)
df = df[sorted_columns]
CodePudding user response:
A simple way is to consider the pattern you expect.
You want A-B-A-A-B-A. So if you concat df2 and df1, and assume columns indices [0, 1, 2, 3, *4, *5] (* denotes df1 items), you want to reorder [0, *4, 1, 2, *5, 3].
For this, you can do:
import numpy as np
n = len(df2) # 4
m = len(df1) # 2
a = np.arange(n m)
# array([0, 1, 2, 3, 4, 5])
a = np.r_[np.delete(a, a[1::3]), a[1::3]]
# order in which to take values
# move 1, 4 to the end or the array
# array([0, 2, 3, 5, 1, 4])
np.argsort(a)
# sorted order
# array([0, 4, 1, 2, 5, 3])
Now, you just perform:
output = pd.concat([ciws, rws], axis=1)
output.iloc[:, np.argsort(a)]
and here you are:
ipt_v1 ipt_v2
l m u l m u
V7 24.251368 24.588074 24.924780 4.901619 4.958045 5.014471
V4 21.806257 22.116738 22.427219 4.665580 4.726775 4.787970
simpler example:
df1 = pd.DataFrame(list('BB'), range(4,6)).T
# 4 5
# 0 B B
df2 = pd.DataFrame(list('AAAA'), range(4)).T
# 0 1 2 3
# 0 A A A A
df3 = pd.concat([df2, df1], axis=1).iloc[:, np.argsort(a)]
# 0 4 1 2 5 3
# 0 A B A A B A
CodePudding user response:
Take the columns from two dfs and append them alternately using itertools then reorder the coulmns in output df
import itertools
def combo(data1, data2):
rws = data1
ciws = data2
rws.columns = pd.MultiIndex.from_product([['m'],rws.columns])
rws = rws.swaplevel(0,1, axis=1)
output = pd.concat([ciws, rws], axis=1)
# alternating the columns
rws_cols = list(rws.columns)
ciws_cols = list(ciws.columns)
cols = [x for x in itertools.chain.from_iterable(itertools.zip_longest(ciws_cols,rws_cols)) if x]
output = output[cols]
output = (output[list(sum(zip(ciws.columns, rws.columns), ()))])
return output
# call the function
output = combo(data1, data2)
print(output)
