Concatenate Two pandas Dataframes (A & B) in an Alternating Pattern of Columns (A,-B-A, A-B-A)-CodePudding

I have two dataframes that I am trying to concatenate together. For the sake of clarity, let's call these dataframes A and B. What I want to do is to concatenate alternating columns from each dataframe in a repeating A-B-A, A-B-A pattern. What I mean by that is this: I want the first column to come from dataframe A, the next from dataframe B, then dataframe A again; then the pattern should repeat A-B-A.

Below are sample of dataframes (A and B) and the expected output.

data1 = {'ipt_v1': {'V7': 24.588074000000002, 'V4': 22.116738}, 
         'ipt_v2': {'V7': 4.958045, 'V4': 4.726775}}


data2 = {('ipt_v1', 'l'): {'V7': 24.251368038901415, 'V4': 21.806257204224853},
         ('ipt_v1', 'u'): {'V7': 24.92477996109859, 'V4': 22.427218795775136},
         ('ipt_v2', 'l'): {'V7': 4.901618779535504, 'V4': 4.665580151848285},
         ('ipt_v2', 'u'): {'V7': 5.014471220464495, 'V4': 4.787969848151715}}


# expected out
output = {('ipt_v1', 'l'): {'V7': 24.251368038901415, 'V4': 21.806257204224853},
          ('ipt_v1', 'm'): {'V7': 24.588074000000002, 'V4': 22.116738},
          ('ipt_v1', 'u'): {'V7': 24.92477996109859, 'V4': 22.427218795775136},
          ('ipt_v2', 'l'): {'V7': 4.901618779535504, 'V4': 4.665580151848285},
          ('ipt_v2', 'm'): {'V7': 4.958045, 'V4': 4.726775},
          ('ipt_v2', 'u'): {'V7': 5.014471220464495, 'V4': 4.787969848151715}}

Additionally, I have included a function that is intended to give the expected output. The part I need some help with in regard to the function relates to # alternating the columns

def combo(data1, data2):
    rws = data1
    ciws = data2
    
    rws.columns = pd.MultiIndex.from_product([['m'],rws.columns])
    rws = rws.swaplevel(0,1, axis=1)
    
    output = pd.concat([ciws, rws], axis=1)
    
    # alternating the columns
    output = (output[list(sum(zip(ciws.columns, rws.columns), ()))])
    return output

# call the function
output = combo(data1, data2)
print(output)

CodePudding user response：

You can use insert, which takes position (integer), name and values as argument. In your case you want to insert columns at every third position from the first dataframe starting with column 1:

import pandas as pd
import numpy as np
data1 = {('ipt_v1', 'm'): {'V7': 24.588074000000002, 'V4': 22.116738}, 
         ('ipt_v2', 'm'): {'V7': 4.958045, 'V4': 4.726775}}


data2 = {('ipt_v1', 'l'): {'V7': 24.251368038901415, 'V4': 21.806257204224853},
         ('ipt_v1', 'u'): {'V7': 24.92477996109859, 'V4': 22.427218795775136},
         ('ipt_v2', 'l'): {'V7': 4.901618779535504, 'V4': 4.665580151848285},
         ('ipt_v2', 'u'): {'V7': 5.014471220464495, 'V4': 4.787969848151715}}

df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)

df = df2.copy()
i = 1
for c in df1.columns:
    df.insert(i, c, df1[c])
    i  = 3

In your case you could also achive the same result using a simple sort:

df = pd.concat([df1, df2], axis=1)
sorted_columns = sorted(df.columns)
df = df[sorted_columns]

CodePudding user response：

A simple way is to consider the pattern you expect.

You want A-B-A-A-B-A. So if you concat df2 and df1, and assume columns indices [0, 1, 2, 3, *4, *5] (* denotes df1 items), you want to reorder [0, *4, 1, 2, *5, 3].

For this, you can do:

import numpy as np

n = len(df2) # 4
m = len(df1) # 2

a = np.arange(n m)
# array([0, 1, 2, 3, 4, 5])

a = np.r_[np.delete(a, a[1::3]), a[1::3]]
# order in which to take values
# move 1, 4 to the end or the array
# array([0, 2, 3, 5, 1, 4])

np.argsort(a)
# sorted order
# array([0, 4, 1, 2, 5, 3])

Now, you just perform:

output = pd.concat([ciws, rws], axis=1)
output.iloc[:, np.argsort(a)]

and here you are:

       ipt_v1                          ipt_v2                    
            l          m          u         l         m         u
V7  24.251368  24.588074  24.924780  4.901619  4.958045  5.014471
V4  21.806257  22.116738  22.427219  4.665580  4.726775  4.787970

simpler example:

df1 = pd.DataFrame(list('BB'), range(4,6)).T
#    4  5
# 0  B  B

df2 = pd.DataFrame(list('AAAA'), range(4)).T
#    0  1  2  3
# 0  A  A  A  A

df3 = pd.concat([df2, df1], axis=1).iloc[:, np.argsort(a)]
#    0  4  1  2  5  3
# 0  A  B  A  A  B  A

CodePudding user response：

Take the columns from two dfs and append them alternately using itertools then reorder the coulmns in output df

import itertools
def combo(data1, data2):
    rws = data1
    ciws = data2
    
    rws.columns = pd.MultiIndex.from_product([['m'],rws.columns])
    rws = rws.swaplevel(0,1, axis=1)
    
    output = pd.concat([ciws, rws], axis=1)
    
    # alternating the columns
    rws_cols = list(rws.columns)
    ciws_cols = list(ciws.columns)
    cols =  [x for x in itertools.chain.from_iterable(itertools.zip_longest(ciws_cols,rws_cols)) if x]
    output = output[cols]

    output = (output[list(sum(zip(ciws.columns, rws.columns), ()))])
    return output

# call the function
output = combo(data1, data2)
print(output)