Is there a way to split rows into a specified number of rows in pandas?-CodePudding

I have a DataFrame like this. Each unique deal id has a number of bank subsidiaries, and each bank subsidiary has a row containing their parents where the number of bank parents is less than the number of bank subsidiaries.

[In]: 
d = {'Bank Parent': [["Barclays", "UBS", "BofA"], ["Barclays", "UBS", "BofA"], ["Barclays", "UBS", "BofA"], 
                     ["China Merchants Bank Co Ltd", "UniCredit"], ["China Merchants Bank Co Ltd", "UniCredit"],
                    ["China Merchants Bank Co Ltd", "UniCredit"]], 
     'Bank Subsidiary': ["Barclays plc", "UBS AG", "BofA Securities Inc", "CMB International Capital Ltd", "UniCredit Bank AG",
                        "Goldman Sachs (Asia) LLC"],
"Deal id": [1, 1, 1, 2, 2, 2]}

df = pd.DataFrame(data = d)
df

[Out]: 
Bank Parent Bank Subsidiary Deal id
0   [Barclays, UBS, BofA]   Barclays plc    1
1   [Barclays, UBS, BofA]   UBS AG  1
2   [Barclays, UBS, BofA]   BofA Securities Inc 1
3   [China Merchants Bank Co Ltd, UniCredit]    CMB International Capital Ltd   2
4   [China Merchants Bank Co Ltd, UniCredit]    UniCredit Bank AG   2
5   [China Merchants Bank Co Ltd, UniCredit]    Goldman Sachs (Asia) LLC    2

I would like to explode the "Bank Parent" column and have the resulting DataFrame where the bank subsidiary that has no bank parent is "NA".

[Out]: 
Bank Parent                     Bank Subsidiary                 Deal id
0   Barclays                    Barclays plc                      1
1   UBS                         UBS AG                            1
2   BofA                        BofA Securities Inc               1
3   China Merchants Bank Co Ltd CMB International Capital Ltd     2
4   UniCredit                   UniCredit Bank AG                 2
5   NA                          Goldman Sachs (Asia) LLC          2

So far, I have tried pandas.DataFrame.explode() method, but I've not got the desired result.

CodePudding user response：

It looks like you want to pick the Parent "diagonally" from the same group. If that's the case, we first create a column idx that tells up the position of the Parent in the group:

df['idx'] = df.groupby(df['Bank Parent'].apply('_'.join), group_keys = False).cumcount()

then we go row by row and pick the relevant element from the list in 'Bank Parent', and stick into a gew column 'Parent':

df['Parent'] = df.apply(lambda r: r['Bank Parent'][r['idx']], axis=1)

df now looks like this:

    Bank Parent                                   Bank Subsidiary                  idx  Parent
--  --------------------------------------------  -----------------------------  -----  ---------------------------
 0  ['Barclays', 'UBS', 'BofA']                   Barclays plc                       0  Barclays
 1  ['Barclays', 'UBS', 'BofA']                   UBS AG                             1  UBS
 2  ['Barclays', 'UBS', 'BofA']                   BofA Securities Inc                2  BofA
 3  ['China Merchants Bank Co Ltd', 'UniCredit']  CMB International Capital Ltd      0  China Merchants Bank Co Ltd
 4  ['China Merchants Bank Co Ltd', 'UniCredit']  UniCredit Bank AG                  1  UniCredit

you can further df.drop(...) the columns you no longer need

Edit

If, as in the revised question, we have a list of Parent Banks that is too short, we can replace the second line of the solution with the following, where we essentially pad the list with NaNs:

df['Parent'] = df.apply(lambda r: (r['Bank Parent']   [None]*r['idx'])[r['idx']], axis=1)

output with the revised df:

    Bank Parent                                   Bank Subsidiary                  Deal id    idx  Parent
--  --------------------------------------------  -----------------------------  ---------  -----  ---------------------------
 0  ['Barclays', 'UBS', 'BofA']                   Barclays plc                           1      0  Barclays
 1  ['Barclays', 'UBS', 'BofA']                   UBS AG                                 1      1  UBS
 2  ['Barclays', 'UBS', 'BofA']                   BofA Securities Inc                    1      2  BofA
 3  ['China Merchants Bank Co Ltd', 'UniCredit']  CMB International Capital Ltd          2      0  China Merchants Bank Co Ltd
 4  ['China Merchants Bank Co Ltd', 'UniCredit']  UniCredit Bank AG                      2      1  UniCredit
 5  ['China Merchants Bank Co Ltd', 'UniCredit']  Goldman Sachs (Asia) LLC               2      2