Home > Net >  Is there a way to split rows into a specified number of rows in pandas?
Is there a way to split rows into a specified number of rows in pandas?

Time:02-03

I have a DataFrame like this. Each unique deal id has a number of bank subsidiaries, and each bank subsidiary has a row containing their parents where the number of bank parents is less than the number of bank subsidiaries.

[In]: 
d = {'Bank Parent': [["Barclays", "UBS", "BofA"], ["Barclays", "UBS", "BofA"], ["Barclays", "UBS", "BofA"], 
                     ["China Merchants Bank Co Ltd", "UniCredit"], ["China Merchants Bank Co Ltd", "UniCredit"],
                    ["China Merchants Bank Co Ltd", "UniCredit"]], 
     'Bank Subsidiary': ["Barclays plc", "UBS AG", "BofA Securities Inc", "CMB International Capital Ltd", "UniCredit Bank AG",
                        "Goldman Sachs (Asia) LLC"],
"Deal id": [1, 1, 1, 2, 2, 2]}

df = pd.DataFrame(data = d)
df

[Out]: 
Bank Parent Bank Subsidiary Deal id
0   [Barclays, UBS, BofA]   Barclays plc    1
1   [Barclays, UBS, BofA]   UBS AG  1
2   [Barclays, UBS, BofA]   BofA Securities Inc 1
3   [China Merchants Bank Co Ltd, UniCredit]    CMB International Capital Ltd   2
4   [China Merchants Bank Co Ltd, UniCredit]    UniCredit Bank AG   2
5   [China Merchants Bank Co Ltd, UniCredit]    Goldman Sachs (Asia) LLC    2

I would like to explode the "Bank Parent" column and have the resulting DataFrame where the bank subsidiary that has no bank parent is "NA".

[Out]: 
Bank Parent                     Bank Subsidiary                 Deal id
0   Barclays                    Barclays plc                      1
1   UBS                         UBS AG                            1
2   BofA                        BofA Securities Inc               1
3   China Merchants Bank Co Ltd CMB International Capital Ltd     2
4   UniCredit                   UniCredit Bank AG                 2
5   NA                          Goldman Sachs (Asia) LLC          2

So far, I have tried pandas.DataFrame.explode() method, but I've not got the desired result.

CodePudding user response:

It looks like you want to pick the Parent "diagonally" from the same group. If that's the case, we first create a column idx that tells up the position of the Parent in the group:

df['idx'] = df.groupby(df['Bank Parent'].apply('_'.join), group_keys = False).cumcount()

then we go row by row and pick the relevant element from the list in 'Bank Parent', and stick into a gew column 'Parent':

df['Parent'] = df.apply(lambda r: r['Bank Parent'][r['idx']], axis=1)

df now looks like this:

    Bank Parent                                   Bank Subsidiary                  idx  Parent
--  --------------------------------------------  -----------------------------  -----  ---------------------------
 0  ['Barclays', 'UBS', 'BofA']                   Barclays plc                       0  Barclays
 1  ['Barclays', 'UBS', 'BofA']                   UBS AG                             1  UBS
 2  ['Barclays', 'UBS', 'BofA']                   BofA Securities Inc                2  BofA
 3  ['China Merchants Bank Co Ltd', 'UniCredit']  CMB International Capital Ltd      0  China Merchants Bank Co Ltd
 4  ['China Merchants Bank Co Ltd', 'UniCredit']  UniCredit Bank AG                  1  UniCredit

you can further df.drop(...) the columns you no longer need

Edit

If, as in the revised question, we have a list of Parent Banks that is too short, we can replace the second line of the solution with the following, where we essentially pad the list with NaNs:

df['Parent'] = df.apply(lambda r: (r['Bank Parent']   [None]*r['idx'])[r['idx']], axis=1)

output with the revised df:

    Bank Parent                                   Bank Subsidiary                  Deal id    idx  Parent
--  --------------------------------------------  -----------------------------  ---------  -----  ---------------------------
 0  ['Barclays', 'UBS', 'BofA']                   Barclays plc                           1      0  Barclays
 1  ['Barclays', 'UBS', 'BofA']                   UBS AG                                 1      1  UBS
 2  ['Barclays', 'UBS', 'BofA']                   BofA Securities Inc                    1      2  BofA
 3  ['China Merchants Bank Co Ltd', 'UniCredit']  CMB International Capital Ltd          2      0  China Merchants Bank Co Ltd
 4  ['China Merchants Bank Co Ltd', 'UniCredit']  UniCredit Bank AG                      2      1  UniCredit
 5  ['China Merchants Bank Co Ltd', 'UniCredit']  Goldman Sachs (Asia) LLC               2      2
  •  Tags:  
  • Related