Home > Blockchain >  Concatenate dataframes for Seaborn hue (adding key)
Concatenate dataframes for Seaborn hue (adding key)

Time:01-16

I'd like to make this code more elegant and reduced. I have three data frames that need to be combined to use Seaborn hue. I need to add a key for the hue itself.

This is what I have come up with, but I feel there must be a more elegant and efficient way

df = pd.DataFrame({ 'A' : range(3), 'B' : range(3) })

frames = (df.copy(), df.copy(), df.copy())
f_names = ["one", "two", "three"]

dff = pd.DataFrame()

for e, f in enumerate(frames):
    tmp = f.copy()
    tmp['name'] = f_names[e]
    dff = dff.append(tmp, ignore_index=True)

print(dff)
Output:
A  B   name
0  0  0    one
1  1  1    one
2  2  2    one
3  0  0    two
4  1  1    two
5  2  2    two
6  0  0  three
7  1  1  three
8  2  2  three

thank you!

CodePudding user response:

IIUC, you want to repeat n times df and add n labels (n=3 here).

You have several options.

concat np.repeat

concatenate the input n times, and add the repeated labels as new column.

f_names = ["one", "two", "three"]
dff = pd.concat([df]*len(f_names), ignore_index=True)
dff['C'] = np.repeat(f_names, len(f_names))

This option also works if you have different dataframes:

dfs = [df1, df2, df3]
f_names = ["one", "two", "three"]
dff = pd.concat(dfs, ignore_index=True)
dff['C'] = np.repeat(f_names, list(map(len, dfs)))

or using a dictionary as input:

dff = (pd.concat({'one': df1, 'two': df2, 'three': df3}, names='C')
         .reset_index(level=0)
       )

cross merge

You can perform a cross merge with a crafted series.

s = pd.Series(['one', 'two', 'three'], name='C')
dff = df.merge(s, how='cross')

output:

   A  B      C
0  0  0    one
1  0  0    two
2  0  0  three
3  1  1    one
4  1  1    two
5  1  1  three
6  2  2    one
7  2  2    two
8  2  2  three

If really, the order of the rows matters, you could use this alternative with pandas.merge (Series first):

s = pd.Series(['one', 'two', 'three'], name='C')
dff = pd.merge(s, df, how='cross')[list(df.columns) [s.name]]

output:

   A  B      C
0  0  0    one
1  1  1    one
2  2  2    one
3  0  0    two
4  1  1    two
5  2  2    two
6  0  0  three
7  1  1  three
8  2  2  three
  •  Tags:  
  • Related