I have an example dataframe that maps a fruit to a colour:
Fruit Colour
Olive Purple
Apple Green
Berry red
I then have two dataframes of numbers
Olive Apple Berry
4 3 0
9 3 8
1 6 5
Purple Red Green
56 45 23
19 48 35
20 19 26
How can I find the spearman's correlation between the columns based on the mapping? i.e. the correlation between olive and purple, apple and green, berry and red?
I know that to find the correlation between two columns I can use the following:
df['Some_Column'].corr(df['Some_Other_Column'])
Reproducible code for dataframes:
mapping_dataframe = {'Fruit': ['Olive', 'Apple', 'Berry'], 'Colour': ['Purple', 'Green', 'Red']}
fruit_dataframe={'Olive': [4,9,1], 'Apple': [3,3,6], 'Berry':[0,8,5]}
colour_dataframe={'Purple':[56,19,20], 'Red':[45,48,19], 'Green':[23,35,26]}
CodePudding user response:
You can use [scipy][1]'s implementation of the 
CodePudding user response:
concat the two DataFrames together and calculate the correlation of all combinations. Then slice the upper quarter, to get the correlation of all fruits with all colours. Finally, use the mapping DataFrame to index that stacked correlation matrix.
import pandas as pd
df = pd.concat([pd.DataFrame(fruit_dataframe), pd.DataFrame(colour_dataframe)],
axis=1, keys=['fruit', 'colour'])
# fruit colour
# Olive Apple Berry Purple Red Green
#0 4 3 0 56 45 23
#1 9 3 8 19 48 35
#2 1 6 5 20 19 26
s = df.corr().xs('fruit', axis=0).xs('colour', axis=1).stack()
#Olive Purple -0.166294
# Red 0.840414
# Green 0.812240
#Apple Purple -0.479317
# Red -0.995567
# Green -0.277350
#Berry Purple -0.937114
# Red -0.049132
# Green 0.911293
#dtype: float64
dfm = pd.DataFrame(mapping_dataframe)
s.loc[dfm.set_index(['Fruit', 'Colour']).index]
Fruit Colour
Olive Purple -0.166294
Apple Green -0.277350
Berry Red -0.049132
dtype: float64
