Home > Software design >  How can I create a mapping between two dataframes to calculate spearman's correlation?
How can I create a mapping between two dataframes to calculate spearman's correlation?

Time:01-14

I have an example dataframe that maps a fruit to a colour:

Fruit  Colour
Olive  Purple
Apple  Green
Berry  red

I then have two dataframes of numbers

Olive  Apple  Berry
4      3      0
9      3      8
1      6      5

Purple Red Green
56     45  23
19     48  35
20     19  26

How can I find the spearman's correlation between the columns based on the mapping? i.e. the correlation between olive and purple, apple and green, berry and red?

I know that to find the correlation between two columns I can use the following:

df['Some_Column'].corr(df['Some_Other_Column'])

Reproducible code for dataframes:

mapping_dataframe = {'Fruit': ['Olive', 'Apple', 'Berry'], 'Colour': ['Purple', 'Green', 'Red']}
fruit_dataframe={'Olive': [4,9,1], 'Apple': [3,3,6], 'Berry':[0,8,5]}
colour_dataframe={'Purple':[56,19,20], 'Red':[45,48,19], 'Green':[23,35,26]}

CodePudding user response:

You can use [scipy][1]'s implementation of the enter image description here

CodePudding user response:

concat the two DataFrames together and calculate the correlation of all combinations. Then slice the upper quarter, to get the correlation of all fruits with all colours. Finally, use the mapping DataFrame to index that stacked correlation matrix.

import pandas as pd

df = pd.concat([pd.DataFrame(fruit_dataframe), pd.DataFrame(colour_dataframe)],
               axis=1, keys=['fruit', 'colour'])   
#  fruit             colour          
#  Olive Apple Berry Purple Red Green
#0     4     3     0     56  45    23
#1     9     3     8     19  48    35
#2     1     6     5     20  19    26  


s = df.corr().xs('fruit', axis=0).xs('colour', axis=1).stack()
#Olive  Purple   -0.166294
#       Red       0.840414
#       Green     0.812240
#Apple  Purple   -0.479317
#       Red      -0.995567
#       Green    -0.277350
#Berry  Purple   -0.937114
#       Red      -0.049132
#       Green     0.911293
#dtype: float64


dfm = pd.DataFrame(mapping_dataframe)
s.loc[dfm.set_index(['Fruit', 'Colour']).index]

Fruit  Colour
Olive  Purple   -0.166294
Apple  Green    -0.277350
Berry  Red      -0.049132
dtype: float64
  •  Tags:  
  • Related