I have 2 pandas dfs (df1 & df2) as seen here:
| df1 | col1 | col2 | col3 | col4 | col5 |
|---|---|---|---|---|---|
| row1 | Dog | Cat | Bird | Tree | Lion |
| row2 | Cat | Dragon | Bird | Dog | Tree |
| row3 | Cat | Dog | Bird | Tree | Hippo |
| row4 | Cat | Tree | Bird | Ant | Fish |
| row5 | Cat | Tree | Monkey | Dragon | Ant |
| df2 | col1 | col2 | col3 | col4 | col5 |
|---|---|---|---|---|---|
| row1 | 3.219843 | 1.3631996 | 1.0051135 | 0.89303696 | 0.4313375 |
| row2 | 2.8661892 | 1.4396228 | 0.7863044 | 0.539315 | 0.48167187 |
| row3 | 2.5679462 | 1.3657334 | 0.9470184 | 0.79186934 | 0.48637152 |
| row4 | 3.631389 | 0.94815284 | 0.7561722 | 0.6743943 | 0.5441728 |
| row5 | 2.4727197 | 1.5941181 | 1.4069512 | 1.064051 | 0.48297918 |
The string elements of the df1 correspond to the values of df2. For both dataframes the condition exists that an element (or a value) does not repeat on the same row. But can be repeated on different rows.
For example Dog of row1 = 3.219843, bird of row3 = 0.9470184, bird of row4 = 0.7561722 etc.
I would like to extract the values for all unique elements of the 1st df into different arrays. Like:
Dog = [3.219843, 0.539315, 1.3657334]
Cat = [1.3631996, 2.8661892, 2.5679462, 3.631389, 2.4727197]
etc...
Any ideas?
Many thanks!
CodePudding user response:
Assuming that your first columns df1 and df2 are the index of their respective df, we can extract the values for each unique animal in df1 by using the first df as a mask to extract all wanted values from the second one (the result is a new df with NaN in irrelevant cells, which can be turned into a 1-dimensional array with .stack().values).
Construct the dataframes
First of, create some test data. Please provide it in a form like this in future posts. That's what @mozway was talking about in the comments. It is greatly appreciated.
(It's not always the case that somebody is willing to do all the copy-and-pasting necessary to get dataframes up and running for testing.)
import pandas as pd
import numpy as np
index = ['row1', 'row2', 'row3', 'row4', 'row5']
data1 = {'col1': ['Dog', 'Cat', 'Cat', 'Cat', 'Cat'],
'col2': ['Cat', 'Dragon', 'Dog', 'Tree', 'Tree'],
'col3': ['Bird', 'Bird', 'Bird', 'Bird', 'Monkey'],
'col4': ['Tree', 'Dog', 'Tree', 'Ant', 'Dragon'],
'col5': ['Lion', 'Tree', 'Hippo', 'Fish', 'Ant']}
data2 = {'col1': [3.219843, 2.8661892, 2.5679462, 3.631389, 2.4727197],
'col2': [1.3631996, 1.4396228, 1.3657334, 0.94815284, 1.5941181],
'col3': [1.0051135, 0.7863044, 0.9470184, 0.7561722, 1.4069512],
'col4': [0.89303696, 0.539315, 0.79186934, 0.6743943, 1.064051],
'col5': [0.4313375, 0.48167187, 0.48637152, 0.5441728, 0.48297918]}
df1 = pd.DataFrame(data1, index=index)
df2 = pd.DataFrame(data2, index=index)
Extract the data
Since you didn't specify what data structure you need, this is the strategy outlined above in a dict comprehension:
{animal: df2[df1.eq(animal)].stack().values for animal in np.unique(df1)}
The result looks like this:
{'Ant': array([0.6743943 , 0.48297918]),
'Bird': array([1.0051135, 0.7863044, 0.9470184, 0.7561722]),
'Cat': array([1.3631996, 2.8661892, 2.5679462, 3.631389 , 2.4727197]),
'Dog': array([3.219843 , 0.539315 , 1.3657334]),
'Dragon': array([1.4396228, 1.064051 ]),
'Fish': array([0.5441728]),
'Hippo': array([0.48637152]),
'Lion': array([0.4313375]),
'Monkey': array([1.4069512]),
'Tree': array([0.89303696, 0.48167187, 0.79186934, 0.94815284, 1.5941181 ])}
CodePudding user response:
Assuming the input kindly provided by @fsimonjetz, you can stack both dataframes, then GroupBy.agg as list:
df2.stack().groupby(df1.stack()).agg(list).to_dict()
or, using an intermediate DataFrame:
(pd
.concat([df1.stack(),df2.stack()], axis=1)
.groupby(0)[1].agg(list)
.to_dict()
)
output:
{'Ant': [0.6743943, 0.48297918],
'Bird': [1.0051135, 0.7863044, 0.9470184, 0.7561722],
'Cat': [1.3631996, 2.8661892, 2.5679462, 3.631389, 2.4727197],
'Dog': [3.219843, 0.539315, 1.3657334],
'Dragon': [1.4396228, 1.064051],
'Fish': [0.5441728],
'Hippo': [0.48637152],
'Lion': [0.4313375],
'Monkey': [1.4069512],
'Tree': [0.89303696, 0.48167187, 0.79186934, 0.94815284, 1.5941181]}
