I have a matrix that store values like table below:
| play_tv | play_series | Null | purchase | Conversion | |
|---|---|---|---|---|---|
| Start | 0.02 | 0.03 | 0.04 | 0.05 | 0.06 |
| play_series | 0.07 | 0.08 | 0.09 | 0.10 | 0.11 |
| play_tv | 0.12 | 0.13 | 0.14 | 0.15 | 0.16 |
| Null | 0.17 | 0.18 | 0.19 | 0.20 | 0.21 |
| purchase | 0.22 | 0.23 | 0.24 | 0.25 | 0.26 |
| Conversion | 0.27 | 0.28 | 0.29 | 0.30 | 0.31 |
and I have dataframe like this below:
| session_id | path | path_pair |
|---|---|---|
| T01 | [Start, play_series, Null] | [(Start, play_series),( play_series, Null)] |
| T02 | [Start, play_tv, purchase, Conversion] | [(Start, play_tv),(play_tv, purchase),(purchase, Conversion)] |
I want to get value from the matrix to replace column path_pair or create new column in my current dataframe. It's choose be list of values and How can I do that?
[(Start, play_series), (play_series, Null)] -> [0.03, 0.09]
[(Start, play_tv), (play_tv, purchase), (purchase, conversion)] -> [0.02, 0.15, 0.26 ]
result I want:
| session_id | path | path_pair |
|---|---|---|
| T01 | [Start, play_series, Null] | [0.03, 0.09] |
| T02 | [Start, play_tv, purchase, Conversion] | [0.02, 0.15, 0.26] |
script I try to get value from the matrix:
trans_matrix[trans_matrix.index=="Start"]["play_series"].values[0]
CodePudding user response:
Given your input:
df1 = pd.DataFrame({'play_tv': [0.02, 0.07, 0.12, 0.17, 0.22, 0.27],
'play_series': [0.03, 0.08, 0.13, 0.18, 0.23, 0.28],
'Null': [0.04, 0.09, 0.14, 0.19, 0.24, 0.29],
'purchase': [0.05, 0.1, 0.15, 0.2, 0.25, 0.3],
'Conversion': [0.06, 0.11, 0.16, 0.21, 0.26, 0.31]},
index=['Start','play_series','play_tv','Null','purchase','Conversion'])
df2 = pd.DataFrame({'session_id': ['T01', 'T02'],
'path': [['Start', 'play_series', 'Null'],
['Start', 'play_tv', 'purchase', 'Conversion']],
'path_pair': [[('Start', 'play_series'),( 'play_series', 'Null')],
[('Start', 'play_tv'),('play_tv', 'purchase'),('purchase', 'Conversion')]]})
You can update df2 by applying a function to column 'path_pair' that looks up values in df1:
df2['path_pair'] = df2['path_pair'].apply(lambda lst: [df1.loc[x,y] for (x,y) in lst])
Output:
session_id path path_pair
0 T01 [Start, play_series, Null] [0.03, 0.09]
1 T02 [Start, play_tv, purchase, Conversion] [0.02, 0.15, 0.26]
