Here is my dataframe
| ID | Sum total | Sum partial |
|---|---|---|
| A1 | 40 | 25 |
| A2 | 70 | 50 |
| A3 | 100 | 40 |
I would like to build a third column based on the following pivot table given grid let's say:
| Sum total interval\sum partial interval | 0-30 | 30-55 | 55-70 |
|---|---|---|---|
| 0-50 | 0.10 | 0.17 | 0.22 |
| 50-75 | 0.14 | 0.18 | 0.25 |
| 75-100 | 0.20 | 0.27 | 0.38 |
Which would give this expected result:
| ID | Sum total | Sum partial | Ratio given by grid |
|---|---|---|---|
| A1 | 40 | 25 | 0.10 |
| A2 | 70 | 50 | 0.18 |
| A3 | 100 | 40 | 0.27 |
I would like to know what is the most convenient way to do this?
Thank you,
CodePudding user response:
Try this using pd.IntervalIndex and loc:
import pandas as pd
df = pd.read_clipboard() # Copy input from question above
df1 = pd.read_clipboard() # Copy input from question above
df1 = df1.set_index(df1.columns[0]) #Create index with first column
#Create IntervalIndex from ranges in index and column headers
df1.columns = pd.IntervalIndex.from_arrays([0,30,55],[30,55,70])
df1.index = pd.IntervalIndex.from_arrays([0,50,75],[50,75,100])
#Test getting value by stacking columns and index to create multiIndex
df1.stack().loc[(40,25)] #returns .1
# Create list of tuples for "lookup" in df1 with zip
df['Ratio Give by grid'] = df1.stack().loc[zip(df['Sum total'], df['Sum partial'])].to_numpy()
df
Output:
ID Sum total Sum partial Ratio Give by grid
0 A1 40 25 0.10
1 A2 70 50 0.18
2 A3 100 40 0.27
