Factor Dataframe:
| Code | Min | Max | Factor |
|---|---|---|---|
| A | 0 | 100 | 1 |
| A | 101 | 300 | 2 |
| A | 301 | 800 | 3 |
| A | 800 | NaN | 4 |
| B | 0 | 100 | 2 |
| B | 101 | 300 | 4 |
| B | 301 | 800 | 6 |
| B | 800 | NaN | 8 |
The table above is used to return a factor based on two values, the 'Code' and a specific number. A specific value is provided, such as 5, 302, 253, 8000 and if that value falls between the min and max, the factor is returned.
For example:
- A, 355 = 3
- B, 9000 = 8
I have a separate table with several thousand lines with values which need to have the factor assigned.
Dataframe:
| Item | Code | Value |
|---|---|---|
| 001 | A | 1 |
| 002 | A | 322 |
| 003 | B | 21 |
| 004 | A | 342 |
| 005 | A | 32 |
| 006 | B | 7666 |
| 007 | B | 10000 |
| 008 | B | 86 |
How do you use the pandas tool set to return the desired factor and append the relevant factor as a new column?
CodePudding user response:
Use left join in DataFrame.merge with replace missing value to np.inf and then filter in Series.between and possible missing values in Min, if no match in boolean indexing:
print (df1)
Code Val
0 A 355
1 B 9000
2 C 5
print (df2)
Code Min Max Factor
0 A 0 100.0 1
1 A 101 300.0 2
2 A 301 800.0 3
3 A 800 NaN 4
4 B 0 100.0 2
5 B 101 300.0 4
6 B 301 800.0 6
7 B 800 NaN 8
df = df1.merge(df2.fillna({'Max':np.inf}), how='left', on='Code')
df3 = (df[df['Val'].between(df['Min'], df['Max']) | df['Min'].isna()]
.drop(['Min','Max'], axis=1))
print (df3)
Code Val Factor
2 A 355 3.0
7 B 9000 8.0
8 C 5 NaN
