Given df1 and df2:
df1:
| id | item | sub_item |
|---|---|---|
| 1 | 52 | AAA |
| 2 | 52 | AAA |
| 3 | 55 | BAA |
| 4 | 77 | CAA |
| 5 | 77 | CAA |
| 6 | 77 | CAA |
| 7 | 77 | CAB |
| 8 | 77 | CAC |
df2:
| id | color |
|---|---|
| 1 | #11 |
| 2 | #12 |
| 3 | #13 |
| 4 | #14 |
| 5 | #17 |
| 6 | #18 |
| 7 | #19 |
| 8 | #20 |
| 9 | #21 |
| 10 | #25 |
produce df_result:
| id | item | sub_item | color |
|---|---|---|---|
| 1 | 52 | AAA | #11 |
| 2 | 52 | AAA | #11 |
| 3 | 55 | BAA | #11 |
| 4 | 77 | CAA | #11 |
| 5 | 77 | CAA | #11 |
| 6 | 77 | CAA | #11 |
| 7 | 77 | CAB | #12 |
| 8 | 77 | CAC | #13 |
id is unimportant. Each new sub-item, within an item group, gets new serial color number. Colors start re-assignment back to #11 for each new item. df2 will always be long enough to cover unique sub-items within an item group.
My intuition says to use groupby on df1, then merge each group with df2, but I need help with the code to make this happen.
CodePudding user response:
I'm not sure, but I think that's what you want
merged_df = pd.merge(
df1,df2, how="left", on="Id"
)
CodePudding user response:
Try:
df1["color"] = (df2.merge(df1["sub-item"].ne(df1.groupby("item")["sub-item"].transform("first"))
.cumsum()
.add(1)
.rename("id"),
how="right")["color"]
)
>>> df1
id item sub-item color
0 1 52 AAA #11
1 2 52 AAA #11
2 3 55 BAA #11
3 4 77 CAA #11
4 5 77 CAA #11
5 6 77 CAA #11
6 7 77 CAB #12
7 8 77 CAC #13
