Example: I have the first three columns and need to create the new order column based on the previous ones. The new order should say no if the order number is repeated. However, it should say yes if the order number is unique or if the item starts with a "WC". The problem is that the two science courses have duplicated order number but since it starts with a WC, the value in new order column should be 'Yes'. I know i should use is_unique and .startswith to do this but i am not sure how to combine these two to get the result.
| Item | Name | Order Number | New Order |
|---|---|---|---|
| Calc | Math | 100 | No |
| Algebra | Math | 100 | No |
| Geometry | Math | 300 | Yes |
| WCChem | Science | 200 | Yes |
| WCBio | Science | 200 | Yes |
Dictionary to construct the DataFrame:
dic = {'Item': ['Calc', 'Algebra', 'Geometry', 'WCChem', 'WCBio'],
'Name': ['Math', 'Math', 'Math', 'Science', 'Science'],
'Order Number': [100, 100, 300, 200, 200],
'New Order': ['No', 'No', 'Yes', 'Yes', 'Yes']}
CodePudding user response:
You can use ~duplicated (with keep=False to select all duplicates) to identify non-duplicate values and use str.startswith to identify the "Item" values that start with "WC". Now since you want a "Yes" if either of these are True, you use | operator and wrap it in a np.where to assign "Yes", "No" values:
df['New_Order'] = np.where(~df['Order_Number'].duplicated(keep=False) | df['Item'].str.startswith('WC'), 'Yes', 'No')
Output:
Item Name Order Number New Order
0 Calc Math 100 No
1 Algebra Math 100 No
2 Geometry Math 300 Yes
3 WCChem Science 200 Yes
4 WCBio Science 200 Yes
