Working with .startswith and is_unique in python to create new column-CodePudding

Example: I have the first three columns and need to create the new order column based on the previous ones. The new order should say no if the order number is repeated. However, it should say yes if the order number is unique or if the item starts with a "WC". The problem is that the two science courses have duplicated order number but since it starts with a WC, the value in new order column should be 'Yes'. I know i should use is_unique and .startswith to do this but i am not sure how to combine these two to get the result.

Item	Name	Order Number	New Order
Calc	Math	100	No
Algebra	Math	100	No
Geometry	Math	300	Yes
WCChem	Science	200	Yes
WCBio	Science	200	Yes

Dictionary to construct the DataFrame:

dic = {'Item': ['Calc', 'Algebra', 'Geometry', 'WCChem', 'WCBio'],
       'Name': ['Math', 'Math', 'Math', 'Science', 'Science'],
       'Order Number': [100, 100, 300, 200, 200],
       'New Order': ['No', 'No', 'Yes', 'Yes', 'Yes']}

CodePudding user response：

You can use ~duplicated (with keep=False to select all duplicates) to identify non-duplicate values and use str.startswith to identify the "Item" values that start with "WC". Now since you want a "Yes" if either of these are True, you use | operator and wrap it in a np.where to assign "Yes", "No" values:

df['New_Order'] = np.where(~df['Order_Number'].duplicated(keep=False) | df['Item'].str.startswith('WC'), 'Yes', 'No')

Output:

       Item     Name  Order Number New Order
0      Calc     Math           100        No
1   Algebra     Math           100        No
2  Geometry     Math           300       Yes
3    WCChem  Science           200       Yes
4     WCBio  Science           200       Yes