iterrows to compare values and add a new column-CodePudding

I have a dataframe with one (column B) of the columns containing a list of value. For each row, I want to compare the value in this list column with a reference list. If the value in B contains the same value in the reference, I want to return Yes to a third column (column result); if the value in B contains different values in reference, I want to return No. Moreover, the items in the column list are not ordered.

The input df is here, the dtype of B is an object.

B 
1st, 2nd, 3rd
2nd, 4th, 5th
2nd, 1st, 3rd
4th, 5th, 6th

Compare with the reference list [1st, 2nd, 3rd]. the expected output

B                      Results
1st, 2nd, 3rd          Yes
2nd, 4th, 5th          No
2nd, 1st, 3rd          Yes
4th, 5th, 6th          No

I tried the code below, but they didn't add a new column for me:

for index, row in df[['B']].iterrows():
    if set(row['B']) == set(['1st', '2nd', '3rd']):    
        row['Results'] = "Yes"
    else:
        row['Results'] = "No"

Or is there any easier way to achieve this? E.g., not looping by rows?

CodePudding user response：

Since each row in "B" is a list but you want to compare it with a set, you'll inevitably have to iterate over rows to at least convert each list to a set imo.

One options is to convert each list in "B" to a set and then compare with ref_set directly to get a boolean Series. Finally use np.where to assign "yes", "no" values.

ref_set = set(['1st', '2nd', '3rd'])
df['results'] = np.where(df['B'].apply(set) == ref_set, 'yes', 'no')

another options is to use a list comprehension:

df['results'] = ['yes' if set(x)==ref_set else 'no' for x in df['B']]

Output:

                 B results
0  [1st, 2nd, 3rd]     yes
1  [2nd, 4th, 5th]      no
2  [2nd, 1st, 3rd]     yes
3  [4th, 5th, 6th]      no

CodePudding user response：

This should do what you want.

base = set(['1st','2nd','3rd'])
row['results'] = row['B'].apply( lambda k: 'Yes' if set(k) == base else 'No' )

CodePudding user response：

df['Results']=df['B'].str.split(',').map(set).apply(lambda x: len(x.difference(lst))).eq(0)

Or if you must denote by Yes, No

df['Results']=np.where(df['B'].str.split(',').map(set).apply(lambda x: len(x.difference(lst))).eq(0),'Yes','No')