I have a dataframe with one (column B) of the columns containing a list of value. For each row, I want to compare the value in this list column with a reference list. If the value in B contains the same value in the reference, I want to return Yes to a third column (column result); if the value in B contains different values in reference, I want to return No. Moreover, the items in the column list are not ordered.
The input df is here, the dtype of B is an object.
B
1st, 2nd, 3rd
2nd, 4th, 5th
2nd, 1st, 3rd
4th, 5th, 6th
Compare with the reference list [1st, 2nd, 3rd]. the expected output
B Results
1st, 2nd, 3rd Yes
2nd, 4th, 5th No
2nd, 1st, 3rd Yes
4th, 5th, 6th No
I tried the code below, but they didn't add a new column for me:
for index, row in df[['B']].iterrows():
if set(row['B']) == set(['1st', '2nd', '3rd']):
row['Results'] = "Yes"
else:
row['Results'] = "No"
Or is there any easier way to achieve this? E.g., not looping by rows?
CodePudding user response:
Since each row in "B" is a list but you want to compare it with a set, you'll inevitably have to iterate over rows to at least convert each list to a set imo.
One options is to convert each list in "B" to a set and then compare with ref_set directly to get a boolean Series. Finally use np.where to assign "yes", "no" values.
ref_set = set(['1st', '2nd', '3rd'])
df['results'] = np.where(df['B'].apply(set) == ref_set, 'yes', 'no')
another options is to use a list comprehension:
df['results'] = ['yes' if set(x)==ref_set else 'no' for x in df['B']]
Output:
B results
0 [1st, 2nd, 3rd] yes
1 [2nd, 4th, 5th] no
2 [2nd, 1st, 3rd] yes
3 [4th, 5th, 6th] no
CodePudding user response:
This should do what you want.
base = set(['1st','2nd','3rd'])
row['results'] = row['B'].apply( lambda k: 'Yes' if set(k) == base else 'No' )
CodePudding user response:
df['Results']=df['B'].str.split(',').map(set).apply(lambda x: len(x.difference(lst))).eq(0)
Or if you must denote by Yes, No
df['Results']=np.where(df['B'].str.split(',').map(set).apply(lambda x: len(x.difference(lst))).eq(0),'Yes','No')
