I've been searching around for a while now, but I can't seem to find the answer to this small problem.
I have this code to make a function for match data between from df_mall and df in column sample2, and the result show in new columns, if the data in the sample2/note column is in the data df_mall, then the result column will show label 1, otherwise 0.
mall_list = ['AMBON',
'BANDUNG', 'BEKASI', 'BOGOR',
'CIREBON',
'DENPASAR',
'GARUT',
'JAKARTA',
'KARAWANG', 'KUDUS',
'MATARAM',
'PALEMBANG',
'SAMARINDA', 'SURABAYA']
df_mall = pd.DataFrame(mall_list)
df = {'Name':['al', 'el', 'naila', 'dori','jlo'],
'living':['Alvando','Georgia GG','Newyork NY','Indiana IN','Florida FL'],
'sample2':['BOGOR','GARUT','AMBON','WONOSOBO','SRAGEN'],
'note':['KOTA','KAB','KOTA','WILAYAH','DAERAH']
}
df = pd.DataFrame(df)
and I'm trying to make process, but didn't works
df['MALL_RESULT'] = 0
df = df.reset_index()
df.drop(['index'], axis=1, inplace=True)
for keys, i in enumerate(df.sample2):
index = keys
if i in (df_mall):
df.loc[df.index == index, 'MALL'] = 1
df.loc[df.note == 'DAERAH', 'MALL'] = 1
df = df.reset_index()
But I am actually expecting this output with the simple code
index Name living sample2 note MALL_RESULT
0 0 al Alvando BOGOR KOTA 1
1 1 el Georgia GG GARUT KAB 1
2 2 naila Newyork NY AMBON KOTA 1
3 3 dori Indiana IN WONOSOBO WILAYAH 0
4 4 jlo Florida FL SRAGEN DAERAH 1
CodePudding user response:
Unless there are constraints not given in the question you don't need to turn mall_list into a dataframe. Using list comprehension you can create the dataframe column you're seeking.
df['MALL_RESULT'] = [1 if sample in mall_list or item in mall_list else 0 for sample, item in zip(df['sample2'], df['note'])]
If you do need mall_list to become a dataframe you can use the same logic
df['MALL_RESULT'] = [1 if sample in list(df_mall[0]) or item in list(df_mall[0]) else 0 for sample, item in zip(df['sample2'], df['note'])]
CodePudding user response:
Use:
res = []
for i, row in df.iterrows():
if row['sample2'] in list(df_mall[0]) or row['note'] in list(df_mall[0]):
res.append(1)
else:
res.append(0)
df['result'] = res
CodePudding user response:
Here's one way to get the desired outcome:
Add "DAERAH" to mall_list and convert the resulting list into a set mall_set.
Then stack columns "sample2" and "note" and apply a lambda that checks if each word is not mall_set. Then groupby the index and use any to check if any word each row exists in mall_set. Finally, convert the boolean Series to 1,0 integer Series.
mall_set = set(mall_list ['DAERAH'])
df['MALL_RESULT'] = df[['sample2', 'note']].stack().apply(lambda x: x in mall_set).groupby(level=0).any().astype(int)
Output:
Name living sample2 note MALL_RESULT
0 al Alvando BOGOR KOTA 1
1 el Georgia GG GARUT KAB 1
2 naila Newyork NY AMBON KOTA 1
3 dori Indiana IN WONOSOBO WILAYAH 0
4 jlo Florida FL SRAGEN DAERAH 1
