Python: how to mapping label into new column based from another column-CodePudding

I've been searching around for a while now, but I can't seem to find the answer to this small problem.

I have this code to make a function for match data between from df_mall and df in column sample2, and the result show in new columns, if the data in the sample2/note column is in the data df_mall, then the result column will show label 1, otherwise 0.

mall_list = ['AMBON',
            'BANDUNG', 'BEKASI', 'BOGOR',
             'CIREBON',
             'DENPASAR',
             'GARUT',
             'JAKARTA',
             'KARAWANG', 'KUDUS',
             'MATARAM',
             'PALEMBANG',  
             'SAMARINDA', 'SURABAYA']
df_mall = pd.DataFrame(mall_list) 

df = {'Name':['al', 'el', 'naila', 'dori','jlo'],
    'living':['Alvando','Georgia GG','Newyork NY','Indiana IN','Florida FL'],
    'sample2':['BOGOR','GARUT','AMBON','WONOSOBO','SRAGEN'],
    'note':['KOTA','KAB','KOTA','WILAYAH','DAERAH']
}

df = pd.DataFrame(df)

and I'm trying to make process, but didn't works

df['MALL_RESULT'] = 0
df = df.reset_index()
df.drop(['index'], axis=1, inplace=True)

for keys, i in enumerate(df.sample2):
    index = keys
    if i in (df_mall):
        df.loc[df.index == index, 'MALL'] = 1

df.loc[df.note == 'DAERAH', 'MALL'] = 1
df = df.reset_index()

But I am actually expecting this output with the simple code

  index Name     living     sample2    note    MALL_RESULT
0   0   al      Alvando      BOGOR     KOTA         1
1   1   el      Georgia GG   GARUT     KAB          1
2   2   naila   Newyork NY   AMBON     KOTA         1
3   3   dori    Indiana IN   WONOSOBO  WILAYAH      0
4   4   jlo     Florida FL   SRAGEN    DAERAH       1

CodePudding user response：

Unless there are constraints not given in the question you don't need to turn mall_list into a dataframe. Using list comprehension you can create the dataframe column you're seeking.

df['MALL_RESULT'] = [1 if sample in mall_list or item in mall_list else 0 for sample, item in zip(df['sample2'], df['note'])]

If you do need mall_list to become a dataframe you can use the same logic

df['MALL_RESULT'] = [1 if sample in list(df_mall[0]) or item in list(df_mall[0]) else 0 for sample, item in zip(df['sample2'], df['note'])]

CodePudding user response：

Use:

res = []
for i, row in df.iterrows():
    if row['sample2'] in list(df_mall[0]) or row['note'] in list(df_mall[0]):
        res.append(1)
    else:
        res.append(0)
df['result'] = res

CodePudding user response：

Here's one way to get the desired outcome:

Add "DAERAH" to mall_list and convert the resulting list into a set mall_set.

Then stack columns "sample2" and "note" and apply a lambda that checks if each word is not mall_set. Then groupby the index and use any to check if any word each row exists in mall_set. Finally, convert the boolean Series to 1,0 integer Series.

mall_set = set(mall_list   ['DAERAH'])
df['MALL_RESULT'] = df[['sample2', 'note']].stack().apply(lambda x: x in mall_set).groupby(level=0).any().astype(int)

Output:

    Name      living   sample2     note  MALL_RESULT
0     al     Alvando     BOGOR     KOTA            1
1     el  Georgia GG     GARUT      KAB            1
2  naila  Newyork NY     AMBON     KOTA            1
3   dori  Indiana IN  WONOSOBO  WILAYAH            0
4    jlo  Florida FL    SRAGEN   DAERAH            1