Home > Mobile >  How to apply startswith() for some conditions
How to apply startswith() for some conditions

Time:01-30

I've been searching around for a while now, but I can't seem to find the answer to this small problem.

I have this code to make a function find pattern string and mapping into new column:

import pandas as pd
import numpy as np

df1 = {'Name':['Tom', 'nick', 'krish', 'jack','elo'],
    'sample1':['KOTA Arizona AZ','Georgia GG','Newyork NY','KOTA Indiana IN','Florida FL'],
    'sample2':['malang','kaltim','KEC','jepara','sragen'],
}

df1 = pd.DataFrame(df1)

and I'm trying to apply the function make the output

df1['output'] = 'KAB'
    for i in df1.index:
        if (str(df1['sample1'][i]).startswith('KOTA')) or (df1['sample2'][i] == 'KEC'):
            df1['output'][i] = 'KOTA'

But I am actually expecting this output with the simple code

    Name      sample1         sample2          output
0   Tom    KOTA Arizona AZ    malang            KOTA
1   nick   Georgia GG         kaltim            KAB
2   krish  Newyork NY         KEC jakarta       KOTA
3   jack   KOTA Indiana IN    jepara            KOTA
4   elo    Florida FL         sragen            KAB

maybe there is another simpler way without looping? maybe apply/lambda? because the loop can make the computation a bit long

CodePudding user response:

Let us try np.where

cond1 = df1['sample1'].str.startswith('KOTA')
cond2 = df1['sample2'] == 'KEC'

df['new'] = np.where(cond1 | cond2, 'KOTA', 'KAB')

CodePudding user response:

import pandas as pd

df1 = {'Name':['Tom', 'nick', 'krish', 'jack','elo'],
    'sample1':['KOTA Arizona AZ','Georgia GG','Newyork NY','KOTA Indiana IN','Florida FL'],
    'sample2':['malang','kaltim','KEC','jepara','sragen'],
}

df1 = pd.DataFrame(df1)
df1['output'] = df1.apply(lambda x: 'KOTA' if (str(x['sample1']).startswith('KOTA')) or (x['sample2'] == 'KEC') else 'KAB', axis=1)

print(df1)

output:

      Name          sample1 sample2 output
0    Tom  KOTA Arizona AZ  malang   KOTA
1   nick       Georgia GG  kaltim    KAB
2  krish       Newyork NY     KEC   KOTA
3   jack  KOTA Indiana IN  jepara   KOTA
4    elo       Florida FL  sragen    KAB

CodePudding user response:

You don't need a loop here:

df1['output'] = (
    (df1['sample1'].str.startswith('KOTA') | df1['sample2'].str.startswith('KEC'))
        .replace({True: 'KOTA', False: 'KAB'})
)
print(df1)

# Output
    Name          sample1 sample2 output
0    Tom  KOTA Arizona AZ  malang   KOTA
1   nick       Georgia GG  kaltim    KAB
2  krish       Newyork NY     KEC   KOTA
3   jack  KOTA Indiana IN  jepara   KOTA
4    elo       Florida FL  sragen    KAB
  •  Tags:  
  • Related