Multiple options using np.where and not just binary-CodePudding

Assuming I have grouped US states into lists:

list1 = ['TX','FL','OR']
list2 = ['AL','MA','NJ']
list3 = ['WA','PA','GA']

I have over 15 lists covering all 50 states, but for the sake of brevity, I will keep it down to 3 lists.

I have the following dataframe df:

State    abbrev     produce        goods    service   

Alabama      AL            5          10        15   
Texas        TX            7          17        27   
Florida      FL            2          73        13   
Georgia      GA            12         65        20    
New Jersey   NJ            65         26        11

I want to create a column called final_delivery, which is equal to produce if the state is in list1, goods if the state is in list2, and service if the state is in list3

State    abbrev     produce        goods    service   final_delivery  

Alabama      AL            5          10        15               10  
Texas        TX            7          17        27                7   
Florida      FL            2          73        13                2  
Georgia      GA            12         65        20               20    
New Jersey   NJ            65         26        11               26

I have tried the following:

df["final_delivery"] = np.where(df['abbrev'].isin(list1), df['produce'], "")

but it replaces the values to missing when I try to move on to filling the column based on list2 and list3.

CodePudding user response：

You could build an inverted index that points to the function that you want to execute. This easily scales to 15 lists.

Inverted Index

si = pd.Series([0,0,0,1,1,1,2,2,2], index=['TX','FL','OR','AL','MA','NJ','WA','PA','GA']).sort_index()
print(si)

AL    1
FL    0
GA    2
MA    1
NJ    1
OR    0
PA    2
TX    0
WA    2

Function Map

fmap = {
    0: lambda x: x['produce'],
    1: lambda x: x['goods'],
    2: lambda x: x['service']
}

Apply it

df.apply(lambda x: fmap[si.at[x['abbrev']]](x), axis=1)

Result

CodePudding user response：

The following helps maintain the integrity of the data frame column:

df["final_delivery"] = np.where(df['abbrev'].isin(list1), df['produce'], "")
df["final_delivery"] = np.where(df['abbrev'].isin(list1), df['goods'], df["final_delivery"])
df["final_delivery"] = np.where(df['abbrev'].isin(list1), df['service'], df["final_delivery"])

CodePudding user response：

We can create the mapping first then doing value position slice

mapping = {z : x  for x, y in zip(['produce','goods','service'], [list1,list2,list3]) for z in y}
df['final_delivery'] = df.to_numpy()[np.arange(df.shape[0]),
                                     df.columns.get_indexer(df.abbrev.map(mapping))]
df
Out[533]: 
       State abbrev  produce  goods  service final_delivery
0    Alabama     AL        5     10       15             10
1      Texas     TX        7     17       27              7
2    Florida     FL        2     73       13              2
3    Georgia     GA       12     65       20             20
4  NewJersey     NJ       65     26       11             26

CodePudding user response：

Firstly, you can create a column with np.nan

df['final_delivery'] = np.nan

Next, for the sake of simplicity and for loop, I will put your lists into a nested list.

new_list = [list1, list2, list3]

You can adjust the new_list based on the order of your column, because I was thinking of using the order of df.columns to facilitate the for loop.

for abb, col in zip(newlist,df.columns[2:-1]):
    df['final_delivery'].loc[df['abbrev'].isin(abb)] = df[col].loc[df['abbrev'].isin(abb)]

df
        State abbrev  produce  goods  service  final_delivery
0     Alabama     AL        5     10       15            10.0
1       Texas     TX        7     17       27             7.0
2     Florida     FL        2     73       13             2.0
3     Georgia     GA       12     65       20            20.0
4  New Jersey     NJ       65     26       11            26.0