Assuming I have grouped US states into lists:
list1 = ['TX','FL','OR']
list2 = ['AL','MA','NJ']
list3 = ['WA','PA','GA']
I have over 15 lists covering all 50 states, but for the sake of brevity, I will keep it down to 3 lists.
I have the following dataframe df:
State abbrev produce goods service
Alabama AL 5 10 15
Texas TX 7 17 27
Florida FL 2 73 13
Georgia GA 12 65 20
New Jersey NJ 65 26 11
I want to create a column called final_delivery, which is equal to produce if the state is in list1, goods if the state is in list2, and service if the state is in list3
State abbrev produce goods service final_delivery
Alabama AL 5 10 15 10
Texas TX 7 17 27 7
Florida FL 2 73 13 2
Georgia GA 12 65 20 20
New Jersey NJ 65 26 11 26
I have tried the following:
df["final_delivery"] = np.where(df['abbrev'].isin(list1), df['produce'], "")
but it replaces the values to missing when I try to move on to filling the column based on list2 and list3.
CodePudding user response:
You could build an inverted index that points to the function that you want to execute. This easily scales to 15 lists.
Inverted Index
si = pd.Series([0,0,0,1,1,1,2,2,2], index=['TX','FL','OR','AL','MA','NJ','WA','PA','GA']).sort_index()
print(si)
AL 1
FL 0
GA 2
MA 1
NJ 1
OR 0
PA 2
TX 0
WA 2
Function Map
fmap = {
0: lambda x: x['produce'],
1: lambda x: x['goods'],
2: lambda x: x['service']
}
Apply it
df.apply(lambda x: fmap[si.at[x['abbrev']]](x), axis=1)
Result
0 10
1 7
2 2
3 20
4 26
CodePudding user response:
The following helps maintain the integrity of the data frame column:
df["final_delivery"] = np.where(df['abbrev'].isin(list1), df['produce'], "")
df["final_delivery"] = np.where(df['abbrev'].isin(list1), df['goods'], df["final_delivery"])
df["final_delivery"] = np.where(df['abbrev'].isin(list1), df['service'], df["final_delivery"])
CodePudding user response:
We can create the mapping first then doing value position slice
mapping = {z : x for x, y in zip(['produce','goods','service'], [list1,list2,list3]) for z in y}
df['final_delivery'] = df.to_numpy()[np.arange(df.shape[0]),
df.columns.get_indexer(df.abbrev.map(mapping))]
df
Out[533]:
State abbrev produce goods service final_delivery
0 Alabama AL 5 10 15 10
1 Texas TX 7 17 27 7
2 Florida FL 2 73 13 2
3 Georgia GA 12 65 20 20
4 NewJersey NJ 65 26 11 26
CodePudding user response:
Firstly, you can create a column with np.nan
df['final_delivery'] = np.nan
Next, for the sake of simplicity and for loop, I will put your lists into a nested list.
new_list = [list1, list2, list3]
You can adjust the new_list based on the order of your column, because I was thinking of using the order of df.columns to facilitate the for loop.
for abb, col in zip(newlist,df.columns[2:-1]):
df['final_delivery'].loc[df['abbrev'].isin(abb)] = df[col].loc[df['abbrev'].isin(abb)]
df
State abbrev produce goods service final_delivery
0 Alabama AL 5 10 15 10.0
1 Texas TX 7 17 27 7.0
2 Florida FL 2 73 13 2.0
3 Georgia GA 12 65 20 20.0
4 New Jersey NJ 65 26 11 26.0
