I have the following data frame, there are other categories besides cat1 but I only want to make a change to the part of the dataframe where the category is cat1.
name short code category
thyrax thx thxar.po cat1
gha gh gh.cd cat1
play pl pl.v cat1
xxdx xd xda.v cat1
......
For cases where everything after the . in the code column is anything besides .cd, I want the short column to take what's in the short column what is after the . in the code column and become something like thx.po, but in cases where there is a cd, I want it to become .cn. I want the output to look like this,
name short code category
thyrax thx.po thxar.po cat1
gha gh.cn gh.cd cat1
play pl.v pl.v cat1
xxdx xd.v xda.v cat1
......
I don't know how to add a condition that IF the category is cat1 and the code after the . is .cd, make it into short what's after the . cn.
I want the same condition for everything unless what is after the . to be copied, but if what is after the . is a .cn, I want it to be a .cd. What is the best way to do it?
I have gotten this code so far,
df['short'] = (df['short'].add("." df['code'].str.split(".").str[-1]).where(df['category'].eq("cat1"),df['short']))
But I can't figure out how to add the condition where if in the code column, what appears after the . is .cd and to have something different happen.
So basically my conditions are this,
firstly, category must be cat1,
then take what is in the short column and merge it with what is after the . in the code column.
If what is after the . in the code column is cd, make it into cn.
CodePudding user response:
Try:
mask = df['category'] == 'cat1'
df.loc[mask, 'short'] = '.' df.loc[mask, 'code'] \
.str.split('.').str[1].replace({'cd': 'cn'})
print(df)
# Output:
name short code category
0 thyrax thx.po thxar.po cat1
1 gha gh.cd gh.cd cat1
2 play pl.v pl.v cat1
3 xxdx xd.v xda.v cat1
CodePudding user response:
Try this:
df['short'] = df['short'].astype(str) np.where(df['category'].eq('cat1'), df['code'].astype(str).str.extract('(\.. )')[0].replace('.cd', '.cn'), '')
Output:
>>> df
name short code category
0 thyrax thx.po thxar.po cat1
1 gha gh gh.cd cat2
2 gha gh.cn gh.cd cat1
3 play pl.v pl.v cat1
4 xxdx xd xda.v cat2
5 xxdx xd xda.v cat2
6 xxdx xd.v xda.v cat1
(I added some dummy rows in the above data to demonstrate that it works with the right category, cat1.)
CodePudding user response:
def custom_apply_function(row):
if row['category'] != 'cat1':
return row.short
code_after_dot = row.code.split('.')[1]
if code_after_dot == 'cd':
code_after_dot = 'cn'
new_short = row.short '.' code_after_dot
return new_short
df.apply(axis=1, func=custom_apply_function)
returns
0 thx.po
1 gh.cn
2 pl.v
3 xd.v
Other answers in this thread are certainly faster due to vectorisation. In case you'd like to complicate the conditions further though, I would go with the more explicit version.
