Home > Software design >  Create new column by mapping to previous values in dataset using Python
Create new column by mapping to previous values in dataset using Python

Time:02-01

I have a dataset, df, where I would like to form a new column by taking first letter of column values and mapping values from additional columns within the dataset.

Data

id  Date    Model   type    status   
aa  Q1 2022 orion   hi01    o_hi    
aa  Q1 2022 orion   hi02    o_hi    
aa  Q1 2022 orion   hello01 o_hello 
aa  Q1 2022 orion   go01    o_go    
bb  Q1 2022 orion   hi01    o_hi    
bb  Q1 2022 orion   hi02    o_hi    
bb  Q1 2022 orion   hi03    o_hi    

                    

Desired

id  Date    Model   type    status  NEW
aa  Q1 2022 orion   hi01    o_hi    aa o_hi01 Q1 2022
aa  Q1 2022 orion   hi02    o_hi    aa o_hi02 Q1 2022
aa  Q1 2022 orion   hello01 o_hello aa o_hello01 Q1 2022
aa  Q1 2022 orion   go01    o_go    aa o_go01 Q1 2022
bb  Q1 2022 orion   hi01    o_hi    bb o_hi01 Q1 2022
bb  Q1 2022 vorion  hi02    v_hi    bb v_hi02 Q1 2022
bb  Q1 2022 vorion  hi03    v_hi    bb v_hi03 Q1 2022

Doing

df['NEW'] = df['id'].map(str)   ' '   df['Model'].astype(str).str[0]   df['type'].map(str)   ' '   df['Date'].map(str)

However, I am troubleshooting on how to incorporate the first letter of column values within the newly formed column.

Believe I can incorporate .loc

 df['Model'].astype(str).str[0]

Any suggestion is appreciated.

CodePudding user response:

You don't need to use .map(str) because all of your fields are already strings. If needed use df[column].astype(str) instead of df[column].map(str).

df['New'] = df['id']   ' '   df['Model'].str[0]   '_'   df['type']   ' '   df['Date']
print(df)

# Output
   id     Date   Model     type   status                   New
0  aa  Q1 2022   orion     hi01     o_hi     aa o_hi01 Q1 2022
1  aa  Q1 2022   orion     hi02     o_hi     aa o_hi02 Q1 2022
2  aa  Q1 2022   orion  hello01  o_hello  aa o_hello01 Q1 2022
3  aa  Q1 2022   orion     go01     o_go     aa o_go01 Q1 2022
4  bb  Q1 2022   orion     hi01     o_hi     bb o_hi01 Q1 2022
5  bb  Q1 2022  vorion     hi02     o_hi     bb v_hi02 Q1 2022
6  bb  Q1 2022  vorion     hi03     o_hi     bb v_hi03 Q1 2022

CodePudding user response:

Try this:

df['NEW'] = df['id'].map(str)   ' '   df['status'].astype(str)   df.groupby(['id', 'status']).cumcount().add(1).astype(str).str.zfill(2)   ' '   df['Date']

Output:

>>> df
   id     Date     type   status                   NEW
0  aa  Q1 2022     hi01     o_hi     aa o_hi01 Q1 2022
1  aa  Q1 2022     hi02     o_hi     aa o_hi02 Q1 2022
2  aa  Q1 2022  hello01  o_hello  aa o_hello01 Q1 2022
3  aa  Q1 2022     go01     o_go     aa o_go01 Q1 2022
4  bb  Q1 2022     hi01     o_hi     bb o_hi01 Q1 2022
5  bb  Q1 2022     hi02     o_hi     bb o_hi02 Q1 2022
6  bb  Q1 2022     hi03     o_hi     bb o_hi03 Q1 2022

CodePudding user response:

df['NEW'] = df['id'].map(str)   ' '   df['Model'].astype(str).str[0]   '_'   df['type'].map(str)   ' '   df['Date'].map(str)
  •  Tags:  
  • Related