So i have a dataset with a few columns. However, one column is made out of locations, in this case street names. I would like to rename those to just one letter. So for example:
| Location | Location new |
|---|---|
| Beukenlaan | A |
| Deventerlaan | B |
| Crixstraat | C |
| Deventerlaan | B |
I have to do this for multiple datasheets so the function below would take a lot of time to adjust manually.
df.replace('Eindhoven Genovevalaan', 'A', regex=True, inplace=True)
Any ideas how to do this automatically?
CodePudding user response:
You could use factorize and string.ascii_uppercase:
from string import ascii_uppercase
import numpy as np
d = dict(enumerate(ascii_uppercase))
df['Location new'] = pd.Series(df['Location'].factorize()[0]).map(d)
NB. there are only 26 values in ascii_uppercase ('ABCDEFGHIJKLMNOPQRSTUVWXYZ') so you need to specify what should happen if you have more than 26 rows.
output:
Location Location new
0 Beukenlaan A
1 Deventerlaan B
2 Crixstraat C
3 Deventerlaan B
