I have a column filled with a string value:
| col_1 |
|---|
| 10500 |
| 25020 |
| 35640 |
| 45440 |
| 50454 |
| 62150 |
| 75410 |
I want to be able to create two other columns with strings values that have been splitted from the first. Also I want an efficient way to do that.
Supposed result :
| col_1 | col_2 | col_3 |
|---|---|---|
| 10500 | 10 | 500 |
| 25020 | 25 | 020 |
| 35640 | 35 | 640 |
| 45440 | 45 | 440 |
| 50454 | 50 | 454 |
| 62150 | 62 | 150 |
| 75410 | 75 | 410 |
So far I was trying to go with vectorization, but hasn't been able to implement it yet.
For the split part, I parse the row (with iterows, and I know that iterows has to be avoid as much as possible.) and create a list that can be used to populate the new tabs, but in my opinion this way is a too archaic.
Also, how can I efficiently, modify each cell ? Like adding a comma, or operating on them ?
Thank you.
CodePudding user response:
Use str accessor:
df = df.join(df['col_1'].astype(str).str.extract('(?P<col_2>\d{2})(?P<col_3>\d{3})'))
print(df)
# Output:
col_1 col_2 col_3
0 10500 10 500
1 25020 25 020
2 35640 35 640
3 45440 45 440
4 50454 50 454
5 62150 62 150
6 75410 75 410
Or simple in few steps:
df['col_1'] = df['col_1'].astype(str)
df['col_2'] = df['col_1'].str[:2]
df['col_3'] = df['col_1'].str[2:]
print(df)
# Output
col_1 col_2 col_3
0 10500 10 500
1 25020 25 020
2 35640 35 640
3 45440 45 440
4 50454 50 454
5 62150 62 150
6 75410 75 410
Another example:
df['col_1'] = df['col_1'].astype(str)
df['col_4'] = df['col_1'].str[:2] '-' df['col_1'].str[2:]
print(df)
# Output
col_1 col_4
0 10500 10-500
1 25020 25-020
2 35640 35-640
3 45440 45-440
4 50454 50-454
5 62150 62-150
6 75410 75-410
