I am looking for a way to write this code consisely. It's for replacing certain characters in a Pandas DataFrame column.
df['age'] = ['[70-80)' '[50-60)' '[60-70)' '[40-50)' '[80-90)' '[90-100)']
df['age'] = df['age'].str.replace('[', '')
df['age'] = df['age'].str.replace(')', '')
df['age'] = df['age'].str.replace('50-60', '50-59')
df['age'] = df['age'].str.replace('60-70', '60-69')
df['age'] = df['age'].str.replace('70-80', '70-79')
df['age'] = df['age'].str.replace('80-90', '80-89')
df['age'] = df['age'].str.replace('90-100', '90-99')
I tried this, but it didn't work, strings in df['age'] were not replaced:
chars_to_replace = {
'[' : '',
')' : '',
'50-60' : '50-59',
'60-70' : '60-69',
'70-80' : '70-79',
'80-90' : '80-89',
'90-100': '90-99'
}
for key in chars_to_replace.keys():
df['age'] = df['age'].replace(key, chars_to_replace[key])
CodePudding user response:
Use two passes of regex substitution.
In the first pass, match each pair of numbers separated by -, and decrement the second number.
In the second pass, remove any occurrences of [ and ).
By the way, did you mean to have spaces between each pair of numbers? Because as it is now, implicit string concatenation puts them together without spaces.
import re
string = '[70-80)' '[50-60)' '[60-70)' '[40-50)' '[80-90)' '[90-100)'
def repl(m: re.Match):
age1 = m.group(1)
age2 = int(m.group(2)) - 1
return f"{age1}-{age2}"
string = re.sub(r'(\d )-(\d )', repl, string)
string = re.sub(r'\[|\)', '', string)
print(string) # 70-7950-5960-6940-4980-8990-99
The repl function above can be condensed into a lambda:
repl = lambda m: f"{m.group(1)}-{int(m.group(2))-1}"
Update: Actually, this can be done in one pass.
import re
string = '[70-80)' '[50-60)' '[60-70)' '[40-50)' '[80-90)' '[90-100)'
repl = lambda m: f"{m.group(1)}-{int(m.group(2))-1}"
string = re.sub(r'\[(\d )-(\d )\)', repl, string)
print(string) # 70-7950-5960-6940-4980-8990-99
CodePudding user response:
In addition to previous response, if you want to apply the regex substitution to your dataframe, you can use the apply method from pandas. To do so, you need to put the regex substitution into a function, then use the apply method:
def replace_chars(chars):
string = re.sub(r'(\d )-(\d )', repl, chars)
string = re.sub(r'\[|\)', ' ', string)
return string
df['age'] = df['age'].apply(replace_chars)
print(df)
which gives the following output:
age
0 70-79 50-59 60-69 40-49 80-89 90-99
By the way, here I put spaces between the ages intervals. Hope this helps.
CodePudding user response:
Assuming these brackets are on all of the entries, you can slice them off and then replace the range strings. From the docs, pandas.Series.replace, pandas will replace the values from the dict without the need for you to loop.
import pandas as pd
df = pd.DataFrame({
"age":['[70-80)', '[50-60)', '[60-70)', '[40-50)', '[80-90)', '[90-100)']})
ranges_to_replace = {
'50-60' : '50-59',
'60-70' : '60-69',
'70-80' : '70-79',
'80-90' : '80-89',
'90-100': '90-99'}
df['age'] = df['age'].str.slice(1,-1).replace(ranges_to_replace)
print(df)
Output
age
0 70-79
1 50-59
2 60-69
3 40-50
4 80-89
5 90-99
CodePudding user response:
change the last part to this
for i in range(len(df['age'])):
for x in chars_to_replace:
df['age'].iloc[i]=df['age'].iloc[i].replace(x,chars_to_replace[x])
