Can I use a dictionary in Python to replace multiple characters?-CodePudding

I am looking for a way to write this code consisely. It's for replacing certain characters in a Pandas DataFrame column.

df['age'] = ['[70-80)' '[50-60)' '[60-70)' '[40-50)' '[80-90)' '[90-100)']

df['age'] = df['age'].str.replace('[', '')
df['age'] = df['age'].str.replace(')', '')
df['age'] = df['age'].str.replace('50-60', '50-59')
df['age'] = df['age'].str.replace('60-70', '60-69')
df['age'] = df['age'].str.replace('70-80', '70-79')
df['age'] = df['age'].str.replace('80-90', '80-89')
df['age'] = df['age'].str.replace('90-100', '90-99')

I tried this, but it didn't work, strings in df['age'] were not replaced:

chars_to_replace = {
    '[' : '',
    ')' : '',
    '50-60' : '50-59',
    '60-70' : '60-69',
    '70-80' : '70-79',
    '80-90' : '80-89',
    '90-100': '90-99'
                  }

for key in chars_to_replace.keys():
    df['age'] = df['age'].replace(key, chars_to_replace[key])

CodePudding user response：

Use two passes of regex substitution.

In the first pass, match each pair of numbers separated by -, and decrement the second number.

In the second pass, remove any occurrences of [ and ).

By the way, did you mean to have spaces between each pair of numbers? Because as it is now, implicit string concatenation puts them together without spaces.

import re

string = '[70-80)' '[50-60)' '[60-70)' '[40-50)' '[80-90)' '[90-100)'

def repl(m: re.Match):
    age1 = m.group(1)
    age2 = int(m.group(2)) - 1
    return f"{age1}-{age2}"

string = re.sub(r'(\d )-(\d )', repl, string)
string = re.sub(r'\[|\)', '', string)

print(string)  # 70-7950-5960-6940-4980-8990-99

The repl function above can be condensed into a lambda:

repl = lambda m: f"{m.group(1)}-{int(m.group(2))-1}"

Update: Actually, this can be done in one pass.

import re

string = '[70-80)' '[50-60)' '[60-70)' '[40-50)' '[80-90)' '[90-100)'

repl = lambda m: f"{m.group(1)}-{int(m.group(2))-1}"

string = re.sub(r'\[(\d )-(\d )\)', repl, string)

print(string)  # 70-7950-5960-6940-4980-8990-99

CodePudding user response：

In addition to previous response, if you want to apply the regex substitution to your dataframe, you can use the apply method from pandas. To do so, you need to put the regex substitution into a function, then use the apply method:

def replace_chars(chars):
    string = re.sub(r'(\d )-(\d )', repl, chars)
    string = re.sub(r'\[|\)', ' ', string)
    return string
    
df['age'] = df['age'].apply(replace_chars)

print(df)

which gives the following output:

                                          age
0   70-79  50-59  60-69  40-49  80-89  90-99

By the way, here I put spaces between the ages intervals. Hope this helps.

CodePudding user response：

Assuming these brackets are on all of the entries, you can slice them off and then replace the range strings. From the docs, pandas.Series.replace, pandas will replace the values from the dict without the need for you to loop.

import pandas as pd

df = pd.DataFrame({
    "age":['[70-80)', '[50-60)', '[60-70)', '[40-50)', '[80-90)', '[90-100)']})

ranges_to_replace = {
    '50-60' : '50-59',
    '60-70' : '60-69',
    '70-80' : '70-79',
    '80-90' : '80-89',
    '90-100': '90-99'}

df['age'] = df['age'].str.slice(1,-1).replace(ranges_to_replace)
print(df)

Output

CodePudding user response：

change the last part to this

for i in range(len(df['age'])):
    for x in chars_to_replace:
        df['age'].iloc[i]=df['age'].iloc[i].replace(x,chars_to_replace[x])