Home > Net >  Data Manipulation in multiple columns in pandas dataframe
Data Manipulation in multiple columns in pandas dataframe

Time:01-31

I need to make a function, which takes input as dataframe, and dictionary{"Col_2":% change,"Col_3":% change} and it should make the changes to the dataframe.

I Have data frame like this

Date col_1 col_2 col_3
01/01/2022 90 100 300
01/02/2022 80 110 310
01/03/2022 92 120 400
01/04/2022 96 130 300
01/05/2022 99 150 500
01/06/2022 105 155 600

Now I pass the dictionary say,

{"col_2":-10,"col_3":10}

Then my desired out would look like this

Date col_1 col_2 col_3
01/01/2022 90 90 330
01/02/2022 80 99 341
01/03/2022 92 108 440
01/04/2022 96 113 330
01/05/2022 99 135 550
01/06/2022 105 139.5 660

I followed this code(but didn't get my desired result)

def per_change(df,col,per):
    m = 1 per/100
    df['col_2'] *= m
    return df 

CodePudding user response:

Use:

df1 = df.set_index('Date')
d =  {"col_2":-10,"col_3":10}
df1 = df1.add(df1.div(100).mul(pd.Series(d)), fill_value=0)
print (df1)
            col_1  col_2  col_3
Date                           
01/01/2022   90.0   90.0  330.0
01/02/2022   80.0   99.0  341.0
01/03/2022   92.0  108.0  440.0
01/04/2022   96.0  117.0  330.0
01/05/2022   99.0  135.0  550.0
01/06/2022  105.0  139.5  660.0

Or:

d =  {"col_2":-10,"col_3":10}
df.iloc[:, 1:] = df.iloc[:, 1:].add(df.iloc[:, 1:].div(100).mul(pd.Series(d)), fill_value=0)
print (df)
         Date  col_1  col_2  col_3
0  01/01/2022   90.0   90.0  330.0
1  01/02/2022   80.0   99.0  341.0
2  01/03/2022   92.0  108.0  440.0
3  01/04/2022   96.0  117.0  330.0
4  01/05/2022   99.0  135.0  550.0
5  01/06/2022  105.0  139.5  660.0

CodePudding user response:

You can use a Series of the dictionary with mul. By default the missing columns will be NaN, and you fill in the original data with combine_first.

The advantage of this solution is that you do not need to care about the existing columns, if will only multiply the ones present in the dictionary.

Numeric/string columns only:

(df df.mul(pd.Series(d)/100)).combine_first(df)

Or to handle dataframes with non-numeric types:

(df df.select_dtypes(include='number').mul(pd.Series(d)/100)).combine_first(df)

output:

         Date  col_1  col_2  col_3
0  01/01/2022   90.0   90.0  330.0
1  01/02/2022   80.0   99.0  341.0
2  01/03/2022   92.0  108.0  440.0
3  01/04/2022   96.0  117.0  330.0
4  01/05/2022   99.0  135.0  550.0
5  01/06/2022  105.0  139.5  660.0

input:

df = pd.DataFrame({'Date': ['01/01/2022', '01/02/2022', '01/03/2022', '01/04/2022', '01/05/2022', '01/06/2022'], 'col_1': [90, 80, 92, 96, 99, 105],
                   'col_2': [100, 110, 120, 130, 150, 155],
                   'col_3': [300, 310, 400, 300, 500, 600]})

d = {"col_2":-10,"col_3":10}
  •  Tags:  
  • Related