Data Manipulation in multiple columns in pandas dataframe-CodePudding

I need to make a function, which takes input as dataframe, and dictionary{"Col_2":% change,"Col_3":% change} and it should make the changes to the dataframe.

I Have data frame like this

Date	col_1	col_2	col_3
01/01/2022	90	100	300
01/02/2022	80	110	310
01/03/2022	92	120	400
01/04/2022	96	130	300
01/05/2022	99	150	500
01/06/2022	105	155	600

Now I pass the dictionary say,

{"col_2":-10,"col_3":10}

Then my desired out would look like this

Date	col_1	col_2	col_3
01/01/2022	90	90	330
01/02/2022	80	99	341
01/03/2022	92	108	440
01/04/2022	96	113	330
01/05/2022	99	135	550
01/06/2022	105	139.5	660

I followed this code(but didn't get my desired result)

def per_change(df,col,per):
    m = 1 per/100
    df['col_2'] *= m
    return df

CodePudding user response：

Use:

df1 = df.set_index('Date')
d =  {"col_2":-10,"col_3":10}
df1 = df1.add(df1.div(100).mul(pd.Series(d)), fill_value=0)
print (df1)
            col_1  col_2  col_3
Date                           
01/01/2022   90.0   90.0  330.0
01/02/2022   80.0   99.0  341.0
01/03/2022   92.0  108.0  440.0
01/04/2022   96.0  117.0  330.0
01/05/2022   99.0  135.0  550.0
01/06/2022  105.0  139.5  660.0

Or:

d =  {"col_2":-10,"col_3":10}
df.iloc[:, 1:] = df.iloc[:, 1:].add(df.iloc[:, 1:].div(100).mul(pd.Series(d)), fill_value=0)
print (df)
         Date  col_1  col_2  col_3
0  01/01/2022   90.0   90.0  330.0
1  01/02/2022   80.0   99.0  341.0
2  01/03/2022   92.0  108.0  440.0
3  01/04/2022   96.0  117.0  330.0
4  01/05/2022   99.0  135.0  550.0
5  01/06/2022  105.0  139.5  660.0

CodePudding user response：

You can use a Series of the dictionary with mul. By default the missing columns will be NaN, and you fill in the original data with combine_first.

The advantage of this solution is that you do not need to care about the existing columns, if will only multiply the ones present in the dictionary.

Numeric/string columns only:

(df df.mul(pd.Series(d)/100)).combine_first(df)

Or to handle dataframes with non-numeric types:

(df df.select_dtypes(include='number').mul(pd.Series(d)/100)).combine_first(df)

output:

         Date  col_1  col_2  col_3
0  01/01/2022   90.0   90.0  330.0
1  01/02/2022   80.0   99.0  341.0
2  01/03/2022   92.0  108.0  440.0
3  01/04/2022   96.0  117.0  330.0
4  01/05/2022   99.0  135.0  550.0
5  01/06/2022  105.0  139.5  660.0

input:

df = pd.DataFrame({'Date': ['01/01/2022', '01/02/2022', '01/03/2022', '01/04/2022', '01/05/2022', '01/06/2022'], 'col_1': [90, 80, 92, 96, 99, 105],
                   'col_2': [100, 110, 120, 130, 150, 155],
                   'col_3': [300, 310, 400, 300, 500, 600]})

d = {"col_2":-10,"col_3":10}