Data.table in R into datatable in Python translation problem (normalization)-CodePudding

I have this code in R where I am using data.table, and I have the intention to translate it into Python with datatable. It creates columns with the value of each existing column divided by the mean of the total. Kind of normalization.

dataset[ , paste0( cols, suffix) := lapply( .SD,  function(x){ x/mean(x, na.rm=TRUE)} ), 
         by= col_A, 
         .SDcols= cols]

CodePudding user response：

from datatable import f,by,update,dt

dataset=dt.Frame({'col_A':[0,0,1,1], 'col_B':[1,2,3,4], 'col_C':[5,6,7,8]})
cols = dataset[:,[int,float]].names
dataset[:, update(**{col '_norm': f[col]/dt.mean(f[col]) for col in cols if col!='col_A'}), by(f.col_A)]

CodePudding user response：

According to the documentation, update function and del operator operate in-place. This may be done in a loop also if there are many columns

DT[:, update(y_suffix = f.y/dt.mean(f.y), v_suffix = f.v/dt.mean(f.v)), by("x")]

-output

data

from datatable import dt, f, g, by, update

DT = dt.Frame(x = ["b"]*3   ["a"]*3   ["c"]*3,
              y = [1, 3, 6] * 3,
              v = range(1, 10))