I have a dataframe df whose index is [x[0], ..., x[N]] and column is [y[0], ..., y[M]] and whose data is a 2D array of z[i,j]'s.
I have a python function def f(x, y, z) of 3 float variables and I would like to calculate the 2d array of f(x[i], y[j], z[i,j])'s in the fastest way using numpy and/or pandas but I don't see how to do it.
I see the df.transform method but it doesn't seem to allow for lambdas that are dependent on index and column of df -- or at least I don't know how to provide such lambdas.
Details on df and f :
How was my
dfobtained ? I created it during a 45 minutes computation using an intensive numerical python vectorized function on a grid with N = 5000 and M = 5000 and I "to_csv'ed" it. Now when I want to use it, I useread_csv.Now my function
fis quite an involved numericalCfunction that I exposed to python with pybind11 (I put the tag for sake of completness) and that I don't want to rewrite in a "numpy vectorizable fashion" for now as it is ultra-optimized and very fast unitarily. Givenx,ythe functionfsolves numerically (iterative root finder) an equation with parametersx,y,zand unknowZ, the root of the equation beingf(x,y,z).
CodePudding user response:
You could do a pd.melt:
df.reset_index().rename(columns={'index':'x'}).melt(var_name='y', value_name='z', id_vars='x')
It essentially transform the dataframe to the long format, making each row to have three entries: x, y and z.
CodePudding user response:
If you don't want to rewite the function, then using loop for to apply the function seems a easy way. you can do this
idx = df.index
cols = df.columns
vals = df.to_numpy()
r = [
[f(x,y,z) for y, z in zip(cols, vals[i])]
for i, x in enumerate(idx)
]
# if you want to recreate a dataframe
df_root = pd.DataFrame(data=r, index=idx, columns=cols)
there is a list comprehension on the index that includes a list comprehension on both the columns and the values of the row at the same time. vals[i] access the values from the row at position i. The result r is a list of length number of rows (N) and each item is a list of length number of columns (M). you don't need this structure especially but it is a easy way to build a dataframe with same index-columns as the original data.
Note that it will still be long, you have about 25 million operations to do, even if f is optimized.
