Faster way of converting a dataframe of x,y,z values into an image?-CodePudding

I have a simple dataframe structure that looks like this:

print(scene_2d_df.head())

     x       y  z
0  963  1691.0  0
1  911  1881.0  0
2  837   864.0  1
3  785  1054.0  0
4  897    59.0  0

print(scene_2d_df.shape)

(2294591, 3)

Every row represents a white or black dot (1 or 0) in an image. The x and y columns are the pixel positions. The image is approx. 1200 x 1800 in this case. I have code which I believe works, but is running very slowly even on a modern machine. The approach is a bit brute-force.

def construct_image_from_df(df_1):
    xmax = int(df_1.max(axis=0)['x'])
    xmin = int(df_1.min(axis=0)['x'])
    ymax = int(df_1.max(axis=0)['y'])
    ymin = int(df_1.min(axis=0)['y'])
    zmax = int(df_1.max(axis=0)['z'])
    zmin = int(df_1.min(axis=0)['z'])
    
    print("xmin :: "   str(xmin)   " // xmax :: "   str(xmax)) # 1200-something
    print("ymin :: "   str(ymin)   " // ymax :: "   str(ymax)) # 1800-something
    print("zmin :: "   str(zmin)   " // zmax :: "   str(zmax)) # 1, all values 0 or 1
    
    img = np.zeros((xmax, ymax))
    
    length = df_1.shape[0] # number of rows
    for i in range(0, length):
        x, y, z = int(df_1.iloc[i]['x']), int(df_1.iloc[i]['y']), int(df_1.iloc[i]['z'])
        img[x - 1, y - 1] = z

    return img

Basically I am grabbing every row of the dataframe, and manually doing a pixel write into my 2D img array. It is very slow.

Is there a faster (maybe vectorized) way to do this?

CodePudding user response：

You can use the coordinates almost directly in an indexing expression.

First, don't compute min and max multiple times:

x_max, y_max = df[['x', 'y']].max()
x_min, y_min = df[['x', 'y']].min()

Then, place the z values into an image buffer directly:

img = np.zeros((x_max   1, y_max   1), dtype=df['z'].dtype)
img[df['x'].to_numpy(dtype=int), df['y'].to_numpy(dtype=int)] = df['z'].to_numpy()

Changing the dtype is necessary, because y appears to contain floats with integer values. Indexing arrays need to be integers. You can also adjust the dtype to the minimum required to hold z with np.min_scalar_type:

np.zeros((x_max   1, y_max   1), dtype=np.min_scalar_type(df['z'].max()))

If you want a boolean mask and you know that z represents True/False values, force the dtype of img to bool.