I have numpy 3D arrays with the following shape : (688, 549, 3).
Each dimension is an image. All the dimentions dtype is 'float64', however, the 3rd one was has only integers (but because of the dtype they are saved as float, for example. isntad of 3 it will be 3.0 ).
I'm trying to transform the array to dataframe. I use this script:
pd.DataFrame(array.reshape([3,-1]).T,columns=['band1','band2','classes'])
>>> band1 vband2 classes
0 NaN -19.110207 -10.134580
1 NaN -28.449677 -15.704137
2 0.0 2.000000 2.000000
3 NaN -19.117571 -10.166842
4 NaN -28.500092 -15.727423
....
as you can see, the result are mismatch - column "classes" supose to be only int numbers between 1-4, and the two first columns suppose to have the float negative numbers. but it can be seen that row 2 got values of "classes" column on the 2nd column, and in general is mismatch.
I have used this method before to create dataframe from 3D and even more dimension arrays, but for some reason that I can find yet, in this case I get very mismatched dataframe.
My question is, why is this mismatching happens? and how can I fix it?
***in order to generate similat dataset:
band1= np.random.uniform(low=-20, high=-0.1, size=(688, 549))
band2= np.random.uniform(low=-8, high=-0.1, size=(688, 549))
classes=np.random.randint(4, size=(688, 549))
array=np.dstack((band1,band2,classes))
pd.DataFrame(array.reshape([3,-1]).T,columns=['band1','band2','classes'])
My end goal: to hve dataframe, when each band is a column
CodePudding user response:
I think this is what you want:
pd.DataFrame(array.reshape([-1,3]),columns=['band1','band2','classes'])
output:
band1 band2 classes
0 -18.785736 -3.710138 0.0
1 -18.922210 -3.469634 0.0
2 -15.049059 -4.815290 0.0
3 -12.835178 -6.440274 1.0
4 -1.855383 -3.362667 2.0
... ... ... ...
377707 -5.288869 -6.399208 2.0
377708 -10.594781 -6.191891 3.0
377709 -2.223590 -0.230346 3.0
377710 -12.577054 -3.737268 3.0
377711 -15.462419 -2.691705 2.0
