I have a data frame like this where the distCum field indicates distance:
oid distCum
1472 0
1473 0.084116923
1565 0.157785132
1469 2.326473679
9567 4.156309659
1500 5.953545907
9544 6.157304401
1561 6.190537806
8823 7.503586809
4037 8.547562197
The dataframe has millions of rows and the distCum column values indicates cumulative kms and increments to more than 1000 kms. I am trying to create a cluster where every two kms are grouped together. The desired output is as follows:
oid distCum Clust
1472 0 1
1473 0.084116923 1
1565 0.157785132 1
1469 2.326473679 2
9567 4.156309659 3
1500 5.953545907 3
9544 6.157304401 4
1561 6.190537806 4
8823 7.503586809 4
4037 8.547562197 5
To elaborate on the cluster classification, if the distance is
< 2 - cluster = 1
between 2 and 4 cluster = 2
between 4 and 6 cluster = 6
I tried using a for loop iterating over each column and however failing on applying the increment values for the divisor.
CodePudding user response:
Use Series.floordiv with add 1 and casting to integers:
df['Clust'] = df['distCum'].floordiv(2).add(1).astype(int)
print (df)
oid distCum Clust
0 1472 0.000000 1
1 1473 0.084117 1
2 1565 0.157785 1
3 1469 2.326474 2
4 9567 4.156310 3
5 1500 5.953546 3
6 9544 6.157304 4
7 1561 6.190538 4
8 8823 7.503587 4
9 4037 8.547562 5
