I have a DataFrame like this:
| idx | Var1 | Var2 | Var3 |
|---|---|---|---|
| 0 | True | False | False |
| 1 | False | True | False |
| 2 | True | False | True |
| 3 | False | False | False |
| 4 | True | False | True |
I'd like to create three new columns with the distance (from each row) of the closest True, and if that row has a True show 0, so I would get this:
| idx | Var1 | Var2 | Var3 | distV1 | distV2 | distV3 |
|---|---|---|---|---|---|---|
| 0 | True | False | False | 0 | 1 | 2 |
| 1 | False | True | False | 1 | 0 | 1 |
| 2 | True | False | True | 0 | 1 | 0 |
| 3 | False | False | False | 1 | 2 | 1 |
| 4 | True | False | True | 0 | 3 | 0 |
I have read all other discussions related to this topic but haven't been able to find an answer for something like this.
CodePudding user response:
Here is one approach with numpy ops:
for c in df:
r = np.where(df[c])[0]
d = abs(df.index.values[:, None] - r)
df[f'{c}_dist'] = abs(df.index - r[d.argmin(1)])
print(df)
Var1 Var2 Var3 Var1_dist Var2_dist Var3_dist
0 True False False 0 1 2
1 False True False 1 0 1
2 True False True 0 1 0
3 False False False 1 2 1
4 True False True 0 3 0
CodePudding user response:
Code
from scipy.spatial import KDTree
array = df.to_numpy()
bmp = array.astype(np.uint8)
all_points = np.argwhere(bmp!=2)
true_points = np.argwhere(bmp==1)
distance = tree.query(points, k=1, p=1)[0]
distance.resize(array.shape)
df[[c "_dist" for c in df.columns]] = distance.astype(int)
Output
Var1 Var2 Var3 Var1_dist Var2_dist Var3_dist
idx
0 True False False 0 1 2
1 False True False 1 0 1
2 True False True 0 1 0
3 False False False 1 2 1
4 True False True 0 1 0
Explain
- Using
np.arrayto make0,1data
array([[1, 0, 0],
[0, 1, 0],
[1, 0, 1],
[0, 0, 0],
[1, 0, 1]], dtype=uint8)
argwherewill return the position coordinate for eligible points.KDTreeis a classical algorithm to find the nearest point.arg
kmeans the top n nearest pointsarg
p=1 means "Manhattan" distance
Which Minkowski p-norm to use.
1 is the sum-of-absolute-values distance ("Manhattan" distance).
2 is the usual Euclidean distance.
