I have a dataset like this :
| # | c1 | c2 | c3 | c4 | c5 |
|---|---|---|---|---|---|
| r1 | 3 | 7 | 4 | 3 | 5 |
| r2 | 4 | 2 | 6 | 5 | 2 |
| r3 | 8 | 4 | 4 | 6 | 2 |
| r4 | 9 | 4 | 5 | 6 | 2 |
| r5 | 3 | 7 | 4 | 5 | 8 |
| r6 | 2 | 6 | 9 | 1 | 10 |
and the elements in each row determine the distance between locations. for example distance between r1 and c2 is 7 km.
now my question is: how can I set a limitation that prevents clustering for elements that their values are bigger than 5 ?! in other words, hierarchical algorithm Does not include them in it's calculations. please help me to solve this problem. thanks.
CodePudding user response:
Modelling using sklearn's agglomerative clustering, provide 5 in distance_threshold parameter as follows:
from sklearn.cluster import AgglomerativeClustering
cluster = AgglomerativeClustering(affinity='euclidean', linkage='ward',distance_threshold = 5)
cluster.fit_predict(data_scaled)
For more information, check this blog [https://www.analyticsvidhya.com/blog/2019/05/beginners-guide-hierarchical-clustering/][1]
