how to set a limitation on hierarchical clustering-CodePudding

I have a dataset like this :

#	c1	c2	c3	c4	c5
r1	3	7	4	3	5
r2	4	2	6	5	2
r3	8	4	4	6	2
r4	9	4	5	6	2
r5	3	7	4	5	8
r6	2	6	9	1	10

and the elements in each row determine the distance between locations. for example distance between r1 and c2 is 7 km.

now my question is: how can I set a limitation that prevents clustering for elements that their values are bigger than 5 ?! in other words, hierarchical algorithm Does not include them in it's calculations. please help me to solve this problem. thanks.

CodePudding user response：

Modelling using sklearn's agglomerative clustering, provide 5 in distance_threshold parameter as follows:

from sklearn.cluster import AgglomerativeClustering
cluster = AgglomerativeClustering(affinity='euclidean', linkage='ward',distance_threshold = 5)  
cluster.fit_predict(data_scaled)

For more information, check this blog [https://www.analyticsvidhya.com/blog/2019/05/beginners-guide-hierarchical-clustering/][1]