This post i deleted as i don't get the answer
CodePudding user response:
As @Debi Prasad Sen suggested above, the fastest/easiest way to do this is to just use sklearn's tried and tested implementation of the KMeans algorithm (see here for documentation).
Alternatively, you could write your own implementation - here's a simple function that I wrote in Python, per your comment:
import numpy as np
from numpy.random import randint
from typing import Tuple, NewType
from scipy.spatial.distance import cdist
ndy = NewType("numpy ndarray", np.ndarray)
def kmeans(X: ndy, k: int, reps: int, seed: int=17)-> Tuple[ndy, ndy]:
np.random.seed(seed) # 17 is my favorite number
labels = np.zeros(X.shape[0], dtype=int)
centroids = X[randint(0, X.shape[0], size=k, dtype=int),:]
for r in range(reps):
labels = np.argmin(cdist(X, centroids), axis=1)
for i in range(k):
np.mean(X[(labels==i), :], axis=0, out=centroids[i])
return (labels, centroids)
CodePudding user response:
Refer to the following query link to understand how to read a text file using the Pandas library of python.
For the implementation of K-means, You can use sci-kit learn library or you can build it from scratch using the NumPy, refer to this article
