Learnt a new thing today!!!! Fuzzy c-means Clustering
Whenever the topic of Clustering arises, K-means is the first algorithm that text books teach us. To give a new brief intro to K-means, it is a clustering technique, which takes 2 inputs,
- K — no of clusters
- Distance measure like Euclidean, Manhattan distance
Algorithm is as follow,
- Initialize k number of instances as cluster centroids to startwith.
- For remaning n-k instances, compute the distance from the k centroids and assign the instance to the cluster with least distance.
- Update the cluster centroids by taking the mean of instances belonging to a particular cluster.
- Repeat 2, 3 until there is no change in cluster centroids.
Algorithm is quite simple and very straight forward. Then why do we need an updated Algorithm?
K-Means is a Hard clustering algorithm, which means one instance can belong to only one cluster.
Fuzzy c-means is similar to K-means clustering algorithm with a difference that every instance may belong to one or more clusters.
Algorithm works as follows:
- Initialize utility matrix U which has `c` number of columns where c is the number of clusters.
- At every step, compute the cluster centroids with utility matrix U
3. Update utility matrix U as
where m is any real number greater than 1, uij is the degree of membership of xi in the cluster j, xi is the ith of d-dimensional measured data, cj is the d-dimension center of the cluster.
4. repeat 2,3 till we run p number of iterations or any other stopping criteria.