A cheatsheet to Clustering algorithms

Published in

Analytics Vidhya

3 min readJul 15, 2021

Clustering algorithms are one of the most popular algorithm used by machine learning practitioners across the world for classification problems. The most popular among all clustering algorithms is K-means clustering, but in this story we shall see how it works, what are some other available options, when to use which, How do they all differ, and much more.

K-means

It requires us to know what are the possible number of clusters that can be formed before even applying the algorithm. It assumes that clusters are convex shaped
Many of the documented pages on k-means including the scikit learn’s official documentation page uses duplicate data (made-up) data with just one feature. Hence, the number of clusters can be easily identified visually with help of a scatter plot.
But in real world, it is very very unlikely that you might find data with just one feature. Hence, knowing the number of clusters is a bit difficult.
There are a few techniques such as asking the stakeholder, elbow method, Silhouette coefficient which help us in identifying the number of clusters.
It uses Within-Cluster-Sum-of-Squares (WCSS) as its objective function (loss function in deep learning terms) to improve itself at every iteration.
A variation of K-means clustering is Mini Batch K-Means clustering. It uses a sample of input data. other than that, everything else is the same. The accuracy of this model is slightly less compared your regular K-means clustering.

Affinity Propagation

In every iteration , it sends messages between pairs of samples until convergence.
this message includes the suitability for one sample to be the exemplar of the other, which is updated in response to the values from other pairs.

Mean shift

It works by updating the centroids of each cluster.These centroids are then filtered to get rid of similar ones or near-duplicates and returns a set of centroids.

Spectral clustering

It uses the concept of affinity matrix followed by clustering.
It works well for a small number of clusters, but is not advised for many clustering the components of the eigenvectors.
Generally used for image segmentation.

Hierarchical clustering

It’s a little similar to a random forest. It uses nested clusters by merging or splitting them successively and this hierarchy is represented as a binary tree.
Agglomerative Clustering is an exact replica of Hierarchical clustering but with a bottom-up approach. It recursively merges the pair of clusters that minimally increases a given linkage distance.

DBSCAN

While K-means assumes that clusters are convex shaped, DBSCAN doesn’t care much about the shape but rather the density of data points. Hence the clusters formed can be of any shape.
It uses eps value which is the maximum distance between two samples for one to be considered as in the neighbourhood of the other. This is not a maximum bound on the distances of points within a cluster. This is the most important DBSCAN parameter to choose appropriately for your data set and distance function.
Optics is a generalised version of DBSCAN that has an eps requirement in a range rather than a single value.

BIRCH

The Birch builds a tree called the Clustering Feature Tree. which essentially makes it a Hierarchical clustering.
The BIRCH algorithm has two parameters, the threshold and the branching factor. The branching factor limits the number of subclusters in a node and the threshold limits the distance between the entering sample and the existing subclusters.
This algorithm can be viewed as an instance or data reduction method, since it reduces the input data to a set of subclusters which are obtained directly from the leaves of the CFT

Gaussian mixture models

There are four types of models in gaussian mixture. Spherical, diagonal, tied or full covariance.
It is the fastest algorithm for learning mixture models
They implement the expectation-maximization (EM) algorithm