A good, wide intro to clustering?


I would like to brush up my clustering knowledge and was wondering if anyone could recommend a resource (video, blog post, course, book, …) that would cover most clustering algorithms, hopefully highlighting their differences? Eg partitioning vs hierarchical vs density.

Everything I’ve found so far is either disappointing (very short blogposts) or not “overview enough”. I’ve seen this scikit example, but of course it doesn’t cover any of the theory.


I recall that Sarkar - Text Analytics with Python includes a nice chapter on clustering, with a clear explanation of different algorithms.



I see this in the book:

There are also several other newer clustering models, like BIRCH and CLARANS. Entire books and journals have been written just on clustering alone, as it is a really interesting topic with a lot of value. Covering every method would be impossible for us in the current scope, hence, we will cover a total of three clustering algorithms and illustrate them with real-world data for better understanding:
• K-means clustering
• Affinity propagation
• Ward’s Agglomerative Hierarchical clustering

Which unfortunately makes me think I might be asking for something that doesn’t exist. Perhaps I should look at Coursera courses or similar.

Yes, or do some snowballing based on some handbooks, like Sarkar’s. I do think others here might be able to point you in some directions.

“Probabilistic Machine Learning: An introduction” by Kevin P Murphy seems to cover most of what I wanted, so if you stumble upon this thread be sure to check it out. A draft of the whole book (900+ pages) is available.



Great, thanks for sharing