tag:blogger.com,1999:blog-766066527791534700.post8323318898175195074..comments2015-11-14T22:31:27.430-08:00Comments on Perplexing Permutations: Web Scale Document Clustering: Clustering 733 Million Web Pageschrishttp://www.blogger.com/profile/14662093233141360874noreply@blogger.comBlogger2125tag:blogger.com,1999:blog-766066527791534700.post-78600058418091794152015-05-31T13:41:48.737-07:002015-05-31T13:41:48.737-07:00The Geomblog has an interest posting on estimating...The Geomblog has an interest posting on estimating the number of clusters in a given dataset at http://blog.geomblog.org/2010/03/this-is-part-of-occasional-series-of.html.<br /><br />You might also like to look at a previous publication of mine where I show how an exact optimum can be found to the elbow method when plotting number of clusters versus root mean squared error, http://eprints.qut.edu.au/53371/.<br /><br />I also think it would be quite feasible to incorporate an approach like X-means into these algorithms, https://www.cs.cmu.edu/~dpelleg/download/xmeans.pdf. chrishttp://www.blogger.com/profile/14662093233141360874noreply@blogger.comtag:blogger.com,1999:blog-766066527791534700.post-59267243313038645102015-05-31T11:59:31.436-07:002015-05-31T11:59:31.436-07:00*Not sure if my previous comment was submitted . B...*Not sure if my previous comment was submitted . But do you have any suggestions for a document clustering algorithm that might be close to the scale of Topsig but one in which the # of clusters is not specified.<br /><br />For example affinity propagation Dwayne Campbellhttp://www.blogger.com/profile/17287582227117679926noreply@blogger.com