Sunday, 23 June 2013

Minimal Test Collection (MTC) Evaluation Utility

I have been using mtc-eval from the TREC 2009 Web Track homepage and I had troubles getting it to run without it crashing with segmentation faults. I found a newer version at the author's web page and it fixed any problems I was experiencing. Also, the GNU Scientific Library that this software depends on will install without the LAPACK and BLAS dependencies. So remember to install the lapack, lapack-devel, atlas, atlas-devel, blas and blas-devel packages found on most Linux distributions.

Saturday, 15 June 2013

ClusterEval 1.0 Released

Today I have released ClusterEval 1.0. This program compares a clustering to a ground truth set of categories according to multiple different measures. It also includes a novel approach called 'Divergence from a Random Baseline' that augments existing measures to correct for ineffective clusterings. It has been used in the evaluation of clustering at the INEX XML Mining track at INEX in 2009 and 2010, and the upcoming Social Event Detection task at MediaEval in 2013. It implements cluster quality metrics based on ground truths such as Purity, Entropy, Negentropy, F1 and NMI.

Further details describing the use and functionality of this software are available in the manual.

Complete details of the quality measures can be found in the paper 'Document Clustering Evaluation: Divergence from a Random Baseline'.

The Social Event Detection task at MediaEval involves automated detection of social events from real life social networks. If this sounds of interest to you, head over the to the task description page and register.

Tuesday, 2 April 2013

The 2013 Social Event Detection Task

The task description for the Social Event Detection Task at MediaEval 2013 has been released. The task involves supervised clustering of events from real social media networks.

Some previous work on clustering evaluation that came from my involvement in the INEX XML Mining track in 2009 and 2010 and I described in a paper "Document Clustering Evaluation: Divergence from a Random Baseline" is being used during the evaluation.

If this sounds of interest to you, head over the task description page, and register!