Data-dependent Similarity

Data-dependent similarity measures have been proposed to overcome key weaknesses of existing data mining algorithms relying on distance measures. These measures have been motivated by psychologists, who advocate for a measure to have the following characteristic: two points in a dense region are less similar to each other than two points of equal inter-point distance in a sparse region. We have shown that these new measures can significantly improve the task-specific performance of existing data mining techniques, including clustering, classification and anomaly detection on a large number of real-world datasets.

The source code of the latest data-dependent similarity measure aNNE (AAAI-19) can be obtained from here. The first generic data-dependent similarity measures me (KDD-16) can be obtained from here.

Dr Ye Zhu
Dr Ye Zhu
Senior Lecturer of Computer Science, IEEE Senior Member

My research works focus on the fields of clustering and anomaly detection.