Data-dependent Similarity

Data-dependent similarity measures have been proposed to overcome key weaknesses of existing data mining algorithms relying on distance measures. These measures have been motivated by psychologists, who advocate for a measure to have the following characteristic: two points in a dense region are less similar to each other than two points of equal inter-point distance in a sparse region. We have shown that these new measures can significantly improve the task-specific performance of existing data mining techniques, including clustering, classification and anomaly detection on a large number of real-world datasets.

The source code of the latest data-dependent similarity measure aNNE (AAAI-19) can be obtained from here. The first generic data-dependent similarity measures me (KDD-16) can be obtained from here.

Ye Zhu
Ye Zhu
Lecturer in IT

My research works focus on the fields of clustering and anomaly detection.