Data-dependent similarity measures have been proposed to overcome key weaknesses of existing data mining algorithms relying on distance measures. These measures have been motivated by psychologists, who advocate for a measure to have the following characteristic: two points in a dense region are less similar to each other than two points of equal inter-point distance in a sparse region. We have shown that these new measures can significantly improve the task-specific performance of existing data mining techniques, including clustering, classification and anomaly detection on a large number of real-world datasets.
- Nearest-Neighbour-Induced Isolation Similarity and Its Impact on Density-Based Clustering
- Lowest probabilitymass neighbour algorithms: relaxing the metric constraint in distance-based neighbourhood algorithms
- Overcoming key weaknesses of Distance-based Neighbourhood Methods using a Data Dependent Dissimilarity Measure