Data-dependent similarity measures have been proposed to overcome key weaknesses of existing data mining algorithms relying on distance measures. These measures have been motivated by psychologists, who advocate for a measure to have the following characteristic: two points in a dense region are less similar to each other than two points of equal inter-point distance in a sparse region. We have shown that these new measures can significantly improve the task-specific performance of existing data mining techniques, including clustering, classification and anomaly detection on a large number of real-world datasets.
- Towards a Persistence Diagram that is Robust to Noise and Varied Densities
- Kernel-based clustering via Isolation Distributional Kernel
- A new distributional treatment for time series and an anomaly detection investigation
- Streaming Hierarchical Clustering Based on Point-Set Kernel
- Improving the Effectiveness and Efficiency of Stochastic Neighbour Embedding with Isolation Kernel