A new distributional treatment for time series and an anomaly detection investigation


Time series is traditionally treated with two main approaches, i.e., the time domain approach and the frequency domain approach. These approaches must rely on a sliding window so that time-shift versions of a periodic subsequence can be measured to be similar. Coupled with the use of a root point-to-point measure, existing methods often have quadratic time complexity. We offer the third R domain approach. It begins with an insight that subsequences in a periodic time series can be treated as sets of independent and identically distributed (iid) points generated from an unknown distribution in R. This R domain treatment enables two new possibilities: (a) the similarity between two subsequences can be computed using a distributional measure such as Wasserstein distance (WD), kernel mean embedding or Isolation Distributional kernel (IDK); and (b) these distributional measures become non-sliding-window-based. Together, they offer an alternative that has more effective similarity measurements and runs significantly faster than the point-to-point and sliding-window-based measures. Our empirical evaluation shows that IDK and WD are effective distributional measures for time series; and IDK-based detectors have better detection accuracy than existing sliding-window-based detectors, and they run faster with linear time complexity.

The 48th International Conference on Very Large Databases (VLDB-22)