Outlier Detection in High Dimensional Data

Kamalov, Firuz
Leung, Ho Hon
Journal Title
Journal ISSN
Volume Title
World Scientific Publishing Co. Pte Ltd
High-dimensional data poses unique challenges in outlier detection process. Most of the existing algorithms fail to properly address the issues stemming from a large number of features. In particular, outlier detection algorithms perform poorly on dataset of small size with a large number of features. In this paper, we propose a novel outlier detection algorithm based on principal component analysis and kernel density estimation. The proposed method is designed to address the challenges of dealing with high-dimensional data by projecting the original data onto a smaller space and using the innate structure of the data to calculate anomaly scores for each data point. Numerical experiments on synthetic and real-life data show that our method performs well on high-dimensional data. In particular, the proposed method outperforms the benchmark methods as measured by F1-score. Our method also produces better-than-average execution times compared with the benchmark methods. © 2020 World Scientific Publishing Co.
This article is not available at CUD collection. The version of scholarly record of this Article is published in Journal of Information & Knowledge Management (2020), available online at: https://doi.org/10.1142/S0219649220400134
high dimensional data , KDE , Outlier detection , PCA , Anomaly detection , Large dataset , Numerical methods , Principal component analysis , Signal detection , Statistics , Data points , Innate structure , Kernel Density Estimation , Numerical experiments , Outlier detection algorithm , Outlier detection in high-dimensional datum , Real life data , Clustering algorithms
Kamalov, F., & Leung, H. H. (2020). Outlier detection in high dimensional data. Journal of Information & Knowledge Management, 19(1), 2040013. https://doi.org/10.1142/S0219649220400134