Isolation Forest Based Anomaly Detection with Learning to Hash
Due to rapid technological advancements, Artificial Intelligence (AI) has become an integral part of our daily lives, impacting various industries such as healthcare, finance, and transportation. In AI industries, anomaly detection plays a crucial role as it is vital in preventing significant losses in various applications such as cybersecurity intrusion detection, financial risk detection, and human health monitoring. With the advancements in AI techniques, anomaly detection has become an essential tool to identify rare items that deviate from the most normal items. A variety of unsupervised anomaly detection methods, ranging from shallow to deep, have been proposed. Notably, a category based on the isolation forest mechanism stands out for its simplicity, effectiveness, and efficiency, e.g., iForest is frequently employed as a state-of-the-art detector in real applications. However, the existing isolation forest based approaches are data-independent and fail to effectively learn the information of data instances, which significantly impairs the effectiveness and robustness of anomaly detection. In this dissertation, we aim to solve the above challenges at both shallow and deep levels through the implementation of appropriate learning techniques.
Specifically, we analyse the limitations of traditional isolation forest based methods in learning data information to build isolation trees. To solve the problem, we adopt the learning to hash schemes to isolation forest and extend the order preserving hashing (OPH) for anomaly detection by designing a two-step learning scheme. Our analysis further establishes the critical role played by isolation tree structures in determining the overall performance of these detection methods. Herein, we investigate the optimization problem of isolation tree structure concerning the branching factor and establish a theory on the optimal isolation forest, simultaneously designing a practical optimal isolation forest. Thirdly, we have the observation that deep anomaly detection (DAD) methods relying on deep neural networks (DNNs) are restricted by the intrinsic shortcomings of deep learning, such as an overabundance of parameters and fixed training layers. Inspired by the deep forest, we aim to solve the above learning problems at a deep level without relying on DNNs. We deepen the isolation forest into the cascaded forest to improve the detection performance of anomaly detection. Next, the isolation tree structure is optimised using a powerful deep model based on the genetic algorithm. Extensive experiments on both synthetic datasets and a series of real-world datasets demonstrate that our approaches can achieve better detection accuracy and robustness than the state-of-the-arts.