Macquarie University
Browse
01whole.pdf (3.57 MB)

Isolation Forest Based Anomaly Detection with Learning to Hash

Download (3.57 MB)
thesis
posted on 2024-07-09, 05:29 authored by Haolong Xiang

Due to rapid technological advancements, Artificial Intelligence (AI) has become an integral part of our daily lives, impacting various industries such as healthcare, finance, and transportation. In AI industries, anomaly detection plays a crucial role as it is vital in preventing significant losses in various applications such as cybersecurity intrusion detection, financial risk detection, and human health monitoring. With the advancements in AI techniques, anomaly detection has become an essential tool to identify rare items that deviate from the most normal items. A variety of unsupervised anomaly detection methods, ranging from shallow to deep, have been proposed. Notably, a category based on the isolation forest mechanism stands out for its simplicity, effectiveness, and efficiency, e.g., iForest is frequently employed as a state-of-the-art detector in real applications. However, the existing isolation forest based approaches are data-independent and fail to effectively learn the information of data instances, which significantly impairs the effectiveness and robustness of anomaly detection. In this dissertation, we aim to solve the above challenges at both shallow and deep levels through the implementation of appropriate learning techniques.

Specifically, we analyse the limitations of traditional isolation forest based methods in learning data information to build isolation trees. To solve the problem, we adopt the learning to hash schemes to isolation forest and extend the order preserving hashing (OPH) for anomaly detection by designing a two-step learning scheme. Our analysis further establishes the critical role played by isolation tree structures in determining the overall performance of these detection methods. Herein, we investigate the optimization problem of isolation tree structure concerning the branching factor and establish a theory on the optimal isolation forest, simultaneously designing a practical optimal isolation forest. Thirdly, we have the observation that deep anomaly detection (DAD) methods relying on deep neural networks (DNNs) are restricted by the intrinsic shortcomings of deep learning, such as an overabundance of parameters and fixed training layers. Inspired by the deep forest, we aim to solve the above learning problems at a deep level without relying on DNNs. We deepen the isolation forest into the cascaded forest to improve the detection performance of anomaly detection. Next, the isolation tree structure is optimised using a powerful deep model based on the genetic algorithm. Extensive experiments on both synthetic datasets and a series of real-world datasets demonstrate that our approaches can achieve better detection accuracy and robustness than the state-of-the-arts.

History

Table of Contents

1. Introduction -- 2. Literature Review -- 3. Order Preserving Hashing Based Isolation Forest for Robust and Scalable Anomaly Detection -- 4. Optimal Isolation Forest for Anomaly Detection -- 5. Deep Isolation Forest for Anomaly Detection -- 6. Deep Optimal Isolation Forest with Genetic Algorithm for Anomaly Detection -- 7. Conclusion and Future Work -- References

Awarding Institution

Macquarie University

Degree Type

Thesis PhD

Degree

Doctor of Philosophy

Department, Centre or School

School of Computing

Year of Award

2024

Principal Supervisor

Xuyun Zhang

Additional Supervisor 1

Mark Dras

Rights

Copyright: The Author Copyright disclaimer: https://www.mq.edu.au/copyright-disclaimer

Language

English

Extent

192 pages

Former Identifiers

AMIS ID: 352446

Usage metrics

    Macquarie University Theses

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC