Macquarie University
Browse

iTransact: Isolation Kernel-Based Transaction Classification

Download (1.99 MB)
thesis
posted on 2025-11-12, 23:28 authored by Eu Jin Foo
<p dir="ltr">In the past three decades, consumers and businesses have shifted from brick-and-mortar stores to online e-commerce platforms. The rise of marketplace platforms such as Amazon, eBay and Alibaba has changed consumer shopping behaviour and expectations. In response to the growth of e-commerce, technology has evolved rapidly leading to the creation of agile financial technology (fintech) companies that have provided innovative, secure and convenient payment solutions, such as PayPal, Square and Stripe.</p><p dir="ltr">The launch of the iPhone in 2007 and the subsequent ubiquity of smartphones has further accelerated the growth of e-commerce and fintech, leading to the creation of mobile payment solutions such as Apple Pay, and Google Wallet. As a result, non-cash transactions have become the norm, exceeding 1.4bn in 2023, and estimated to reach 2.8bn by 2028⁰. Naturally, the ability to monitor, process and understand transactions at scale has become a critical capability for businesses and regulators.</p><p dir="ltr">Given the evolving complexity, volume and velocity of transactions, both traditional rule-based and machine learning approaches face challenges of performing effectively at scale. Bank transaction descriptions tend to be sparse, cryptic abbreviations that are difficult for a human to interpret, let alone an algorithm. Hence, this research aims to develop a novel transaction classification solution that does not require costly manual labelling, is efficient yet effective at scale, and institution-agnostic.</p><p dir="ltr">This thesis proposes a novel transaction classification solution called iTransact. The methodology employs a three-stage pipeline: BERT generates 768-dimensional semantic embeddings from transaction descriptions, minhashing with Locality-Sensitive Hashing (LSH) reduces dimensionality to fixed-length signatures, and an Isolation Distributional Kernel (IDK) classifier categorises transactions. The advantages of this approach are minimal manual labelling, the capability to distinguish between two points in a sparse region compared to the same two points in a dense region, and exact finite-dimensional mapping which allows for efficient ��(��) time complexity[1].</p><p dir="ltr">Evaluation on 197,000 real-world bank transactions demonstrates that iTransact performed comparably to traditional SVM, Random Forest, Logistic Regression and XGBoost models on datasets minhashed to 256 and 128 dimensions. While dimensionality reduction degraded performance across all models, running each classifier on full BERT embeddings achieved strong results.</p><p dir="ltr">Logistic Regression demonstrated superior performance with ROC-AUC of 0.850, outperforming SVM (0.753), Random Forest (0.689), and XGBoost (0.804). The isolation kernel approach, while theoretically promising for sparse data, achieved only 0.523 ROC-AUC, suggesting that multi-class transaction classification benefits more from linear decision boundaries aligned with BERT’s representation space.</p><p dir="ltr">This work presents a novel hybrid architecture that combines transformer-based embeddings with isolation kernel methods, which creates a scalable framework that allows users to balance accuracy and computational efficiency through configurable dimensionality reduction. This system is designed to work with minimal labelled data, and is designed to be implemented as a production-grade solution for transaction classification.</p><p dir="ltr">The findings suggest that while isolation kernels are theoretically sound, practical applications in multi-class classification may require further refinement to achieve optimal performance. Future work could explore alternative kernel methods or hybrid approaches that leverage the strengths of both linear and non-linear classifiers in the context of transaction data.</p>

History

Table of Contents

1. Introduction -- 2. Background and State-of-the-Art -- 3. Method -- 4. Experiments -- 5. Conclusion

Notes

ADDITIONAL SUPERVISOR 3: Yin Liao

Awarding Institution

Macquarie University

Degree Type

Thesis MRes

Degree

Master of Research

Department, Centre or School

School of Computing

Year of Award

2025

Principal Supervisor

Xuyun Zhang

Additional Supervisor 1

Amin Beheshti

Additional Supervisor 2

Di Bu

Rights

Copyright: The Author Copyright disclaimer: https://www.mq.edu.au/copyright-disclaimer

Language

English

Extent

58 pages

Former Identifiers

AMIS ID: 527445

Usage metrics

    Macquarie University Theses

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC