Macquarie University
Browse

Spot Fake Lies with Real Eyes: Using Machine Learning to Detect Online Fake Reviews

Download (2.32 MB)
thesis
posted on 2025-11-27, 23:27 authored by Ling Zhang
<p dir="ltr">Online reviews significantly influence consumer purchase decisions and business reputations on digital platforms. However, the widespread presence of fake reviews not only misguides consumers but also undermines the reliability of online reviews. Therefore, detecting fake reviews is crucial for various stakeholders and has evolved from simple visual inspection to the use of advanced machine learning models. Despite the proven effectiveness of machine learning models in fake review detection, several key research gaps remain.</p><p dir="ltr">First, most predictive models rely on binary classification (i.e., true vs. fake), which assumes that review authenticity is a clear-cut distinction between real and fake. However, online reviews often lie on a spectrum of deception, where subtle manipulation, exaggeration, and subjective bias blur the boundary between authenticity and fabrication. Second, prior research lacks a clearly structured framework that accounts for behavioural features in predicting deceptive reviews.</p><p dir="ltr">To address these gaps, this study develops a probabilistic modelling framework that: 1) estimates the likelihood of deception on a continuous spectrum, allowing for more flexible and accurate detection beyond a rigid true/false label; 2) systematically organizes behavioural features into three dimensions – structural (e.g., number of friends), relational (e.g., number of compliments) and activity-based (e.g., review frequency), based on Social Capital Theory; 3) incorporates store-level attributes, such as average rating, number of reviews, and consumer-generated “vibes”.</p><p dir="ltr">This study uses a dataset of 32,964 restaurant reviews from Yelp, covering both real and fake reviews from January 2005 to November 2024. A range of probabilistic machine models – including Logistic Regression, Decision Tree, Random Forest, XGBoost, and Artificial Neural Networks – are implemented and compared using 5-fold cross-validation. XGBoost is selected as the final model based on overall superior performance across Precision, Recall, F1-Score, Accuracy, and Area Under the ROC Curve (AUC).</p><p dir="ltr">Results show that relational and structural behavioural features are among the top predictors. Interestingly, reviewers high in social capital features – such as elite reviewers (designated by Yelp as “elite” based on their activity, consistency, and perceived credibility) – are more likely to generate fake reviews. The average probability of fake review among elite reviewers is 33.5%, compared to 12% among non-elite reviewers. At the store level, reviews associated with high-rated businesses tend to exhibit higher probabilities of being fake. Furthermore, the distribution of fakeness probability across star ratings follows a U-shaped pattern, with reviews at 1-star and 5-star exhibiting higher probabilities of deception, and a secondary peak observed at 4-star.</p><p dir="ltr">This study contributes to the literature by developing a systematic and behaviourally grounded probabilistic model to predict fake online reviews. Practically, detecting the probabilities of fake review enhances review transparency for consumers, protects brand reputation for businesses, and support fair review policies for regulators.</p>

History

Table of Contents

Chapter 1: Introduction -- Chapter 2: Literature Review -- Chapter 3: Conceptualisation -- Chapter 4: Methods -- Chapter 5: Empirical Results -- Chapter 6: General Discussion and Implication -- Reference -- Appendix A -- Appendix B

Awarding Institution

Macquarie University

Degree Type

Thesis MRes

Degree

Master of Research

Department, Centre or School

Department of Marketing

Year of Award

2025

Principal Supervisor

Jun Yao

Additional Supervisor 1

Darren Sung Uk Kim

Rights

Copyright: The Author Copyright disclaimer: https://www.mq.edu.au/copyright-disclaimer

Language

English

Extent

88 pages

Former Identifiers

AMIS ID: 512697

Usage metrics

    Macquarie University Theses

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC