<p dir="ltr">Online reviews significantly influence consumer purchase decisions and business reputations on digital platforms. However, the widespread presence of fake reviews not only misguides consumers but also undermines the reliability of online reviews. Therefore, detecting fake reviews is crucial for various stakeholders and has evolved from simple visual inspection to the use of advanced machine learning models. Despite the proven effectiveness of machine learning models in fake review detection, several key research gaps remain.</p><p dir="ltr">First, most predictive models rely on binary classification (i.e., true vs. fake), which assumes that review authenticity is a clear-cut distinction between real and fake. However, online reviews often lie on a spectrum of deception, where subtle manipulation, exaggeration, and subjective bias blur the boundary between authenticity and fabrication. Second, prior research lacks a clearly structured framework that accounts for behavioural features in predicting deceptive reviews.</p><p dir="ltr">To address these gaps, this study develops a probabilistic modelling framework that: 1) estimates the likelihood of deception on a continuous spectrum, allowing for more flexible and accurate detection beyond a rigid true/false label; 2) systematically organizes behavioural features into three dimensions – structural (e.g., number of friends), relational (e.g., number of compliments) and activity-based (e.g., review frequency), based on Social Capital Theory; 3) incorporates store-level attributes, such as average rating, number of reviews, and consumer-generated “vibes”.</p><p dir="ltr">This study uses a dataset of 32,964 restaurant reviews from Yelp, covering both real and fake reviews from January 2005 to November 2024. A range of probabilistic machine models – including Logistic Regression, Decision Tree, Random Forest, XGBoost, and Artificial Neural Networks – are implemented and compared using 5-fold cross-validation. XGBoost is selected as the final model based on overall superior performance across Precision, Recall, F1-Score, Accuracy, and Area Under the ROC Curve (AUC).</p><p dir="ltr">Results show that relational and structural behavioural features are among the top predictors. Interestingly, reviewers high in social capital features – such as elite reviewers (designated by Yelp as “elite” based on their activity, consistency, and perceived credibility) – are more likely to generate fake reviews. The average probability of fake review among elite reviewers is 33.5%, compared to 12% among non-elite reviewers. At the store level, reviews associated with high-rated businesses tend to exhibit higher probabilities of being fake. Furthermore, the distribution of fakeness probability across star ratings follows a U-shaped pattern, with reviews at 1-star and 5-star exhibiting higher probabilities of deception, and a secondary peak observed at 4-star.</p><p dir="ltr">This study contributes to the literature by developing a systematic and behaviourally grounded probabilistic model to predict fake online reviews. Practically, detecting the probabilities of fake review enhances review transparency for consumers, protects brand reputation for businesses, and support fair review policies for regulators.</p>
History
Table of Contents
Chapter 1: Introduction -- Chapter 2: Literature Review -- Chapter 3: Conceptualisation -- Chapter 4: Methods -- Chapter 5: Empirical Results -- Chapter 6: General Discussion and Implication -- Reference -- Appendix A -- Appendix B
Awarding Institution
Macquarie University
Degree Type
Thesis MRes
Degree
Master of Research
Department, Centre or School
Department of Marketing
Year of Award
2025
Principal Supervisor
Jun Yao
Additional Supervisor 1
Darren Sung Uk Kim
Rights
Copyright: The Author
Copyright disclaimer: https://www.mq.edu.au/copyright-disclaimer