Macquarie University
Browse

Language model pretraining and fine-tuning for emotion classification tasks using Vent data

Download (1.7 MB)
thesis
posted on 2024-11-06, 05:50 authored by Allan Francia

Emotion classification of text data has numerous potential benefits in various applications ranging from assessing the mental health of the community through to understanding customer feedback on products and services. The recent development of transformers for language modelling has raised the benchmark results for numerous tasks in natural language processing. Alongside the performance improvement, the development of transformer models also presented several challenges such as availability of pretraining data, language model size (including parameters), training time and required computing power requirements. This research project determines if there are benefits in pretraining a transformer language model called BERT using emotion specific domain data from a social media platform called Vent. It compares the output of pretraining to just fine-tuning the original model. In doing so it also determines what size is required to pretrain the BERT model to generate reasonable results as well as how the size of pretraining data scales with the results within a constrained computing power budget. This research project also benchmarks its emotion classification results against the multilabel classification GoEmotions project. Although, the pretraining results did not outperform the original BERT models both for the multi-class and multilabel tasks, the results were close.

History

Table of Contents

1 Introduction -- 2 Literature review -- 3 Research methods -- 4 Results and discussion -- 5 Conclusion and future work -- Bibliography

Awarding Institution

Macquarie University

Degree Type

Thesis MRes

Degree

Master of Research

Department, Centre or School

School of Computing

Year of Award

2024

Principal Supervisor

Diego Molla-Aliod

Additional Supervisor 1

Cecile Paris

Rights

Copyright: The Author Copyright disclaimer: https://www.mq.edu.au/copyright-disclaimer

Language

English

Extent

54 pages

Former Identifiers

AMIS ID: 331893

Usage metrics

    Macquarie University Theses

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC