Macquarie University
01whole.pdf (2.04 MB)

Gradual unfreezing transformer-based language models for biomedical question answering

Download (2.04 MB)
posted on 2022-03-28, 02:39 authored by Urvashi Khanna
Pretrained transformer-based language models have achieved state-of-the-art results on various Natural Language Processing (NLP) tasks. These models can be fine-tuned on a range of downstream tasks with minimalistic modifications. However, fine-tuning a language model may result in the problem of catastrophic forgetting and tend to overfit on smaller training datasets. Therefore, gradually unfreezing the pretrained weights is a possible approach to avoid catastrophic forgetting of the knowledge learnt from the source task. Multi-task fine-tuning is an intermediate step on a high-resource dataset that yields good results for low-resource tasks. In this project, we will be investigating the strategies of multi-task fine-tuning and gradual unfreezing on DistilBERT, which have not yet been applied for biomedical domain. First, we explore whether DistilBERT improves the accuracy of a low-resource dataset, BioASQ, with question answering (QA) task as our NLP use-case. Second, we investigate the effect that gradual unfreezing has on the performance of DistilBERT. We observe that despite being 40% smaller and without any domain-specific pretraining, DistilBERT achieves comparable results to a larger model, BERT on smaller BioASQ dataset. However, we observed that gradually unfreezing DistilBERT has no significant impact on the results of our QA task in comparison to standard non-gradual fine-tuning.


Table of Contents

1 Introduction -- 2 Background and literature review -- 3 Approach -- Fine-tuning DistilBERT -- 5 Gradual unfreezing experiments -- 6 Conclusion and future work


Bibliography: pages 49-57 Theoretical thesis.

Awarding Institution

Macquarie University

Degree Type

Thesis MRes


MRes, Macquarie University, Faculty of Science and Engineering, Department of Computing

Department, Centre or School

Department of Computing

Year of Award


Principal Supervisor

Diego Mollá Aliod


Copyright Urvashi Khanna 2021 Copyright disclaimer:




1 online resource (xi, 61 pages) illustrations

Former Identifiers