posted on 2022-03-28, 02:39authored byUrvashi Khanna
Pretrained transformer-based language models have achieved state-of-the-art results on various Natural Language Processing (NLP) tasks. These models can be fine-tuned on a range of downstream tasks with minimalistic modifications. However, fine-tuning a language model may result in the problem of catastrophic forgetting and tend to overfit on smaller training datasets. Therefore, gradually unfreezing the pretrained weights is a possible approach to avoid catastrophic forgetting of the knowledge learnt from the source task. Multi-task fine-tuning is an intermediate step on a high-resource dataset that yields good results for low-resource tasks. In this project, we will be investigating the strategies of multi-task fine-tuning and gradual unfreezing on DistilBERT, which have not yet been applied for biomedical domain. First, we explore whether DistilBERT improves the accuracy of a low-resource dataset, BioASQ, with question answering (QA) task as our NLP use-case. Second, we investigate the effect that gradual unfreezing has on the performance of DistilBERT. We observe that despite being 40% smaller and without any domain-specific pretraining, DistilBERT achieves comparable results to a larger model, BERT on smaller BioASQ dataset. However, we observed that gradually unfreezing DistilBERT has no significant impact on the results of our QA task in comparison to standard non-gradual fine-tuning.
History
Table of Contents
1 Introduction -- 2 Background and literature review -- 3 Approach -- Fine-tuning DistilBERT -- 5 Gradual unfreezing experiments -- 6 Conclusion and future work
Notes
Bibliography: pages 49-57
Theoretical thesis.
Awarding Institution
Macquarie University
Degree Type
Thesis MRes
Degree
MRes, Macquarie University, Faculty of Science and Engineering, Department of Computing