01whole.pdf (2.04 MB)
Gradual unfreezing transformer-based language models for biomedical question answering
thesisposted on 2022-03-28, 02:39 authored by Urvashi Khanna
Pretrained transformer-based language models have achieved state-of-the-art results on various Natural Language Processing (NLP) tasks. These models can be fine-tuned on a range of downstream tasks with minimalistic modifications. However, fine-tuning a language model may result in the problem of catastrophic forgetting and tend to overfit on smaller training datasets. Therefore, gradually unfreezing the pretrained weights is a possible approach to avoid catastrophic forgetting of the knowledge learnt from the source task. Multi-task fine-tuning is an intermediate step on a high-resource dataset that yields good results for low-resource tasks. In this project, we will be investigating the strategies of multi-task fine-tuning and gradual unfreezing on DistilBERT, which have not yet been applied for biomedical domain. First, we explore whether DistilBERT improves the accuracy of a low-resource dataset, BioASQ, with question answering (QA) task as our NLP use-case. Second, we investigate the effect that gradual unfreezing has on the performance of DistilBERT. We observe that despite being 40% smaller and without any domain-specific pretraining, DistilBERT achieves comparable results to a larger model, BERT on smaller BioASQ dataset. However, we observed that gradually unfreezing DistilBERT has no significant impact on the results of our QA task in comparison to standard non-gradual fine-tuning.