01whole.pdf (4.07 MB)
Download file

Captioning images containing novel object

Download (4.07 MB)
posted on 28.03.2022, 16:39 by Yufei Wang
Image captioning is the task of describing images using natural language. Recent advances in deep neural networks have boosted the performance of image captioning systems on large benchmark datasets such as COCO [2]. However, these data-driven approaches result in low quality captions for images containing novel objects (i.e., image objects whose corresponding textual labels are not included in the parallel image-caption training data). This thesis aims to improve generated caption quality for images containing novel objects. We notice the limitations in previous novel object captioning benchmark and systems. The contributions of this thesis are two fold. The first contribution is a new evaluation dataset nocaps for novel object captioning, which is intended for evaluation of image captioning models trained on COCO. The nocaps benchmark is sampled from Open Images Dataset [3] with more than 400 classes of objects that are rarely seen in the COCO data. The second contribution is an improved novel object captioning model UpDown-C, which balances generation quality between in-domain and novel object captions. The evaluation results show that UpDown-C outperforms several strong baselines, including the state-of-the-art Up-Down model with CBS and NBT model, with substantial improvement over previous work and sets a new state-of-the-art on the nocaps benchmark.


Table of Contents

1. Introduction -- 2. Background and related work -- 3. nocaps: novel object captioning at scale -- 4. UpDown-C: a new novel object captioner -- 5. Summary and conclusions.


Theoretical thesis. Bibliography: pages 57-67

Awarding Institution

Macquarie University

Degree Type

Thesis MRes


MRes, Macquarie University, Faculty of Science and Engineering, Department of Computing

Department, Centre or School

Department of Computing

Year of Award


Principal Supervisor

Mark Johnson


Copyright Yufei Wang 2019. Copyright disclaimer: http://mq.edu.au/library/copyright




1 online resource (xi, 67 pages) colour illustrations

Former Identifiers

mq:72079 http://hdl.handle.net/1959.14/1281171