Image captioning is the task of describing images using natural language. Recent advances in deep neural networks have boosted the performance of image captioning systems on large benchmark datasets such as COCO [2]. However, these data-driven approaches result in low quality captions for images containing novel objects (i.e., image objects whose corresponding textual labels are not included in the parallel image-caption training data). This thesis aims to improve generated caption quality for images containing novel objects. We notice the limitations in previous novel object captioning benchmark and systems. The contributions of this thesis are two fold. The first contribution is a new evaluation dataset nocaps for novel object captioning, which is intended for evaluation of image captioning models trained on COCO. The nocaps benchmark is sampled from Open Images Dataset [3] with more than 400 classes of objects that are rarely seen in the COCO data. The second contribution is an improved novel object captioning model UpDown-C, which balances generation quality between in-domain and novel object captions. The evaluation results show that UpDown-C outperforms several strong baselines, including the state-of-the-art Up-Down model with CBS and NBT model, with substantial improvement over previous work and sets a new state-of-the-art on the nocaps benchmark.
History
Table of Contents
1. Introduction -- 2. Background and related work -- 3. nocaps: novel object captioning at scale -- 4. UpDown-C: a new novel object captioner -- 5. Summary and conclusions.
Notes
Theoretical thesis.
Bibliography: pages 57-67
Awarding Institution
Macquarie University
Degree Type
Thesis MRes
Degree
MRes, Macquarie University, Faculty of Science and Engineering, Department of Computing
Department, Centre or School
Department of Computing
Year of Award
2019
Principal Supervisor
Mark Johnson
Rights
Copyright Yufei Wang 2019.
Copyright disclaimer: http://mq.edu.au/library/copyright