01whole.pdf (4.07 MB)
Download file

Captioning images containing novel object

Download (4.07 MB)
thesis
posted on 28.03.2022, 16:39 authored by Yufei Wang
Image captioning is the task of describing images using natural language. Recent advances in deep neural networks have boosted the performance of image captioning systems on large benchmark datasets such as COCO [2]. However, these data-driven approaches result in low quality captions for images containing novel objects (i.e., image objects whose corresponding textual labels are not included in the parallel image-caption training data). This thesis aims to improve generated caption quality for images containing novel objects. We notice the limitations in previous novel object captioning benchmark and systems. The contributions of this thesis are two fold. The first contribution is a new evaluation dataset nocaps for novel object captioning, which is intended for evaluation of image captioning models trained on COCO. The nocaps benchmark is sampled from Open Images Dataset [3] with more than 400 classes of objects that are rarely seen in the COCO data. The second contribution is an improved novel object captioning model UpDown-C, which balances generation quality between in-domain and novel object captions. The evaluation results show that UpDown-C outperforms several strong baselines, including the state-of-the-art Up-Down model with CBS and NBT model, with substantial improvement over previous work and sets a new state-of-the-art on the nocaps benchmark.

History

Table of Contents

1. Introduction -- 2. Background and related work -- 3. nocaps: novel object captioning at scale -- 4. UpDown-C: a new novel object captioner -- 5. Summary and conclusions.

Notes

Theoretical thesis. Bibliography: pages 57-67

Awarding Institution

Macquarie University

Degree Type

Thesis MRes

Degree

MRes, Macquarie University, Faculty of Science and Engineering, Department of Computing

Department, Centre or School

Department of Computing

Year of Award

2019

Principal Supervisor

Mark Johnson

Rights

Copyright Yufei Wang 2019. Copyright disclaimer: http://mq.edu.au/library/copyright

Language

English

Extent

1 online resource (xi, 67 pages) colour illustrations

Former Identifiers

mq:72079 http://hdl.handle.net/1959.14/1281171