Captioning images containing novel object

Wang, Yufei

doi:10.25949/19436165.v1

01whole.pdf (4.07 MB)

Captioning images containing novel object

thesis

posted on 2022-03-28, 16:39 authored by Yufei Wang

Image captioning is the task of describing images using natural language. Recent advances in deep neural networks have boosted the performance of image captioning systems on large benchmark datasets such as COCO [2]. However, these data-driven approaches result in low quality captions for images containing novel objects (i.e., image objects whose corresponding textual labels are not included in the parallel image-caption training data). This thesis aims to improve generated caption quality for images containing novel objects. We notice the limitations in previous novel object captioning benchmark and systems. The contributions of this thesis are two fold. The first contribution is a new evaluation dataset nocaps for novel object captioning, which is intended for evaluation of image captioning models trained on COCO. The nocaps benchmark is sampled from Open Images Dataset [3] with more than 400 classes of objects that are rarely seen in the COCO data. The second contribution is an improved novel object captioning model UpDown-C, which balances generation quality between in-domain and novel object captions. The evaluation results show that UpDown-C outperforms several strong baselines, including the state-of-the-art Up-Down model with CBS and NBT model, with substantial improvement over previous work and sets a new state-of-the-art on the nocaps benchmark.

History

Notes

Theoretical thesis. Bibliography: pages 57-67

Awarding Institution

Macquarie University

Degree Type

Thesis MRes

Degree

MRes, Macquarie University, Faculty of Science and Engineering, Department of Computing

Department, Centre or School

Department of Computing

Year of Award

2019

Principal Supervisor

Mark Johnson

Rights

Copyright Yufei Wang 2019. Copyright disclaimer: http://mq.edu.au/library/copyright

Language

English

Extent

1 online resource (xi, 67 pages) colour illustrations

Former Identifiers

mq:72079 http://hdl.handle.net/1959.14/1281171

Usage metrics

Keywords

image captioning language-and-vision Image processing novel object captioning Computer vision

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Captioning images containing novel object

History

Table of Contents

Notes

Awarding Institution

Degree Type

Degree

Department, Centre or School

Year of Award

Principal Supervisor

Rights

Language

Extent

Former Identifiers

Usage metrics

Categories

Keywords

Licence

Exports