Macquarie University
01whole.pdf (4.75 MB)

Deep reinforcement learning as text generator in image captioning

Download (4.75 MB)
posted on 2022-03-28, 16:03 authored by Farhad Amouzgar
Reinforcement learning (RL), as one of the oldest AI paradigms, has led to exciting results in recent years. Even though the research frontiers in the field are game playing and robotics, the natural language processing (NLP) community has also found many applications of RL as a solution for optimizing non-differentiable metrics in deep learning, including in text generation, image captioning and chatbots. However, current literature is mainly focused on the REINFORCE algorithm and its derivatives. REINFORCE is a robust algorithm, but it dates back to the 1990s and suffers from high variance compared to modern RL algorithms. To address this challenge, we study and analyze the recent state-of-the-art in RL. Taking image captioning as our specific NLP use case, we identify Proximal Policy Optimization (PPO) RL algorithms as suitable updates for REINFORCE, and propose methods for optimizing non-differentiable captioning metrics based on these. We experimentally evaluate them with respect to the REINFORCE-based standard and find that, while the static clipping mechanism of PPO is the key aspect of state-of-the-art results in game playing, it does not improve over REINFORCE in image captioning; rather, the actor-critic aspect of the algorithms has a more significant impact on the convergence of the model.


Table of Contents

1. Introduction -- 2. Background and literature review -- 3. OpenAI Gym Experiments -- 4. Proposed method for image captioning -- 5. Experimental results for image captioning -- 6. Conclusion and future work.


Theoretical thesis. Bibliography: pages 66-72

Awarding Institution

Macquarie University

Degree Type

Thesis MRes


MRes, Macquarie University, Faculty of Science and Engineering, Department of Computing

Department, Centre or School

Department of Computing

Year of Award


Principal Supervisor

Amin Beheshti

Additional Supervisor 1

Mark Dras


Copyright Farhad Amouzgar 2019. Copyright disclaimer:




1 online resource (xiv, 72 pages) illustrations

Former Identifiers