posted on 2022-03-28, 16:03authored byFarhad Amouzgar
Reinforcement learning (RL), as one of the oldest AI paradigms, has led to exciting results in recent years. Even though the research frontiers in the field are game playing and robotics, the natural language processing (NLP) community has also found many applications of RL as a solution for optimizing non-differentiable metrics in deep learning, including in text generation, image captioning and chatbots. However, current literature is mainly focused on the REINFORCE algorithm and its derivatives. REINFORCE is a robust algorithm, but it dates back to the 1990s and suffers from high variance compared to modern RL algorithms. To address this challenge, we study and analyze the recent state-of-the-art in RL. Taking image captioning as our specific NLP use case, we identify Proximal Policy Optimization (PPO) RL algorithms as suitable updates for REINFORCE, and propose methods for optimizing non-differentiable captioning metrics based on these. We experimentally evaluate them with respect to the REINFORCE-based standard and find that, while the static clipping mechanism of PPO is the key aspect of state-of-the-art results in game playing, it does not improve over REINFORCE in image captioning; rather, the actor-critic aspect of the algorithms has a more significant impact on the convergence of the model.
History
Table of Contents
1. Introduction -- 2. Background and literature review -- 3. OpenAI Gym Experiments -- 4. Proposed method for image captioning -- 5. Experimental results for image captioning -- 6. Conclusion and future work.
Notes
Theoretical thesis.
Bibliography: pages 66-72
Awarding Institution
Macquarie University
Degree Type
Thesis MRes
Degree
MRes, Macquarie University, Faculty of Science and Engineering, Department of Computing