01whole.pdf (2.24 MB)
Download file

Modeling topics and knowledge bases with vector representations

Download (2.24 MB)
thesis
posted on 28.03.2022, 02:01 authored by Dat Quoc Nguyen
Motivated by the recent success of utilizing latent feature vector representations (i.e. embeddings) in various natural language processing tasks, this thesis investigates how latent feature vector representations can help build better topic models and improve link prediction models for knowledge base completion. The first objective of this thesis is to incorporate latent feature word representations that contain external information from a large corpus in order to improve topic inference in a small corpus. The second objective is to develop new embedding models for predicting missing relationships between entities in knowledge bases. In particular, the major contributions of this thesis are summarized as follows: We propose two latent feature topic models which integrate word representations trained on large external corpora into two Dirichlet multinomial topic models: a Latent Dirichlet Allocation model and a one-topic-per-document Dirichlet Multinomial Mixture model. We introduce a triple-based embedding model named STransE to improve complex relationship prediction in knowledge bases. In addition, we also describe a new embedding approach, which combines the Latent Dirichlet Allocation model and the STransE model, to improve search personalization. We present a neighborhood mixture model where we formalize an entity representation as a mixture of its neighborhood in the knowledge base. Extensive experiments show that our proposed models and approach obtain better performance than well-established baselines in a wide range of evaluation tasks.

History

Table of Contents

1. Introduction -- 2. Background -- 3. Improving topic models with word representations -- 4. STransE: a novel embedding model of entities and relationships -- 5. Neighborhood mixture model for knowledge base completion -- 6. Conclusion -- Bibliography.

Notes

Empirical thesis. Bibliography: pages 119-145

Awarding Institution

Macquarie University

Degree Type

Thesis PhD

Degree

PhD, Macquarie University, Faculty of Science and Engineering, Department of Computing

Department, Centre or School

Department of Computing

Year of Award

2017

Principal Supervisor

Mark Johnson

Additional Supervisor 1

Mark Dras

Rights

Copyright Dat Quoc Nguyen 2017. Copyright disclaimer: http://mq.edu.au/library/copyright

Language

English

Extent

1 online resource (xx, 145 pages)

Former Identifiers

mq:70217 http://hdl.handle.net/1959.14/1261406