Modeling topics and knowledge bases with vector representations

Nguyen, Dat Quoc

doi:10.25949/19427933.v1

01whole.pdf (2.24 MB)

Modeling topics and knowledge bases with vector representations

thesis

posted on 2022-03-28, 02:01 authored by Dat Quoc Nguyen

Motivated by the recent success of utilizing latent feature vector representations (i.e. embeddings) in various natural language processing tasks, this thesis investigates how latent feature vector representations can help build better topic models and improve link prediction models for knowledge base completion. The first objective of this thesis is to incorporate latent feature word representations that contain external information from a large corpus in order to improve topic inference in a small corpus. The second objective is to develop new embedding models for predicting missing relationships between entities in knowledge bases. In particular, the major contributions of this thesis are summarized as follows: We propose two latent feature topic models which integrate word representations trained on large external corpora into two Dirichlet multinomial topic models: a Latent Dirichlet Allocation model and a one-topic-per-document Dirichlet Multinomial Mixture model. We introduce a triple-based embedding model named STransE to improve complex relationship prediction in knowledge bases. In addition, we also describe a new embedding approach, which combines the Latent Dirichlet Allocation model and the STransE model, to improve search personalization. We present a neighborhood mixture model where we formalize an entity representation as a mixture of its neighborhood in the knowledge base. Extensive experiments show that our proposed models and approach obtain better performance than well-established baselines in a wide range of evaluation tasks.

History

Notes

Empirical thesis. Bibliography: pages 119-145

Awarding Institution

Macquarie University

Degree Type

Thesis PhD

Degree

PhD, Macquarie University, Faculty of Science and Engineering, Department of Computing

Department, Centre or School

Department of Computing

Year of Award

2017

Principal Supervisor

Mark Johnson

Additional Supervisor 1

Mark Dras

Rights

Copyright Dat Quoc Nguyen 2017. Copyright disclaimer: http://mq.edu.au/library/copyright

Language

English

Extent

1 online resource (xx, 145 pages)

Former Identifiers

mq:70217 http://hdl.handle.net/1959.14/1261406

Usage metrics

Keywords

knowledge base embedding Computer simulation topic models Computational linguistics Expert systems (Computer science)vector representations Vector analysis

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Modeling topics and knowledge bases with vector representations

History

Table of Contents

Notes

Awarding Institution

Degree Type

Degree

Department, Centre or School

Year of Award

Principal Supervisor

Additional Supervisor 1

Rights

Language

Extent

Former Identifiers

Usage metrics

Categories

Keywords

Licence

Exports