Macquarie University
Browse

Knowledge Graph Representation Learning based on low-dimensional embedding space

Download (8.66 MB)
thesis
posted on 2024-08-01, 03:11 authored by Kai Wang

Knowledge Graph Representation Learning has drawn great attention in the Artificial Intelligence (AI) and Knowledge Graph (KG) domains. It aims to represent entities and relations in a KG as low-dimensional real-number embedding vectors via a Knowledge Graph Embedding (KGE) model. By utilizing vector computation of entities and relations, link prediction based on KGE models can predict the missing element in a KG triple, and have significant potentials on automatically KG completion and KG reasoning. Effective knowledge graph representation learning can act as an effective data channel between the discrete knowledge graph and deep neural networks, and greatly enhance the application value of KGs in various AI tasks, laying the foundation for the next technological progress of cognitive intelligence and even general artificial intelligence.

To improve the predicting accuracy of KGE models, recent research tends to propose high-dimensional big models. These big models usually use a high-dimensional vector with hundreds or even thousands of dimensions to represent each entity, and achieve a little accuracy improvement on the benchmark dataset. However, these KGE models require numerous training and storing costs when facing large-scale KGs with millions or billions of entities. This prevents downstream AI applications from promptly updating KG embeddings or being deployed on resource-limited edge devices, and limits the research progress of knowledge graph representation learning.

To this end, this thesis focuses on the representation learning problem of knowledge graph under the condition of low-dimensional vector space and aims to avoid the parameter explosion problem of previous work and meet the efficiency requirements of practical applications. This thesis analyzes the influencing factors of model performance and optimizes the key technical components of the KGE model from three perspectives. Several effective methods are proposed for lightweight, high-accuracy, and low-cost knowledge graph representation learning.

The research content of this thesis contains the following three parts:

• KGE enhancement based on multi-source information integration. First, for the problem of unbalanced external information distribution of knowledge graph entities, a KGE enhancement framework based on composite neighbors is proposed; It integrates entity features in textual descriptions and topological neighbors to enhance the entity embedding vector of the KGE model. Second, considering the problems of insufficient representation ability of low-dimensional vector space and high training cost of high-dimensional teacher model, a novel multi-teacher active distillation framework is proposed. The prediction results of multiple pre-trained low-dimensional models are integrated to provide effective supervision information for the student model.

• Efficient KGE model based on low-dimensional Euclidean space. Considering the high computational complexity of the existing hyperbolic geometric models, lightweight scoring functions based on Euclidean space are proposed, which improves the model representation ability while keeping a low computational complexity. By further considering the insufficient prediction accuracy of existing models and the difficulty in evaluating triple confidence, this thesis proposes a confidence measurement method of knowledge graph embedding model based on causal intervention theory, which effectively improves the prediction accuracy of triples with high confidence.

• KGE training strategy based on contrastive learning insights. This thesis deeply analyzes the relationship between knowledge representation learning and self-supervised contrastive learning. Based on the latest analysis results in the field of contrastive learning, a new KGE training strategy is proposed. Considering the drawbacks of the existing negative sampling loss function, a loss function based on query sampling is designed, which efficiently realizes the feature alignment of positive samples and the uniformity of entity distribution. Furthermore, a lightweight difficulty-aware activation mechanism is proposed to speed up the training convergence speed of the knowledge graph embedding model.

Funding

http://purl.org/au-research/grants/arc/FT140101247

http://purl.org/au-research/grants/arc/DP200102298

History

Table of Contents

1 Introduction -- 2 Literature review -- 3 KGE enhancement based on multi-source information integration -- 4 Efficient KGE model based on low-dimensional Euclidean space -- 5 KGE training strategy based on contrastive learning -- 6 Conclusion -- List of symbols -- References

Awarding Institution

Macquarie University

Degree Type

Thesis PhD

Degree

Doctor of Philosophy

Department, Centre or School

School of Computing

Year of Award

2022

Principal Supervisor

Michael Sheng

Additional Supervisor 1

Yu Liu

Rights

Copyright: The Author Copyright disclaimer: https://www.mq.edu.au/copyright-disclaimer

Language

English

Extent

200 pages

Usage metrics

    Macquarie University Theses

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC