Incremental graph pattern based node matching

Sun, Guohao

doi:10.25949/19431395.v1

01whole.pdf (1.61 MB)

Incremental graph pattern based node matching

thesis

posted on 2022-03-28, 10:21 authored by Guohao Sun

Graph Pattern based Subgraph Matching (GPSM) is used to identify all the matching subgraphs of a pattern graph GP in a data graph GD. The existing GPSM solutions are based on subgraph isomorphism or Bounded Graph Simulation (BGS), which aims to find the entire matching subgraphs in GD. However, in some real applications, such as group finding and expert recommendation, people are more interested in identifying some nodes based on a specified structure between them, leading to the Graph Pattern based Node Matching (GPNM) problem. GPNM aims to find all the matches of the nodes in a data graph GD based on a given pattern graph GP. Firstly, in real scenarios, both GP and GD are updated frequently. However, the existing GPNM methods must perform a new GPNM procedure from scratch to deliver the node matching results based on the updated GP and updated GD, which consumes significant time. Therefore, there is a pressing need for a method to efficiently deliver the node matching results on the updated graphs. To address this problem, in this thesis, we propose a novel incremental GPNM method called INC-GPNM, where we first build an index to incrementally record the shortest path length range between different label types in GD, and then identify the affected parts of GD in GPNM including nodes and edges with respect to the updates of GP and GD. Based on the index structure and our novel search strategies, INC-GPNM can efficiently deliver node matching results taking the updates of GP and GD as inputs, and can greatly reduce the query processing time with improved time complexity. Secondly, in some real applications (e.g., social graph searches on Facebook), many typical pattern graphs frequently and repeatedly appear in users' queries in a short period of time. In this situation, it is still time-consuming to apply the incremental GPNM procedure for each of the updates in data graph. To address this problem, in this thesis, we first analyze the updates in the data graph and find that not all the updates in the data graph essentially affect the GPNM results. Then, we propose the notion of elimination relationships and analyze the elimination relationships between multiple updates in the data graph. In addition, if one update Ua can eliminate the other update Ub, and Ub can eliminate update Uc, there exists the hierarchical structure between these elimination relationships. We further generate an Elimination Hierarchy Tree (EH-Tree) to index the elimination relationships. Based on the EH-Tree, we propose a GPNM method called EH-GPNM, that considers the elimination relationships between multiple updates in the data graph. EH-GPNM can efficiently deliver node matching results when facing frequent and repeated pattern graphs with multiple updates in the data graph. Thirdly, inspired by EH-GPNM, we noted that the elimination relationships not only exist among the updates in the data graph, but are also present among the updates in the pattern graph and even in the cross updates from GP and GD. To further improve the GPNM efficiency when both GP and GD are updated frequently, in this thesis, we propose a more efficient GPNM method, called UA-GPNM. UA-GPNM first detects the elimination relationships between multiple independent updates in GP and GD, and also the cross elimination relationships between the updates from GP and GD, then UA-GPNM generates an EH-Tree to index all the elimination relationships. In addition, we also propose a graph partition strategy in UA-GPNM to accelerate the GPNM procedure. The experiments show that UA-GPNM can achieve better efficiency compared with INC-GPNM and EH-GPNM when facing the updates of GP and GD. All the methods proposed above in this thesis have been validated and evaluated through sufficient experiments and theoretical analysis. The results have demonstrated that the proposed methods significantly outperform the existing work of GPNM -- abstract.

History

Chapter 1. Introduction -- Chapter 2. Literature review -- Chapter 3. Incremental graph pattern based node matching with multiple updates -- Chapter 4. Incremental graph pattern based node matching considering the elimination relationships among updates in data graphs -- Chapter 5. Incremental graph pattern based node matching considering the elimination relationships among updates in both pattern graphs and data graphs -- Chapter 6. Conclusions and future work -- Appendices

Notes

Theoretical thesis. Bibliography: pages 147-162

Awarding Institution

Macquarie University

Degree Type

Thesis PhD

Degree

PhD, Macquarie University, Faculty of Science and Engineering, Department of Computing

Department, Centre or School

Department of Computing

Year of Award

2019

Principal Supervisor

Yan Wang

Additional Supervisor 1

Mehmet A. Orgun

Additional Supervisor 2

Guanfeng Liu

Rights

Copyright Guohao Sun 2019 Copyright disclaimer: http://mq.edu.au/library/copyright

Language

English

Extent

1 online resource (vii, 162 pages)

Former Identifiers

mq:72056 http://hdl.handle.net/1959.14/1280941

Usage metrics

Keywords

Graph databases Information retrieval graph matching elimination relationship graph partition Information storage and retrieval systems -- Databases Information storage and retrieval systems

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Incremental graph pattern based node matching

History

Table of Contents

Notes

Awarding Institution

Degree Type

Degree

Department, Centre or School

Year of Award

Principal Supervisor

Additional Supervisor 1

Additional Supervisor 2

Rights

Language

Extent

Former Identifiers

Usage metrics

Categories

Keywords

Licence

Exports