Macquarie University
Browse
- No file added yet -

A graph-based solution for data discovery in large organisations

Download (1.31 MB)
thesis
posted on 2022-10-24, 23:32 authored by Hamidreza Ferdowsi

Being a data-driven organisation starts with finding the right datasets and then understanding and analysing the data. With a myriad of data sources across multiple heterogeneous databases, modern organisations face many problems in retrieving all the relevant data from their databases. In large organisations, data is stored in multiple databases from cloud environments to legacy warehouses and mainframe applications. Hence, discovering data has become a time consuming iterative process for data analysts. Data discovery is a business intelligence term for the process of collecting data from various databases and consolidating them into a single source. Traditional data discovery solutions were only focused on individual data units such as dimensional search engines (Ex. OLAP) and not the relationship among datasets. This is becoming problematic as the industry shows more interest in the relationship among datasets. Therefore, graph models are employed as an alternative method to manage and store the relationships among individual entities. Graphs also have shown their advantages in integrating data sources into a unified semantic graph as virtual knowledge graphs (VKGs) paradigm. However, more research needs to be done to make these solutions practical and simple to use. This thesis demonstrates the advantages of using graphs that can remedy some of the weaknesses of the existing technologies in the data discovery process. The proposed solution orchestrates a set of concepts and technologies to introduce a technological agnostic layer called Graph Gateway for hiding the complexity of the data environment for end-users. We replace the query language to help analysts formulate the query simply and exibly. The new proposed query language enables users to query several databases in a single query. Our model integrates the data catalogues and data dictionaries in the graph layer to help users formulate and complete their queries.

History

Table of Contents

1. Introduction -- 2. Background & literature review -- 3. Methodology -- 4. Experiment and evaluation -- 5. Conclusion and future work -- References

Notes

A thesis submitted to Macquarie University for the degree of Masters of Research

Awarding Institution

Macquarie University

Degree Type

Thesis MRes

Degree

Thesis (MRes), Macquarie University, Faculty of Science and Engineering, 2021

Department, Centre or School

Department of Computing

Year of Award

2021

Principal Supervisor

Michael Sheng

Additional Supervisor 1

Alireza Jolfaei

Rights

Copyright: Hamidreza Ferdowsi Copyright disclaimer: https://www.mq.edu.au/copyright-disclaimer

Language

English

Extent

78 pages

Usage metrics

    Macquarie University Theses

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC