Generating Actionable Knowledge from Big Data: Knowledge Extraction and Truth Discovery

Fang, Xiu

doi:10.25949/19435886.v1

01whole.pdf (13.11 MB)

Generating Actionable Knowledge from Big Data: Knowledge Extraction and Truth Discovery

thesis

posted on 2022-03-28, 16:14 authored by Xiu Fang

To revolutionize our modern society by utilizing the wisdom of Big Data, considerable knowledge bases (KBs) have been constructed to feed the massive knowledge-driven applications with Resource Description Framework (RDF) triples. The important challenges for KB construction include extracting information from large-scale, possibly conflicting and different-structured data sources (i.e., the knowledge extraction problem) and reconciling the conflicts that reside in the sources (i.e., the truth discovery problem). Tremendous research efforts have been contributed on both problems respectively. However, the existing KBs are far from being comprehensive and accurate.In this dissertation, we first propose a system for generating actionable knowledge from Big Data, and use this system to construct a comprehensive KB, called GrandBase. Then we solve the raised research issues regarding GranbBase construction by developing a series of methodologies: Firstly, we study predicate extraction and implement ontology augmentation for knowledge base expansion. Secondly, we address truth discovery (on both single-valued and multi-valued objects or predicates) and performance evaluation on truth discovery methods for knowledge base purification. In particular, we first propose a framework for extracting new predicates from four types of data sources, namely Web texts, Document Object Model (DOM) trees, existing KBs, and query stream to augment the ontology of the existing KB (i.e., Freebase). We use query stream and two major KBs, DBpedia and Freebase,to seed the predicate extraction from Web texts and DOM trees. Then, to estimate value veracity for multi-valued objects, we model the endorsement relations amongsourcesbyquantifyingtheirtwo-sidedinter-sourceagreements. Twoaspectsofsource reliabilityarederivedfromthetwographsconstructedbymodelingtheinter-sourcerelations. To more precisely estimate source reliability for effective multi-valued truth discovery, our graph-based model incorporates four important implications, including two types of source relations, object popularity, loose mutual exclusion, and long-tail phenomenon on source coverage. After that, to fully leverage the advantages of the existing truth discovery methods and achieve more robust and better truth discovery, we propose to extract truth from the prediction results of those methods. Our ensemble approach distinguishes between the single-valued and multi-valued truth discovery problems. Finally,for performance evaluation of truth discovery methods, as the ground truth may be very limited or even impossible to obtain, we make the attempt towards conducting evaluation without using ground truth. For each of the models and approaches presented in this dissertation, we have conducted extensive experiments using either real-world or synthetic datasets. Empirical studies show the effectiveness of our approaches. Finally, we also discuss the future research directions regarding GrandBase construction and extension in this dissertation.

History

1 Introduction -- 2 Background -- 3 Attribute Extraction for Knowledge Base Expansion -- 4 Multi-Valued Truth Discovery via Inter-Source Agreements -- 5 A Full-Fledged Graph-Based Model for Multi-Valued Truth Discovery -- 6 An Ensemble Approach For Better Truth Discovery --7 Performance Evaluationon Truth Discovery Methods -- 8 Conclusion -- References.

Notes

Theoretical thesis. Bibliography: pages 161-196

Awarding Institution

Macquarie University

Degree Type

Thesis PhD

Degree

PhD, Macquarie University, Faculty of Science and Engineering, Department of Computing

Department, Centre or School

Faculty of Science and Engineering

Year of Award

2018

Principal Supervisor

Michael Sheng

Additional Supervisor 1

Anne H.H. Ngu

Additional Supervisor 2

Jian Yang

Rights

Copyright Xiu Fang 2018. Copyright disclaimer: http://mq.edu.au/library/copyright

Language

English

Extent

xx, 196 pages illustrations

Former Identifiers

mq:70784 http://hdl.handle.net/1959.14/1267701

Usage metrics

Keywords

Data mining Computer science knowledge extraction big data Document Object Model (Web site development technology)truth discovery Expert systems (Computer science)

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Generating Actionable Knowledge from Big Data: Knowledge Extraction and Truth Discovery

History

Table of Contents

Notes

Awarding Institution

Degree Type

Degree

Department, Centre or School

Year of Award

Principal Supervisor

Additional Supervisor 1

Additional Supervisor 2

Rights

Language

Extent

Former Identifiers

Usage metrics

Categories

Keywords

Licence

Exports