Optimisation techniques for storing and querying XML data in relational database systems
thesisposted on 2022-03-28, 14:36 authored by Mo'ad Maghaydah
Extensible Markup Language (XML) has recently emerged as a standard for electronic information interchange, due to its flexibility and portability. For instance, in service-oriented applications, XML messages are commonly used for inter-company interactions. However, those XML messages may be used for purposes other than transactions and data interchange (for example, purchase orders and invoice statements) and they must often be retained for later use and analysis. This requires scalable technology to effectively store and query XML data. Due to their widespread availability and robustness, relational database management systems (RDBMS) still offer the most affordable technology to develop XML database systems. However, the XML data model presents new challenges such as maintaining the document order and supporting complex structural-join queries, which require tree-aware processing mechanisms. While the state-of-the-art approaches to support XML data in relational systems require new algorithms and indexing techniques that make them powerful, it has been observed that some of those changes may not be directly applicable to relational database systems and/or they may present a trade-off between performance and storage usage. Further, the modification of the relational system's kernel is hardly an option for many RDBMS vendors. There are still considerable benefits in developing solutions that do not involve changes to the RDBMS's kernel, thereby reducing the cost of re-engineering relational database systems. In order to improve the process of storing and querying XML data in relational systems, in this thesis, we propose a new compact Dewey-based labelling scheme to support it. The new label structure, composed of two components (parent, child) in the Dewey format, would significantly improve the performance of those XML queries that are based on parent-child and sibling relationships. Moreover, we propose advanced query optimisation techniques based on certain features that exist in Dewey labels and based on a better utilisation of the document schema summary of XML documents. Our techniques are portable and can be applied to any Dewey-based labeling technique proposed for storing and querying XML data. Through extensive experimental studies, we show that these techniques make off-the-shelf relational systems more tree-aware, and significantly improve their capabilities to support XML data.