Enabling scalable building analytics through metadata inference methods
Digitalisation of the built environment provides multiple benefits, such as operational and energy productivity improvements and supports the participation of buildings in the management of electricity networks. Automated methods to infer contextual information from Building Management Systems and the Internet of Things (IoT) sensor metadata plays a significant role in this process. Standards-based contextual information reduces the cost of deploying Smart Building Applications. Such applications are portable since they are developed against a standard Application Programming Interface. In this thesis, we investigate how to scale the deployment of building analytics to a large portfolio of commercial buildings. This is challenging because the Building Management Systems metadata is not standardised. The contextual information required must be inferred from unstructured text fields that are highly technical, informal, repetitive and heavily abbreviated. The same concept is often expressed in many different ways due to the lack of standardisation.
The first contribution of this thesis is a comprehensive review of the metadata mapping area in general. Next, we investigate Building Automation Systems and provide background into how they are designed and operated. We then review metadata schemas proposed for semantic interoperability. Finally, we reviewed the latest research on metadata mapping between the representation obtained from the Building Automation System into a form that contains the required semantic interoperability for Smart Building applications.
The second significant contribution of this work is to investigate the use of a global model – a single model trained on data from many buildings. We show that a transformer is capable of a generalised representation that can be used as the basis of a global model. We identify challenges in applying the transformer architecture to Building Automation System metadata and propose a tokenisation approach and a methodology to score the goodness of the representation. We demonstrate the ability of the model to scale using a classifier trained using data from 152 buildings.
The third contribution of this work is to develop a framework for applying weakly supervised machine learning to Building Automation System metadata mapping. We identify a lack of publically available datasets which researchers can use to develop and evaluate models and frameworks from which to create such datasets. We compare weak supervision to active learning by performing a case study using experts from a building analytics company. We show that weak supervision has the potential to scale better than active learning and is easier to implement in practice.
Our final contribution is a hybrid model that leverages language modelling and programmatic rules. We observe many different forms of technical abbreviations for the same short text descriptions. This implies that the vocabulary of unabbreviated words is considerably smaller than the vocabulary of all forms of abbreviated and unabbreviated words. Deep learning models also require large amounts of labelled data, and it is expensive and time-consuming to collect and annotate such datasets. Instead, we investigate a form of model based on finite-state transducers. We show that our hybrid method achieves state-of-the-art accuracy using significantly less data than a pure machine learning-based approach.