Quantifying privacy threats in aggregated energy data analytics
The ever-growing technological landscape, information sharing among multiple stakeholders in business ecosystems and the commercialisation of user data could be combined to form an increasing threat to the privacy of individual users. A massive amount of user specific fine-grained data is being collected from different applications, such as web browsers, social networks, location-enabled applications and cyber-physical systems (e.g., smart grids), and beneficially, this could be used for enhancing user experience, sustainable urban planning, renewable energy planning, system monitoring, management and data analytics purposes. Further, this crowdsourced data is being shared with multiple stakeholders for business analytics and societal benefits. However, preserving the privacy of individual users has become an increasingly difficult challenge with the exponential growth in data analyses techniques, use of artificial intelligence and machine learning algorithms.
In this thesis, we attempt to quantify privacy threats related to aggregated smart meter data analytics. Smart meters, an integral component of the smart grids infrastructure, are now widely deployed by electricity providers and retailers to monitor in real time fine-grained energy consumption of households. Smart meter data is collected and shared with the different stakeholders involved in a smart grid ecosystem. The fine-grained energy data is extremely useful for grid operations and maintenance, monitoring and for market segmentation purposes. However, sharing and releasing finegrained energy data can allow explicit violations of private information of consumers. Service providers share and release aggregated statistics, and the data aggregation is aimed at reducing the risks of individual consumption traces being revealed so as to preserve the privacy of consumers. However, our investigation shows that fine-grained energy consumption traces of individual users can be inferred from aggregated statistics by adversaries, whom, having access to different level of background resources, attempt to reconstruct individual consumption patterns from aggregated data.
First, we assessed the capability of an adversary that can reconstruct fine-grained energy consumption traces of individual consumers by exploiting the consistency (similar consumption patterns over time) and distinctiveness (one household's energy consumption pattern is significantly different from that of others) properties of individual consumption load patterns. We propose an unsupervised attack framework to recover the hourly energy consumption time-series of individual users without any prior knowledge. We pose the problem of assigning aggregated energy consumption meter readings to individuals as an assignment problem and solve it by use of the Hungarian algorithm. Our findings highlight that individual consumption traces can be recovered from aggregated statistics with very high accuracy.
Second, we investigated the extent to which statistical predictive models leak information about their training data. More specifically, based on the use case of household (electrical) energy consumption, we evaluate whether white-box access to auto-regressive (AR) models trained on such data together with background information, such as household energy data aggregates (e.g., monthly billing information) and publicly-available weather data, can lead to inferring the fine-grained energy data of any particular household. We constructed different adversarial models aiming to infer fine-grained energy consumption patterns. All threat models used the monthly billing information of target households. The first adversary only uses monthly aggregates to estimate daily consumption. The second adversary was given additional access to the parameters of the statistical models trained on target households, whereas the third adversary had access to the statistical model for a cluster of households containing the target household. We demonstrate that these adversaries can apply maximum a posteriori estimation to reconstruct daily consumption of target households with significantly lower error than the first adversary, which serves as a baseline. Such fine-grained data can essentially expose private information, such as occupancy levels.
Finally, we used differential privacy to alleviate the privacy concerns of the adversaries in dis-aggregating energy data. Our evaluations show that differentially private model parameters offer strong privacy protection against an adversary with moderate utility, captured in terms of model fitness. Furthermore, we apply local differential privacy (LDP) to perturb the aggregated monthly bills before releasing or sharing with other third-party agencies. We then evaluate adversarial performance, having access to perturbed model parameters along with perturbed monthly aggregates, at inferring fine-grained energy consumption traces. Finally, we quantify to what extent utility of the aggregated energy consumption data (monthly energy bills) is affected using mean relative error (MRE) metric.