Differential privacy for metric spaces: information-theoretic models for privacy and utility with new applications to metric domains
The problem of data privacy – protecting sensitive or personal data from discovery – has been a long-standing research issue. In this regard, differential privacy, introduced in 2006, is considered to be the gold standard. Differential privacy was designed to protect the privacy of individuals in statistical datasets such as census datasets. Its widespread popularity has led to interest in applying differential privacy to new domains for which it was not originally designed, such as text documents. This raises questions regarding the interpretability of differential privacy’s guarantees, which are usually expressed in the language of statistical disclosure control. In addition, it escalates the need for answers to core issues currently debated in the differential privacy community: how does the application of differential privacy protect against inference attacks? How can the use of noise-adding mechanisms guarantee the release of useful information? And how can this privacy-utility balance be achieved? The goal of this thesis is to address these foundational questions. Firstly, we approach the problem of interpretability by exploring a generalisation of differential privacy for metric domains known as metric differential privacy or d-privacy. Metric differential privacy abstracts away from the particulars of statistical databases and permits reasoning about privacy on more general domains endowed with a metric. This allows differential privacy’s guarantees to be understood in more general terms which can be applied to arbitrary domains of interest, including text documents. Secondly, we propose to study the key questions surrounding privacy and utility in differential privacy using the Quantitative Information Flow (QIF) framework, an information-theoretic framework previously used to analyse threats to secure systems. In this thesis, we repurpose QIF to analyse the privacy and utility guarantees provided by differentially private systems modelled as probabilistic channels. Using information flow analysis we examine the privacy characteristics of d-private mechanisms, finding new ways to compare them with respect to the protection they afford against arbitrary adversarial threats; we examine the utility characteristics of d-private mechanisms, discovering a new characterisation for optimal mechanisms and a proof of the universal optimality of the Laplace mechanism; and we re-examine the well-known privacy-utility trade-off for d-private mechanisms, finding new models for describing the relationship between privacy and utility via correlations. The second part of this thesis is dedicated to the demonstration of the practical applicability of d-privacy to novel and complex domains. We present three new sample applications of d-privacy: to text document privacy, statistical utility and private nearest neighbour search. In each of these applications, we show how the use of d-privacy, and an understanding of the metrics on the domain, permit reasoning about privacy and utility. This opens up new methods of exploring privacy in these domains, as well as providing guidelines for further applications of differential privacy to new domains.