posted on 2022-03-28, 14:16authored byNatasha Fernandes
The problem of obfuscating the authorship of a text document has received little attention in the literature to date. Current approaches are ad-hoc and rely on assumptions about an adversary's auxiliary knowledge which makes it difficult to reason about the privacy properties of these methods. Another approach to privacy, known as differential privacy, is advocated in the literature for its strong privacy guarantees. However, differential privacy has been dismissed as an option for text document privacy due to its design around the release of aggregate statistics, and its dependence on notions of 'adjacency',neither of which apply to text document privacy. In addition, differential privacy does not permit the release of individual data points as required for text document publishing. However, a new approach to privacy known as generalised differential privacy extends differential privacy to arbitrary datasets with no notion of adjacency, and permits the private release of individual data points. In this thesis, we show to apply generalised differential privacy to author obfuscation, drawing inspiration from the example of geo-location privacy, and utilising existing tools and methods from the stylometry and natural language processing literature.