Macquarie University
Browse

Data reconstruction attack in distributed learning

Download (1.48 MB)
thesis
posted on 2025-07-23, 05:21 authored by Weijun Li
<p dir="ltr">As the scale of model parameters continues to grow, the demand for large amounts of data in modern machine learning models has increased, raising significant concerns over data security. Distributed learning methods, such as Federated Learning, have been introduced to mitigate these concerns by keeping data on every user’s device. However, recent studies have shown that these methods are vulnerable to data reconstruction attacks through gradient inversion, leading to data leakage risks as private data can be reconstructed from shared gradients. In this work, we extend research on attack and defense strategies for such data reconstruction attacks in the context of training language models. </p><p dir="ltr">On the <i>attack </i>side, unlike previous efforts that typically use gradients from all model parameters, we hypothesize that most involved modules, or even their sub-modules, are at risk of training data leakage. We validate these vulnerabilities across various intermediate layers of language models. Our experiments reveal that gradients from a single Transformer layer, or even a single linear component with just 0.54% of the parameters, are susceptible to training data leakage, in the worst case being of the same magnitude as information from all layers. Additionally, we observe that a weighted combination of key modules enhances the attack. </p><p dir="ltr">On the <i>defense </i>side, we examine the effect of Differentially Private Stochastic Gradient Descent (DP-SGD) against this new data leakage threat and find it challenging to balance privacy protection with downstream utility. As an alternative, we propose and validate a selective noise-adding paradigm, which achieves comparable defense performance while offering more flexibility by targeting specific components of SVD-decomposed gradients. This method demonstrates an improved balance between privacy and utility, potentially inspiring future research aimed at improving differential privacy practices.</p>

History

Table of Contents

1. Introduction -- 2. Literature Review -- 3. Data Leakage from Partial Gradients -- 4. Defense Against Data Reconstruction Attacks -- 5. Conclusion -- References

Awarding Institution

Macquarie University

Degree Type

Thesis MRes

Degree

Master of Research

Department, Centre or School

School of Computing

Year of Award

2025

Principal Supervisor

Mark Dras

Additional Supervisor 1

Qiongkai Xu

Rights

Copyright: The Author Copyright disclaimer: https://www.mq.edu.au/copyright-disclaimer

Language

English

Extent

103 pages

Former Identifiers

AMIS ID: 413285

Usage metrics

    Macquarie University Theses

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC