Augmented Mixup Procedure for Privacy-Preserving Collaborative Training

Published: 02 May 2026, Last Modified: 02 May 2026Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Mixup involves training neural networks on convex combinations of input samples and labels and has been adapted for privacy-preserving collaborative training, most notably in InstaHide. However, mixing-based obfuscation schemes create structured linear systems that can be exploited to reconstruct the underlying private data. We propose a singularized mixup procedure that injects controlled perturbations prior to forming convex combinations, rendering the resulting inverse problem ill-conditioned while preserving discriminative structure. We provide an average-case theoretical analysis that characterizes the security--utility trade-off via minimax reconstruction bounds and directional signal-to-noise ratio control. Empirically, we evaluate classification accuracy on MNIST, CIFAR-10, CIFAR-100, and Tiny-ImageNet, and compare against InstaHide, observing competitive or improved accuracy under strong privacy settings. We assess robustness against both linear and nonlinear reconstruction attacks, including at-scale linear inversion experiments on CIFAR-5M. In a collaborative training setting with multiple parties and heterogeneous data partitions, we further compare against standard federated learning (FedProx), showing that singularized mixup enables accurate centralized training without iterative gradient exchange and yields improved robustness and performance in heterogeneous regimes. Overall, our results demonstrate that singularized mixup substantially degrades reconstruction quality while maintaining strong predictive performance, providing a practical and scalable approach to privacy-preserving collaborative learning.
Submission Type: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=NW8ebJTDON
Changes Since Last Submission: Dear Editors, We wish to disclose that this submission was previously reviewed and rejected by ICLR 2026. Below, we include the full meta-review for your reference: ``` Meta Review of Submission18201 by Area Chair jfGi Meta Reviewby Area Chair jfGi04 Jan 2026, 18:10 (modified: 26 Jan 2026, 15:31)Senior Area Chairs, Area Chairs, Authors, Program ChairsRevisions Summary: My suggested decision to Reject is based on the synthesis of four reviews. While the authors provided a substantial rebuttal that effectively addressed concerns regarding heuristic parameter selection and experimental scope (adding Tiny-ImageNet and Federated Learning), fundamental concerns regarding the theoretical foundation remain. Specifically, reviewers (xaSY, 9hNT) noted that the new theoretical analysis relies on idealized assumptions (e.g., isotropic Gaussian data) that do not translate rigorously to real-world image distributions. Furthermore, the paper lacks a formal information-theoretic security proof against general nonlinear adversaries, relying instead on empirical demonstrations (U-Net failure). Consequently, while the work has significantly improved, the gap between theory and practice prevents a clear acceptance at this stage. Reviewer Concerns: Effectively Addressed Concerns: Heuristic Parameters (L4sq, 9hNT): The introduction of Theorems 4.1 and 4.2 successfully shifted the noise selection from an empirical guess to a principled framework based on a privacy parameter (τ). Limited Experimental Scope (h7YU, 9hNT, L4sq): The addition of Tiny-ImageNet experiments, higher-capacity models (ResNet-50), and Federated Learning scenarios directly addressed concerns about scalability and practical utility. Specific Attack Vectors (h7YU, 9hNT): The authors convincingly rebutted the feasibility of the proposed clustering+BSS attack and empirically demonstrated robustness against powerful nonlinear adversaries (U-Net). Outstanding / Partially Addressed Concerns: Idealized Theoretical Assumptions (xaSY, 9hNT): The theoretical guarantees heavily rely on simplified assumptions (e.g., isotropic Gaussian data). The disconnect between this idealized model and the complex manifold of real image data remains a significant theoretical gap. Lack of Formal Security Proof (9hNT): While the U-Net evaluation is promising, it constitutes an empirical validation rather than a formal, cryptographic, or information-theoretic guarantee against all possible nonlinear inversion strategies. Scalability Limits (9hNT): While Tiny-ImageNet is an improvement, the performance on full-scale ImageNet remains unverified. Reviewer Scores: Reviewer L4sq (Initial: 2 → Estimated: 4): Would likely improve due to the resolution of the heuristic parameter issue, but may remain borderline due to the remaining theoretical gap. Reviewer xaSY (Initial: 6 → Estimated: 6): While utility concerns were resolved, the "tenuous connection" between the Gaussian theory and experiments prevents a higher score. Reviewer h7YU (Initial: 4 → Estimated: 6): Would likely lean towards acceptance as their specific concern regarding adaptive attacks was effectively refuted. Reviewer 9hNT (Initial: 2 → Estimated: 4): Significant improvements were made, but the lack of a formal proof against nonlinear attacks limits the score increase. ``` In this revised version, we have addressed the outstanding or partially concerns as follows: 1. Idealized Theoretical Assumptions: This concern was already addressed in our ICLR revision by modeling the data as sub-Gaussian, which is a common and realistic assumption for images. In the present TMLR revision, we go further and remove any assumption about the data distribution from Theorem 4.2. 2. Lack of Formal Security Proof: This concern was also addressed in the ICLR revision, where Theorem 4.1 included both linear and non-linear attackers. In this version, we further clarify this aspect in the main text and explicitly acknowledge that our definition is not cryptographic, but reflects a practical privacy notion suitable for scalable collaborative private training. 3. Scalability Limits: While Tiny-ImageNet is a well-accepted proxy for ImageNet, we present experiments for comparison with previous related work, which also did not include full-scale ImageNet results. We recognize that the ICLR review process this year was affected by exceptional circumstances, and we appreciate the efforts of the area chairs and reviewers under these challenging conditions. We have further clarified and strengthened our manuscript in response to the feedback received. Thank you for your consideration. Sincerely, Authors
Video: https://youtu.be/RYMP7kjLeRA
Code: https://github.com/basavyr/augmented-mixup-privacy-collaborative-training
Supplementary Material: zip
Assigned Action Editor: ~Fernando_Perez-Cruz1
Submission Number: 7201
Loading