Abstract: This paper presents an innovative adaptation of the Agnostic Label-Only Membership Inference Attack (ALOA) specifically designed for two-tower neural network (NN) models used in recommendation systems. Unlike traditional membership inference attacks that focus on categorical outputs, our approach targets models that produce continuous vector embeddings. We propose a comprehensive methodology that employs synthetic datasets, shadow model training, and a suite of perturbation techniques to evaluate model robustness using the Maximum Mean Discrepancy (MMD) metric. Experimental results demonstrate that the attack model achieves exceptionally high accuracy and precision in distinguishing whether data is part of the original training dataset, even without direct access to it. These findings extend the theoretical framework of membership inference attacks to continuous output spaces and highlight vulnerabilities in modern recommendation systems.
Submission Length: Long submission (more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=xESNHVXt9H¬eId=6rrLFASkK6
Changes Since Last Submission: Changes Since Last Submission:
Removed the term “label-only” from the title and body text to prevent confusion, as our work focuses on continuous outputs rather than categorical labels.
Added a new Subsection 1.1 to clearly summarize our key contributions.
Revised Section 2.2 to eliminate the incorrect use of b(u, v) and instead emphasized the correct function s(u, v) as defined earlier.
Corrected the citation for Theorem 1; it now correctly references Ye et al. (2022).
Updated the discussion in Section 4.4 to reflect that model accuracy thresholds of 0.8 and 0.91 are more appropriate than the previously used 0.5, and revised our interpretation accordingly.
Assigned Action Editor: ~Joonas_Jälkö1
Submission Number: 4451
Loading