Agnostic Label-Only Membership Inference Attack on Two-Tower Neural Networks for Recommendation Systems

TMLR Paper4451 Authors

11 Mar 2025 (modified: 14 May 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: This paper presents an innovative adaptation of the Agnostic Label-Only Membership Inference Attack (ALOA) specifically designed for two-tower neural network (NN) models used in recommendation systems. Unlike traditional membership inference attacks that focus on categorical outputs, our approach targets models that produce continuous vector embeddings. We propose a comprehensive methodology that employs synthetic datasets, shadow model training, and a suite of perturbation techniques to evaluate model robustness using the Maximum Mean Discrepancy (MMD) metric. Experimental results demonstrate that the attack model achieves exceptionally high accuracy and precision in distinguishing whether data is part of the original training dataset, even without direct access to it. These findings extend the theoretical framework of membership inference attacks to continuous output spaces and highlight vulnerabilities in modern recommendation systems.
Submission Length: Long submission (more than 12 pages of main content)
Previous TMLR Submission Url: /forum?id=xESNHVXt9H¬eId=6rrLFASkK6
Changes Since Last Submission:

Changes Since Last Submission:

Added a new subsection in Section 2 explaining the structure and role of Two-Tower Neural Networks in recommendation systems, including detailed descriptions of input-output formats.

Clarified in Section 3 why perturbations are applied only to user features and not item (movie) features.

Added Section 4.1 detailing how the synthetic dataset for shadow model training was generated in accordance with the agnostic assumption.

Updated Tables 1–5 with correct robustness scores. The methodology using Maximum Mean Discrepancy (MMD) was already correct in the text, but the tables mistakenly reflected results based on cosine similarity. This inconsistency has been fixed to align the tables with the described method.

Expanded the discussion on the key findings from Tables 1–5, especially in relation to model robustness across different user features.

Minor textual edits for clarity and consistency.

Assigned Action Editor: Joonas Jälkö
Submission Number: 4451
Loading