Black-Box Privacy Attacks on Shared Representations in Multitask Learning

Black-Box Privacy Attacks on Shared Representations in Multitask Learning

ICLR 2026 Conference Submission20413 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: multitask learning, privacy, attacks

TL;DR: We propose a black-box task-inference threat model, where the goal is to determine if an entire distribution was used to train a multitask model. By leveraging task structure, we construct high-power attacks without reference models for calibration.

Abstract: The proliferation of diverse data across users and organizations has driven the development of machine learning methods that enable multiple entities to jointly train models while minimizing data sharing. Among these, *multitask learning* (MTL) is a powerful paradigm that leverages similarities among multiple tasks, each with insufficient samples to train a standalone model, to solve them simultaneously. MTL accomplishes this by learning a *shared representation* that captures common structure between tasks and generalizes well across them all. Despite being designed to be the smallest unit of shared information necessary to effectively learn patterns across multiple tasks, these shared representations can inadvertently leak sensitive information about the particular tasks they were trained~on. In this work, we investigate privacy leakage in shared representations through the lens of inference attacks. Towards this, we propose a novel, *black-box task-inference* threat model where the adversary, given the embedding vectors produced by querying the shared representation on samples from a particular task, aims to determine whether the task was present in the multitask training dataset. Motivated by analysis of tracing attacks on mean estimation over mixtures of Gaussian distributions, we develop efficient, purely black-box attacks on machine learning models that exploit the dependencies between embeddings from the same task without requiring shadow models or labeled reference data. We evaluate our attacks across vision and language domains when MTL is used for personalization and for solving multiple distinct learning problems, and demonstrate that even with access only to fresh task samples rather than training data, a black-box adversary can successfully infer a task's inclusion in training.

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 20413

Loading