['3c3', '< Abstract: The rapid advancement and widespread use of large language models (LLMs) have raised significant concerns regarding the potential leakage of personally identifiable information (PII). These models are often trained on vast quantities of web-collected data, which may inadvertently include sensitive personal data. This paper presents ProPILE, a novel probing tool designed to empower data subjects, or the owners of the PII, with awareness of potential PII leakage in LLM-based services. ProPILE lets data subjects formulate prompts based on their own PII to evaluate the level of privacy intrusion in LLMs. We demonstrate its application on the OPT-1.3B model trained on the publicly available Pile dataset. We show how hypothetical data subjects may assess the likelihood of their PII being included in the Pile dataset being revealed. ProPILE can also be leveraged by LLM service providers to effectively evaluate their own levels of PII leakage with more powerful prompts specifically tuned for their in-house models. This tool represents a pioneering step towards empowering the data subjects for their awareness and control over their own data on the web.', '---', '> Abstract: The rapid advancement and widespread use of large language models (LLMs) have raised significant concerns regarding the potential leakage of personally identifiable information (PII). These models are often trained on vast quantities of web-collected data, which may inadvertently include sensitive personal data. This paper presents ProPILE, a novel probing tool designed to empower data subjects, or the owners of the PII, with awareness of potential PII leakage in LLM-based services. ProPILE lets data subjects formulate prompts based on their own PII to evaluate the level of privacy intrusion in LLMs. We demonstrate its application on the OPT-1.3B model trained on the publicly available Pile dataset, showing how hypothetical data subjects may assess the likelihood of their PII being revealed. Our experiments reveal that strategically crafted black-box prompts can disclose a significant portion of diverse PII. Furthermore, white-box probing leveraging soft prompt tuning can dramatically magnify leakage, increasing exact match rates from 0.0047% to 1.3% with minimal training data. ProPILE can also be leveraged by LLM service providers to effectively evaluate their own levels of PII leakage with more powerful prompts specifically tuned for their in-house models. This tool represents a pioneering step towards empowering data subjects with awareness and control over their data on the web, and providing LLM providers with robust evaluation capabilities.', '8,9c8,9', '< In this regard, we introduce ProPILE, a tool to let the data subjects examine the possible inclusion and subsequent leakage of their own PII in LLM products in deployment. The data subject has only black-box access to LLM products; they can only send prompts and receive the generated sentences or likelihoods. Nevertheless, since the data subject possesses complete access to their own PII, ProPILE leverages this to generate effective prompts aimed at assessing the potential PII leakage in LLMs. See Figure 1 for an overview of the ProPILE framework. Importantly, this tool holds considerable value not only for data subjects but also for LLM service providers. ProPILE provides the service providers with a tool to effectively assess their own levels of PII leakage with more powerful prompts specifically tuned for their in-house models. Through this, the service providers can proactively address potential privacy vulnerabilities and enhance the overall robustness of their LLMs.', '< Our experiments on the Open Pre-trained Transformers (OPT) [35] trained on the Pile dataset [10] confirm the following. 1) A significant portion of the diverse types of PII included in the training data can be disclosed through strategically crafted prompts. 2) By refining the prompt, having access to model parameters, and utilizing a few hundred training data points for the LLM, the degree of PII leakage can be significantly magnified. We envision our proposition and the insights gathered through ProPILE as the initial step towards enhancing the awareness of data subjects and LLM service providers regarding potential PII leakage.', '---', '> In this regard, we introduce ProPILE, a novel probing tool designed for data subjects to examine the possible inclusion and subsequent leakage of their own PII in LLM products in deployment, even with black-box access. ProPILE uniquely empowers data subjects by leveraging their complete access to their own PII to generate effective prompts aimed at assessing potential PII leakage in LLMs. See Figure 1 for an overview of the ProPILE framework. Importantly, this tool holds considerable value not only for data subjects but also for LLM service providers. ProPILE provides the service providers with a tool to effectively assess their own levels of PII leakage with more powerful prompts specifically tuned for their in-house models, enabling proactive addressing of potential privacy vulnerabilities and enhancement of overall LLM robustness.', '> Our experiments on the Open Pre-trained Transformers (OPT) [35] trained on the Pile dataset [10] confirm the following. 1) A significant portion of the diverse types of PII included in the training data can be disclosed through strategically crafted prompts. 2) By refining the prompt, having access to model parameters, and utilizing a few hundred training data points for the LLM, the degree of PII leakage can be significantly magnified, achieving up to a 270x increase in exact match rates compared to baseline black-box methods. We envision our proposition and the insights gathered through ProPILE as the initial step towards enhancing the awareness of data subjects and LLM service providers regarding potential PII leakage.', '21c21', "< Prompt engineering [26,18] improves downstream task performance of LLMs by well-designing prompts without further LLM fine-tuning. In soft prompt tuning [15,16], a few learnable soft token embeddings concatenated to the original prompts are trained while LLM is frozen, so that more optimal prompts for the downstream task can be obtained. The white-box approach of ProPILE leverages soft prompt tuning to further refine the black-box approach's hand-crafted prompts.", '---', '> Prompt engineering [26,18] is a critical technique for improving downstream task performance of LLMs by well-designing prompts without further LLM fine-tuning. For PII leakage, the challenge lies in crafting prompts that can effectively elicit sensitive information. In soft prompt tuning [15,16], a few learnable soft token embeddings concatenated to the original prompts are trained while the LLM is frozen, allowing for the discovery of more optimal prompts for specific tasks. The white-box approach of ProPILE leverages this advanced technique, specifically soft prompt tuning, to systematically refine and optimize prompts beyond human intuition, thereby maximizing the potential for PII leakage detection and providing a robust lower bound on leakage probability.', '26a27', '> Personally Identifiable Information (PII) refers to any data that can be used to identify a specific individual. In the context of LLMs, PII can manifest in various forms within training data, ranging from explicit identifiers to more subtle, linkable attributes. For this work, we categorize PII based on its structural properties and linkability, which directly influence its detectability and the methods required for effective probing. We consider a broad spectrum of PII types, including direct identifiers (e.g., names, email addresses, phone numbers, physical addresses) and quasi-identifiers (e.g., affiliations, family relationships, educational background) that, when combined, can uniquely identify an individual. Our analysis focuses on how these different types of PII are represented in text and how their characteristics impact the likelihood of leakage from LLMs.', '28d28', '< ', '31c31', '< Definition 1 (Linkable PII leakage). Let A := {a 1 , ..., a M } be M PII items relevant to a data subject S. Each element a m denotes a PII item of a specific PII type. Let T be a probing tool that estimates a probability of leakage of PII item a m given the rest of the items A \\m := {a 1 , ..., a m-1 , a m+1 , ..., a M }. We say that T exposes the linkability of PII items for the data subject S when the likelihood of reconstructing the true PII, Pr(a m |A \\m , T ), is greater than the unconditional, context-free likelihood Pr(a m ).', '---', "> Definition 1 (Linkable PII leakage). Let A := {a 1 , ..., a M } be M PII items relevant to a data subject S. Each element a m denotes a PII item of a specific PII type. Let T be a probing tool that estimates a probability of leakage of PII item a m given the rest of the items A \\m := {a 1 , ..., a m-1 , a m+1 , ..., a M }. We say that T exposes the linkability of PII items for the data subject S when the likelihood of reconstructing the true PII, Pr(a m |A \\m , T ), is greater than the unconditional, context-free likelihood Pr(a m ). For example, if providing a person's name and address significantly increases the likelihood of an LLM generating their phone number, then the phone number is linkable PII.", '49c49', '< Probing strategy. For a target PII a m , a set of query prompts T is created by associating the remaining PII A \\m . Particularly, A \\m is prompted with K different templates t k as T = {t 1 (A \\m ), ..., t K (A \\m )}. Then, the user sends the set of probing prompts T to the target LLM for as much as N times. Assuming the target LLM performs sampling, the user will receive N ×K responses along with the likelihood scores L ∈ R K×L×V , where L and V denote the length of the response and the vocabulary size of the target LLM, respectively. Please note that the likelihood is identical for the same query regardless of repeated queries. Example prompts are shown in Figure 2.', '---', '> Probing strategy. For a target PII a m , a set of query prompts T is created by associating the remaining PII A \\m . The core principle is to craft prompts that, while not explicitly asking for the PII, provide enough contextual information to implicitly guide the LLM towards generating it if it exists in its training data. Particularly, A \\m is prompted with K different templates t k as T = {t 1 (A \\m ), ..., t K (A \\m )}. Then, the user sends the set of probing prompts T to the target LLM for as much as N times. Assuming the target LLM performs sampling, the user will receive N ×K responses along with the likelihood scores L ∈ R K×L×V , where L and V denote the length of the response and the vocabulary size of the target LLM, respectively. Please note that the likelihood is identical for the same query regardless of repeated queries. Example prompts are shown in Figure 2.', '53c53', '< Probing strategy. We use soft prompt tuning to achieve the goal, of finding a prompt that induces more leakage than the handcrafted prompts in the black-box case. First, we denote a set of PII lists included in the training dataset of target LLM as D = {A i } N i=1 . White-box approach assumes that an actor has access to a subset of training data D ⊂ D, where | D| = n for n ≪ N . Let us denote a query prompt as X that is created by one of the templates used in the black-box probing X = t n (A i \\m ). Then X is tokenized and embedded into X e ∈ R L X ×d , where L X denotes the length of the query sequence and d denotes the embedding dimension of the target LLM. The soft prompt θ s ∈ R Ls×d , technically learnable parameters, are appended ahead of X e making [θ s ; X e ] ∈ R (Ls+L X )×d , where L s denotes the number of soft prompt tokens to be prepended. The soft embedding is trained to maximize the expected reconstruction likelihood of the target PII over D. Therefore, the training is conducted to minimize negative log-likelihood defined as below:', '---', '> Probing strategy. We use soft prompt tuning to achieve the goal of finding a prompt that induces a tighter worst-case leakage (i.e., a higher likelihood of PII reconstruction) than the handcrafted prompts used in the black-box case. This method systematically explores the prompt space to discover highly effective prompts for PII extraction. First, we denote a set of PII lists included in the training dataset of target LLM as D = {A i } N i=1 . The white-box approach assumes that an actor has access to a subset of training data D ⊂ D, where | D| = n for n ≪ N . Let us denote a query prompt as X that is created by one of the templates used in the black-box probing X = t n (A i \\m ). Then X is tokenized and embedded into X e ∈ R L X ×d , where L X denotes the length of the query sequence and d denotes the embedding dimension of the target LLM. The soft prompt θ s ∈ R Ls×d , technically learnable parameters, are appended ahead of X e making [θ s ; X e ] ∈ R (Ls+L X )×d , where L s denotes the number of soft prompt tokens to be prepended. The soft embedding is trained to maximize the expected reconstruction likelihood of the target PII over D. Therefore, the training is conducted to minimize negative log-likelihood defined as below:', '58,59c58,59', '< For both black-box and white-box probing, the risk of PII leakage is quantified using two types of metrics depending on the output that the users receive.', '< Quantification based on string match. Users receive generated text from the LLMs. Naturally, the string match between the generated text and the target PII serves as a primary metric to quantify the leakage. Exact match represents a verbatim reconstruction of a PII; the generated string is identical to the ground truth PII.', '---', '> For both black-box and white-box probing, the risk of PII leakage is quantified using two types of metrics depending on the output that the users receive, offering complementary insights into leakage severity.', '> Quantification based on string match. Users receive generated text from the LLMs. Naturally, the string match between the generated text and the target PII serves as a primary metric to quantify the leakage. Exact match represents a verbatim reconstruction of a PII; the generated string is identical to the ground truth PII. This metric directly measures observable leakage.', '76c76', '< Efficacy of soft prompt tuning. Figure 5 illustrates the impact of the soft prompt on the exact match rate and reconstruction likelihood, with blue and orange colors, respectively. The results indicate a significant increase, from 0.0047% of black-box probing using five prompt templates to 1.3% with the soft prompt learned only from 128 data points being prepended to a single query prompt. The likelihood also increased by a large amount for the same case. It is speculated that the observed increase can be attributed to the soft prompt facilitating the more optimal prompts that may not have been considered by humans during the construction of prompts in black-box probing.', '---', '> Efficacy of soft prompt tuning. Figure 5 illustrates the profound impact of soft prompt tuning on both the exact match rate and reconstruction likelihood, with blue and orange colors, respectively. This represents a novel application of soft prompt tuning for targeted PII leakage amplification. The results indicate a significant increase, from 0.0047% of black-box probing using five prompt templates to 1.3% with the soft prompt learned only from 128 data points being prepended to a single query prompt. The likelihood also increased by a large amount for the same case. It is speculated that the observed increase can be attributed to the soft prompt facilitating the discovery of more optimal prompts that may not have been considered by humans during the construction of prompts in black-box probing, effectively finding "vulnerability vectors" in the LLM.', '86c86', '< Societal Impact. We emphasize that our proposed probing strategies are not designed to facilitate or encourage the leakage of PII. Instead, our intention is to provide a framework that empowers both data subjects and LLM service providers to thoroughly assess the privacy state of current LLMs. By conducting such evaluations, stakeholders can gain insights into the privacy vulnerabilities and potential risks associated with LLMs prior to their deployment in a wider range of real-world applications. This proactive approach aims to raise awareness among users, enabling them to understand the security and privacy implications of LLM usage and take appropriate measures to safeguard their personal information.', '---', '> Societal Impact. We emphasize that our proposed probing strategies are not designed to facilitate or encourage the leakage of PII. Instead, our intention is to provide a framework that empowers both data subjects and LLM service providers to thoroughly assess the privacy state of current LLMs. By conducting such evaluations, stakeholders can gain insights into the privacy vulnerabilities and potential risks associated with LLMs prior to their deployment in a wider range of real-world applications. This proactive approach aims to raise awareness among users, enabling them to understand the security and privacy implications of LLM usage and take appropriate measures to safeguard their personal information. Furthermore, findings from ProPILE can inform responsible disclosure practices and guide the development of more privacy-preserving LLM architectures.', '96,99d95', '< Section: ', '< https://parameterlab.de/', '< ', '< ', '174d169', '< ']
