{
    "title": "Language Model Personal Information Leakage Phenomenon",
    "turns": [{
        "turn": 1,
        "user": "What do you call it when a language model outputs personal information and the like?",
        "assistant": "There isn't a single official term for the phenomenon where a language model outputs personal or sensitive information included in its training data, but it's generally described using several technical terms and concepts.\n\nAt its core, this phenomenon is called **memorization**, and the act of exploiting it is referred to as a **data extraction attack**.\n\nDepending on the context, the main terms used are as follows:\n\n### 1. Terms Describing the Phenomenon Itself\n\n*   **Memorization**\n    *   This is the fundamental cause and the key term describing the phenomenon. It refers to when the model goes beyond just statistically learning from the training data and almost perfectly memorizes specific data samples (e.g., a person's name, address, email, password, etc.).\n    *   Memorization is more likely to occur if such information appears repeatedly in the training dataset or is very unique.\n*   **Information Leakage or Data Leakage**\n    *   This is the most commonly used expression. It broadly refers to the unintended exposure of sensitive training data through the model.\n*   **Regurgitation**\n    *   This is a metaphorical term describing when the model 'spits out' the training data without understanding its meaning. It's used almost interchangeably with 'memorization.'\n\n### 2. Terms from a Malicious Attack Perspective\n\nAttempts to deliberately extract information by exploiting this 'memorization' phenomenon are called 'attacks.'\n\n*   **Data Extraction Attack**\n    *   This is a general term for any attack where an attacker inputs specific prompts to induce the model to output parts of its training data as-is. Extracting personal information is a representative example.\n*   **Model Inversion Attack**\n    *   This refers to attacks that analyze the model's outputs to infer or reconstruct the original data used in training. For example, trying to recover other writings (training data) of a person whose writing style the model has learned.\n*   **Membership Inference Attack**\n    *   This is an attack to determine whether a specific data point was included in the model's training dataset. For example, \"Was Hong Gil-dong's hospital record used in training this medical AI model?\"—answering 'yes' or 'no.' While it's a step below actually extracting the information, it's still a serious privacy issue.\n\n### 3. Techniques Used in Attacks\n\n*   **Prompt Injection / Jailbreaking**\n    *   This is a technique where an attacker inputs cleverly crafted prompts to bypass the model's safety mechanisms and induce unintended responses (including personal information) from the model.\n\n* * *\n\n### Summary\n\n| Category  | Term  | Description  |\n| --- | --- | --- |\n| **Phenomenon/Cause** | **Memorization** | The model memorizes the training data as-is |\n| **Result** | **Personal Information Leakage/Exposure** | The memorized sensitive information is output externally |\n| **Malicious Act** | **Data Extraction Attack** | An attack that intentionally extracts information by exploiting memorization |\n\nSo, the most accurate answer to your question would be: **\"Personal information leakage occurs due to the 'memorization' phenomenon of language models, and exploiting this is called a 'data extraction attack.'\"**"
    }]
}