[
    {
        "problem_id": 791,
        "domain": [
            "Mathematics -> Applied Mathematics -> Statistics -> Probability -> Other"
        ],
        "difficulty": 4.0,
        "problem_text": "Values $a_{1}, \\ldots, a_{2013}$ are chosen independently and at random from the set $\\{1, \\ldots, 2013\\}$. What is expected number of distinct values in the set $\\{a_{1}, \\ldots, a_{2013}\\}$ ?",
        "sample_id": 1,
        "final_answer": "\\boxed{\\frac{2013^{2013} - 2012^{2013}}{2013^{2012}}}",
        "steps": [
            {
                "step_id": 1,
                "edge": "We establish the fundamental distribution of each random variable. The problem statement specifies that values are chosen uniformly at random from the set \\{1, \\ldots, 2013\\}, meaning each element has equal probability \\frac{1}{2013} of being selected for any a_i. This uniformity is critical for subsequent probability calculations and is given directly by the problem's setup.",
                "direct_dependent_steps": null,
                "node": "Each random variable $a_i$ is chosen uniformly at random from the set $\\{1,\\ldots,2013\\}$."
            },
            {
                "step_id": 2,
                "edge": "We note the independence of the random variables, which is explicitly stated in the problem as values being chosen independently. This independence allows us to compute joint probabilities as products of individual probabilities in later steps, forming a necessary foundation for handling multiple selections without correlation.",
                "direct_dependent_steps": null,
                "node": "The random variables $a_1,\\ldots,a_{2013}$ are independent."
            },
            {
                "step_id": 3,
                "edge": "We introduce a generic element n from the universal set \\{1, \\ldots, 2013\\} as a placeholder for analysis. This definition leverages basic set theory background knowledge, enabling us to focus on the behavior of a single representative element before generalizing across all elements in the set.",
                "direct_dependent_steps": null,
                "node": "Let $n$ be an element of the set $\\{1,2,\\ldots,2013\\}$."
            },
            {
                "step_id": 4,
                "edge": "Building on Step 3's definition of n, we construct an indicator variable X_n to track whether n appears in the sample. This is a standard technique in combinatorial probability where indicator variables simplify counting problems by converting set membership into binary outcomes. Specifically, X_n = 1 if n is observed at least once (Step 3's element appears), and 0 otherwise, providing a mathematical handle for distinctness.",
                "direct_dependent_steps": [
                    3
                ],
                "node": "Define the indicator variable $X_n$ by $X_n = 1$ if $n$ appears among $\\{a_1,\\ldots,a_{2013}\\}$ and $X_n = 0$ otherwise."
            },
            {
                "step_id": 5,
                "edge": "Using the indicator variables defined in Step 4, we express the total distinct count as the sum \\sum_{n=1}^{2013} X_n. This follows directly from the definition of X_n: each X_n contributes 1 for every distinct element present, so summing over all possible n (from Step 3) counts exactly the number of unique values in the sample, as required by the problem's objective.",
                "direct_dependent_steps": [
                    4
                ],
                "node": "The number of distinct values among $a_1,\\ldots,a_{2013}$ equals $\\sum_{n=1}^{2013} X_n$."
            },
            {
                "step_id": 6,
                "edge": "We apply the linearity of expectation principle to the sum from Step 5. This fundamental theorem in probability states that the expectation of a sum equals the sum of expectations, regardless of dependence between variables. Thus, \\mathbb{E}\\bigl[\\sum_{n=1}^{2013} X_n\\bigr] decomposes into \\sum_{n=1}^{2013} \\mathbb{E}[X_n], converting a complex expectation of a sum into manageable individual expectations.",
                "direct_dependent_steps": [
                    5
                ],
                "node": "By linearity of expectation, $\\mathbb{E}\\bigl[\\sum_{n=1}^{2013} X_n\\bigr] = \\sum_{n=1}^{2013} \\mathbb{E}[X_n]$."
            },
            {
                "step_id": 7,
                "edge": "Combining Step 6's result with Step 5's expression, we restate the expected distinct count as \\sum_{n=1}^{2013} \\mathbb{E}[X_n]. This step consolidates the problem into computing identical individual expectations for each n, leveraging the symmetry across all elements in the set (since the selection process treats all n equally per Steps 1 and 2).",
                "direct_dependent_steps": [
                    6
                ],
                "node": "Hence, the expected number of distinct values equals $\\sum_{n=1}^{2013} \\mathbb{E}[X_n]$."
            },
            {
                "step_id": 8,
                "edge": "For the indicator variable X_n defined in Step 4, we recall that the expectation of any indicator equals the probability of its success event. Thus, \\mathbb{E}[X_n] = \\Pr(X_n = 1), which is the probability that n appears at least once. This property is a direct consequence of the definition of expectation for Bernoulli random variables and simplifies our computation to a probability question.",
                "direct_dependent_steps": [
                    4
                ],
                "node": "For each $n$, $\\mathbb{E}[X_n]$ equals the probability that $X_n=1$."
            },
            {
                "step_id": 9,
                "edge": "Using the complement rule from basic probability, we rewrite \\Pr(X_n = 1) as 1 - \\Pr(X_n = 0). This transformation (from Step 8) is strategic because calculating the probability of absence (X_n = 0) is often simpler than directly computing the probability of at least one occurrence, especially with independent trials.",
                "direct_dependent_steps": [
                    8
                ],
                "node": "The probability that $X_n=1$ equals $1 - \\Pr(X_n=0)$."
            },
            {
                "step_id": 10,
                "edge": "We interpret the event X_n = 0 (from Step 9) as none of the a_i equaling n. This follows directly from Step 4's definition of X_n, where X_n = 0 precisely when n is absent from all 2013 selections. This rephrasing connects the indicator variable to concrete sample outcomes, setting up the probability calculation.",
                "direct_dependent_steps": [
                    4
                ],
                "node": "The event $X_n=0$ means that none of the $a_i$ equals $n$."
            },
            {
                "step_id": 11,
                "edge": "For any single index i, the probability that a_i \\neq n combines Step 1's uniform distribution (each value equally likely) and Step 3's fixed n. Since there are 2012 values other than n in the 2013-element set, \\Pr(a_i \\neq n) = \\frac{2012}{2013}. This elementary counting argument relies on the uniformity established in Step 1 and the set structure from Step 3.",
                "direct_dependent_steps": [
                    1,
                    3
                ],
                "node": "For any fixed index $i$, the probability that $a_i \\neq n$ is $\\frac{2012}{2013}$."
            },
            {
                "step_id": 12,
                "edge": "Applying Step 2's independence across all 2013 trials, the joint probability that no a_i equals n (Step 10's event) is the product of individual probabilities from Step 11. Thus, \\Pr(X_n = 0) = \\left(\\frac{2012}{2013}\\right)^{2013}. Independence (Step 2) justifies multiplying the identical probabilities (Step 11) for each of the 2013 independent selections.",
                "direct_dependent_steps": [
                    2,
                    10,
                    11
                ],
                "node": "By independence, the probability that none of the $a_i$ equals $n$ is $\\bigl(\\frac{2012}{2013}\\bigr)^{2013}$."
            },
            {
                "step_id": 13,
                "edge": "Substituting Step 12's result into Step 9's complement expression, we obtain \\mathbb{E}[X_n] = 1 - \\left(\\frac{2012}{2013}\\right)^{2013}. This combines the probability of absence (Step 12) with the complement rule (Step 9), yielding the expectation for a single indicator variable. The expression is identical for every n due to symmetry.",
                "direct_dependent_steps": [
                    9,
                    12
                ],
                "node": "Hence, $\\mathbb{E}[X_n] = 1 - \\bigl(\\frac{2012}{2013}\\bigr)^{2013}$."
            },
            {
                "step_id": 14,
                "edge": "We substitute Step 13's expression for \\mathbb{E}[X_n] into Step 7's sum. This replaces each term in \\sum_{n=1}^{2013} \\mathbb{E}[X_n] with 1 - \\left(\\frac{2012}{2013}\\right)^{2013}, resulting in the expected distinct count being \\sum_{n=1}^{2013} \\left(1 - \\left(\\frac{2012}{2013}\\right)^{2013}\\right). This step assembles the components from the expectation decomposition (Step 7) and individual probability calculation (Step 13).",
                "direct_dependent_steps": [
                    7,
                    13
                ],
                "node": "Substituting into the sum gives the expected number of distinct values as $\\sum_{n=1}^{2013} \\bigl(1 - (\\frac{2012}{2013})^{2013}\\bigr)$."
            },
            {
                "step_id": 15,
                "edge": "Recognizing that all 2013 terms in Step 14's sum are identical (since \\mathbb{E}[X_n] does not depend on n), we simplify the sum to 2013 \\left(1 - \\left(\\frac{2012}{2013}\\right)^{2013}\\right). This leverages the constant value across the summation index, a consequence of the symmetric treatment of all elements in Steps 1, 2, and 3, reducing the sum to a single multiplication.",
                "direct_dependent_steps": [
                    14
                ],
                "node": "Since the sum has 2013 identical terms, it equals $2013 \\bigl(1 - (\\frac{2012}{2013})^{2013}\\bigr)$."
            },
            {
                "step_id": 16,
                "edge": "We algebraically transform Step 15's expression: 2013 \\left(1 - \\frac{2012^{2013}}{2013^{2013}}\\right) = 2013 \\cdot \\frac{2013^{2013} - 2012^{2013}}{2013^{2013}} = \\frac{2013^{2013} - 2012^{2013}}{2013^{2012}}. The simplification uses common denominators and exponent rules (2013 / 2013^{2013} = 2013^{-2012}). Sanity check: for a smaller case with set size 2 and 2 draws, the formula gives (2^2 - 1^2)/2^{1} = 3/2 = 1.5, matching the expected distinct values (verified by enumerating outcomes: two identical pairs yield 1 distinct value each, two mixed pairs yield 2, so (1+1+2+2)/4 = 1.5).",
                "direct_dependent_steps": [
                    15
                ],
                "node": "This expression simplifies to $\\frac{2013^{2013} - 2012^{2013}}{2013^{2012}}$."
            },
            {
                "step_id": 17,
                "edge": "The simplified expression from Step 16 directly provides the final numerical expectation. We box this result as it satisfies the problem's request for the expected number of distinct values, having rigorously derived it through indicator variables, probability rules, and algebraic simplification while respecting all problem constraints.",
                "direct_dependent_steps": [
                    16
                ],
                "node": "The final answer is \\boxed{\\frac{2013^{2013} - 2012^{2013}}{2013^{2012}}}."
            }
        ]
    }
]
