[
  {
    "prompt": "The weight of an aspirin tablet is 300 milligrams according to the bottle label. An FDA investigator weighs a simple random sample of seven tablets, obtains weights of 299, 300, 305, 302, 299, 301, and 303, and runs a hypothesis test of the manufacturer's claim. Which of the following gives the P-value of this test?<|start_2d|>\"P(t > 1.54) with df = 6\" <|split_2d|>\"2P(t > 1.54) with df = 6\" <|split_2d|>\"P(t > 1.54) with df = 7\" <|split_2d|>\"2P(t > 1.54) with df = 7\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "2P(t > 1.54) with df = 6",
      "incorrect_answers": [
        "P(t > 1.54) with df = 6",
        "P(t > 1.54) with df = 7",
        "2P(t > 1.54) with df = 7"
      ]
    }
  },
  {
    "prompt": "The waiting times for a new roller coaster ride are normally distributed with a mean of 35 minutes and a standard deviation of 10 minutes. If there are 150,000 riders the first summer, which of the following is the shortest time interval associated with 100,000 riders?<|start_2d|>\"0 to 31.7 minutes\" <|split_2d|>\"31.7 to 39.3 minutes\" <|split_2d|>\"25.3 to 44.7 minutes\" <|split_2d|>\"25.3 to 35 minutes\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "25.3 to 44.7 minutes",
      "incorrect_answers": [
        "0 to 31.7 minutes",
        "31.7 to 39.3 minutes",
        "25.3 to 35 minutes"
      ]
    }
  },
  {
    "prompt": "All of the following statements are true for all discrete random variables except for which one?<|start_2d|>\"The possible outcomes must all be numerical.\" <|split_2d|>\"The possible outcomes must be mutually exclusive.\" <|split_2d|>\"The mean (expected value) always equals the sum of the products obtained by multiplying each value by its corresponding probability.\" <|split_2d|>\"Approximately 95% of the outcomes will be within two standard deviations of the mean.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "Approximately 95% of the outcomes will be within two standard deviations of the mean.",
      "incorrect_answers": [
        "The possible outcomes must all be numerical.",
        "The possible outcomes must be mutually exclusive.",
        "The mean (expected value) always equals the sum of the products obtained by multiplying each value by its corresponding probability."
      ]
    }
  },
  {
    "prompt": "Which of the following is a true statement about hypothesis testing?<|start_2d|>\"If there is sufficient evidence to reject a null hypothesis at the 10% level, then there is sufficient evidence to reject it at the 5% level.\" <|split_2d|>\"Whether to use a one- or a two-sided test is typically decided after the data are gathered.\" <|split_2d|>\"If a hypothesis test is conducted at the 1% level, there is a 1% chance of rejecting the null hypothesis.\" <|split_2d|>\"The power of a test concerns its ability to detect an alternative hypothesis.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "The power of a test concerns its ability to detect an alternative hypothesis.",
      "incorrect_answers": [
        "If there is sufficient evidence to reject a null hypothesis at the 10% level, then there is sufficient evidence to reject it at the 5% level.",
        "Whether to use a one- or a two-sided test is typically decided after the data are gathered.",
        "If a hypothesis test is conducted at the 1% level, there is a 1% chance of rejecting the null hypothesis."
      ]
    }
  },
  {
    "prompt": "An outlier is an observation that<|start_2d|>\"is seen more frequently than the other observations in the data set\" <|split_2d|>\"is seen less frequently than the other observations in the data set\" <|split_2d|>\"is always smaller than the other observations in the data set\" <|split_2d|>\"is significantly different from the other observations in the data set\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "is significantly different from the other observations in the data set",
      "incorrect_answers": [
        "is seen more frequently than the other observations in the data set",
        "is seen less frequently than the other observations in the data set",
        "is always smaller than the other observations in the data set"
      ]
    }
  },
  {
    "prompt": "The principal of a school is interested in estimating the average income per family of her students. She selects a random sample of students and collects information about their family income. A 95 percent confidence interval computed from this data for the mean income per family is ($35,095, $45,005). Which of the following provides the best interpretation of this confidence interval?<|start_2d|>\"95 percent of the students in her school are from families whose income is between $35,095 and $45,005.\" <|split_2d|>\"There is a 95% probability that the families of all the students in this school have an income of between $35,095 and $45,005.\" <|split_2d|>\"If we were to take another sample of the same size and compute a 95 percent confidence interval, we would have a 95% chance of getting the interval ($35,095, $45,005).\" <|split_2d|>\"There is a 95% probability that the mean income per family in the school is between $35,095 and $45,005.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "There is a 95% probability that the mean income per family in the school is between $35,095 and $45,005.",
      "incorrect_answers": [
        "95 percent of the students in her school are from families whose income is between $35,095 and $45,005.",
        "There is a 95% probability that the families of all the students in this school have an income of between $35,095 and $45,005.",
        "If we were to take another sample of the same size and compute a 95 percent confidence interval, we would have a 95% chance of getting the interval ($35,095, $45,005)."
      ]
    }
  },
  {
    "prompt": "Researchers are conducting an experiment using a significance level of 0.05. The null hypothesis is, in fact, false. If they modify their experiment to use twice as many experimental units for each treatment, which of the following would be true?<|start_2d|>\"The probability of a Type I error and the probability of a Type II error would both decrease.\" <|split_2d|>\"The probability of a Type I error and the power would both increase.\" <|split_2d|>\"The probability of a Type II error and the power would both increase.\" <|split_2d|>\"The probability of a Type I error would stay the same and the power would increase.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "The probability of a Type I error would stay the same and the power would increase.",
      "incorrect_answers": [
        "The probability of a Type I error and the probability of a Type II error would both decrease.",
        "The probability of a Type I error and the power would both increase.",
        "The probability of a Type II error and the power would both increase."
      ]
    }
  },
  {
    "prompt": "The mean height of adult men is 70 inches, with a standard deviation of four inches. The mean height of adult women is 66 inches, with a standard deviation of three inches. Between a man with a height of 74 inches and a woman with a height of 70 inches, who is more unusually tall within his or her respective sex?<|start_2d|>\"It cannot be determined, because the mean heights for the two groups are different.\" <|split_2d|>\"It cannot be determined, because the standard deviations of the heights for the two groups are different.\" <|split_2d|>\"The man, because he is 74 inches tall and the woman is only 70 inches tall.\" <|split_2d|>\"The woman, because her height is 1.33 standard deviations above the mean height of all women, whereas the man\u2019s height is only one standard deviation above the mean height of all men.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "The woman, because her height is 1.33 standard deviations above the mean height of all women, whereas the man\u2019s height is only one standard deviation above the mean height of all men.",
      "incorrect_answers": [
        "It cannot be determined, because the mean heights for the two groups are different.",
        "It cannot be determined, because the standard deviations of the heights for the two groups are different.",
        "The man, because he is 74 inches tall and the woman is only 70 inches tall."
      ]
    }
  },
  {
    "prompt": "Which of the following is not true about t-distributions?<|start_2d|>\"There are different t-distributions for different values of df (degrees of freedom).\" <|split_2d|>\"t-distributions are bell-shaped and symmetric.\" <|split_2d|>\"t-distributions always have mean 0 and standard deviation 1.\" <|split_2d|>\"t-distributions are more spread out than the normal distribution.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "t-distributions always have mean 0 and standard deviation 1.",
      "incorrect_answers": [
        "There are different t-distributions for different values of df (degrees of freedom).",
        "t-distributions are bell-shaped and symmetric.",
        "t-distributions are more spread out than the normal distribution."
      ]
    }
  },
  {
    "prompt": "To study the effect of alcohol on reaction time, subjects were randomly selected and given three beers to consume. Their reaction time to a simple stimulus was measured before and after drinking the alcohol. Which of the following is a correct statement?<|start_2d|>\"This study was an observational study.\" <|split_2d|>\"Lack of blocking makes this a poorly designed study.\" <|split_2d|>\"The placebo effect is irrelevant in this type of study.\" <|split_2d|>\"This study was an experiment in which the subjects were used as their own controls.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "This study was an experiment in which the subjects were used as their own controls.",
      "incorrect_answers": [
        "This study was an observational study.",
        "Lack of blocking makes this a poorly designed study.",
        "The placebo effect is irrelevant in this type of study."
      ]
    }
  },
  {
    "prompt": "Which of these statements correctly explains bias?<|start_2d|>\"It describes a process that creates estimates that are too high on average or too low on average.\" <|split_2d|>\"It describes a process that results in a sample that does not represent the population.\" <|split_2d|>\"It describes a lack of randomization in the sampling process.\" <|split_2d|>\"It describes a sample that is not representative of the population.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "It describes a process that creates estimates that are too high on average or too low on average.",
      "incorrect_answers": [
        "It describes a process that results in a sample that does not represent the population.",
        "It describes a lack of randomization in the sampling process.",
        "It describes a sample that is not representative of the population."
      ]
    }
  },
  {
    "prompt": "A company that produces facial tissues continually monitors tissue strength. If the mean strength from sample data drops below a specified level, the production process is halted and the machinery inspected. Which of the following would result from a Type I error?<|start_2d|>\"Halting the production process when sufficient customer complaints are received.\" <|split_2d|>\"Halting the production process when the tissue strength is below specifications.\" <|split_2d|>\"Halting the production process when the tissue strength is within specifications.\" <|split_2d|>\"Allowing the production process to continue when the tissue strength is below specifications.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "Halting the production process when the tissue strength is within specifications.",
      "incorrect_answers": [
        "Halting the production process when sufficient customer complaints are received.",
        "Halting the production process when the tissue strength is below specifications.",
        "Allowing the production process to continue when the tissue strength is below specifications."
      ]
    }
  },
  {
    "prompt": "The longevity of people living in a certain locality has a standard deviation of 14 years. What is the mean longevity if 30% of the people live longer than 75 years? Assume a normal distribution for life spans.<|start_2d|>\"75 - 0.30(14)\" <|split_2d|>\"75 + 0.30(14)\" <|split_2d|>\"75 - 0.4756(14)\" <|split_2d|>\"75 - 0.5244(14)\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "75 - 0.5244(14)",
      "incorrect_answers": [
        "75 - 0.30(14)",
        "75 + 0.30(14)",
        "75 - 0.4756(14)"
      ]
    }
  },
  {
    "prompt": "Changing from a 90% confidence interval estimate for a population proportion to a 99% confidence interval estimate, with all other things being equal,<|start_2d|>\"increases the interval size by 9%.\" <|split_2d|>\"decreases the interval size by 9%.\" <|split_2d|>\"increases the interval size by 57%.\" <|split_2d|>\"decreases the interval size by 57%.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "increases the interval size by 57%.",
      "incorrect_answers": [
        "increases the interval size by 9%.",
        "decreases the interval size by 9%.",
        "decreases the interval size by 57%."
      ]
    }
  },
  {
    "prompt": "Which of the following is a binomial random variable?<|start_2d|>\"The number of tosses before a '5' appears when tossing a fair die.\" <|split_2d|>\"The number of points a hockey team receives in 10 games, where two points are awarded for wins, one point for ties, and no points for losses.\" <|split_2d|>\"The number of hearts out of five cards randomly drawn from a deck of 52 cards, without replacement.\" <|split_2d|>\"The number of motorists not wearing seat belts in a random sample of five drivers.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "The number of motorists not wearing seat belts in a random sample of five drivers.",
      "incorrect_answers": [
        "The number of tosses before a \"5\" appears when tossing a fair die.",
        "The number of points a hockey team receives in 10 games, where two points are awarded for wins, one point for ties, and no points for losses.",
        "The number of hearts out of five cards randomly drawn from a deck of 52 cards, without replacement."
      ]
    }
  },
  {
    "prompt": "If all the other variables remain constant, which of the following will increase the power of a hypothesis test? I.Increasing the sample size. II.Increasing the significance level. III.Increasing the probability of a Type II error.<|start_2d|>\"I only\" <|split_2d|>\"II only\" <|split_2d|>\"III only\" <|split_2d|>\"I and II\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "I and II",
      "incorrect_answers": [
        "I only",
        "II only",
        "III only"
      ]
    }
  },
  {
    "prompt": "Which of the following statements is (are) true? I. In order to use a \u03c72 procedure, the expected value for each cell of a one- or two-way table must be at least 5. II. In order to use \u03c72 procedures, you must have at least 2 degrees of freedom. III. In a 4 \u00d7 2 two-way table, the number of degrees of freedom is 3.<|start_2d|>\"I only\" <|split_2d|>\"I and III only\" <|split_2d|>\"I and II only\" <|split_2d|>\"III only\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "I and III only",
      "incorrect_answers": [
        "I only",
        "I and II only",
        "III only"
      ]
    }
  },
  {
    "prompt": "Suppose two events, E and F, have nonzero probabilities p and q, respectively. Which of the following is impossible?<|start_2d|>\"p + q > 1\" <|split_2d|>\"p - q < 0\" <|split_2d|>\"p /q > 1\" <|split_2d|>\"E and F are both independent and mutually exclusive.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "E and F are both independent and mutually exclusive.",
      "incorrect_answers": [
        "p + q > 1",
        "p - q < 0",
        "p /q > 1"
      ]
    }
  },
  {
    "prompt": "The probability that there will be an accident on Highway 48 each day depends on the weather. If the weather is dry that day, there is a 0.2% chance of an accident on Highway 48; if the weather is wet that day, there is a 1.0% chance of an accident. Today, the weather station announced that there is a 20% chance of the weather being wet. What is the probability that there will be an accident on Highway 48 today?<|start_2d|>\"0.0004\" <|split_2d|>\"0.0016\" <|split_2d|>\"0.002\" <|split_2d|>\"0.0036\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "0.0036",
      "incorrect_answers": [
        "0.0004",
        "0.0016",
        "0.002"
      ]
    }
  },
  {
    "prompt": "The binomial distribution is an appropriate model for which of the following?<|start_2d|>\"The number of minutes in an hour for which the Dow-Jones average is above its beginning average for the day.\" <|split_2d|>\"The number of cities among the 10 largest in New York State for which the weather is cloudy for most of a given day.\" <|split_2d|>\"The number of drivers wearing seat belts if 10 consecutive drivers are stopped at a police roadblock.\" <|split_2d|>\"None of the above.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "None of the above.",
      "incorrect_answers": [
        "The number of minutes in an hour for which the Dow-Jones average is above its beginning average for the day.",
        "The number of cities among the 10 largest in New York State for which the weather is cloudy for most of a given day.",
        "The number of drivers wearing seat belts if 10 consecutive drivers are stopped at a police roadblock."
      ]
    }
  },
  {
    "prompt": "Data were collected on two variables X and Y and a least squares regression line was fitted to the data. The estimated equation for this data is y = \u20132.29 + 1.70x. One point has x = 5, y = 6. What is the residual for this point?<|start_2d|>\"7.91\" <|split_2d|>\"6.21\" <|split_2d|>\"0.21\" <|split_2d|>\"\u20130.21\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "\u20130.21",
      "incorrect_answers": [
        "7.91",
        "6.21",
        "0.21"
      ]
    }
  },
  {
    "prompt": "Which of these explains why t should be used instead of z for inference procedures for means.<|start_2d|>\"The Central Limit Theorem applies to proportions but not means.\" <|split_2d|>\"We use z for proportions because proportions are approximately normal. Means are not.\" <|split_2d|>\"We use z for proportions because the sampling distribution of sample proportions is approximately normal, but that is not the case for the sampling distribution of sample means.\" <|split_2d|>\"When using the sample standard deviation to estimate the population standard deviation, more variability is introduced into the sampling distribution of the statistic.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "When using the sample standard deviation to estimate the population standard deviation, more variability is introduced into the sampling distribution of the statistic.",
      "incorrect_answers": [
        "The Central Limit Theorem applies to proportions but not means.",
        "We use z for proportions because proportions are approximately normal. Means are not.",
        "We use z for proportions because the sampling distribution of sample proportions is approximately normal, but that is not the case for the sampling distribution of sample means."
      ]
    }
  },
  {
    "prompt": "Which of the following is least likely to reduce bias in a sample survey?<|start_2d|>\"Following up with those who did not respond to the survey the first time\" <|split_2d|>\"Asking questions in a neutral manner to avoid influencing the responses\" <|split_2d|>\"Using stratified random sampling rather than simple random sampling\" <|split_2d|>\"Selecting samples randomly\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "Using stratified random sampling rather than simple random sampling",
      "incorrect_answers": [
        "Following up with those who did not respond to the survey the first time",
        "Asking questions in a neutral manner to avoid influencing the responses",
        "Selecting samples randomly"
      ]
    }
  },
  {
    "prompt": "For their first exam, students in an AP Statistics class studied an average of 4 hours with a standard deviation of 1 hour. Almost everyone did poorly on the exam, and so for the second exam every student studied 10 hours. What is the correlation between the numbers of hours students studied for each exam?<|start_2d|>\"-1\" <|split_2d|>\"0\" <|split_2d|>\"0.4\" <|split_2d|>\"1\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "0",
      "incorrect_answers": [
        "-1",
        "0.4",
        "1"
      ]
    }
  },
  {
    "prompt": "The appraised values of houses in a city have a mean of $125,000 with a standard deviation of $23,000. Because of a new teachers' contract, the school district needs an extra 10% in funds compared to the previous year. To raise this additional money, the city instructs the assessment office to raise all appraised house values by $5,000. What will be the new standard deviation of the appraised values of houses in the city?<|start_2d|>\"$23,000\" <|split_2d|>\"$25,300\" <|split_2d|>\"$28,000\" <|split_2d|>\"$30,300\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "$23,000",
      "incorrect_answers": [
        "$25,300",
        "$28,000",
        "$30,300"
      ]
    }
  },
  {
    "prompt": "A large company has offices in two locations, one in New Jersey and one in Utah. The mean salary of office assistants in the New Jersey office is $28,500. The mean salary of office assistants in the Utah office is $22,500. The New Jersey office has 128 office assistants and the Utah office has 32 office assistants. What is the mean salary paid to the office assistants in this company?<|start_2d|>\"$22,500 \" <|split_2d|>\"$23,700 \" <|split_2d|>\"$25,500 \" <|split_2d|>\"$27,300 \" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "$27,300 ",
      "incorrect_answers": [
        "$22,500 ",
        "$23,700 ",
        "$25,500 "
      ]
    }
  },
  {
    "prompt": "Which of the following is a false statement about simple random samples?<|start_2d|>\"A sample must be reasonably large to be properly considered a simple random sample.\" <|split_2d|>\"Inspection of a sample will give no indication of whether or not it is a simple random sample.\" <|split_2d|>\"Attributes of a simple random sample may be very different from attributes of the population.\" <|split_2d|>\"Every element of the population has an equal chance of being picked.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "A sample must be reasonably large to be properly considered a simple random sample.",
      "incorrect_answers": [
        "Inspection of a sample will give no indication of whether or not it is a simple random sample.",
        "Attributes of a simple random sample may be very different from attributes of the population.",
        "Every element of the population has an equal chance of being picked."
      ]
    }
  },
  {
    "prompt": "A telephone executive instructs an associate to contact 104 customers using their service to obtain their opinions in regard to an idea for a new pricing package. The associate notes the number of customers whose names begin with A and uses a random number table to pick four of these names. She then proceeds to use the same procedure for each letter of the alphabet and combines the 4 \u00d7 26 = 104 results into a group to be contacted. Which of the following is a correct conclusion?<|start_2d|>\"Her procedure makes use of chance.\" <|split_2d|>\"Her procedure results in a simple random sample.\" <|split_2d|>\"Each customer has an equal probability of being included in the survey.\" <|split_2d|>\"Her procedure introduces bias through sampling error.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "Her procedure makes use of chance.",
      "incorrect_answers": [
        "Her procedure results in a simple random sample.",
        "Each customer has an equal probability of being included in the survey.",
        "Her procedure introduces bias through sampling error."
      ]
    }
  },
  {
    "prompt": "A company has a choice of three investment schemes. Option I gives a sure $25,000 return on investment. Option II gives a 50% chance of returning $50,000 and a 50% chance of returning $10,000. Option III gives a 5% chance of returning $100,000 and a 95% chance of returning nothing. Which option should the company choose?<|start_2d|>\"Option II if it wants to maximize expected return\" <|split_2d|>\"Option I if it needs at least $20,000 to pay off an overdue loan\" <|split_2d|>\"Option III if it needs at least $80,000 to pay off an overdue loan\" <|split_2d|>\"All of the above answers are correct.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "All of the above answers are correct.",
      "incorrect_answers": [
        "Option II if it wants to maximize expected return",
        "Option I if it needs at least $20,000 to pay off an overdue loan",
        "Option III if it needs at least $80,000 to pay off an overdue loan"
      ]
    }
  },
  {
    "prompt": "A one-sided test of a hypothesis about a population mean, based on a sample of size 14, yields a P-value of 0.075. Which of the following best describes the range of tvalues that would have given this P-value?<|start_2d|>\"1.345 < t< 1.761\" <|split_2d|>\"1.356 < t< 1.782\" <|split_2d|>\"1.771 < t< 2.160\" <|split_2d|>\"1.350 < t< 1.771\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "1.350 < t< 1.771",
      "incorrect_answers": [
        "1.345 < t< 1.761",
        "1.356 < t< 1.782",
        "1.771 < t< 2.160"
      ]
    }
  },
  {
    "prompt": "The president of an online music streaming service whose customers pay a fee wants to gather additional information about customers who have joined in the past 12 months. The company plans to send out an e-mail survey to a sample of current customers with a link that gives participants a month of streaming service for free once the survey has been completed. They know that musical tastes vary by geographical region. Which of the following sample plans would produce the most representative sample of its customers?<|start_2d|>\"Choose all of the customers who joined in the last month.\" <|split_2d|>\"Make a list of all the customers who joined in the last 12 months and choose a random sample of customers on this list.\" <|split_2d|>\"From the list of all customers who joined in the last 12 months, classify customers by the state in which they live, then choose 10 customers from each state.\" <|split_2d|>\"From the list of all customers who joined in the last 12 months, classify customers by the state in which they live, then choose 3% of the customers from each state.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "From the list of all customers who joined in the last 12 months, classify customers by the state in which they live, then choose 3% of the customers from each state.",
      "incorrect_answers": [
        "Choose all of the customers who joined in the last month.",
        "Make a list of all the customers who joined in the last 12 months and choose a random sample of customers on this list.",
        "From the list of all customers who joined in the last 12 months, classify customers by the state in which they live, then choose 10 customers from each state."
      ]
    }
  },
  {
    "prompt": "You need to construct a 94% confidence interval for a population proportion. What is the upper critical value of z to be used in constructing this interval?<|start_2d|>\"0.9699\" <|split_2d|>\"1.96\" <|split_2d|>\"1.555\" <|split_2d|>\"1.88\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "1.88",
      "incorrect_answers": [
        "0.9699",
        "1.96",
        "1.555"
      ]
    }
  },
  {
    "prompt": "A local school has seven math teachers and seven English teachers. When comparing their mean salaries, which of the following is most appropriate?<|start_2d|>\"A two-sample z-test of population means\" <|split_2d|>\"A two-sample t-test of population means\" <|split_2d|>\"A one-sample z-test on a set of differences\" <|split_2d|>\"None of the above are appropriate.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "None of the above are appropriate.",
      "incorrect_answers": [
        "A two-sample z-test of population means",
        "A two-sample t-test of population means",
        "A one-sample z-test on a set of differences"
      ]
    }
  },
  {
    "prompt": "A small kiosk at the Atlanta airport carries souvenirs in the price range of $3.99 to $29.99, with a mean price of $14.75. The airport authorities decide to increase the rent charged for a kiosk by 5 percent. To make up for the increased rent, the kiosk owner decides to increase the prices of all items by 50 cents. As a result, which of the following will happen?<|start_2d|>\"The mean price and the range of prices will increase by 50 cents.\" <|split_2d|>\"The mean price will remain the same, but the range of prices will increase by 50 cents.\" <|split_2d|>\"The mean price and the standard deviation of prices will increase by 50 cents.\" <|split_2d|>\"The mean price will increase by 50 cents, but the standard deviation of prices will remain the same.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "The mean price will increase by 50 cents, but the standard deviation of prices will remain the same.",
      "incorrect_answers": [
        "The mean price and the range of prices will increase by 50 cents.",
        "The mean price will remain the same, but the range of prices will increase by 50 cents.",
        "The mean price and the standard deviation of prices will increase by 50 cents."
      ]
    }
  },
  {
    "prompt": "The director of a local food bank asks for data on all donations given during the month of November. Of the 100 checks received, the average donation is $155 with a standard deviation of $32. Which of the following is the most appropriate statement?<|start_2d|>\"This November, the average donation is $155.\" <|split_2d|>\"50% of all donations this November are more than $155.\" <|split_2d|>\"We are 95% confident that the average donation in November is between about $91 and $219.\" <|split_2d|>\"We are 95% confident that the average donation in November is between about $149 and $161.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "This November, the average donation is $155.",
      "incorrect_answers": [
        "50% of all donations this November are more than $155.",
        "We are 95% confident that the average donation in November is between about $91 and $219.",
        "We are 95% confident that the average donation in November is between about $149 and $161."
      ]
    }
  },
  {
    "prompt": "Which of the following describe/s the central limit theorem? I. The mean of the sampling distribution of x-is the same as the mean of the population. II. The standard deviation of the sampling distribution of x-is the same as the standard deviation of x- divided by the square root of the sample size. III. If the sample size is large, the shape of the sampling distribution of x- is approximately normal.<|start_2d|>\"I only\" <|split_2d|>\"I & II only\" <|split_2d|>\"II only\" <|split_2d|>\"III only\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "III only",
      "incorrect_answers": [
        "I only",
        "I & II only",
        "II only"
      ]
    }
  },
  {
    "prompt": "A large city was interested in annexing part of the surrounding county. In a survey conducted by the local newspaper, 58 percent of respondents said they were against the annexation. During the actual vote, not all eligible voters voted, but 56 percent of the respondents voted against the annexation. Which of the following best describes the difference in the percentages obtained from the newspaper poll and the vote itself?<|start_2d|>\"It is an example of nonresponse bias, the systematic tendency of individuals with particular characteristics to refuse to answer a survey question.\" <|split_2d|>\"It is the systematic difference between a statistic and parameter caused by the nonrandom selection of surveyed persons.\" <|split_2d|>\"It is the difference between the same statistics computed from two different samples.\" <|split_2d|>\"It is the difference between the statistic and the truth due to use of a random sample.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "It is the difference between the same statistics computed from two different samples.",
      "incorrect_answers": [
        "It is an example of nonresponse bias, the systematic tendency of individuals with particular characteristics to refuse to answer a survey question.",
        "It is the systematic difference between a statistic and parameter caused by the nonrandom selection of surveyed persons.",
        "It is the difference between the statistic and the truth due to use of a random sample."
      ]
    }
  },
  {
    "prompt": "After receiving several complaints from his customers about the store being closed on Sundays, a storekeeper decided to conduct a survey. He randomly selected 100 female customers and 120 male customers, and asked them, \"Are you interested in shopping at this store on Sundays?\" He counted the number of customers answering \"yes\" and constructed a 95 percent confidence interval for the difference by subtracting the proportions of female from the proportion of male customers in favor of shopping on Sundays. The resulting interval was (\u20130.23, \u20130.18). Which of the following is a correct interpretation of the interval?<|start_2d|>\"We are 95 percent confident that the proportion of women interested in shopping on Sundays exceeds the proportion of men interested in shopping on Sundays.\" <|split_2d|>\"We are 95 percent confident that the proportion of men interested in shopping on Sundays exceeds the proportion of women interested in shopping on Sundays.\" <|split_2d|>\"We are 95 percent confident that the proportion of women interested in shopping on Sundays is equal to the proportion of men interested in shopping on Sundays.\" <|split_2d|>\"Because the interval contains negative values, it is invalid and should not be interpreted.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "We are 95 percent confident that the proportion of women interested in shopping on Sundays exceeds the proportion of men interested in shopping on Sundays.",
      "incorrect_answers": [
        "We are 95 percent confident that the proportion of men interested in shopping on Sundays exceeds the proportion of women interested in shopping on Sundays.",
        "We are 95 percent confident that the proportion of women interested in shopping on Sundays is equal to the proportion of men interested in shopping on Sundays.",
        "Because the interval contains negative values, it is invalid and should not be interpreted."
      ]
    }
  },
  {
    "prompt": "Which of the following is the primary difference between an experiment and an observational study?<|start_2d|>\"Experiments are only conducted on human subjects; observational studies can be conducted on nonhuman subjects.\" <|split_2d|>\"In an experiment, the researcher manipulates some variable to observe its effect on a response variable; in an observational study, he or she simply observes and records the observations.\" <|split_2d|>\"Experiments must use randomized treatment and control groups; observational studies also use treatment and control groups, but they do not need to be randomized.\" <|split_2d|>\"Experiments must be double-blind; observational studies do not need to be.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "Experiments are only conducted on human subjects; observational studies can be conducted on nonhuman subjects.",
      "incorrect_answers": [
        "In an experiment, the researcher manipulates some variable to observe its effect on a response variable; in an observational study, he or she simply observes and records the observations.",
        "Experiments must use randomized treatment and control groups; observational studies also use treatment and control groups, but they do not need to be randomized.",
        "Experiments must be double-blind; observational studies do not need to be."
      ]
    }
  },
  {
    "prompt": "Random variable X is normally distributed, with a mean of 25 and a standard deviation of 4. Which of the following is the approximate interquartile range for this distribution?<|start_2d|>\"25.00 \u2013 22.30 = 2.70\" <|split_2d|>\"27.70 \u2013 22.30 = 5.40\" <|split_2d|>\"27.70 \u00f7 22.30 = 1.24\" <|split_2d|>\"2.00(4.00) = 8.00\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "27.70 \u2013 22.30 = 5.40",
      "incorrect_answers": [
        "25.00 \u2013 22.30 = 2.70",
        "27.70 \u00f7 22.30 = 1.24",
        "2.00(4.00) = 8.00"
      ]
    }
  },
  {
    "prompt": "Which of the following statements about any two events A and B is true?<|start_2d|>\"P(A \u222a B) implies events A and B are independent.\" <|split_2d|>\"P(A \u222a B) = 1 implies events A and B are mutually exclusive.\" <|split_2d|>\"P(A \u2229 B) = 0 implies events A and B are independent.\" <|split_2d|>\"P(A \u2229 B) = 0 implies events A and B are mutually exclusive.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "P(A \u2229 B) = 0 implies events A and B are mutually exclusive.",
      "incorrect_answers": [
        "P(A \u222a B) implies events A and B are independent.",
        "P(A \u222a B) = 1 implies events A and B are mutually exclusive.",
        "P(A \u2229 B) = 0 implies events A and B are independent."
      ]
    }
  },
  {
    "prompt": "A company has 400 employees. Their mean income is $20,500, and the standard deviation of their incomes is $3,750. The distribution of incomes is normally distributed. How many of the 400 employees do you expect to have an income of between $13,000 and $28,000?<|start_2d|>\"50\" <|split_2d|>\"100\" <|split_2d|>\"200\" <|split_2d|>\"390\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "390",
      "incorrect_answers": [
        "50",
        "100",
        "200"
      ]
    }
  },
  {
    "prompt": "As reported on CNN, in a May 1999 national poll 43% of high school students expressed fear about going to school. Which of the following best describes what is meant by the poll having a margin of error of 5%?<|start_2d|>\"It is likely that the true proportion of high school students afraid to go to school is between 38% and 48%.\" <|split_2d|>\"Five percent of the students refused to participate in the poll.\" <|split_2d|>\"Between 38% and 48% of those surveyed expressed fear about going to school.\" <|split_2d|>\"There is a 0.05 probability that the 43% result is in error.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "It is likely that the true proportion of high school students afraid to go to school is between 38% and 48%.",
      "incorrect_answers": [
        "Five percent of the students refused to participate in the poll.",
        "Between 38% and 48% of those surveyed expressed fear about going to school.",
        "There is a 0.05 probability that the 43% result is in error."
      ]
    }
  },
  {
    "prompt": "Suppose you wish to compare the AP Statistics exam results for the male and female students taking AP Statistics at your high school. Which is the most appropriate technique for gathering the needed data?<|start_2d|>\"Census\" <|split_2d|>\"Sample survey\" <|split_2d|>\"Experiment\" <|split_2d|>\"Observational study\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "Census",
      "incorrect_answers": [
        "Sample survey",
        "Experiment",
        "Observational study"
      ]
    }
  },
  {
    "prompt": "In one study half of a class were instructed to watch exactly 1 hour of television per day, the other half were told to watch 5 hours per day, and then their class grades were compared. In a second study students in a class responded to a questionnaire asking about their television usage and their class grades.<|start_2d|>\"The first study was an experiment without a control group, while the second was an observational study.\" <|split_2d|>\"The first study was an observational study, while the second was a controlled experiment.\" <|split_2d|>\"Both studies were controlled experiments.\" <|split_2d|>\"Both studies were observational studies.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "The first study was an experiment without a control group, while the second was an observational study.",
      "incorrect_answers": [
        "The first study was an observational study, while the second was a controlled experiment.",
        "Both studies were controlled experiments.",
        "Both studies were observational studies."
      ]
    }
  },
  {
    "prompt": "In a high school of 1650 students, 132 have personal investments in the stock market. To estimate the total stock investment by students in this school, two plans are proposed. Plan I would sample 30 students at random, find a confidence interval estimate of their average investment, and then multiply both ends of this interval by 1650 to get an interval estimate of the total investment. Plan II would sample 30 students at random from among the 132 who have investments in the market, find a confidence interval estimate of their average investment, and then multiply both ends of this interval by 132 to get an interval estimate of the total investment. Which is the better plan for estimating the total stock market investment by students in this school?<|start_2d|>\"Plan I\" <|split_2d|>\"Plan II\" <|split_2d|>\"Both plans use random samples and so will produce equivalent results.\" <|split_2d|>\"Neither plan will give an accurate estimate.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "Plan II",
      "incorrect_answers": [
        "Plan I",
        "Both plans use random samples and so will produce equivalent results.",
        "Neither plan will give an accurate estimate."
      ]
    }
  },
  {
    "prompt": "Sampling error occurs<|start_2d|>\"when interviewers make mistakes resulting in bias.\" <|split_2d|>\"because a sample statistic is used to estimate a population parameter.\" <|split_2d|>\"when interviewers use judgment instead of random choice in picking the sample.\" <|split_2d|>\"when samples are too small.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "because a sample statistic is used to estimate a population parameter.",
      "incorrect_answers": [
        "when interviewers make mistakes resulting in bias.",
        "when interviewers use judgment instead of random choice in picking the sample.",
        "when samples are too small."
      ]
    }
  },
  {
    "prompt": "The regression analysis of question 20 indicated that \"R-sq = 98.1%.\" Which of the following is (are) true? I. There is a strong positive linear relationship between the explanatory and response variables. II. There is a strong negative linear relationship between the explanatory and response variables. III. About 98% of the variation in the response variable can be explained by the regression on the explanatory variable.<|start_2d|>\"I and III only\" <|split_2d|>\"I or II only\" <|split_2d|>\"I or II (but not both) and III\" <|split_2d|>\"II and III only\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "I or II (but not both) and III",
      "incorrect_answers": [
        "I and III only",
        "I or II only",
        "II and III only"
      ]
    }
  },
  {
    "prompt": "In a clinic, 50 patients with sleep disorders are randomly assigned to one of two different groups. Patients in one group are given medication before bedtime. Patients in the other group are given blindfolds and played soft music at bedtime. Each patient is attached to a machine that records breathing patterns. From the patterns, it is possible to determine if the patient is awake or asleep. The data will be used to decide which method is more effective in helping patients with sleep disorders. Which of the following statements is correct in the context of this experiment?<|start_2d|>\"It is not possible for this experiment to be single-blind or double-blind, because human subjects are involved.\" <|split_2d|>\"This is only a single-blind experiment because the experimenters did not include a control group that received neither medication nor a blindfold and soft music.\" <|split_2d|>\"This is only a single-blind experiment because patients will know which treatment they are receiving, although the examining doctor might not.\" <|split_2d|>\"This is a double-blind experiment because the experimenters could prevent both groups of patients from knowing what sort of treatment the other group is receiving.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "This is only a single-blind experiment because patients will know which treatment they are receiving, although the examining doctor might not.",
      "incorrect_answers": [
        "It is not possible for this experiment to be single-blind or double-blind, because human subjects are involved.",
        "This is only a single-blind experiment because the experimenters did not include a control group that received neither medication nor a blindfold and soft music.",
        "This is a double-blind experiment because the experimenters could prevent both groups of patients from knowing what sort of treatment the other group is receiving."
      ]
    }
  },
  {
    "prompt": "A study was conducted to test a new style of keyboard in preventing repetitive stress disorders. Volunteers who have had problems with such injuries were randomly assigned to use either a traditional keyboard or the new design. A significance test was conducted with the alternative hypothesis that a smaller proportion of those using the new keyboard will suffer injuries than those using the traditional keyboard. The resulting P-value was 0.07. Which is a correct interpretation of this P-value?<|start_2d|>\"The null hypothesis should be rejected.\" <|split_2d|>\"The null hypothesis should be accepted.\" <|split_2d|>\"There is a 7% chance that the null hypothesis is correct.\" <|split_2d|>\"There is a 7% chance of getting a difference between the two groups at least as large as the observed difference if the new keyboard is really no better at preventing injuries.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "There is a 7% chance of getting a difference between the two groups at least as large as the observed difference if the new keyboard is really no better at preventing injuries.",
      "incorrect_answers": [
        "The null hypothesis should be rejected.",
        "The null hypothesis should be accepted.",
        "There is a 7% chance that the null hypothesis is correct."
      ]
    }
  },
  {
    "prompt": "A college admissions officer is interested in comparing the SAT math scores of high school applicants who have and have not taken AP Statistics. She randomly pulls the files of five applicants who took AP Statistics and five applicants who did not, and proceeds to run a t-test to compare the mean SAT math scores of the two groups. Which of the following is a necessary assumption?<|start_2d|>\"The population variances from each group are known.\" <|split_2d|>\"The population variances from each group are unknown.\" <|split_2d|>\"The population variances from the two groups are equal.\" <|split_2d|>\"The population of SAT scores from each group is normally distributed.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "The population of SAT scores from each group is normally distributed.",
      "incorrect_answers": [
        "The population variances from each group are known.",
        "The population variances from each group are unknown.",
        "The population variances from the two groups are equal."
      ]
    }
  },
  {
    "prompt": "Given that P(E) = 0.32, P(F) = 0.15, and P(E \u2229 F) = 0.048, which of the following is a correct conclusion?<|start_2d|>\"The events E and F are both independent and mutually exclusive.\" <|split_2d|>\"The events E and F are neither independent nor mutually exclusive.\" <|split_2d|>\"The events E and F are mutually exclusive but not independent.\" <|split_2d|>\"The events E and F are independent but not mutually exclusive.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "The events E and F are independent but not mutually exclusive.",
      "incorrect_answers": [
        "The events E and F are both independent and mutually exclusive.",
        "The events E and F are neither independent nor mutually exclusive.",
        "The events E and F are mutually exclusive but not independent."
      ]
    }
  },
  {
    "prompt": "A medical research team tests for tumor reduction in a sample of patients using three different dosages of an experimental cancer drug. Which of the following is true?<|start_2d|>\"There are three explanatory variables and one response variable.\" <|split_2d|>\"There is one explanatory variable with three levels of response.\" <|split_2d|>\"Tumor reduction is the only explanatory variable, but there are three response variables corresponding to the different dosages.\" <|split_2d|>\"There are three levels of a single explanatory variable.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "There are three levels of a single explanatory variable.",
      "incorrect_answers": [
        "There are three explanatory variables and one response variable.",
        "There is one explanatory variable with three levels of response.",
        "Tumor reduction is the only explanatory variable, but there are three response variables corresponding to the different dosages."
      ]
    }
  },
  {
    "prompt": "A department store at a large mall claims that over 60 percent of the mall\u2019s visitors shop at that store. Let p = the proportion of the mall\u2019s visitors who shop at the store. Which of the following pairs of hypotheses should be used to support this claim?<|start_2d|>\"H0: p \u2264 0.60 and Ha: p > 0.60\" <|split_2d|>\"H0: p = 0.60 and Ha: p 0.60\" <|split_2d|>\"H0: p = 0.60 and Ha: p < 0.60\" <|split_2d|>\"H0: p > 0.60 and Ha: p 0.60\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "H0: p \u2264 0.60 and Ha: p > 0.60",
      "incorrect_answers": [
        "H0: p = 0.60 and Ha: p 0.60",
        "H0: p = 0.60 and Ha: p < 0.60",
        "H0: p > 0.60 and Ha: p 0.60"
      ]
    }
  },
  {
    "prompt": "Which of the following can affect the value of the correlation r?<|start_2d|>\"A change in measurement units\" <|split_2d|>\"A change in which variable is called x and which is called y\" <|split_2d|>\"Adding the same constant to all values of the x-variable\" <|split_2d|>\"None of the above can affect the r value.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "None of the above can affect the r value.",
      "incorrect_answers": [
        "A change in measurement units",
        "A change in which variable is called x and which is called y",
        "Adding the same constant to all values of the x-variable"
      ]
    }
  },
  {
    "prompt": "In general, how does tripling the sample size change the confidence interval size?<|start_2d|>\"It triples the interval size.\" <|split_2d|>\"It divides the interval size by 3.\" <|split_2d|>\"It multiples the interval size by 1.732.\" <|split_2d|>\"It divides the interval size by 1.732.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "It divides the interval size by 1.732.",
      "incorrect_answers": [
        "It triples the interval size.",
        "It divides the interval size by 3.",
        "It multiples the interval size by 1.732."
      ]
    }
  },
  {
    "prompt": "Suppose the correlation between two variables is r = 0.19. What is the new correlation if 0.23 is added to all values of the x-variable, every value of the y-variable is doubled, and the two variables are interchanged?<|start_2d|>\"0.19\" <|split_2d|>\"0.42\" <|split_2d|>\"0.84\" <|split_2d|>\"-0.19\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "0.19",
      "incorrect_answers": [
        "0.42",
        "0.84",
        "-0.19"
      ]
    }
  },
  {
    "prompt": "Students in AP Statistics class were timed in seconds while solving a series of mathematical brainteasers. One student's time had a standardized score of z = 2.40. If the times are all changed to minutes, what will then be the student's standardized score?<|start_2d|>\"z = 0.04\" <|split_2d|>\"z = 0.4\" <|split_2d|>\"z = 1.80\" <|split_2d|>\"z = 2.40\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "z = 2.40",
      "incorrect_answers": [
        "z = 0.04",
        "z = 0.4",
        "z = 1.80"
      ]
    }
  },
  {
    "prompt": "An experiment was designed to test the effects of three different types of paint on the durability of wooden toys. Because boys and girls tend to play differently with toys, a randomly selected group of children was divided into two groups by sex. Which of the following statements about this experiment is true?<|start_2d|>\"There are three types of paint and two sex groups, giving a total of six treatment combinations in this experiment.\" <|split_2d|>\"Type of paint is a blocking factor.\" <|split_2d|>\"Sex is a blocking factor.\" <|split_2d|>\"This is a completely randomized design.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "Sex is a blocking factor.",
      "incorrect_answers": [
        "There are three types of paint and two sex groups, giving a total of six treatment combinations in this experiment.",
        "Type of paint is a blocking factor.",
        "This is a completely randomized design."
      ]
    }
  },
  {
    "prompt": "Suppose there is a correlation of r = 0.9 between number of hours per day students study and GPAs. Which of the following is a reasonable conclusion?<|start_2d|>\"90% of students who study receive high grades.\" <|split_2d|>\"90% of students who receive high grades study a lot.\" <|split_2d|>\"90% of the variation in GPAs can be explained by variation in number of study hours per day.\" <|split_2d|>\"81% of the variation in GPAs can be explained by variation in number of study hours per day.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "81% of the variation in GPAs can be explained by variation in number of study hours per day.",
      "incorrect_answers": [
        "90% of students who study receive high grades.",
        "90% of students who receive high grades study a lot.",
        "90% of the variation in GPAs can be explained by variation in number of study hours per day."
      ]
    }
  },
  {
    "prompt": "Estrella wants to become a paramedic and takes a screening exam. Scores on the exam have been approximately normally distributed over the years it has been given. The exam is normed with a mean of 80 and a standard deviation of 9. Only those who score in the top 15% on the test are invited back for further evaluation. Estrella received a 90 on the test. What was her percentile rank on the test, and did she qualify for further evaluation?<|start_2d|>\"13.35; she didn't qualify.\" <|split_2d|>\"54.38; she didn't qualify.\" <|split_2d|>\"86.65; she qualified.\" <|split_2d|>\"84.38; she didn't qualify.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "86.65; she qualified.",
      "incorrect_answers": [
        "13.35; she didn't qualify.",
        "54.38; she didn't qualify.",
        "84.38; she didn't qualify."
      ]
    }
  },
  {
    "prompt": "To test whether extensive exercise lowers the resting heart rate, a study is performed by randomly selecting half of a group of volunteers to exercise 1 hour each morning, while the rest are instructed to perform no exercise. Is this study an experiment or an observational study?<|start_2d|>\"An experiment with a control group and blinding\" <|split_2d|>\"An experiment with blocking\" <|split_2d|>\"An observational study with comparison and randomization\" <|split_2d|>\"None of the above\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "None of the above",
      "incorrect_answers": [
        "An experiment with a control group and blinding",
        "An experiment with blocking",
        "An observational study with comparison and randomization"
      ]
    }
  },
  {
    "prompt": "Both over-the-counter niacin and the prescription drug Lipitor are known to lower blood cholesterol levels. In one double-blind study Lipitor outperformed niacin. The 95% confidence interval estimate of the difference in mean cholesterol level lowering was (18, 41). Which of the following is a reasonable conclusion?<|start_2d|>\"Niacin lowers cholesterol an average of 18 points, while Lipitor lowers cholesterol an average of 41 points.\" <|split_2d|>\"There is a 0.95 probability that Lipitor will outperform niacin in lowering the cholesterol level of any given individual.\" <|split_2d|>\"There is a 0.95 probability that Lipitor will outperform niacin by at least 23 points in lowering the cholesterol level of any given individual.\" <|split_2d|>\"None of the above.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "None of the above.",
      "incorrect_answers": [
        "Niacin lowers cholesterol an average of 18 points, while Lipitor lowers cholesterol an average of 41 points.",
        "There is a 0.95 probability that Lipitor will outperform niacin in lowering the cholesterol level of any given individual.",
        "There is a 0.95 probability that Lipitor will outperform niacin by at least 23 points in lowering the cholesterol level of any given individual."
      ]
    }
  },
  {
    "prompt": "Fifty oranges of one variety were selected at random and weighed. A 95 percent confidence interval computed from the sample weights to estimate the mean weight of oranges of this variety is (7.58, 8.72) ounces. Which of the following statements is true with respect to the confidence level used?<|start_2d|>\"In repeated samplings of the same size, 95 percent of the intervals constructed will contain the true mean weight of the oranges of this variety.\" <|split_2d|>\"In repeated samplings of the same size, 95 percent of the intervals constructed will contain the sampled mean weight of the oranges.\" <|split_2d|>\"Ninety-five percent of the oranges in this sample weigh between 7.58 and 8.72 ounces.\" <|split_2d|>\"Ninety-five percent of the oranges of this variety weigh between 7.58 and 8.72 ounces.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "In repeated samplings of the same size, 95 percent of the intervals constructed will contain the true mean weight of the oranges of this variety.",
      "incorrect_answers": [
        "In repeated samplings of the same size, 95 percent of the intervals constructed will contain the sampled mean weight of the oranges.",
        "Ninety-five percent of the oranges in this sample weigh between 7.58 and 8.72 ounces.",
        "Ninety-five percent of the oranges of this variety weigh between 7.58 and 8.72 ounces."
      ]
    }
  },
  {
    "prompt": "There are two games involving flipping a fair coin. In the first game you win a prize if you can throw between 45% and 55% heads. In the second game you win if you can throw more than 80% heads. For each game would you rather flip the coin 30 times or 300 times?<|start_2d|>\"30 times for each game\" <|split_2d|>\"300 times for each game\" <|split_2d|>\"30 times for the first game and 300 times for the second\" <|split_2d|>\"300 times for the first game and 30 times for the second\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "300 times for the first game and 30 times for the second",
      "incorrect_answers": [
        "30 times for each game",
        "300 times for each game",
        "30 times for the first game and 300 times for the second"
      ]
    }
  },
  {
    "prompt": "A company is interested in comparing the mean sales revenue per salesperson at two different locations. The manager takes a random sample of 10 salespeople from each location independently and records the sales revenue generated by each person during the last four weeks. He decides to use a t-test to compare the mean sales revenue at the two locations. Which of the following assumptions is necessary for the validity of the t-test?<|start_2d|>\"The population standard deviations at both locations are equal.\" <|split_2d|>\"The population standard deviations at both locations are not equal.\" <|split_2d|>\"The population standard deviations at both locations are known.\" <|split_2d|>\"The population of the sales records at each location is normally distributed.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "The population of the sales records at each location is normally distributed.",
      "incorrect_answers": [
        "The population standard deviations at both locations are equal.",
        "The population standard deviations at both locations are not equal.",
        "The population standard deviations at both locations are known."
      ]
    }
  },
  {
    "prompt": "Suppose you did 10 independent tests of the form H0: \u03bc = 25 versus Ha: \u03bc < 25, each at the \u03b1 = 0.05 significance level. What is the probability of committing a Type I error and incorrectly rejecting a true H0 with at least one of the 10 tests?<|start_2d|>\"0.05\" <|split_2d|>\"0.40\" <|split_2d|>\"0.50\" <|split_2d|>\"0.60\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "0.40",
      "incorrect_answers": [
        "0.05",
        "0.50",
        "0.60"
      ]
    }
  },
  {
    "prompt": "Suppose that 60% of a particular electronic part last over 3 years, while 70% last less than 6 years. Assuming a normal distribution, what are the mean and standard deviation with regard to length of life of these parts?<|start_2d|>\"\u03bc = 3.677, \u03c3 = 3.561\" <|split_2d|>\"\u03bc = 3.977, \u03c3 = 3.861\" <|split_2d|>\"\u03bc = 4.177, \u03c3 = 3.561\" <|split_2d|>\"\u03bc = 4.377, \u03c3 = 3.261\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "\u03bc = 3.977, \u03c3 = 3.861",
      "incorrect_answers": [
        "\u03bc = 3.677, \u03c3 = 3.561",
        "\u03bc = 4.177, \u03c3 = 3.561",
        "\u03bc = 4.377, \u03c3 = 3.261"
      ]
    }
  },
  {
    "prompt": "A psychologist believes that positive rewards for proper behavior are more effective than punishment for bad behavior in promoting good behavior in children. A scale of \"proper behavior\" is developed. \u03bc1= the \"proper behavior\" rating for children receiving positive rewards, and \u03bc2= the \"proper behavior\" rating for children receiving punishment. If H0: \u03bc1\u2212 \u03bc2= 0, which of the following is the proper statement of HA?<|start_2d|>\"HA: \u03bc1\u2212 \u03bc2> 0\" <|split_2d|>\"HA: \u03bc1\u2212 \u03bc2< 0\" <|split_2d|>\"HA: \u03bc1\u2212 \u03bc2\u2260 0\" <|split_2d|>\"Any of the above is an acceptable alternative to the given null.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "HA: \u03bc1\u2212 \u03bc2> 0",
      "incorrect_answers": [
        "HA: \u03bc1\u2212 \u03bc2< 0",
        "HA: \u03bc1\u2212 \u03bc2\u2260 0",
        "Any of the above is an acceptable alternative to the given null."
      ]
    }
  },
  {
    "prompt": "A confidence interval estimate is determined from the monthly grocery expenditures in a random sample of n families. Which of the following will result in a smaller margin of error? I. A smaller confidence level. II. A smaller sample standard deviation. III. A smaller sample size<|start_2d|>\"II only\" <|split_2d|>\"I and II\" <|split_2d|>\"I and III\" <|split_2d|>\"II and III\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "I and II",
      "incorrect_answers": [
        "II only",
        "I and III",
        "II and III"
      ]
    }
  },
  {
    "prompt": "Which of the following is not important in the design of experiments?<|start_2d|>\"Control of confounding variables\" <|split_2d|>\"Randomization in assigning subjects to different treatments\" <|split_2d|>\"Use of a confounding variable to control the placebo effect\" <|split_2d|>\"Replication of the experiment using sufficient numbers of subjects\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "Use of a confounding variable to control the placebo effect",
      "incorrect_answers": [
        "Control of confounding variables",
        "Randomization in assigning subjects to different treatments",
        "Replication of the experiment using sufficient numbers of subjects"
      ]
    }
  },
  {
    "prompt": "Which of the following is a correct statement?<|start_2d|>\"The probability of a Type II error does not depend on the probability of a Type I error.\" <|split_2d|>\"In conducting a hypothesis test, it is possible to simultaneously make both a Type I and a Type II error.\" <|split_2d|>\"A Type II error will result if one incorrectly assumes the data are normally distributed.\" <|split_2d|>\"When you choose a significance level \u03b1, you're setting the probability of a Type I error to exactly \u03b1.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "When you choose a significance level \u03b1, you're setting the probability of a Type I error to exactly \u03b1.",
      "incorrect_answers": [
        "The probability of a Type II error does not depend on the probability of a Type I error.",
        "In conducting a hypothesis test, it is possible to simultaneously make both a Type I and a Type II error.",
        "A Type II error will result if one incorrectly assumes the data are normally distributed."
      ]
    }
  },
  {
    "prompt": "Define a new measurement as the difference between the 60th and 40th percentile scores in a population. This measurement will give information concerning<|start_2d|>\"central tendency.\" <|split_2d|>\"variability.\" <|split_2d|>\"symmetry.\" <|split_2d|>\"skewness.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "variability.",
      "incorrect_answers": [
        "central tendency.",
        "symmetry.",
        "skewness."
      ]
    }
  },
  {
    "prompt": "Which of the following is notrequired in a binomial setting?<|start_2d|>\"Each trial is considered either a success or a failure.\" <|split_2d|>\"Each trial is independent.\" <|split_2d|>\"The value of the random variable of interest is the number of trials until the first success occurs.\" <|split_2d|>\"There is a fixed number of trials.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "The value of the random variable of interest is the number of trials until the first success occurs.",
      "incorrect_answers": [
        "Each trial is considered either a success or a failure.",
        "Each trial is independent.",
        "There is a fixed number of trials."
      ]
    }
  },
  {
    "prompt": "A publisher used standard boxes for shipping books. The mean weight of books packed per box is 25 pounds, with a standard deviation of two pounds. The mean weight of the boxes is one pound, with a standard deviation of 0.15 pounds. The mean weight of the packing material used per box is two pounds, with a standard deviation of 0.25 pounds. What is the standard deviation of the weights of the packed boxes?<|start_2d|>\"28.000 pounds\" <|split_2d|>\"5.290 pounds\" <|split_2d|>\"4.085 pounds\" <|split_2d|>\"2.021 pounds\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "2.021 pounds",
      "incorrect_answers": [
        "28.000 pounds",
        "5.290 pounds",
        "4.085 pounds"
      ]
    }
  },
  {
    "prompt": "A group of scientists wanted to estimate the proportion of geese returning to the same site for the next breeding season. Suppose they decided to increase the sample size from 200 to 2,000. How will this affect the distribution of the sample proportion?<|start_2d|>\"The distribution of the sample proportion will be more spread out.\" <|split_2d|>\"The distribution of the sample proportion will be less spread out.\" <|split_2d|>\"The spread of the distribution of the sample proportion will remain unaffected.\" <|split_2d|>\"The distribution of the sample proportion will more closely resemble the binomial distribution.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "The distribution of the sample proportion will be less spread out.",
      "incorrect_answers": [
        "The distribution of the sample proportion will be more spread out.",
        "The spread of the distribution of the sample proportion will remain unaffected.",
        "The distribution of the sample proportion will more closely resemble the binomial distribution."
      ]
    }
  },
  {
    "prompt": "A telephone survey of 400 registered voters showed that 256 had not yet made up their minds 1 month before the election. How sure can we be that between 60% and 68% of the electorate were still undecided at that time?<|start_2d|>\"2.4%\" <|split_2d|>\"8.0%\" <|split_2d|>\"64.0%\" <|split_2d|>\"90.4%\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "90.4%",
      "incorrect_answers": [
        "2.4%",
        "8.0%",
        "64.0%"
      ]
    }
  },
  {
    "prompt": "Players in the National Football League weigh, on average, about 248 pounds with a standard deviation of about 47 pounds. If four players are to be selected at random, the expected value of the random variable W, the total combined weight of the four players, is 992 pounds. The standard deviation of Wis approximately<|start_2d|>\"47 pounds\" <|split_2d|>\"67 pounds\" <|split_2d|>\"94 pounds\" <|split_2d|>\"141 pounds\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "94 pounds",
      "incorrect_answers": [
        "47 pounds",
        "67 pounds",
        "141 pounds"
      ]
    }
  },
  {
    "prompt": "\"A hypothesis test yields a P-value of 0.20.\" Which of the following best describes what is meant by this statement?<|start_2d|>\"The probability of getting a finding at least as extreme as that obtained by chance alone if the null hypothesis is true is 0.20.\" <|split_2d|>\"The probability of getting a finding as extreme as that obtained by chance alone from repeated random sampling is 0.20.\" <|split_2d|>\"The probability is 0.20 that our finding is significant.\" <|split_2d|>\"The probability of getting this finding is 0.20.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "The probability of getting a finding at least as extreme as that obtained by chance alone if the null hypothesis is true is 0.20.",
      "incorrect_answers": [
        "The probability of getting a finding as extreme as that obtained by chance alone from repeated random sampling is 0.20.",
        "The probability is 0.20 that our finding is significant.",
        "The probability of getting this finding is 0.20."
      ]
    }
  },
  {
    "prompt": "Which of the following is not true with regard to contingency tables for chi-square tests for independence?<|start_2d|>\"The categories are not numerical for either variable.\" <|split_2d|>\"Observed frequencies should be whole numbers.\" <|split_2d|>\"Expected frequencies should be whole numbers.\" <|split_2d|>\"Expected frequencies in each cell should be at least 5, and to achieve this, one sometimes combines categories for one or the other or both of the variables.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "Expected frequencies should be whole numbers.",
      "incorrect_answers": [
        "The categories are not numerical for either variable.",
        "Observed frequencies should be whole numbers.",
        "Expected frequencies in each cell should be at least 5, and to achieve this, one sometimes combines categories for one or the other or both of the variables."
      ]
    }
  },
  {
    "prompt": "The observed times (in minutes) it takes a runner to complete a marathon are normally distributed. The z-score for his running time this week is \u20132. Which one of the following statements is a correct interpretation of his z-score?<|start_2d|>\"This week his time was two minutes lower than his time last week.\" <|split_2d|>\"This week his time was two minutes lower than his best time ever.\" <|split_2d|>\"This week his time was two minutes lower than his average time.\" <|split_2d|>\"This week his time was two standard deviations lower than his average time.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "This week his time was two standard deviations lower than his average time.",
      "incorrect_answers": [
        "This week his time was two minutes lower than his time last week.",
        "This week his time was two minutes lower than his best time ever.",
        "This week his time was two minutes lower than his average time."
      ]
    }
  },
  {
    "prompt": "The mean Law School Aptitude Test (LSAT) score for applicants to a particular law school is 650 with a standard deviation of 45. Suppose that only applicants with scores above 700 are considered. What percentage of the applicants considered have scores below 740? (Assume the scores are normally distributed.)<|start_2d|>\"13.3%\" <|split_2d|>\"17.1%\" <|split_2d|>\"82.9%\" <|split_2d|>\"86.7%\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "82.9%",
      "incorrect_answers": [
        "13.3%",
        "17.1%",
        "86.7%"
      ]
    }
  },
  {
    "prompt": "On the same test, Mary and Pam scored at the 64th and 56th percentiles, respectively. Which of the following is a true statement?<|start_2d|>\"Mary scored eight more points than Pam.\" <|split_2d|>\"Mary's score is 8% higher than Pam's.\" <|split_2d|>\"Eight percent of those who took the test scored between Pam and Mary.\" <|split_2d|>\"Thirty-six people scored higher than both Mary and Pam.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "Eight percent of those who took the test scored between Pam and Mary.",
      "incorrect_answers": [
        "Mary scored eight more points than Pam.",
        "Mary's score is 8% higher than Pam's.",
        "Thirty-six people scored higher than both Mary and Pam."
      ]
    }
  },
  {
    "prompt": "A study looked at medical records of about 23,000 patients, mostly in Asia. They found that patients who drank at least three cups of green tea per day had a much lower incidence of depression. In an American newspaper article about this study, which of the following statements should not be made?<|start_2d|>\"It is possible that people who drink green tea also tend to have healthier lifestyles than those who don't.\" <|split_2d|>\"It is possible that people who drink green tea also tend to have a more positive outlook on life than those who don't.\" <|split_2d|>\"Because this is observational data, the association between drinking green tea and a lower incidence of depression is not an indication of a causal relationship.\" <|split_2d|>\"People who want to avoid depression should consider drinking green tea on a daily basis.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "People who want to avoid depression should consider drinking green tea on a daily basis.",
      "incorrect_answers": [
        "It is possible that people who drink green tea also tend to have healthier lifestyles than those who don't.",
        "It is possible that people who drink green tea also tend to have a more positive outlook on life than those who don't.",
        "Because this is observational data, the association between drinking green tea and a lower incidence of depression is not an indication of a causal relationship."
      ]
    }
  },
  {
    "prompt": "A researcher planning a survey of school principals in a particular state has lists of the school principals employed in each of the 125 school districts. The procedure is to obtain a random sample of principals from each of the districts rather than grouping all the lists together and obtaining a sample from the entire group. Which of the following is a correct conclusion?<|start_2d|>\"This is a simple random sample obtained in an easier and less costly manner than procedures involving sampling from the entire population of principals.\" <|split_2d|>\"This is a cluster sample in which the population was divided into heterogeneous groups called clusters.\" <|split_2d|>\"This is an example of systematic sampling, which gives a reasonable sample as long as the original order of the list is not related to the variables under consideration.\" <|split_2d|>\"This is a stratified sample, which may give comparative information that a simple random sample wouldn't give.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "This is a stratified sample, which may give comparative information that a simple random sample wouldn't give.",
      "incorrect_answers": [
        "This is a simple random sample obtained in an easier and less costly manner than procedures involving sampling from the entire population of principals.",
        "This is a cluster sample in which the population was divided into heterogeneous groups called clusters.",
        "This is an example of systematic sampling, which gives a reasonable sample as long as the original order of the list is not related to the variables under consideration."
      ]
    }
  },
  {
    "prompt": "In an experiment, the control group should receive<|start_2d|>\"treatment opposite that given the experimental group.\" <|split_2d|>\"the same treatment given the experimental group without knowing they are receiving the treatment.\" <|split_2d|>\"a procedure identical to that given the experimental group except for receiving the treatment under examination.\" <|split_2d|>\"a procedure identical to that given the experimental group except for a random decision on receiving the treatment under examination.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "a procedure identical to that given the experimental group except for receiving the treatment under examination.",
      "incorrect_answers": [
        "treatment opposite that given the experimental group.",
        "the same treatment given the experimental group without knowing they are receiving the treatment.",
        "a procedure identical to that given the experimental group except for a random decision on receiving the treatment under examination."
      ]
    }
  },
  {
    "prompt": "Tina\u2019s science fair project was to estimate the mean amount of chemicals in her city\u2019s water supply. At first, she had decided to use a random sample of 15 observations. But her teacher asked her to take 35 observations. The mean and standard deviation from 35 observations turned out to be approximately the same as those from 15 observations. Is there any advantage in using 35 observations instead of 15 observations?<|start_2d|>\"There is no advantage. Because the mean and the standard deviation are about the same, the confidence interval computed using 35 observations should be approximately the same as that computed using 15 observations.\" <|split_2d|>\"There is no advantage. In fact, the 20 extra observations will increase the likelihood of error.\" <|split_2d|>\"There is no advantage. Because she took 35 observations instead of 15 observations, the confidence interval using 35 observations will be wider than that using 15 observations.\" <|split_2d|>\"There is some advantage. Because she took 35 observations instead of 15 observations, the confidence interval using 35 observations will be narrower than that using 15 observations.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "There is some advantage. Because she took 35 observations instead of 15 observations, the confidence interval using 35 observations will be narrower than that using 15 observations.",
      "incorrect_answers": [
        "There is no advantage. Because the mean and the standard deviation are about the same, the confidence interval computed using 35 observations should be approximately the same as that computed using 15 observations.",
        "There is no advantage. In fact, the 20 extra observations will increase the likelihood of error.",
        "There is no advantage. Because she took 35 observations instead of 15 observations, the confidence interval using 35 observations will be wider than that using 15 observations."
      ]
    }
  },
  {
    "prompt": "The total cholesterol level in a large population of people is strongly skewed right with a mean of 210 mg/dL and a standard deviation of 15 mg/dL. If random samples of size 16 are repeatedly drawn from this population, which of the following appropriately describes the sampling distribution of these sample means?<|start_2d|>\"The shape is unknown with a mean of 210 and a standard deviation of 15.\" <|split_2d|>\"The shape is somewhat skewed right with a mean of 210 and a standard deviation of 3.75.\" <|split_2d|>\"The shape is approximately normal with a mean of 210 and a standard deviation of 15.\" <|split_2d|>\"The shape is approximately normal with a mean of 210 and a standard deviation of 3.75.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "The shape is somewhat skewed right with a mean of 210 and a standard deviation of 3.75.",
      "incorrect_answers": [
        "The shape is unknown with a mean of 210 and a standard deviation of 15.",
        "The shape is approximately normal with a mean of 210 and a standard deviation of 15.",
        "The shape is approximately normal with a mean of 210 and a standard deviation of 3.75."
      ]
    }
  },
  {
    "prompt": "An experiment is conducted in which the response variable is the average gain in participants' performance in the long jump. A two-sample t-test with a 5% level of significance will be used to analyze the results. If all else is kept the same, which of the following descriptions of a possible change in procedure is true?<|start_2d|>\"Change from equal size treatment groups to very different size treatment groups would increase the power of the test.\" <|split_2d|>\"Change from a 5% significance level to a 1% significance level would increase the power of the test.\" <|split_2d|>\"Taking more careful measurements to reduce variability in the response would increase the power of the test.\" <|split_2d|>\"Increasing the sample size would reduce the probability of a Type I error.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "Taking more careful measurements to reduce variability in the response would increase the power of the test.",
      "incorrect_answers": [
        "Change from equal size treatment groups to very different size treatment groups would increase the power of the test.",
        "Change from a 5% significance level to a 1% significance level would increase the power of the test.",
        "Increasing the sample size would reduce the probability of a Type I error."
      ]
    }
  },
  {
    "prompt": "An assembly line machine is supposed to turn out ball bearings with a diameter of 1.25 centimeters. Each morning the first 30 bearings produced are pulled and measured. If their mean diameter is under 1.23 centimeters or over 1.27 centimeters, the machinery is stopped and an engineer is called to make adjustments before production is resumed. The quality control procedure may be viewed as a hypothesis test with the null hypothesis H0: \u03bc = 1.25 and the alternative hypothesis Ha: \u03bc \u2260 1.25. The engineer is asked to make adjustments when the null hypothesis is rejected. In test terminology, what would a Type II error result in?<|start_2d|>\"A warranted halt in production to adjust the machinery\" <|split_2d|>\"An unnecessary stoppage of the production process\" <|split_2d|>\"Continued production of wrong size ball bearings\" <|split_2d|>\"Continued production of proper size ball bearings\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "Continued production of wrong size ball bearings",
      "incorrect_answers": [
        "A warranted halt in production to adjust the machinery",
        "An unnecessary stoppage of the production process",
        "Continued production of proper size ball bearings"
      ]
    }
  },
  {
    "prompt": "We are interested in the proportion p of people who are unemployed in a large city. Eight percent of a simple random sample of 500 people are unemployed. What is the midpoint for a 95% confidence interval estimate of p?<|start_2d|>\"0.012\" <|split_2d|>\"0.025\" <|split_2d|>\"0.475\" <|split_2d|>\"None of the above.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "None of the above.",
      "incorrect_answers": [
        "0.012",
        "0.025",
        "0.475"
      ]
    }
  },
  {
    "prompt": "An insurance company wishes to study the number of years drivers in a large city go between automobile accidents. They plan to obtain and analyze the data from a sample of drivers. Which of the following is a true statement?<|start_2d|>\"A reasonable time-and-cost-saving procedure would be to use systematic sampling on an available list of all AAA (Automobile Association of America) members in the city.\" <|split_2d|>\"A reasonable time-and-cost-saving procedure would be to randomly choose families and include all drivers in each of these families in the sample.\" <|split_2d|>\"To determine the mean number of years between accidents, randomness in choosing a sample of drivers is not important as long as the sample size is very large.\" <|split_2d|>\"None of the above are true statements.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "None of the above are true statements.",
      "incorrect_answers": [
        "A reasonable time-and-cost-saving procedure would be to use systematic sampling on an available list of all AAA (Automobile Association of America) members in the city.",
        "A reasonable time-and-cost-saving procedure would be to randomly choose families and include all drivers in each of these families in the sample.",
        "To determine the mean number of years between accidents, randomness in choosing a sample of drivers is not important as long as the sample size is very large."
      ]
    }
  },
  {
    "prompt": "Which of the following is the central limit theorem?<|start_2d|>\"No matter how the population is distributed, as the sample size increases, the mean of the sample means becomes closer to the mean of the population.\" <|split_2d|>\"No matter how the population is distributed, as the sample size increases, the standard deviation of the sample means becomes closer to the standard deviation of the population divided by the square root of the sample size.\" <|split_2d|>\"If the population is normally distributed, then as the sample size increases, the sampling distribution of the sample mean becomes closer to a normal distribution.\" <|split_2d|>\"The central limit theorem refers to something else.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "The central limit theorem refers to something else.",
      "incorrect_answers": [
        "No matter how the population is distributed, as the sample size increases, the mean of the sample means becomes closer to the mean of the population.",
        "No matter how the population is distributed, as the sample size increases, the standard deviation of the sample means becomes closer to the standard deviation of the population divided by the square root of the sample size.",
        "If the population is normally distributed, then as the sample size increases, the sampling distribution of the sample mean becomes closer to a normal distribution."
      ]
    }
  },
  {
    "prompt": "A manufacturer of ready-bake cake mixes is interested in designing an experiment to test the effects of four different temperature levels (300, 325, 350, and 375F), two different types of pans (glass and metal), and three different types of ovens (gas, electric, and microwave) on the texture of its cakes, in all combinations. Which of the following below is the best description of the design of the necessary experiment?<|start_2d|>\"A completely randomized design with nine treatment groups\" <|split_2d|>\"A completely randomized design with 24 treatment groups\" <|split_2d|>\"A randomized block design, blocked on temperature, with six treatment groups\" <|split_2d|>\"A randomized block design, blocked on type of pan, with 12 treatment groups\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "A completely randomized design with 24 treatment groups",
      "incorrect_answers": [
        "A completely randomized design with nine treatment groups",
        "A randomized block design, blocked on temperature, with six treatment groups",
        "A randomized block design, blocked on type of pan, with 12 treatment groups"
      ]
    }
  },
  {
    "prompt": "A study is made to determine whether more hours of academic studying leads to higher point scoring by basketball players. In surveying 50 basketball players, it is noted that the 25 who claim to study the most hours have a higher point average than the 25 who study less. Based on this study, the coach begins requiring the players to spend more time studying. Which of the following is a correct statement?<|start_2d|>\"While this study may have its faults, it still does prove causation.\" <|split_2d|>\"There could well be a confounding variable responsible for the seeming relationship.\" <|split_2d|>\"While this is a controlled experiment, the conclusion of the coach is not justified.\" <|split_2d|>\"To get the athletes to study more, it would be more meaningful to have them put in more practice time on the court to boost their point averages, as higher point averages seem to be associated with more study time.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "There could well be a confounding variable responsible for the seeming relationship.",
      "incorrect_answers": [
        "While this study may have its faults, it still does prove causation.",
        "While this is a controlled experiment, the conclusion of the coach is not justified.",
        "To get the athletes to study more, it would be more meaningful to have them put in more practice time on the court to boost their point averages, as higher point averages seem to be associated with more study time."
      ]
    }
  },
  {
    "prompt": "Biologists around the world have increased efforts at conservation. Monitoring wildlife populations is important so that appropriate management efforts can be implemented, but such monitoring is often difficult. One study found a cheap and easy way to estimate the number of nesting sites of terns (a type of seabird) by monitoring the number of calls heard per minute. More calls happen when there are more birds, as one would expect. In fact, it turned out that the number of calls explained 71% of the variation in the abundance of nests between breeding sites. Which of the following statements is correct about the correlation between the number of calls and the abundance of nests?<|start_2d|>\"The correlation coefficient is -0.71.\" <|split_2d|>\"The correlation coefficient is 0.71.\" <|split_2d|>\"The correlation coefficient is -0.84.\" <|split_2d|>\"The correlation coefficient is 0.84.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "The correlation coefficient is 0.84.",
      "incorrect_answers": [
        "The correlation coefficient is -0.71.",
        "The correlation coefficient is 0.71.",
        "The correlation coefficient is -0.84."
      ]
    }
  },
  {
    "prompt": "A company has 1000 employees evenly distributed throughout five assembly plants. A sample of 30 employees is to be chosen as follows. Each of the five managers will be asked to place the 200 time cards of their respective employees in a bag, shake them up, and randomly draw out six names. The six names from each plant will be put together to make up the sample. Will this method result in a simple random sample of the 1000 employees?<|start_2d|>\"Yes, because every employee has the same chance of being selected.\" <|split_2d|>\"Yes, because every plant is equally represented.\" <|split_2d|>\"Yes, because this is an example of stratified sampling, which is a special case of simple random sampling.\" <|split_2d|>\"No, because not every group of 30 employees has the same chance of being selected.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "No, because not every group of 30 employees has the same chance of being selected.",
      "incorrect_answers": [
        "Yes, because every employee has the same chance of being selected.",
        "Yes, because every plant is equally represented.",
        "Yes, because this is an example of stratified sampling, which is a special case of simple random sampling."
      ]
    }
  },
  {
    "prompt": "A doctor hopes that a new surgery technique will shorten the recovery time compared to the standard technique. To test this, he designed an experiment in which patients who required this type of surgery were randomly assigned to the standard technique or the new technique. Then the mean recovery time for each treatment group was compared. Assuming conditions for inference were met, which analysis should be used?<|start_2d|>\"A t-test for a mean.\" <|split_2d|>\"A t-test for a difference in means.\" <|split_2d|>\"A z-test for a mean.\" <|split_2d|>\"A z-test for a difference in means.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "A t-test for a difference in means.",
      "incorrect_answers": [
        "A t-test for a mean.",
        "A z-test for a mean.",
        "A z-test for a difference in means."
      ]
    }
  },
  {
    "prompt": "A researcher plans a study to examine long-term confidence in the U.S. economy among the adult population. She obtains a simple random sample of 30 adults as they leave a Wall Street office building one weekday afternoon. All but two of the adults agree to participate in the survey. Which of the following conclusions is correct?<|start_2d|>\"Proper use of chance as evidenced by the simple random sample makes this a well-designed survey.\" <|split_2d|>\"The high response rate makes this a well-designed survey.\" <|split_2d|>\"Selection bias makes this a poorly designed survey.\" <|split_2d|>\"A voluntary response study like this gives too much emphasis to persons with strong opinions.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "Selection bias makes this a poorly designed survey.",
      "incorrect_answers": [
        "Proper use of chance as evidenced by the simple random sample makes this a well-designed survey.",
        "The high response rate makes this a well-designed survey.",
        "A voluntary response study like this gives too much emphasis to persons with strong opinions."
      ]
    }
  },
  {
    "prompt": "To study the effect of music on piecework output at a clothing manufacturer, two experimental treatments are planned: day-long classical music for one group versus day-long light rock music for another. Which one of the following groups would serve best as a control for this study?<|start_2d|>\"A third group for which no music is played\" <|split_2d|>\"A third group that randomly hears either classical or light rock music each day\" <|split_2d|>\"A third group that hears day-long R & B music\" <|split_2d|>\"A third group that hears classical music every morning and light rock every afternoon\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "A third group for which no music is played",
      "incorrect_answers": [
        "A third group that randomly hears either classical or light rock music each day",
        "A third group that hears day-long R & B music",
        "A third group that hears classical music every morning and light rock every afternoon"
      ]
    }
  },
  {
    "prompt": "During the years 1886 through 2000 there were an average of 8.7 tropical cyclones per year, of which an average of 5.1 became hurricanes. Assuming that the probability of any cyclone becoming a hurricane is independent of what happens to any other cyclone, if there are five cyclones in one year, what is the probability that at least three become hurricanes?<|start_2d|>\"0.313\" <|split_2d|>\"0.345\" <|split_2d|>\"0.586\" <|split_2d|>\"0.658\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "0.658",
      "incorrect_answers": [
        "0.313",
        "0.345",
        "0.586"
      ]
    }
  },
  {
    "prompt": "A chi-squared test of independence is to be performed on a 3 \u00d7 4 contingency table. How many degrees of freedom does this test have?<|start_2d|>\"6\" <|split_2d|>\"8\" <|split_2d|>\"10\" <|split_2d|>\"11\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "6",
      "incorrect_answers": [
        "8",
        "10",
        "11"
      ]
    }
  },
  {
    "prompt": "At a large hospital, the durations of emergency room visits, from check-in to discharge, are approximately normally distributed with a mean of 167 minutes and a standard deviation of 76 minutes. Which interval, to the nearest minute, contains the middle 80% of durations?<|start_2d|>\"15 minutes to 319 minutes\" <|split_2d|>\"53 minutes to 281 minutes\" <|split_2d|>\"70 minutes to 264 minutes\" <|split_2d|>\"91 minutes to 243 minutes\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "70 minutes to 264 minutes",
      "incorrect_answers": [
        "15 minutes to 319 minutes",
        "53 minutes to 281 minutes",
        "91 minutes to 243 minutes"
      ]
    }
  },
  {
    "prompt": "The relation between studying time (in hours) and grade on a final exam (0-100) in a random sample of students in math class was found to be Grade = 50.25 + 10.2 (h) How will a student\u2019s grade be affected if she studies for two hours?<|start_2d|>\"Her grade will go up by 40.05 points.\" <|split_2d|>\"Her grade will go up by 20.4 points.\" <|split_2d|>\"Her grade will remain unchanged.\" <|split_2d|>\"Her grade will down by 10.2 points.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "Her grade will go up by 20.4 points.",
      "incorrect_answers": [
        "Her grade will go up by 40.05 points.",
        "Her grade will remain unchanged.",
        "Her grade will down by 10.2 points."
      ]
    }
  },
  {
    "prompt": "A producer of skin care products has created a new formula for its cream to cure acne. To compare the effectiveness of the new cream to that of the old cream, it conducted a double-blind randomized experiment. Volunteers with acne tried the old formula on one side of their face and the new formula on the other, and which side got which formula was determined randomly. The response variable was the difference in the number of pimples (old formula - new formula). Which is the correct significance test to perform?<|start_2d|>\"A two-proportion z-test\" <|split_2d|>\"A two-sample t-test\" <|split_2d|>\"A matched pairs t-test\" <|split_2d|>\"A chi-square test of independence\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "A matched pairs t-test",
      "incorrect_answers": [
        "A two-proportion z-test",
        "A two-sample t-test",
        "A chi-square test of independence"
      ]
    }
  },
  {
    "prompt": "A pharmaceutical company wants to test a new cholesterol-reducing drug against the previous drug. It does not anticipate much association between cholesterol level and gender, but it does anticipate an association between cholesterol level and the amount of exercise a person gets. For a randomized block design, it should:<|start_2d|>\"Block on gender because it is not associated with cholesterol level.\" <|split_2d|>\"Block on gender because males and females are different.\" <|split_2d|>\"Block on the type of drug because it may be associated with cholesterol level.\" <|split_2d|>\"Block on exercise level because it is associated with cholesterol level.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "Block on exercise level because it is associated with cholesterol level.",
      "incorrect_answers": [
        "Block on gender because it is not associated with cholesterol level.",
        "Block on gender because males and females are different.",
        "Block on the type of drug because it may be associated with cholesterol level."
      ]
    }
  },
  {
    "prompt": "An experiment is to be conducted to determine whether taking fish oil capsules or garlic capsules has more of an effect on cholesterol levels. In past studies it was noted that daily exercise intensity (low, moderate, high) is associated with cholesterol level, but average sleep length (< 5, 5 - 8, > 8 hours) is not associated with cholesterol level. This experiment should be done<|start_2d|>\"by blocking on exercise intensity\" <|split_2d|>\"by blocking on sleep length\" <|split_2d|>\"by blocking on cholesterol level\" <|split_2d|>\"by blocking on capsule type\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "by blocking on exercise intensity",
      "incorrect_answers": [
        "by blocking on sleep length",
        "by blocking on cholesterol level",
        "by blocking on capsule type"
      ]
    }
  },
  {
    "prompt": "A test for heartworm in dogs shows a positive result in 96% of dogs that actually have heartworm, and shows a negative result in 98% of dogs with no heartworm. If heartworm actually occurs in 10% of dogs, what is the probability that a randomly selected dog that tested positive for heartworm actually has heartworm?<|start_2d|>\"11%\" <|split_2d|>\"18%\" <|split_2d|>\"84%\" <|split_2d|>\"88%\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "84%",
      "incorrect_answers": [
        "11%",
        "18%",
        "88%"
      ]
    }
  },
  {
    "prompt": "Suppose X and Y are random variables with \u03bcx = 32, \u03c3x = 5, \u03bcy = 44, and \u03c3y = 12. Given that X and Y are independent, what are the mean and standard deviation of the random variable X + Y?<|start_2d|>\"\u03bcx+y = 76, \u03c3x+y = 8.5\" <|split_2d|>\"\u03bcx+y = 76, \u03c3x+y = 13\" <|split_2d|>\"\u03bcx+y = 76, \u03c3x+y = 17\" <|split_2d|>\"\u03bcx+y = 38, \u03c3x+y = 17\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "\u03bcx+y = 76, \u03c3x+y = 13",
      "incorrect_answers": [
        "\u03bcx+y = 76, \u03c3x+y = 8.5",
        "\u03bcx+y = 76, \u03c3x+y = 17",
        "\u03bcx+y = 38, \u03c3x+y = 17"
      ]
    }
  },
  {
    "prompt": "A high school sent a survey to a randomly selected sample of 75 of last year's graduating class. 27 of those selected did not return the survey. The best plan of action would be to:<|start_2d|>\"Use the surveys that were returned and change the sample size to 48 for the analysis.\" <|split_2d|>\"Use the surveys that were returned and leave the sample size at 75 for the analysis.\" <|split_2d|>\"Randomly select 27 additional class members and send the survey to them.\" <|split_2d|>\"Follow up with those that did not return the survey to encourage them to respond.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "Follow up with those that did not return the survey to encourage them to respond.",
      "incorrect_answers": [
        "Use the surveys that were returned and change the sample size to 48 for the analysis.",
        "Use the surveys that were returned and leave the sample size at 75 for the analysis.",
        "Randomly select 27 additional class members and send the survey to them."
      ]
    }
  },
  {
    "prompt": "Mr. Bee's statistics class had a standard deviation of 11.2 on a standardized test, while Mr. Em's class had a standard deviation of 5.6 on the same test. Which of the following is the most reasonable conclusion concerning the two classes' performance on the test?<|start_2d|>\"Mr. Bee's class is less heterogeneous than Mr. Em's.\" <|split_2d|>\"Mr. Em's class is more homogeneous than Mr. Bee's.\" <|split_2d|>\"Mr. Bee's class performed twice as well as Mr. Em's.\" <|split_2d|>\"Mr. Em's class did not do as well as Mr. Bee's.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "Mr. Em's class is more homogeneous than Mr. Bee's.",
      "incorrect_answers": [
        "Mr. Bee's class is less heterogeneous than Mr. Em's.",
        "Mr. Bee's class performed twice as well as Mr. Em's.",
        "Mr. Em's class did not do as well as Mr. Bee's."
      ]
    }
  },
  {
    "prompt": "When a patient complains to the doctor about a certain set of symptoms, the doctor diagnoses the patient with Condition A 15% of the time. If a patient with these symptoms is diagnosed with Condition A, he or she is diagnosed with Condition B 70% of the time. A patient with these symptoms that is not diagnosed with Condition A is diagnosed with Condition B 10% of the time. What is the probability that a patient with this set of symptoms will be diagnosed with at least one of these conditions?<|start_2d|>\"0.235\" <|split_2d|>\"0.25\" <|split_2d|>\"0.765\" <|split_2d|>\"0.85\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "0.235",
      "incorrect_answers": [
        "0.25",
        "0.765",
        "0.85"
      ]
    }
  },
  {
    "prompt": "A simple random sample is defined by<|start_2d|>\"the method of selection.\" <|split_2d|>\"examination of the outcome.\" <|split_2d|>\"both of the above.\" <|split_2d|>\"how representative the sample is of the population.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "the method of selection.",
      "incorrect_answers": [
        "examination of the outcome.",
        "both of the above.",
        "how representative the sample is of the population."
      ]
    }
  },
  {
    "prompt": "In one study on the effect that eating meat products has on weight level, an SRS of 500 subjects who admitted to eating meat at least once a day had their weights compared with those of an independent SRS of 500 people who claimed to be vegetarians. In a second study, an SRS of 500 subjects were served at least one meat meal per day for 6 months, while an independent SRS of 500 others were chosen to receive a strictly vegetarian diet for 6 months, with weights compared after 6 months.<|start_2d|>\"The first study is a controlled experiment, while the second is an observational study.\" <|split_2d|>\"The first study is an observational study, while the second is a controlled experiment.\" <|split_2d|>\"Both studies are controlled experiments.\" <|split_2d|>\"Both studies are observational studies.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "The first study is an observational study, while the second is a controlled experiment.",
      "incorrect_answers": [
        "The first study is a controlled experiment, while the second is an observational study.",
        "Both studies are controlled experiments.",
        "Both studies are observational studies."
      ]
    }
  },
  {
    "prompt": "Which of these is a correct description of the term?<|start_2d|>\"A factor is a response variable.\" <|split_2d|>\"Replication means the experiment should be repeated several times.\" <|split_2d|>\"Levels are the same as treatments.\" <|split_2d|>\"Experimental units are the same as subjects.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "Experimental units are the same as subjects.",
      "incorrect_answers": [
        "A factor is a response variable.",
        "Replication means the experiment should be repeated several times.",
        "Levels are the same as treatments."
      ]
    }
  },
  {
    "prompt": "In the jury pool available for this week, 30 percent of potential jurors are women. A particular trial requires that, out of a jury of 12, at least three are women. If a jury of 12 is to be selected at random from the pool, what is the probability it meets the requirements of this trial?<|start_2d|>\"0.168\" <|split_2d|>\"0.843\" <|split_2d|>\"0.915\" <|split_2d|>\"The answer cannot be determined without knowing the size of the jury pool.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "The answer cannot be determined without knowing the size of the jury pool.",
      "incorrect_answers": [
        "0.168",
        "0.843",
        "0.915"
      ]
    }
  },
  {
    "prompt": "An employment placement agency specializes in placing workers in jobs suited for them. From past experience, the agency knows that 20% of all the workers it places will no longer be at the position in which they were placed after one year; however, only 5% of those remaining after the first year leave during the next year. At the start of a year an employer hires 100 workers using this agency, then at the start of the next year the employer hires 100 more. How many of these 200 workers are expected to be on the job at the end of the second year?<|start_2d|>\"140\" <|split_2d|>\"144\" <|split_2d|>\"152\" <|split_2d|>\"156\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "156",
      "incorrect_answers": [
        "140",
        "144",
        "152"
      ]
    }
  },
  {
    "prompt": "A newspaper reporter examined police reports of accidents during the past 12 months to collect data about the speed of a car and its stopping distance. The reporter then constructed a scatterplot and computed a correlation coefficient to show the relation between a car\u2019s speed and its stopping distance. This is an example of<|start_2d|>\"a double-blind study\" <|split_2d|>\"a single-blind study\" <|split_2d|>\"a study involving no blinding at all\" <|split_2d|>\"an observational study\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "an observational study",
      "incorrect_answers": [
        "a double-blind study",
        "a single-blind study",
        "a study involving no blinding at all"
      ]
    }
  },
  {
    "prompt": "In which of the following situations is a binomial model not an appropriate model to describe the outcome?<|start_2d|>\"The number of heads in three tosses of a coin\" <|split_2d|>\"The number of rainy days in a given week\" <|split_2d|>\"The number of girls in a family of five children\" <|split_2d|>\"The number of students present in a class of 22\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "The number of rainy days in a given week",
      "incorrect_answers": [
        "The number of heads in three tosses of a coin",
        "The number of girls in a family of five children",
        "The number of students present in a class of 22"
      ]
    }
  },
  {
    "prompt": "Two possible wordings for a questionnaire on a proposed school budget increase are as follows:<|start_2d|>\"The first showed 58% and the second 13% because of the lack of randomization as evidenced by the wording of the questions.\" <|split_2d|>\"The first showed 13% and the second 58% because of a placebo effect due to the wording of the questions.\" <|split_2d|>\"The first showed 58% and the second 13% because of the lack of a control group.\" <|split_2d|>\"The first showed 58% and the second 13% because of response bias due to the wording of the questions.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "The first showed 58% and the second 13% because of response bias due to the wording of the questions.",
      "incorrect_answers": [
        "The first showed 58% and the second 13% because of the lack of randomization as evidenced by the wording of the questions.",
        "The first showed 13% and the second 58% because of a placebo effect due to the wording of the questions.",
        "The first showed 58% and the second 13% because of the lack of a control group."
      ]
    }
  },
  {
    "prompt": "Company I manufactures demolition fuses that burn an average of 50 minutes with a standard deviation of 10 minutes, while company II advertises fuses that burn an average of 55 minutes with a standard deviation of 5 minutes. Which company's fuse is more likely to last at least 1 hour? Assume normal distributions of fuse times.<|start_2d|>\"Company I's, because of its greater standard deviation\" <|split_2d|>\"Company II's, because of its greater mean\" <|split_2d|>\"For both companies, the probability that a fuse will last at least 1 hour is 0.159\" <|split_2d|>\"For both companies, the probability that a fuse will last at least 1 hour is 0.841\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "For both companies, the probability that a fuse will last at least 1 hour is 0.159",
      "incorrect_answers": [
        "Company I's, because of its greater standard deviation",
        "Company II's, because of its greater mean",
        "For both companies, the probability that a fuse will last at least 1 hour is 0.841"
      ]
    }
  },
  {
    "prompt": "In a random sample of Toyota car owners, 83 out of 112 said they were satisfied with the Toyota front-wheel drive, while in a similar survey of Subaru owners, 76 out of 81 said they were satisfied with the Subaru four-wheel drive. A 90% confidence interval estimate for the difference in proportions between Toyota and Subaru car owners who are satisfied with their drive systems is reported to be -0.197 \u00b1 0.081. Which is a proper conclusion?<|start_2d|>\"The interval is invalid because probabilities cannot be negative.\" <|split_2d|>\"The interval is invalid because it does not contain zero.\" <|split_2d|>\"Subaru owners are approximately 19.7% more satisfied with their drive systems than are Toyota owners.\" <|split_2d|>\"We are 90% confident that the difference in proportions between Toyota and Subaru car owners who are satisfied with their drive systems is between -0.278 and -0.116.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "We are 90% confident that the difference in proportions between Toyota and Subaru car owners who are satisfied with their drive systems is between -0.278 and -0.116.",
      "incorrect_answers": [
        "The interval is invalid because probabilities cannot be negative.",
        "The interval is invalid because it does not contain zero.",
        "Subaru owners are approximately 19.7% more satisfied with their drive systems than are Toyota owners."
      ]
    }
  },
  {
    "prompt": "Which of the following statements about the correlation coefficient is true?<|start_2d|>\"The correlation coefficient and the slope of the regression line may have opposite signs.\" <|split_2d|>\"A correlation of 1 indicates a perfect cause-and-effect relationship between the variables.\" <|split_2d|>\"Correlations of +0.87 and -0.87 indicate the same degree of clustering around the regression line.\" <|split_2d|>\"Correlation applies equally well to quantitative and categorical data.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "Correlations of +0.87 and -0.87 indicate the same degree of clustering around the regression line.",
      "incorrect_answers": [
        "The correlation coefficient and the slope of the regression line may have opposite signs.",
        "A correlation of 1 indicates a perfect cause-and-effect relationship between the variables.",
        "Correlation applies equally well to quantitative and categorical data."
      ]
    }
  },
  {
    "prompt": "A school board of a large school district is proposing a new dress code for students. Some students feel that this dress code unfairly targets female students. To see if there is a difference between boys and girls in their opposition to the new dress code, they conduct a poll of 60 randomly selected male and 70 randomly selected female high school students in the district. They find that 66 females oppose the dress code and 50 males oppose the dress code. Which of the following explains why a two-proportion z-test is not appropriate?<|start_2d|>\"The sample sizes are different.\" <|split_2d|>\"The sample sizes are too large.\" <|split_2d|>\"The number of successes and the number of failures for the two groups are not all large enough.\" <|split_2d|>\"The shapes of the population distributions are not known.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "The number of successes and the number of failures for the two groups are not all large enough.",
      "incorrect_answers": [
        "The sample sizes are different.",
        "The sample sizes are too large.",
        "The shapes of the population distributions are not known."
      ]
    }
  },
  {
    "prompt": "A soft drink dispenser can be adjusted to deliver any fixed number of ounces. If the machine is operating with a standard deviation in delivery equal to 0.3 ounce, what should be the mean setting so that a 12-ounce cup will overflow less than 1% of the time? Assume a normal distribution for ounces delivered.<|start_2d|>\"12 - 0.99(0.3) ounces\" <|split_2d|>\"12 - 2.326(0.3) ounces\" <|split_2d|>\"12 - 2.576(0.3) ounces\" <|split_2d|>\"12 + 2.326(0.3) ounces\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "12 - 2.326(0.3) ounces",
      "incorrect_answers": [
        "12 - 0.99(0.3) ounces",
        "12 - 2.576(0.3) ounces",
        "12 + 2.326(0.3) ounces"
      ]
    }
  },
  {
    "prompt": "The average number of calories in Yum-Yum Good candy bars is 210, with a standard deviation of 10. If the number of calories per candy bar is normally distributed, what percent of candy bars contain more than 225 calories?<|start_2d|>\"66.80%\" <|split_2d|>\"47.70%\" <|split_2d|>\"43.30%\" <|split_2d|>\"6.68%\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "6.68%",
      "incorrect_answers": [
        "66.80%",
        "47.70%",
        "43.30%"
      ]
    }
  },
  {
    "prompt": "The mean daily demand for bread at a popular bakery is 2,500 loaves, with a standard deviation of 225 loaves. Every morning the bakery bakes 3,000 loaves. What is the probability that today it will run out of bread? Assume that the mean daily demand for bread at this bakery is normally distributed.<|start_2d|>\"0.8333\" <|split_2d|>\"0.1667\" <|split_2d|>\"0.9869\" <|split_2d|>\"0.0132\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "0.0132",
      "incorrect_answers": [
        "0.8333",
        "0.1667",
        "0.9869"
      ]
    }
  },
  {
    "prompt": "Two dice are rolled simultaneously. If both dice show 6, then the player wins $20; otherwise the player loses the game. It costs $2.00 to play the game. What is the expected gain or loss per game?<|start_2d|>\"The player will gain about $0.55.\" <|split_2d|>\"The player will gain about $1.44.\" <|split_2d|>\"The player will lose about $0.55.\" <|split_2d|>\"The player will lose about $1.44.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "The player will lose about $1.44.",
      "incorrect_answers": [
        "The player will gain about $0.55.",
        "The player will gain about $1.44.",
        "The player will lose about $0.55."
      ]
    }
  },
  {
    "prompt": "Suppose you toss a fair die three times and it comes up an even number each time. Which of the following is a true statement?<|start_2d|>\"By the law of large numbers, the next toss is more likely to be an odd number than another even number.\" <|split_2d|>\"Based on the properties of conditional probability the next toss is more likely to be an even number given that three in a row have been even.\" <|split_2d|>\"Dice actually do have memories, and thus the number that comes up on the next toss will be influenced by the previous tosses.\" <|split_2d|>\"The probability that the next toss will again be even is 0.5.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "The probability that the next toss will again be even is 0.5.",
      "incorrect_answers": [
        "By the law of large numbers, the next toss is more likely to be an odd number than another even number.",
        "Based on the properties of conditional probability the next toss is more likely to be an even number given that three in a row have been even.",
        "Dice actually do have memories, and thus the number that comes up on the next toss will be influenced by the previous tosses."
      ]
    }
  },
  {
    "prompt": "Taxicabs in a metropolitan area are driven an average of 75,000 miles per year with a standard deviation of 12,000 miles. What is the probability that a randomly selected cab has been driven less than 100,000 miles if it is known that it has been driven over 80,000 miles? Assume a normal distribution of miles per year among cabs.<|start_2d|>\"0.06\" <|split_2d|>\"0.34\" <|split_2d|>\"0.66\" <|split_2d|>\"0.94\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "0.94",
      "incorrect_answers": [
        "0.06",
        "0.34",
        "0.66"
      ]
    }
  },
  {
    "prompt": "The registrar\u2019s office at a university has noticed that a large number of students fail to report a change of address. The registrar decides to take a random sample of 150 students from the current directory of students and determine the number of students with the correct addresses on record. He then uses this information to construct a 95 percent confidence interval. Which of the following statements must be true?<|start_2d|>\"The true proportion of students at this university with the correct address on record is within the confidence interval constructed.\" <|split_2d|>\"Ninety-five percent of students at this university have the correct address on record.\" <|split_2d|>\"The true proportion of students at this university with the correct addresses on record is within the confidence interval 95 percent of the time.\" <|split_2d|>\"The sample proportion of students at this university with the correct address on record is within the confidence interval.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "The sample proportion of students at this university with the correct address on record is within the confidence interval.",
      "incorrect_answers": [
        "The true proportion of students at this university with the correct address on record is within the confidence interval constructed.",
        "Ninety-five percent of students at this university have the correct address on record.",
        "The true proportion of students at this university with the correct addresses on record is within the confidence interval 95 percent of the time."
      ]
    }
  },
  {
    "prompt": "A botanist is interested in testing H0: \u03bc = 3.5 cm versus Ha: \u03bc > 3.5, where \u03bc = the mean petal length of one variety of flowers. A random sample of 50 petals gives significant results at a 5 percent level of significance. Which of the following statements about the confidence interval to estimate the mean petal length is true?<|start_2d|>\"The specified mean length of 3.5 cm is within a 90 percent confidence interval.\" <|split_2d|>\"The specified mean length of 3.5 cm is not within a 90 percent confidence interval.\" <|split_2d|>\"The specified mean length of 3.5 cm is below the lower limit of a 90 percent confidence interval.\" <|split_2d|>\"The specified mean length of 3.5 cm is below the lower limit of a 95 percent confidence interval.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "The specified mean length of 3.5 cm is not within a 90 percent confidence interval.",
      "incorrect_answers": [
        "The specified mean length of 3.5 cm is within a 90 percent confidence interval.",
        "The specified mean length of 3.5 cm is below the lower limit of a 90 percent confidence interval.",
        "The specified mean length of 3.5 cm is below the lower limit of a 95 percent confidence interval."
      ]
    }
  },
  {
    "prompt": "A dentist has noticed that about two children in every seven whom he sees professionally develop cavities before they turn 10 years old. Last week he examined the teeth of five unrelated children younger than 10. Let X be the number of children who develop cavities before turning 10. Which of the following gives the probability that at least one will develop a cavity before turning 10?<|start_2d|>\"P(X = 2, 3, 4, 5, 6, 7)\" <|split_2d|>\"P(X = 2 out of 7)\" <|split_2d|>\"P(X = 1)\" <|split_2d|>\"1 \u2013 P(X = 0)\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "1 \u2013 P(X = 0)",
      "incorrect_answers": [
        "P(X = 2, 3, 4, 5, 6, 7)",
        "P(X = 2 out of 7)",
        "P(X = 1)"
      ]
    }
  },
  {
    "prompt": "Which of the following is NOT true of the \u03c72 probability distribution function?<|start_2d|>\"The area under the \u03c72 curve is 1.\" <|split_2d|>\"\u03c72 is defined only for nonnegative values of the variable.\" <|split_2d|>\"For small degrees of freedom, the curve displays strong right-skewness.\" <|split_2d|>\"For the same \u03b1, as the number of degrees of freedom increases, the critical value for the rejection region decreases.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "For the same \u03b1, as the number of degrees of freedom increases, the critical value for the rejection region decreases.",
      "incorrect_answers": [
        "The area under the \u03c72 curve is 1.",
        "\u03c72 is defined only for nonnegative values of the variable.",
        "For small degrees of freedom, the curve displays strong right-skewness."
      ]
    }
  },
  {
    "prompt": "The relation between the selling price of a car (in $1,000) and its age (in years) is estimated from a random sample of cars of a specific model. The relation is given by the following formula: Selling price = 15.9 \u2013 0.983 (age) Which of the following can we conclude from this equation?<|start_2d|>\"For every year the car gets older, the selling price goes down by approximately 9.83 percent.\" <|split_2d|>\"A new car costs on the average $9,830.\" <|split_2d|>\"For every year the car gets older, the selling price drops by approximately $1,590.\" <|split_2d|>\"For every year the car gets older, the selling price drops by approximately $983.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "For every year the car gets older, the selling price drops by approximately $983.",
      "incorrect_answers": [
        "For every year the car gets older, the selling price goes down by approximately 9.83 percent.",
        "A new car costs on the average $9,830.",
        "For every year the car gets older, the selling price drops by approximately $1,590."
      ]
    }
  },
  {
    "prompt": "Consider a hypothesis test with H0 : \u03bc = 70 and Ha : \u03bc < 70. Which of the following choices of significance level and sample size results in the greatest power of the test when \u03bc = 65?<|start_2d|>\"\u03b1 = 0.05, n = 15\" <|split_2d|>\"\u03b1 = 0.01, n = 15\" <|split_2d|>\"\u03b1 = 0.05, n = 30\" <|split_2d|>\"\u03b1 = 0.01, n = 30\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "\u03b1 = 0.05, n = 30",
      "incorrect_answers": [
        "\u03b1 = 0.05, n = 15",
        "\u03b1 = 0.01, n = 15",
        "\u03b1 = 0.01, n = 30"
      ]
    }
  },
  {
    "prompt": "A farmer who raises hens for egg production wants his eggs to have a mean mass of 56 grams. He is considering the purchase of a different type of hen, so he took a random sample of 18 eggs laid by this type of hen. The distribution of the masses is symmetric and mound-shaped with a mean of 54.1 grams and no outliers. The farmer conducted a t-test to see if there is evidence that the eggs from these hens have a mean mass that is different from 56 g and got a test statistic of t= -1.973. If he uses a 5% significance level, which is the correct conclusion and reason?<|start_2d|>\"Because tis more extreme than \u00b11.96, he should reject the null hypothesis. He has convincing evidence at the 5% significance level that the mean mass of eggs from these hens is different from 56 grams.\" <|split_2d|>\"Because tis less extreme than the critical value of t for 17 degrees of freedom, he should not reject the null hypothesis. He does not have convincing evidence at the 5% significance level that the mean mass of eggs from these hens is different from 56 grams.\" <|split_2d|>\"Because tis less extreme than the critical value of t for 18 degrees of freedom, he should not reject the null hypothesis. He does not have convincing evidence at the 5% significance level that the mean mass of eggs from these hens is different from 56 grams.\" <|split_2d|>\"Because tis more extreme than the critical value of t for 18 degrees of freedom, he should reject the null hypothesis. He has convincing evidence at the 5% significance level that the mean mass of eggs from these hens is different from 56 grams.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "Because tis less extreme than the critical value of t for 17 degrees of freedom, he should not reject the null hypothesis. He does not have convincing evidence at the 5% significance level that the mean mass of eggs from these hens is different from 56 grams.",
      "incorrect_answers": [
        "Because tis more extreme than \u00b11.96, he should reject the null hypothesis. He has convincing evidence at the 5% significance level that the mean mass of eggs from these hens is different from 56 grams.",
        "Because tis less extreme than the critical value of t for 18 degrees of freedom, he should not reject the null hypothesis. He does not have convincing evidence at the 5% significance level that the mean mass of eggs from these hens is different from 56 grams.",
        "Because tis more extreme than the critical value of t for 18 degrees of freedom, he should reject the null hypothesis. He has convincing evidence at the 5% significance level that the mean mass of eggs from these hens is different from 56 grams."
      ]
    }
  },
  {
    "prompt": "A school committee member is lobbying for an increase in the gasoline tax to support the county school system. The local newspaper conducted a survey of county residents to assess their support for such an increase. What is the population of interest here?<|start_2d|>\"All school-aged children\" <|split_2d|>\"All county residents\" <|split_2d|>\"All county residents with school-aged children\" <|split_2d|>\"All county residents with children in the county school system\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "All county residents",
      "incorrect_answers": [
        "All school-aged children",
        "All county residents with school-aged children",
        "All county residents with children in the county school system"
      ]
    }
  },
  {
    "prompt": "To survey the opinions of the students at your high school, a researcher plans to select every twenty-fifth student entering the school in the morning. Assuming there are no absences, will this result in a simple random sample of students attending your school?<|start_2d|>\"Yes, because every student has the same chance of being selected.\" <|split_2d|>\"Yes, but only if there is a single entrance to the school.\" <|split_2d|>\"Yes, because the 24 out of every 25 students who are not selected will form a control group.\" <|split_2d|>\"No, because not every sample of the intended size has an equal chance of being selected.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "No, because not every sample of the intended size has an equal chance of being selected.",
      "incorrect_answers": [
        "Yes, because every student has the same chance of being selected.",
        "Yes, but only if there is a single entrance to the school.",
        "Yes, because the 24 out of every 25 students who are not selected will form a control group."
      ]
    }
  },
  {
    "prompt": "Which of the following is a true statement?<|start_2d|>\"While properly designed experiments can strongly suggest cause-and-effect relationships, a complete census is the only way of establishing such a relationship.\" <|split_2d|>\"If properly designed, observational studies can establish cause-and-effect relationships just as strongly as properly designed experiments.\" <|split_2d|>\"Controlled experiments are often undertaken later to establish cause-and-effect relationships first suggested by observational studies.\" <|split_2d|>\"A useful approach to overcome bias in observational studies is to increase the sample size.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "Controlled experiments are often undertaken later to establish cause-and-effect relationships first suggested by observational studies.",
      "incorrect_answers": [
        "While properly designed experiments can strongly suggest cause-and-effect relationships, a complete census is the only way of establishing such a relationship.",
        "If properly designed, observational studies can establish cause-and-effect relationships just as strongly as properly designed experiments.",
        "A useful approach to overcome bias in observational studies is to increase the sample size."
      ]
    }
  },
  {
    "prompt": "An automobile service station performs oil changes and tire replacements, as well as other services. Sixty percent of its customers request an oil change, 30 percent request tire replacements, and 10 percent request both. A customer requests an oil change. What is the probability this customer does not request tire replacements?<|start_2d|>\"0.42\" <|split_2d|>\"0.5\" <|split_2d|>\"0.7\" <|split_2d|>\"0.833\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "0.833",
      "incorrect_answers": [
        "0.42",
        "0.5",
        "0.7"
      ]
    }
  },
  {
    "prompt": "Which of the following statements is false?<|start_2d|>\"Like the normal distribution, the t-distributions are symmetric.\" <|split_2d|>\"The t-distributions are lower at the mean and higher at the tails, and so are more spread out than the normal distribution.\" <|split_2d|>\"The greater the df, the closer the t-distributions are to the normal distribution.\" <|split_2d|>\"The smaller the df, the better the 68-95-99.7 Rule works for t-models.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "The smaller the df, the better the 68-95-99.7 Rule works for t-models.",
      "incorrect_answers": [
        "Like the normal distribution, the t-distributions are symmetric.",
        "The t-distributions are lower at the mean and higher at the tails, and so are more spread out than the normal distribution.",
        "The greater the df, the closer the t-distributions are to the normal distribution."
      ]
    }
  },
  {
    "prompt": "A kidney dialysis center periodically checks a sample of its equipment and performs a major recalibration if readings are sufficiently off target. Similarly, a fabric factory periodically checks the sizes of towels coming off an assembly line and halts production if measurements are sufficiently off target. In both situations, we have the null hypothesis that the equipment is performing satisfactorily. For each situation, which is the more serious concern, a Type I or Type II error?<|start_2d|>\"Dialysis center: Type I error, towel manufacturer: Type I error\" <|split_2d|>\"Dialysis center: Type I error, towel manufacturer: Type II error\" <|split_2d|>\"Dialysis center: Type II error, towel manufacturer: Type I error\" <|split_2d|>\"Dialysis center: Type II error, towel manufacturer: Type II error\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "Dialysis center: Type II error, towel manufacturer: Type I error",
      "incorrect_answers": [
        "Dialysis center: Type I error, towel manufacturer: Type I error",
        "Dialysis center: Type I error, towel manufacturer: Type II error",
        "Dialysis center: Type II error, towel manufacturer: Type II error"
      ]
    }
  },
  {
    "prompt": "A company with offices in five different countries is interested in estimating the proportion of its employees in favor of banning smoking on the office premises. It is known that the views of people from different countries on issues like smoking tend to vary due to the influence of different local social structures. Which of the following is an appropriate sampling technique to use in this situation, and why?<|start_2d|>\"Stratified sample, because the population is divided into five strata\u2014namely, five offices in five different countries\" <|split_2d|>\"Cluster sample, because the population is divided into five clusters\u2014namely, five offices in five different countries\" <|split_2d|>\"Simple random sample, because this is the simplest sampling method\" <|split_2d|>\"Simple random sample, because this is the only method that gives unbiased results\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "Stratified sample, because the population is divided into five strata\u2014namely, five offices in five different countries",
      "incorrect_answers": [
        "Cluster sample, because the population is divided into five clusters\u2014namely, five offices in five different countries",
        "Simple random sample, because this is the simplest sampling method",
        "Simple random sample, because this is the only method that gives unbiased results"
      ]
    }
  },
  {
    "prompt": "Semir rolls a six-sided die every morning to determine what he will have for breakfast. If he rolls a 1 or 2, he takes time to cook himself a big breakfast. If he rolls a 3 or larger he grabs a quick lighter breakfast. When he cooks himself a big breakfast, there is a 15% chance he will be late for school. If he has a lighter breakfast, there is a 6% chance he will be late for school. What is the probability Semir will be on time for school any given day?<|start_2d|>\"0.09\" <|split_2d|>\"0.21\" <|split_2d|>\"0.8\" <|split_2d|>\"0.91\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "0.91",
      "incorrect_answers": [
        "0.09",
        "0.21",
        "0.8"
      ]
    }
  },
  {
    "prompt": "A teacher's union and a school district are negotiating salaries for the coming year. The teachers want more money, and the district, claiming, as always, budget constraints, wants to pay as little as possible. The district, like most, has a large number of moderately paid teachers and a few highly paid administrators. The salaries of all teachers and administrators are included in trying to figure out, on average, how much the professional staff currently earn. Which of the following would the teachers' union be most likely to quote during negotiations?<|start_2d|>\"The mean of all the salaries.\" <|split_2d|>\"The mode of all the salaries.\" <|split_2d|>\"The standard deviation of all the salaries.\" <|split_2d|>\"The median of all the salaries.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "The median of all the salaries.",
      "incorrect_answers": [
        "The mean of all the salaries.",
        "The mode of all the salaries.",
        "The standard deviation of all the salaries."
      ]
    }
  },
  {
    "prompt": "The Hardcore Construction Company has two offices, one in Atlanta and one in New Orleans. Fifteen engineers work in the Atlanta office, and 14 engineers work in the New Orleans office. The business manager decided to use a 2-sample t-test to compare the mean salaries of engineers in the two offices. Because there were only 15 engineers in one office and 14 engineers in the other, he used the salaries of all the engineers in the computation. Is the 2-sample t-test an appropriate inferential technique in this situation?<|start_2d|>\"Yes, because he is comparing the means of two small groups.\" <|split_2d|>\"Yes. Both Atlanta and New Orleans are large cities, so the salaries are comparable.\" <|split_2d|>\"Yes. Because Atlanta and New Orleans are about 500 miles apart, the two groups of engineers can be assumed to be independent.\" <|split_2d|>\"No, because the entire population information was used from both offices. Because no samples were taken, a t-test should not be used.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "No, because the entire population information was used from both offices. Because no samples were taken, a t-test should not be used.",
      "incorrect_answers": [
        "Yes, because he is comparing the means of two small groups.",
        "Yes. Both Atlanta and New Orleans are large cities, so the salaries are comparable.",
        "Yes. Because Atlanta and New Orleans are about 500 miles apart, the two groups of engineers can be assumed to be independent."
      ]
    }
  },
  {
    "prompt": "For the regression line, which of the following statements about residuals is true?<|start_2d|>\"Influential scores have large residuals.\" <|split_2d|>\"If the linear model is good, the number of positive residuals will be the same as the number of negative residuals.\" <|split_2d|>\"The mean of the residuals is always zero.\" <|split_2d|>\"If the correlation is 0, there will be a distinct pattern in the residual plot.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "The mean of the residuals is always zero.",
      "incorrect_answers": [
        "Influential scores have large residuals.",
        "If the linear model is good, the number of positive residuals will be the same as the number of negative residuals.",
        "If the correlation is 0, there will be a distinct pattern in the residual plot."
      ]
    }
  },
  {
    "prompt": "A reading specialist in a large public school system believes that the more time students spend reading, the better they will do in school. She plans a middle school experiment in which an SRS of 30 eighth graders will be assigned four extra hours of reading per week, an SRS of 30 seventh graders will be assigned two extra hours of reading per week, and an SRS of 30 sixth graders with no extra assigned reading will be a control group. After one school year, the mean GPAs from each group will be compared. Is this a good experimental design?<|start_2d|>\"Yes.\" <|split_2d|>\"No, because while this design may point out an association between reading and GPA, it cannot establish a cause-and-effect relationship.\" <|split_2d|>\"No, because without blinding, there is a strong chance of a placebo effect.\" <|split_2d|>\"No, because grade level is a lurking variable which may well be confounded with the variables under consideration.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "No, because grade level is a lurking variable which may well be confounded with the variables under consideration.",
      "incorrect_answers": [
        "Yes.",
        "No, because while this design may point out an association between reading and GPA, it cannot establish a cause-and-effect relationship.",
        "No, because without blinding, there is a strong chance of a placebo effect."
      ]
    }
  },
  {
    "prompt": "What is a sampling distribution?<|start_2d|>\"A distribution of all the statistics that can be found in a given sample\" <|split_2d|>\"A histogram, or other such visual representation, showing the distribution of a sample\" <|split_2d|>\"A normal distribution of some statistic\" <|split_2d|>\"A distribution of all the values taken by a statistic from all possible samples of a given size\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "A distribution of all the values taken by a statistic from all possible samples of a given size",
      "incorrect_answers": [
        "A distribution of all the statistics that can be found in a given sample",
        "A histogram, or other such visual representation, showing the distribution of a sample",
        "A normal distribution of some statistic"
      ]
    }
  },
  {
    "prompt": "A farmer wants to know whether a new fertilizer has increased the mean weight of his apples. With the old fertilizer, the mean weight was 4.0 ounces per apple. The farmer decides to test H0: \u03bc = 4.0 ounces versus Ha : \u03bc > 4.0 ounces, at a 5 percent level of significance, where \u03bc = the mean weight of apples using the new fertilizer. The weights of apples are approximately normally distributed. The farmer takes a random sample of 16 apples and computes a mean of 4.3 ounces and a standard deviation of 0.6 ounces. Which of the following gives the p-value for this test?<|start_2d|>\"P(Z > 2)\" <|split_2d|>\"P(Z < 2)\" <|split_2d|>\"P(t > 2) with 15 degrees of freedom\" <|split_2d|>\"P(t < 2) with 15 degrees of freedom\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "P(t > 2) with 15 degrees of freedom",
      "incorrect_answers": [
        "P(Z > 2)",
        "P(Z < 2)",
        "P(t < 2) with 15 degrees of freedom"
      ]
    }
  },
  {
    "prompt": "In a study on the effect of music on worker productivity, employees were told that a different genre of background music would be played each day and the corresponding production outputs noted. Every change in music resulted in an increase in production. This is an example of<|start_2d|>\"the effect of a treatment unit.\" <|split_2d|>\"the placebo effect.\" <|split_2d|>\"the control group effect.\" <|split_2d|>\"sampling error.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "the placebo effect.",
      "incorrect_answers": [
        "the effect of a treatment unit.",
        "the control group effect.",
        "sampling error."
      ]
    }
  },
  {
    "prompt": "In a clinical trial, 30 sickle cell anemia patients are randomly assigned to two groups. One group receives the currently marketed medicine, and the other group receives an experimental medicine. Each week, patients report to the clinic where blood tests are conducted. The lab technician is unaware of the kind of medicine the patient is taking. This design can be described as<|start_2d|>\"a completely randomized design, with the currently marketed medicine and the experimental medicine as two treatments\" <|split_2d|>\"a matched-pairs design, with the currently marketed medicine and the experimental medicine forming a pair\" <|split_2d|>\"a randomized block design, with the currently marketed medicine and the experimental medicine as two blocks\" <|split_2d|>\"a randomized block design, with the currently marketed medicine and the experimental medicine as two treatments\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "a completely randomized design, with the currently marketed medicine and the experimental medicine as two treatments",
      "incorrect_answers": [
        "a matched-pairs design, with the currently marketed medicine and the experimental medicine forming a pair",
        "a randomized block design, with the currently marketed medicine and the experimental medicine as two blocks",
        "a randomized block design, with the currently marketed medicine and the experimental medicine as two treatments"
      ]
    }
  },
  {
    "prompt": "Suppose that the weights of trucks traveling on the interstate highway system are normally distributed. If 70% of the trucks weigh more than 12,000 pounds and 80% weigh more than 10,000 pounds, what are the mean and standard deviation for the weights of trucks traveling on the interstate system?<|start_2d|>\"\u03bc = 14,900; \u03c3 = 6100\" <|split_2d|>\"\u03bc = 15,100; \u03c3 = 6200\" <|split_2d|>\"\u03bc = 15,300; \u03c3 = 6300\" <|split_2d|>\"\u03bc = 15,500; \u03c3 = 6400\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "\u03bc = 15,300; \u03c3 = 6300",
      "incorrect_answers": [
        "\u03bc = 14,900; \u03c3 = 6100",
        "\u03bc = 15,100; \u03c3 = 6200",
        "\u03bc = 15,500; \u03c3 = 6400"
      ]
    }
  },
  {
    "prompt": "An insurance agent is successful in selling a life insurance policy to 20 percent of the customers he contacts. He decides to construct a simulation to estimate the mean number of customers he needs to contact before being able to sell a policy. Which of the following schemes should he use to do the simulation?<|start_2d|>\"Assign numbers 0, 1 to successfully selling a policy to a customer and numbers 2, 3, 4, 5, 6, 7, 8, 9 to failing to sell a policy to a customer.\" <|split_2d|>\"Assign numbers 0, 1 to successfully selling a policy to a customer and numbers 2, 3, 4 to failing to sell a policy to a customer.\" <|split_2d|>\"Assign number 0 to successfully selling a policy to a customer and number 1 to failing to sell a policy to a customer.\" <|split_2d|>\"Assign numbers 0, 1, 2, 3, 4 to successfully selling a policy to a customer and numbers 5, 6, 7, 8, 9 to failing to sell a policy to a customer.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "Assign numbers 0, 1 to successfully selling a policy to a customer and numbers 2, 3, 4, 5, 6, 7, 8, 9 to failing to sell a policy to a customer.",
      "incorrect_answers": [
        "Assign numbers 0, 1 to successfully selling a policy to a customer and numbers 2, 3, 4 to failing to sell a policy to a customer.",
        "Assign number 0 to successfully selling a policy to a customer and number 1 to failing to sell a policy to a customer.",
        "Assign numbers 0, 1, 2, 3, 4 to successfully selling a policy to a customer and numbers 5, 6, 7, 8, 9 to failing to sell a policy to a customer."
      ]
    }
  },
  {
    "prompt": "The midterm scores for a statistics course were approximately normally distributed with a mean of 52 points and a standard deviation of 4 points. The final exam scores were approximately normally distributed with a mean of 112 points and a standard deviation of 10 points. One student had a score of 58 points on the midterm. If she had the same standardized score (z-score) on the final exam, what must her score have been?<|start_2d|>\"15 points\" <|split_2d|>\"58 points\" <|split_2d|>\"118 points\" <|split_2d|>\"127 points\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "127 points",
      "incorrect_answers": [
        "15 points",
        "58 points",
        "118 points"
      ]
    }
  },
  {
    "prompt": "The sampling distribution of the sample mean is close to the normal distribution<|start_2d|>\"only if both the original population has a normal distribution and n is large.\" <|split_2d|>\"if the standard deviation of the original population is known.\" <|split_2d|>\"if n is large, no matter what the distribution of the original population.\" <|split_2d|>\"no matter what the value of n or what the distribution of the original population.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "if n is large, no matter what the distribution of the original population.",
      "incorrect_answers": [
        "only if both the original population has a normal distribution and n is large.",
        "if the standard deviation of the original population is known.",
        "no matter what the value of n or what the distribution of the original population."
      ]
    }
  },
  {
    "prompt": "Suppose H0: p = 0.6, and the power of the test for Ha: p = 0.7 is 0.8. Which of the following is a valid conclusion?<|start_2d|>\"The probability of committing a Type I error is 0.1.\" <|split_2d|>\"If Ha is true, the probability of failing to reject H0 is 0.2.\" <|split_2d|>\"The probability of committing a Type II error is 0.3.\" <|split_2d|>\"All of the above are valid conclusions.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "If Ha is true, the probability of failing to reject H0 is 0.2.",
      "incorrect_answers": [
        "The probability of committing a Type I error is 0.1.",
        "The probability of committing a Type II error is 0.3.",
        "All of the above are valid conclusions."
      ]
    }
  },
  {
    "prompt": "In one metropolitan region, technical writers have an annual salary that is approximately normally distributed with a mean of $55,800. The first quartile of salaries is $48,815. What is the standard deviation?<|start_2d|>\"$6,984 \" <|split_2d|>\"$10,356 \" <|split_2d|>\"$10,476 \" <|split_2d|>\"$13,968 \" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "$10,356 ",
      "incorrect_answers": [
        "$6,984 ",
        "$10,476 ",
        "$13,968 "
      ]
    }
  },
  {
    "prompt": "The amount of rainfall per month in a certain city is approximately normally distributed, with a mean of six inches and a standard deviation of 1.6 inches. Which of the following is the highest amount of rainfall, in inches, this city could have this month for the month to be among the 10percent driest months the city has seen?<|start_2d|>\"8.05\" <|split_2d|>\"7.6\" <|split_2d|>\"3.95\" <|split_2d|>\"3.37\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "3.95",
      "incorrect_answers": [
        "8.05",
        "7.6",
        "3.37"
      ]
    }
  },
  {
    "prompt": "Which of the following is most useful in establishing cause-and-effect relationships?<|start_2d|>\"A complete census\" <|split_2d|>\"A least squares regression line showing high correlation\" <|split_2d|>\"A simple random sample (SRS)\" <|split_2d|>\"An experiment\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "An experiment",
      "incorrect_answers": [
        "A complete census",
        "A least squares regression line showing high correlation",
        "A simple random sample (SRS)"
      ]
    }
  },
  {
    "prompt": "A sports talk show asks people to call in and give their opinion of the officiating in the local basketball team's most recent loss. What will most likely be the typical reaction?<|start_2d|>\"They will most likely feel that the officiating could have been better, but that it was the team's poor play, not the officiating, that was primarily responsible for the loss.\" <|split_2d|>\"They would most likely call for the team to get some new players to replace the current ones.\" <|split_2d|>\"The team probably wouldn't have lost if the officials had been doing their job.\" <|split_2d|>\"Because the team had been foul-plagued all year, the callers would most likely support the officials.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "The team probably wouldn't have lost if the officials had been doing their job.",
      "incorrect_answers": [
        "They will most likely feel that the officiating could have been better, but that it was the team's poor play, not the officiating, that was primarily responsible for the loss.",
        "They would most likely call for the team to get some new players to replace the current ones.",
        "Because the team had been foul-plagued all year, the callers would most likely support the officials."
      ]
    }
  },
  {
    "prompt": "Alfred and Ben don't know each other but are each considering asking the lovely Charlene to the school prom. The probability that at least one of them will ask her is 0.72. The probability that they both ask her is 0.18. The probability that Alfred asks her is 0.6. What is the probability that Ben asks Charlene to the prom?<|start_2d|>\"0.78\" <|split_2d|>\"0.3\" <|split_2d|>\"0.24\" <|split_2d|>\"0.48\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "0.3",
      "incorrect_answers": [
        "0.78",
        "0.24",
        "0.48"
      ]
    }
  },
  {
    "prompt": "A random sample of 25 men and a separate random sample of 25 women are selected to answer questions about attitudes toward abortion. The answers were categorized as \"pro-life\" or \"pro-choice.\" Which of the following is the proper null hypothesis for this situation?<|start_2d|>\"The variables 'gender' and 'attitude toward abortion' are related.\" <|split_2d|>\"The proportion of 'pro-life' men is the same as the proportion of 'pro-life' women.\" <|split_2d|>\"The proportion of 'pro-life' men is related to the proportion of 'pro-life' women.\" <|split_2d|>\"The proportion of 'pro-choice' men is the same as the proportion of 'pro-life' women.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "The proportion of \"pro-life\" men is the same as the proportion of \"pro-life\" women.",
      "incorrect_answers": [
        "The variables \"gender\" and \"attitude toward abortion\" are related.",
        "The proportion of \"pro-life\" men is related to the proportion of \"pro-life\" women.",
        "The proportion of \"pro-choice\" men is the same as the proportion of \"pro-life\" women."
      ]
    }
  },
  {
    "prompt": "A 2008 survey of 500 households concluded that 82% of the population uses grocery coupons. Which of the following best describes what is meant by the poll having a margin of error of 3%?<|start_2d|>\"Three percent of those surveyed refused to participate in the poll.\" <|split_2d|>\"It would not be unexpected for 3% of the population to begin using coupons or stop using coupons.\" <|split_2d|>\"Between 395 and 425 of the 500 households surveyed responded that they used grocery coupons.\" <|split_2d|>\"It is likely that between 79% and 85% of the population use grocery coupons.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "It is likely that between 79% and 85% of the population use grocery coupons.",
      "incorrect_answers": [
        "Three percent of those surveyed refused to participate in the poll.",
        "It would not be unexpected for 3% of the population to begin using coupons or stop using coupons.",
        "Between 395 and 425 of the 500 households surveyed responded that they used grocery coupons."
      ]
    }
  },
  {
    "prompt": "A hypothesis test is set up so that P(rejecting H0 when H0 is true) = 0.05 and P(failing to reject H0 when H0 is false) = 0.26. What is the power of the test?<|start_2d|>\"0.26\" <|split_2d|>\"0.05\" <|split_2d|>\"0.95\" <|split_2d|>\"0.74\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "0.74",
      "incorrect_answers": [
        "0.26",
        "0.05",
        "0.95"
      ]
    }
  },
  {
    "prompt": "For a class project, Charlotte recorded the heights of all 28 students in her class and calculated several statistics. She then realized she made an error recording the height of the tallest person in the class. She correctly had him listed as the tallest, but needed to add two inches to his recorded height to correct it. Which of these measures of spread must remain unchanged?<|start_2d|>\"Mean absolute deviation\" <|split_2d|>\"Standard deviation\" <|split_2d|>\"Variance\" <|split_2d|>\"Interquartile range\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "Interquartile range",
      "incorrect_answers": [
        "Mean absolute deviation",
        "Standard deviation",
        "Variance"
      ]
    }
  },
  {
    "prompt": "The main purpose of blocking in an experiment is to:<|start_2d|>\"reduce bias.\" <|split_2d|>\"reduce confounding.\" <|split_2d|>\"reduce variation within treatments.\" <|split_2d|>\"reduce variation between treatments.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "reduce variation within treatments.",
      "incorrect_answers": [
        "reduce bias.",
        "reduce confounding.",
        "reduce variation between treatments."
      ]
    }
  },
  {
    "prompt": "A manufacturer of motor oil is interested in testing the effects of a newly developed additive on the lifespan of an engine. Twenty-five different engine types are selected at random and each one is tested using oil with the additive and oil without the additive. What type of analysis will yield the most useful information?<|start_2d|>\"Matched pairs comparison of population proportions\" <|split_2d|>\"Matched pairs comparison of population means\" <|split_2d|>\"Independent samples comparison of population proportions\" <|split_2d|>\"Independent samples comparison of population means\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "Matched pairs comparison of population means",
      "incorrect_answers": [
        "Matched pairs comparison of population proportions",
        "Independent samples comparison of population proportions",
        "Independent samples comparison of population means"
      ]
    }
  },
  {
    "prompt": "To determine the average cost of running for a congressional seat, a simple random sample of 50 politicians is chosen and the politicians' records examined. The cost figures show a mean of $125,000 with a standard deviation of $32,000. Which of the following is the best interpretation of a 90% confidence interval estimate for the average cost of running for office?<|start_2d|>\"90% of politicians running for a congressional seat spend between $117,500 and $132,500.\" <|split_2d|>\"90% of politicians running for a congressional seat spend a mean dollar amount that is between $117,500 and $132,500.\" <|split_2d|>\"We are 90% confident that politicians running for a congressional seat spend between $117,500 and $132,500.\" <|split_2d|>\"We are 90% confident that politicians running for a congressional seat spend a mean dollar amount between $117,500 and $132,500.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "We are 90% confident that politicians running for a congressional seat spend a mean dollar amount between $117,500 and $132,500.",
      "incorrect_answers": [
        "90% of politicians running for a congressional seat spend between $117,500 and $132,500.",
        "90% of politicians running for a congressional seat spend a mean dollar amount that is between $117,500 and $132,500.",
        "We are 90% confident that politicians running for a congressional seat spend between $117,500 and $132,500."
      ]
    }
  },
  {
    "prompt": "A filling machine puts an average of four ounces of coffee in jars, with a standard deviation of 0.25 ounces. Forty jars filled by this machine are selected at random. What is the probability that the mean amount per jar filled in the sampled jars is less than 3.9 ounces?<|start_2d|>\"0.0057\" <|split_2d|>\"0.0225\" <|split_2d|>\"0.025\" <|split_2d|>\"0.05\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "0.0057",
      "incorrect_answers": [
        "0.0225",
        "0.025",
        "0.05"
      ]
    }
  },
  {
    "prompt": "A town has one high school, which buses students from urban, suburban, and rural communities. Which of the following samples is recommended in studying attitudes toward tracking of students in honors, regular, and below-grade classes?<|start_2d|>\"Convenience sample\" <|split_2d|>\"Simple random sample (SRS)\" <|split_2d|>\"Stratified sample\" <|split_2d|>\"Systematic sample\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "Stratified sample",
      "incorrect_answers": [
        "Convenience sample",
        "Simple random sample (SRS)",
        "Systematic sample"
      ]
    }
  },
  {
    "prompt": "A shipment of resistors have an average resistance of 200 ohms with a standard deviation of 5 ohms, and the resistances are normally distributed. Suppose a randomly chosen resistor has a resistance under 194 ohms. What is the probability that its resistance is greater than 188 ohms?<|start_2d|>\"0.07\" <|split_2d|>\"0.12\" <|split_2d|>\"0.50\" <|split_2d|>\"0.93\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "0.93",
      "incorrect_answers": [
        "0.07",
        "0.12",
        "0.50"
      ]
    }
  },
  {
    "prompt": "A major polling organization wants to predict the outcome of an upcoming national election (in terms of the proportion of voters who will vote for each candidate). They intend to use a 95% confidence interval with margin of error of no more than 2.5%. What is the minimum sample size needed to accomplish this goal?<|start_2d|>\"1536\" <|split_2d|>\"39\" <|split_2d|>\"1537\" <|split_2d|>\"40\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "1537",
      "incorrect_answers": [
        "1536",
        "39",
        "40"
      ]
    }
  },
  {
    "prompt": "Given that the sample has a standard deviation of zero, which of the following is a true statement?<|start_2d|>\"The standard deviation of the population is also zero.\" <|split_2d|>\"The sample mean and sample median are equal.\" <|split_2d|>\"The sample may have outliers.\" <|split_2d|>\"The population has a symmetric distribution.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "The sample mean and sample median are equal.",
      "incorrect_answers": [
        "The standard deviation of the population is also zero.",
        "The sample may have outliers.",
        "The population has a symmetric distribution."
      ]
    }
  },
  {
    "prompt": "The number of customers served per day by a large department store is normally distributed, with a mean of 3,250 customers and a standard deviation of 320. Find the range of customers served on the middle 50 percent of days.<|start_2d|>\"(3,034, 3,466)\" <|split_2d|>\"(2,930, 3,570)\" <|split_2d|>\"(2,610, 3,890)\" <|split_2d|>\"(2,450, 4,050)\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "(3,034, 3,466)",
      "incorrect_answers": [
        "(2,930, 3,570)",
        "(2,610, 3,890)",
        "(2,450, 4,050)"
      ]
    }
  },
  {
    "prompt": "Suppose that 54% of the graduates from your high school go on to 4-year colleges, 20% go on to 2-year colleges, 19% find employment, and the remaining 7% search for a job. If a randomly selected student is not going on to a 2-year college, what is the probability she will be going on to a 4-year college?<|start_2d|>\"0.460\" <|split_2d|>\"0.540\" <|split_2d|>\"0.630\" <|split_2d|>\"0.675\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "0.675",
      "incorrect_answers": [
        "0.460",
        "0.540",
        "0.630"
      ]
    }
  },
  {
    "prompt": "The Department of Health plans to test the lead level in a specific park. Because a high lead level is harmful to children, the park will be closed if the lead level exceeds the allowed limit. The department randomly selects several locations in the park, gets soil samples from those locations, and tests the samples for their lead levels. Which of the following decisions would result from the type I error?<|start_2d|>\"Closing the park when the lead levels are within the allowed limit\" <|split_2d|>\"Keeping the park open when the lead levels are in excess of the allowed limit\" <|split_2d|>\"Closing the park when the lead levels are in excess of the allowed limit\" <|split_2d|>\"Keeping the park open when the lead levels are within the allowed limit\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "Closing the park when the lead levels are within the allowed limit",
      "incorrect_answers": [
        "Keeping the park open when the lead levels are in excess of the allowed limit",
        "Closing the park when the lead levels are in excess of the allowed limit",
        "Keeping the park open when the lead levels are within the allowed limit"
      ]
    }
  },
  {
    "prompt": "For a class project, a student wants to see if boys and girls at their large high school differ in the number of contacts they have stored in their phone. The student conducts a survey of 50 randomly sampled boys and 40 randomly selected girls, and asks them to report the number of contacts. Which of the following is true about this situation?<|start_2d|>\"Because the population standard deviations are not known and conditions are met, the student shoulduse a two-sample t-test.\" <|split_2d|>\"Because the sample sizes are different, the student should notuse a two-sample t-test.\" <|split_2d|>\"Because the sample sizes are both greater than 30, the student should notuse a two-sample t-test.\" <|split_2d|>\"Because the shape of the population distribution is not known, the student should notuse a two-sample t-test.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "Because the shape of the population distribution is not known, the student should notuse a two-sample t-test.",
      "incorrect_answers": [
        "Because the population standard deviations are not known and conditions are met, the student shoulduse a two-sample t-test.",
        "Because the sample sizes are different, the student should notuse a two-sample t-test.",
        "Because the sample sizes are both greater than 30, the student should notuse a two-sample t-test."
      ]
    }
  },
  {
    "prompt": "The probability is 0.2 that a value selected at random from a normal distribution with mean 600 and standard deviation 15 will be above what number?<|start_2d|>\"0.84\" <|split_2d|>\"603.8\" <|split_2d|>\"612.6\" <|split_2d|>\"587.4\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "612.6",
      "incorrect_answers": [
        "0.84",
        "603.8",
        "587.4"
      ]
    }
  },
  {
    "prompt": "In a group of 10 scores, the largest score is increased by 40 points. What will happen to the mean?<|start_2d|>\"It will remain the same.\" <|split_2d|>\"It will increase by 4 points.\" <|split_2d|>\"It will increase by 10 points.\" <|split_2d|>\"It will increase by 40 points.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "It will increase by 4 points.",
      "incorrect_answers": [
        "It will remain the same.",
        "It will increase by 10 points.",
        "It will increase by 40 points."
      ]
    }
  },
  {
    "prompt": "The distribution of salaries of a county school system with 4,752 employees is known to be right skewed, with the superintendent\u2019s salary an outlier on the higher side. A random sample of 20 employees was taken and their salaries recorded. A 95 percent t-confidence interval for the mean salary of the county school system employees is ($15,360, $32,470). The t-confidence interval is not appropriate in this situation because<|start_2d|>\"the sample size is too small compared to the number of employees in the system to give a fair representation\" <|split_2d|>\"the skewed salary distribution tells us that assumption of normality of the sampled population will not be satisfied\" <|split_2d|>\"the population standard deviation of the salaries of employees of this county school system is not known\" <|split_2d|>\"the teacher salaries are not negotiable, but the superintendent\u2019s salary is negotiable, so the superintendent\u2019s salary should be excluded from the population sampled\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "the teacher salaries are not negotiable, but the superintendent\u2019s salary is negotiable, so the superintendent\u2019s salary should be excluded from the population sampled",
      "incorrect_answers": [
        "the sample size is too small compared to the number of employees in the system to give a fair representation",
        "the skewed salary distribution tells us that assumption of normality of the sampled population will not be satisfied",
        "the population standard deviation of the salaries of employees of this county school system is not known"
      ]
    }
  },
  {
    "prompt": "Assume that the masses of chicken eggs are normally distributed with a mean of 45 g and a standard deviation of 4 g. What mass of egg would be the 25th percentile of the masses of all the eggs?<|start_2d|>\"42.2 g\" <|split_2d|>\"42.3 g\" <|split_2d|>\"42.4 g\" <|split_2d|>\"42.5 g\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "42.3 g",
      "incorrect_answers": [
        "42.2 g",
        "42.4 g",
        "42.5 g"
      ]
    }
  },
  {
    "prompt": "A recent poll reported that 43% of Americans approve of the job the president is doing, with a margin of sampling error of \u00b13.2% at a 95% level of confidence. Which of these correctly interprets that margin of error?<|start_2d|>\"There is a 3.2% percent chance that the proportion of all Americans who approve of the job the president is doing is not 43%.\" <|split_2d|>\"There is a 95% chance that the proportion of all Americans who approve of the job the president is doing is 43%.\" <|split_2d|>\"About 95% of polls conducted in this way will find that between 39.8% and 46.2% of those sampled approve of the job the president is doing.\" <|split_2d|>\"About 95% of polls conducted in this way will give a sample proportion within 3.2 percentage points of the actual proportion of all Americans who approve of the job the president is doing.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "About 95% of polls conducted in this way will give a sample proportion within 3.2 percentage points of the actual proportion of all Americans who approve of the job the president is doing.",
      "incorrect_answers": [
        "There is a 3.2% percent chance that the proportion of all Americans who approve of the job the president is doing is not 43%.",
        "There is a 95% chance that the proportion of all Americans who approve of the job the president is doing is 43%.",
        "About 95% of polls conducted in this way will find that between 39.8% and 46.2% of those sampled approve of the job the president is doing."
      ]
    }
  },
  {
    "prompt": "What is the probability of a Type II error when a hypothesis test is being conducted at the 10% significance level (\u03b1 = 0.10)?<|start_2d|>\"0.05\" <|split_2d|>\"0.10\" <|split_2d|>\"0.90\" <|split_2d|>\"There is insufficient information to answer this question.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "There is insufficient information to answer this question.",
      "incorrect_answers": [
        "0.05",
        "0.10",
        "0.90"
      ]
    }
  },
  {
    "prompt": "In comparing the life expectancies of two models of refrigerators, the average years before complete breakdown of 10 model A refrigerators is compared with that of 15 model B refrigerators. The 90% confidence interval estimate of the difference is (6, 12). Which of the following is the most reasonable conclusion?<|start_2d|>\"The mean life expectancy of one model is twice that of the other.\" <|split_2d|>\"The mean life expectancy of one model is 6 years, while the mean life expectancy of the other is 12 years.\" <|split_2d|>\"The probability that the life expectancies are different is 0.90.\" <|split_2d|>\"We should be 90% confident that the difference in life expectancies is between 6 and 12 years.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "We should be 90% confident that the difference in life expectancies is between 6 and 12 years.",
      "incorrect_answers": [
        "The mean life expectancy of one model is twice that of the other.",
        "The mean life expectancy of one model is 6 years, while the mean life expectancy of the other is 12 years.",
        "The probability that the life expectancies are different is 0.90."
      ]
    }
  },
  {
    "prompt": "A medicine is known to produce side effects in one in five patients taking it. Suppose a doctor prescribes the medicine to four unrelated patients. What is the probability that none of the patients will develop side effects?<|start_2d|>\"0.8\" <|split_2d|>\"0.4096\" <|split_2d|>\"0.25\" <|split_2d|>\"0.2\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "0.4096",
      "incorrect_answers": [
        "0.8",
        "0.25",
        "0.2"
      ]
    }
  },
  {
    "prompt": "Which of the following is notan advantage of stratified random sampling over simple random sampling?<|start_2d|>\"When done correctly, a stratified random sample is less biased than a simple random sample.\" <|split_2d|>\"When done correctly, a stratified random sampling process has less variability from sample to sample than a simple random sample.\" <|split_2d|>\"When done correctly, a stratified random sample can provide, with a smaller sample size, an estimate that is just as reliable as that of a simple random sample with a larger sample size.\" <|split_2d|>\"A stratified random sample provides information about each stratum in the population as well as an estimate for the population as a whole, and a simple random sample does not.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "When done correctly, a stratified random sample is less biased than a simple random sample.",
      "incorrect_answers": [
        "When done correctly, a stratified random sampling process has less variability from sample to sample than a simple random sample.",
        "When done correctly, a stratified random sample can provide, with a smaller sample size, an estimate that is just as reliable as that of a simple random sample with a larger sample size.",
        "A stratified random sample provides information about each stratum in the population as well as an estimate for the population as a whole, and a simple random sample does not."
      ]
    }
  },
  {
    "prompt": "In the casino game of roulette, there are 38 slots for a ball to drop into when it is rolled around the rim of a revolving wheel: 18 red, 18 black, and 2 green. What is the probability that the first time a ball drops into the red slot is on the 8th trial (in other words, suppose you are betting on red every time-what is the probability of losing 7 straight times before you win the first time)?<|start_2d|>\"0.0278\" <|split_2d|>\"0.0112\" <|split_2d|>\"0.0053\" <|split_2d|>\"0.0101\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "0.0053",
      "incorrect_answers": [
        "0.0278",
        "0.0112",
        "0.0101"
      ]
    }
  },
  {
    "prompt": "A talk show host recently reported that in response to his on-air question, 82% of the more than 2500 e-mail messages received through his publicized address supported the death penalty for anyone convicted of selling drugs to children. What does this show?<|start_2d|>\"The survey is meaningless because of voluntary response bias.\" <|split_2d|>\"No meaningful conclusion is possible without knowing something more about the characteristics of his listeners.\" <|split_2d|>\"The survey would have been more meaningful if he had picked a random sample of the 2500 listeners who responded.\" <|split_2d|>\"The survey would have been more meaningful if he had used a control group.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "The survey is meaningless because of voluntary response bias.",
      "incorrect_answers": [
        "No meaningful conclusion is possible without knowing something more about the characteristics of his listeners.",
        "The survey would have been more meaningful if he had picked a random sample of the 2500 listeners who responded.",
        "The survey would have been more meaningful if he had used a control group."
      ]
    }
  },
  {
    "prompt": "To determine the average number of children living in single-family homes, a researcher picks a simple random sample of 50 such homes. However, even after one follow-up visit the interviewer is unable to make contact with anyone in 8 of these homes. Concerned about nonresponse bias, the researcher picks another simple random sample and instructs the interviewer to keep trying until contact is made with someone in a total of 50 homes. The average number of children is determined to be 1.73. Is this estimate probably too low or too high?<|start_2d|>\"Too low, because of undercoverage bias.\" <|split_2d|>\"Too low, because convenience samples overestimate average results.\" <|split_2d|>\"Too high, because of undercoverage bias.\" <|split_2d|>\"Too high, because convenience samples overestimate average results.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "Too high, because of undercoverage bias.",
      "incorrect_answers": [
        "Too low, because of undercoverage bias.",
        "Too low, because convenience samples overestimate average results.",
        "Too high, because convenience samples overestimate average results."
      ]
    }
  },
  {
    "prompt": "If all the values of a data set are the same, all of the following must equal zero except for which one?<|start_2d|>\"Mean\" <|split_2d|>\"Standard deviation\" <|split_2d|>\"Variance\" <|split_2d|>\"Range\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "Mean",
      "incorrect_answers": [
        "Standard deviation",
        "Variance",
        "Range"
      ]
    }
  },
  {
    "prompt": "A university sent out a survey to a random sample of 120 recent graduates to see how those graduates feel about the education they received. Thirty-two of the graduates did not respond to the survey. Which of these would be the best way to proceed?<|start_2d|>\"Use the 88 who did respond, using 88 as the sample size in the analysis.\" <|split_2d|>\"Use the 88 who did respond, using 120 as the sample size in the analysis.\" <|split_2d|>\"Attempt to contact the 32 who did not respond and get the information from them.\" <|split_2d|>\"Select 32 more graduates at random and use them to replace the nonresponders.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "Attempt to contact the 32 who did not respond and get the information from them.",
      "incorrect_answers": [
        "Use the 88 who did respond, using 88 as the sample size in the analysis.",
        "Use the 88 who did respond, using 120 as the sample size in the analysis.",
        "Select 32 more graduates at random and use them to replace the nonresponders."
      ]
    }
  },
  {
    "prompt": "Given the probabilities P(A) = 0.3 and P(B) = 0.2, what is the probability of the union P(A \u222a B) if A and B are mutually exclusive? If A and B are independent? If B is a subset of A?<|start_2d|>\"0.44, 0.5, 0.2\" <|split_2d|>\".44, 0.5, 0.3\" <|split_2d|>\".5, 0.44, 0.2\" <|split_2d|>\".5, 0.44, 0.3\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": ".5, 0.44, 0.3",
      "incorrect_answers": [
        "0.44, 0.5, 0.2",
        ".44, 0.5, 0.3",
        ".5, 0.44, 0.2"
      ]
    }
  },
  {
    "prompt": "Which of the following is true about any discrete probability distribution of a random variable X?<|start_2d|>\"The expected value of X = np.\" <|split_2d|>\"The sum of all possible values of X is equal to one.\" <|split_2d|>\"The probabilities of all possible values of X must add up to one.\" <|split_2d|>\"The probability distribution is bell-shaped and symmetric.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "The probabilities of all possible values of X must add up to one.",
      "incorrect_answers": [
        "The expected value of X = np.",
        "The sum of all possible values of X is equal to one.",
        "The probability distribution is bell-shaped and symmetric."
      ]
    }
  },
  {
    "prompt": "Which among the following would result in the narrowest confidence interval?<|start_2d|>\"Small sample size and 95% confidence\" <|split_2d|>\"Small sample size and 99% confidence\" <|split_2d|>\"Large sample size and 95% confidence\" <|split_2d|>\"Large sample size and 99% confidence\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "Large sample size and 95% confidence",
      "incorrect_answers": [
        "Small sample size and 95% confidence",
        "Small sample size and 99% confidence",
        "Large sample size and 99% confidence"
      ]
    }
  },
  {
    "prompt": "The correlation between height in inches and weight in pounds for a particular class is 0.65. If the heights are converted from inches to centimeters, what will the correlation be? (1 in. = 2.54 cm)<|start_2d|>\"-0.65\" <|split_2d|>\"-0.26\" <|split_2d|>\"0.1\" <|split_2d|>\"0.65\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "0.65",
      "incorrect_answers": [
        "-0.65",
        "-0.26",
        "0.1"
      ]
    }
  },
  {
    "prompt": "The number of days it takes to build a new house has a variance of 386. A sample of 40 new homes shows an average building time of 83 days. With what confidence can we assert that the average building time for a new house is between 80 and 90 days?<|start_2d|>\"15.4%\" <|split_2d|>\"17.8%\" <|split_2d|>\"20.0%\" <|split_2d|>\"82.1%\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "82.1%",
      "incorrect_answers": [
        "15.4%",
        "17.8%",
        "20.0%"
      ]
    }
  },
  {
    "prompt": "A significance test of the hypothesis H0: p= 0.3 against the alternative HA: p> 0.3 found a value of = 0.35 for a random sample of size 95. What is the P-value of this test?<|start_2d|>\"1.06\" <|split_2d|>\"0.1446\" <|split_2d|>\"0.2275\" <|split_2d|>\"0.8554\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "0.1446",
      "incorrect_answers": [
        "1.06",
        "0.2275",
        "0.8554"
      ]
    }
  },
  {
    "prompt": "Two events A and B each have a nonzero probability. If A and B are independent, which of the following statements is true?<|start_2d|>\"P(A and B) = P(A) \u00b7 P(B)\" <|split_2d|>\"A and B may or may not be mutually exclusive.\" <|split_2d|>\"A and B must be mutually exclusive.\" <|split_2d|>\"P(A| B) = P(B| A)\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "P(A and B) = P(A) \u00b7 P(B)",
      "incorrect_answers": [
        "A and B may or may not be mutually exclusive.",
        "A and B must be mutually exclusive.",
        "P(A| B) = P(B| A)"
      ]
    }
  },
  {
    "prompt": "A teacher believes that giving her students a practice quiz every week will motivate them to study harder, leading to a greater overall understanding of the course material. She tries this technique for a year, and everyone in the class achieves a grade of at least C. Is this an experiment or an observational study?<|start_2d|>\"An experiment, but with no reasonable conclusion possible about cause and effect\" <|split_2d|>\"An experiment, thus making cause and effect a reasonable conclusion\" <|split_2d|>\"An observational study, because there was no use of a control group\" <|split_2d|>\"An observational study, but a poorly designed one because randomization was not used\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "An experiment, but with no reasonable conclusion possible about cause and effect",
      "incorrect_answers": [
        "An experiment, thus making cause and effect a reasonable conclusion",
        "An observational study, because there was no use of a control group",
        "An observational study, but a poorly designed one because randomization was not used"
      ]
    }
  },
  {
    "prompt": "Which of these is the best description of a P-value?<|start_2d|>\"The probability of making a Type I error.\" <|split_2d|>\"The probability of making a Type II error.\" <|split_2d|>\"The probability of rejecting the null hypothesis if it is, in fact, false.\" <|split_2d|>\"The probability of getting a test statistic at least as extreme as the observed test statistic, if the null hypothesis is true.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "The probability of getting a test statistic at least as extreme as the observed test statistic, if the null hypothesis is true.",
      "incorrect_answers": [
        "The probability of making a Type I error.",
        "The probability of making a Type II error.",
        "The probability of rejecting the null hypothesis if it is, in fact, false."
      ]
    }
  },
  {
    "prompt": "The financial aid office at a state university conducts a study to determine the total student costs per semester. All students are charged $4500 for tuition. The mean cost for books is $350 with a standard deviation of $65. The mean outlay for room and board is $2800 with a standard deviation of $380. The mean personal expenditure is $675 with a standard deviation of $125. Assuming independence among categories, what is the standard deviation of the total student costs?<|start_2d|>\"$24\" <|split_2d|>\"$91\" <|split_2d|>\"$190\" <|split_2d|>\"$405\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "$405",
      "incorrect_answers": [
        "$24",
        "$91",
        "$190"
      ]
    }
  },
  {
    "prompt": "Self-efficacy (the belief that one has control over one's situation) as it related to job satisfaction was studied. When a group of teachers rated their ability to control their situation and their satisfaction with their job, the two variables had a correlation of 0.30. Which statement follows from this correlation?<|start_2d|>\"If you want teachers to be happy with their job, give them more control over their situation.\" <|split_2d|>\"If you want teachers to take more control over their situation, make them happier at their jobs.\" <|split_2d|>\"Teachers in the study who were more satisfied with their job were less confident in their ability to control their situation.\" <|split_2d|>\"9% of the variability in job satisfaction can be explained by the linear model with self-efficacy as a predictor.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "9% of the variability in job satisfaction can be explained by the linear model with self-efficacy as a predictor.",
      "incorrect_answers": [
        "If you want teachers to be happy with their job, give them more control over their situation.",
        "If you want teachers to take more control over their situation, make them happier at their jobs.",
        "Teachers in the study who were more satisfied with their job were less confident in their ability to control their situation."
      ]
    }
  },
  {
    "prompt": "Suppose X and Y are random variables with E(X) = 312, var(X) = 6, E(X) = 307, and var(Y) = 8. What are the expected value and variance of the random variable X + Y?<|start_2d|>\"E(X + Y) = 619, var(X + Y) = 7\" <|split_2d|>\"E(X + Y) = 619, var(X + Y) = 10\" <|split_2d|>\"E(X + Y) = 619, var(X + Y) = 14\" <|split_2d|>\"There is insufficient information to answer this question.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "There is insufficient information to answer this question.",
      "incorrect_answers": [
        "E(X + Y) = 619, var(X + Y) = 7",
        "E(X + Y) = 619, var(X + Y) = 10",
        "E(X + Y) = 619, var(X + Y) = 14"
      ]
    }
  },
  {
    "prompt": "Consider the two sets X = {10, 30, 45, 50, 55, 70, 90} and Y = {10, 30, 35, 50, 65, 70, 90}. Which of the following is false?<|start_2d|>\"The sets have identical medians.\" <|split_2d|>\"The sets have identical means.\" <|split_2d|>\"The sets have identical ranges.\" <|split_2d|>\"None of the above are false.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "None of the above are false.",
      "incorrect_answers": [
        "The sets have identical medians.",
        "The sets have identical means.",
        "The sets have identical ranges."
      ]
    }
  },
  {
    "prompt": "Which of the following is a true statement?<|start_2d|>\"In a well-designed, well-conducted sample survey, sampling error is effectively eliminated.\" <|split_2d|>\"In a well-designed observational study, responses are influenced through an orderly, carefully planned procedure during the collection of data.\" <|split_2d|>\"In a well-designed experiment, the treatments are carefully planned to result in responses that are as similar as possible.\" <|split_2d|>\"None of the above is a true statement.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "None of the above is a true statement.",
      "incorrect_answers": [
        "In a well-designed, well-conducted sample survey, sampling error is effectively eliminated.",
        "In a well-designed observational study, responses are influenced through an orderly, carefully planned procedure during the collection of data.",
        "In a well-designed experiment, the treatments are carefully planned to result in responses that are as similar as possible."
      ]
    }
  },
  {
    "prompt": "A researcher is hoping to find a predictive linear relationship between the explanatory and response variables in her study. Accordingly, as part of her analysis she plans to generate a 95% confidence interval for the slope of the regression line for the two variables. The interval is determined to be (0.45, 0.80). Which of the following is (are) true? (Assume conditions for inference are met.) I. She has good evidence of a linear relationship between the variables. II. It is likely that there is a non-zero correlation (r) between the two variables. III. It is likely that the true slope of the regression line is 0.<|start_2d|>\"I and II only\" <|split_2d|>\"I and III only\" <|split_2d|>\"II and III only\" <|split_2d|>\"I only\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "I and II only",
      "incorrect_answers": [
        "I and III only",
        "II and III only",
        "I only"
      ]
    }
  },
  {
    "prompt": "A student organization at a university is interested in estimating the proportion of students in favor of showing movies biweekly instead of monthly. How many students should be sampled to get a 90 percent confidence interval with a width of at most 0.08?<|start_2d|>\"27\" <|split_2d|>\"64\" <|split_2d|>\"106\" <|split_2d|>\"423\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "423",
      "incorrect_answers": [
        "27",
        "64",
        "106"
      ]
    }
  },
  {
    "prompt": "A correlation of 0.6 indicates that the percentage of variation in y that is explained by the variation in x is how many times the percentage indicated by a correlation of 0.3?<|start_2d|>\"2\" <|split_2d|>\"3\" <|split_2d|>\"4\" <|split_2d|>\"6\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "4",
      "incorrect_answers": [
        "2",
        "3",
        "6"
      ]
    }
  },
  {
    "prompt": "For which of the following is it appropriate to use a census?<|start_2d|>\"A 95% confidence interval of mean height of teachers in a small town.\" <|split_2d|>\"A 95% confidence interval of the proportion of students in a small town who are taking some AP class.\" <|split_2d|>\"A two-tailed hypothesis test where the null hypothesis was that the mean expenditure on entertainment by male students at a high school is the same as that of female students.\" <|split_2d|>\"None of the above.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "None of the above.",
      "incorrect_answers": [
        "A 95% confidence interval of mean height of teachers in a small town.",
        "A 95% confidence interval of the proportion of students in a small town who are taking some AP class.",
        "A two-tailed hypothesis test where the null hypothesis was that the mean expenditure on entertainment by male students at a high school is the same as that of female students."
      ]
    }
  },
  {
    "prompt": "Which of the following is the best description of the power of a significance test?<|start_2d|>\"The probability that the null hypothesis is true.\" <|split_2d|>\"The probability of getting a Type I error.\" <|split_2d|>\"The probability of getting a Type II error.\" <|split_2d|>\"The probability of rejecting the null hypothesis if it is, in fact, false.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "The probability of rejecting the null hypothesis if it is, in fact, false.",
      "incorrect_answers": [
        "The probability that the null hypothesis is true.",
        "The probability of getting a Type I error.",
        "The probability of getting a Type II error."
      ]
    }
  },
  {
    "prompt": "A drug company will conduct a randomized controlled study on the effectiveness of a new heart disease medication called Heartaid. Heartaid is more expensive than the currently used medication. The analysis will include a significance test with H0: Heartaid and the current medication are equally effective at preventing heart disease and HA: Heartaid is more effective than the current medication at preventing heart disease. Which of these would be a potential consequence of a Type II error?<|start_2d|>\"Patients will spend more money on Heartaid, even though it is actually not any more effective than the current medication.\" <|split_2d|>\"Doctors will begin to prescribe Heartaid to patients, even though it is actually not any more effective than the current medication.\" <|split_2d|>\"Patients will continue to use the current medication, even though Heartaid is actually more effective.\" <|split_2d|>\"Researchers will calculate the wrong P-value, making their advice to doctors invalid.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "Patients will continue to use the current medication, even though Heartaid is actually more effective.",
      "incorrect_answers": [
        "Patients will spend more money on Heartaid, even though it is actually not any more effective than the current medication.",
        "Doctors will begin to prescribe Heartaid to patients, even though it is actually not any more effective than the current medication.",
        "Researchers will calculate the wrong P-value, making their advice to doctors invalid."
      ]
    }
  },
  {
    "prompt": "Two classes take the same exam. Suppose a certain score is at the 40th percentile for the first class and at the 80th percentile for the second class. Which of the following is the most reasonable conclusion?<|start_2d|>\"Students in the first class generally scored higher than students in the second class.\" <|split_2d|>\"Students in the second class generally scored higher than students in the first class.\" <|split_2d|>\"A score at the 20th percentile for the first class is at the 40th percentile for the second class.\" <|split_2d|>\"A score at the 50th percentile for the first class is at the 90th percentile for the second class.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "Students in the first class generally scored higher than students in the second class.",
      "incorrect_answers": [
        "Students in the second class generally scored higher than students in the first class.",
        "A score at the 20th percentile for the first class is at the 40th percentile for the second class.",
        "A score at the 50th percentile for the first class is at the 90th percentile for the second class."
      ]
    }
  },
  {
    "prompt": "The mean thrust of a certain model jet engine is 9500 pounds. Concerned that a production process change might have lowered the thrust, an inspector tests a sample of units, calculating a mean of 9350 pounds with a z-score of -2.46 and a P-value of 0.0069. Which of the following is the most reasonable conclusion?<|start_2d|>\"99.31% of the engines produced under the new process will have a thrust under 9350 pounds.\" <|split_2d|>\"99.31% of the engines produced under the new process will have a thrust under 9500 pounds.\" <|split_2d|>\"0.69% of the time an engine produced under the new process will have a thrust over 9500 pounds.\" <|split_2d|>\"There is evidence to conclude that the new process is producing engines with a mean thrust under 9500 pounds.\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "There is evidence to conclude that the new process is producing engines with a mean thrust under 9500 pounds.",
      "incorrect_answers": [
        "99.31% of the engines produced under the new process will have a thrust under 9350 pounds.",
        "99.31% of the engines produced under the new process will have a thrust under 9500 pounds.",
        "0.69% of the time an engine produced under the new process will have a thrust over 9500 pounds."
      ]
    }
  },
  {
    "prompt": "When the point (15,2) is included, the slope of regression line (y= a+ bx) is b= -0.54. The correlation is r= -0.82. When the point is removed, the new slope is -1.04 and the new correlation coefficient is -0.95. What name is given to a point whose removal has this kind of effect on statistical calculations?<|start_2d|>\"Outlier\" <|split_2d|>\"Statistically significant point\" <|split_2d|>\"Point of discontinuity\" <|split_2d|>\"Influential point\" <|end_2d|>Answer: ",
    "prompt_metadata": {
      "label": "Influential point",
      "incorrect_answers": [
        "Outlier",
        "Statistically significant point",
        "Point of discontinuity"
      ]
    }
  }
]