{
  "query": "Please explain survival analysis.",
  "query-id": 38,
  "gold_response": "Survival analysis: theory, core formulas, and major methodologies\n\nPurpose\n- Analyze a nonnegative time-to-event variable T in the presence of censoring (often right-censoring), and relate event timing to covariates.\n\nCore objects and identities\n- Survival function: S(t) = P(T > t).\n- Distribution and density: F(t) = P(T ≤ t), f(t) = dF(t)/dt = -dS(t)/dt.\n- Hazard function (instantaneous event rate): h(t) = lim_{Δt→0} P(t ≤ T < t+Δt | T ≥ t)/Δt.\n- Cumulative hazard: H(t) = ∫_0^t h(u) du.\n- Fundamental relationships: h(t) = f(t)/S(t), f(t) = h(t) S(t), and S(t) = exp[−H(t)].\n\nCensoring and basic assumptions\n- Right-censoring: observe (t_i, δ_i, x_i), where δ_i = 1 if event, 0 if censored at time t_i.\n- Independent/non-informative censoring: C ⫫ T | X (intuitively, conditional on covariates, the censoring mechanism carries no extra information about failure time). This underpins consistency of the estimators below.\n\n1) Nonparametric survival: Kaplan–Meier (product-limit)\n- Goal: Estimate S(t) without specifying h0(t) or a distribution for T.\n- Estimator: Ŝ(t) = Π_{t_i ≤ t} (1 − d_i/n_i) = Π_{t_i ≤ t} (n_i − d_i)/n_i,\n  where at each distinct event time t_i, d_i is the number of events and n_i is the number at risk just before t_i.\n- Interpretation: Ŝ(t) is a decreasing step function; handles right-censoring naturally via the risk set.\n- Assumptions: independent right-censoring; events occur at observed times; no model for h(t) is imposed.\n- Related identity: One may estimate H(t) and recover S(t) via S(t) = exp[−H(t)].\n- Classic paper: Kaplan, E. L., and Meier, P. (1958), Nonparametric estimation from incomplete observations, JASA.\n\n2) Semiparametric regression: Cox proportional hazards (PH)\n- Model: h(t | x) = h0(t) exp(x^T β). Baseline hazard h0(t) is unspecified (nonparametric); covariate effects exp(x^T β) are parametric.\n- Proportional hazards assumption: For two covariate vectors x_a, x_b, the hazard ratio is HR = h(t|x_a)/h(t|x_b) = exp((x_a − x_b)^T β), constant in t.\n- Interpretation: exp(β_j) is the hazard ratio for a one-unit increase in covariate j (holding others fixed).\n- Estimation by partial likelihood (no need to specify h0):\n  L_p(β) = ∏_{i: δ_i=1} [ exp(x_i^T β) / Σ_{j ∈ R(t_i)} exp(x_j^T β) ],\n  ℓ_p(β) = Σ_{i: δ_i=1} [ x_i^T β − log{ Σ_{j ∈ R(t_i)} exp(x_j^T β) } ].\n  Here R(t_i) is the risk set just before time t_i. Ties are commonly handled by Breslow or Efron approximations.\n- Baseline and survival: After β̂, estimate H0(t) via Breslow: Ĥ0(t) = Σ_{t_i ≤ t} d_i / Σ_{j ∈ R(t_i)} exp(x_j^T β̂), and S(t|x) = exp{−Ĥ0(t) exp(x^T β̂)}.\n- Assumptions: independent right-censoring; correct log-linear effect on the hazard; proportional hazards (time-constant HRs). Time-varying effects require extensions (e.g., interactions with time).\n- Classic paper: Cox, D. R. (1972), Regression models and life-tables, JRSS B.\n\n3) Accelerated failure time (AFT) models\n- Model (log-linear on survival time): log T = x^T β + σ ε, with ε following a specified distribution (e.g., extreme value → Weibull AFT, normal → log-normal, logistic → log-logistic).\n- Acceleration factor: For covariate j, AF_j = exp(β_j). A one-unit increase in X_j multiplies the typical survival time by AF_j (direct time ratio interpretation). Unlike PH, the HR generally varies over t.\n- Survival re-expression: T|x = exp(x^T β) T0 ⇒ S(t|x) = S0(t exp(−x^T β)).\n- Estimation: usually maximum likelihood under the specified ε distribution; rank-based methods (e.g., Buckley–James) are alternatives.\n- Assumptions: correct error distribution; independent right-censoring.\n- Classic paper: Wei, L. J. (1992), The accelerated failure time model: a useful alternative to the Cox regression model in survival analysis, Statistics in Medicine.\n\n4) Fully parametric survival models\n- Specify a full distribution for T and (optionally) embed covariates via PH or AFT links. Likelihood for censored data:\n  ℓ(θ) = Σ_i [ δ_i log f(t_i | x_i, θ) + (1 − δ_i) log S(t_i | x_i, θ) ].\n- Example (Weibull, with scale λ > 0 and shape κ > 0):\n  h(t) = λ κ t^{κ−1},  H(t) = λ t^κ,  S(t) = exp(−λ t^κ),  f(t) = h(t) S(t).\n  • κ = 1 reduces to Exponential (constant hazard).  κ > 1 gives increasing hazard; κ < 1 decreasing hazard.\n  • With covariates, Weibull supports both PH [h(t|x) = h0(t) exp(x^T β)] and AFT [log T = x^T β + σ ε] parameterizations.\n- Other common choices: Gompertz (monotone hazard), log-normal (non-monotone hazard), log-logistic (allowing bathtub/peaked hazards).\n- Assumptions: correct parametric form; independent right-censoring. When correct, parametric models are efficient and enable direct time and quantile predictions.\n- Classic paper: Weibull, W. (1951), A statistical distribution function of wide applicability, Journal of Applied Mechanics.\n\nNotes on diagnostics and extensions (brief)\n- PH assessment: Schoenfeld residuals and log(−log Ŝ(t)) vs log t plots.\n- AFT fit checks: QQ-plots of log T against model-based quantiles.\n- Extensions: time-dependent covariates, competing risks, multi-state models, frailty (random effects), interval censoring.\n\nSummary\n- The core calculus is built on S(t), f(t), h(t), H(t), with S(t) = exp[−H(t)].\n- Methods span: nonparametric estimation of S(t) (Kaplan–Meier), semiparametric regression on hazards (Cox PH via partial likelihood), parametric time-scale regression (AFT), and fully parametric likelihoods (e.g., Weibull). Each method’s assumptions and interpretations (HR vs AF) guide appropriate choice and inference.",
  "gold_information": [
    "Survival analysis examines nonnegative time-to-event outcomes with censoring and covariate effects.",
    "The survival function S(t) equals P(T > t).",
    "The distribution function F(t) equals P(T ≤ t).",
    "The density satisfies f(t) = -dS(t)/dt.",
    "The hazard function h(t) is the instantaneous event rate conditional on survival to time t.",
    "The cumulative hazard H(t) equals ∫ h(u) du from 0 to t.",
    "Key identities include h(t) = f(t)/S(t), f(t) = h(t) S(t), and S(t) = exp(−H(t)).",
    "Right-censoring records a time and an event indicator for each subject.",
    "Independent censoring assumes the censoring mechanism gives no extra information about failure time given covariates.",
    "The product-limit estimator provides a nonparametric estimate of the survival function.",
    "The product-limit estimator is a decreasing step function that naturally handles right-censoring via risk sets.",
    "A proportional hazards model specifies h(t|x) = h0(t) exp(x^Tβ).",
    "In a proportional hazards model, hazard ratios are constant over time.",
    "Coefficients in a proportional hazards model are estimated by partial likelihood using risk sets at event times.",
    "The baseline cumulative hazard can be estimated and used to compute predicted survival S(t|x) = exp{−Ĥ0(t) exp(x^Tβ̂)}.",
    "Tied event times in partial likelihood can be handled by common approximations.",
    "The proportional hazards approach assumes independent censoring and a correct log-linear covariate effect on the hazard.",
    "An accelerated failure time model specifies log T = x^Tβ + σε with a chosen error distribution.",
    "In an accelerated failure time model, exp(β_j) scales typical survival time as a time ratio.",
    "Under an accelerated failure time model, S(t|x) = S0(t exp(−x^Tβ)).",
    "Accelerated failure time models are estimated by maximum likelihood or rank-based methods.",
    "Fully parametric survival models specify a full distribution for event times and use the censored-data likelihood.",
    "A two-parameter model with h(t) = λ κ t^{κ−1} implies H(t) = λ t^κ and S(t) = exp(−λ t^κ).",
    "A shape parameter greater than one gives an increasing hazard, and a shape parameter less than one gives a decreasing hazard.",
    "Parametric models can incorporate covariates under proportional hazards or accelerated failure time parameterizations.",
    "Parametric models are efficient when correctly specified and enable direct prediction of event times and quantiles.",
    "Diagnostics for proportional hazards include residual checks and log(−log Ŝ(t)) versus log t plots.",
    "Diagnostics for accelerated failure time models include quantile plots of log times against model-based quantiles.",
    "Common extensions include time-dependent covariates, competing risks, multi-state models, frailty terms, and interval censoring.",
    "Method choice depends on assumptions and on whether hazard ratios or time ratios are the target of interpretation."
  ]
}