Awesome — here’s a ready-to-paste “Cursor writing task prompt” that will guide section-by-section drafting of your AAAI Workshop paper in English LaTeX. It includes a global master brief plus detailed prompts for every section, with objectives, sentence-by-sentence guidance, must-include items, LaTeX cues, placeholders, and acceptance checks.

⸻

Master Prompt (use this once at the top of the Cursor session)

Role & Target
	•	You are a senior academic writing agent.
	•	Venue: AAAI Workshop (EAIM 2026).
	•	Language/format: English, LaTeX (use \citep{} with natbib, tables with booktabs, figures via \includegraphics, math via amsmath).
	•	Paper topic: TS-RaMIA — Time- and Structure-Range Membership Inference Attacks for symbolic music generation.

Tone & Style
	•	Clear, precise, technical. Active voice. Avoid hype.
	•	Prefer short paragraphs (3–6 sentences).
	•	Numbers and claims must map to macros or TODO placeholders (no fabrication).

Global LaTeX Rules
	•	Do not invent references. Use placeholder keys (e.g., \citep{shokri2017membership}) and add \todo{Add bib} if unknown.
	•	Define reusable macros for key results at the top of main.tex:

% ==== global numbers (edit in one place) ====
\newcommand{\aucMain}{0.826}
\newcommand{\tprOne}{14.6\%}
\newcommand{\aucT52}{0.780}
\newcommand{\aucDebiased}{0.563}
\newcommand{\aucNotaGen}{0.81} % PROVISIONAL — replace after runs
\newcommand{\tprNotaGen}{12--15\%} % PROVISIONAL


	•	Refer to figures/tables as Figure~\ref{fig:...} / Table~\ref{tab:...}.
	•	Use \label{} right after \caption{}.
	•	Insert explicit TODO comments for missing results (e.g., NotaGen) like:

% TODO(Notagen): replace provisional numbers after runs complete.



Figures & Tables (filenames / placeholders)
	•	fig/tsramia_framework.pdf — overall attack pipeline (Method).
	•	fig/ckpt_auc_curve.png — checkpoint risk curve (Results).
	•	fig/roc_main.png and fig/roc_lowfpr.png — ROC overall & low-FPR zoom (Results).
	•	tab/main_results.tex — main table (AUC, low-FPR).
	•	tab/ablations.tex — top-k, windowing, calibration ablations.
	•	tab/notagen_results.tex — provisional cross-model results.

Section Workflow
For each section below:
	1.	Follow the Goal and Outline exactly.
	2.	Use the Sentence Plan as guidance (you may merge sentences if smoother).
	3.	Add required Citations/Artifacts.
	4.	End with the Acceptance Checklist as LaTeX comments to self-verify.

⸻

Section-by-Section Prompts

1) Abstract (≤ 180–220 words)

Goal: One concise paragraph that states the problem, gap, our idea (TS-RaMIA), core mechanisms (structural axis + tail-of-top-k + debiasing + meta-fusion), headline results, and takeaway.

Sentence Plan
	1.	Context: MIA for generative models is under-explored in symbolic music; unique structure & token families create leak channels.
	2.	Gap: Existing likelihood probes are confounded (length/density) and overlook structural tokens (bar/position/tempo).
	3.	Our idea: TS-RaMIA — a simple, reproducible attack exploiting time & structure ranges, computed from sample-level NLLs.
	4.	Mechanisms: (i) debias (length matching + conditional calibration), (ii) amplify sparse memorization via tail-of-top-k on structural tokens, (iii) meta-attacker to fuse a few cues.
	5.	Main results (Model A): report AUC \aucMain, TPR@1%FPR \tprOne, and that baseline debiased AUC is \aucDebiased.
	6.	Cross-representation (NotaGen, provisional): trends mirror Model A (AUC \aucNotaGen, TPR \tprNotaGen) — will be replaced post-run.
	7.	Takeaway: TS-RaMIA is simple, debiased, and transfers across models/representations; we release a complete protocol.

Citations: None in abstract.

Acceptance Checklist (as comments)

% [Abs-Check]
% - Mentions problem, gap, method, mechanisms, numbers, transfer, release.
% - No citations. ≈180–220 words. Uses macros for numbers. NotaGen marked provisional.


⸻

2) Introduction (¾–1.5 pages)

Relation to Abstract: Expands context and motivation; positions contributions; previews results; states threat model at a high level.

Outline (4–6 paragraphs)
	•	P1 (Context & Stakes): LLMs for symbolic music; privacy risks; why MIAs matter in creative content. Cite MIA classics and symbolic music modeling papers.
	•	P2 (Unique Challenge): Symbolic structure (bars, positions, tempo) + multiple token families; typical LM MIAs miss this; confounding (length/density) inflates metrics.
	•	P3 (Our Approach): TS-RaMIA overview: structural axis, sample-level NLL, debiasing, tail-of-top-k, meta-fusion; simple & reproducible.
	•	P4 (Contributions): Bullet list of 3–5 concrete contributions (structural axis insight; debiasing protocol; tail-of-top-k; meta-attacker; checkpoint-risk analysis; cross-representation NotaGen).
	•	P5 (Headline Results): Macro numbers: AUC \aucMain, TPR@1% \tprOne; debiased baseline \aucDebiased; NotaGen provisional trend.
	•	P6 (Roadmap): Brief section map.

Citations: \citep{shokri2017membership,yeom2018privacy,carlini2021extracting} for MIA; \citep{huang2018musictransformer,huang2020remi,hawthorne2019maestro} for symbolic models.
If you lack exact bib, add \todo{Add bib entry or replace}.

Figure? Don’t place the framework figure here (space is tight). Add in Method.

Acceptance Checklist

% [Intro-Check]
% - Problem importance, unique symbolic challenges, confounding stated.
% - Clear contributions bullets; headline numbers via macros.
% - Cites MIA + symbolic generation; framework fig deferred to Method.


⸻

3) Background & Related Work (0.5–1 page)

Goal: Define MIA notions; summarize symbolic representations and models; position our approach vs likelihood MIAs and structural modeling.

Outline
	•	B1 (MIA basics): Threat model flavors; why per-sample NLL is used for LMs.
	•	B2 (Symbolic music): REMI vs ABC; structural tokens significance (bars/positions/tempo).
	•	B3 (Generative models): Transformer LMs; hierarchical ABC models (NotaGen).
	•	B4 (Positioning): Likelihood MIAs and confounders; where TS-RaMIA differs (structural axis + debiasing + tail-top-k).

Citations: As above plus any LM-MIA for text LMs. Add TODO for missing items.

Acceptance Checklist

% [Bg-Check]
% - Clear definitions; concise survey; sets stage for our structural axis.
% - No method details yet (save for next section).


⸻

4) Threat Model & Problem Formulation (0.5 page)

Goal: Precisely specify attacker knowledge/access and the statistical decision problem.

Outline
	•	T1 (Attacker access): Black/gray-box likelihood via teacher forcing; no gradients; model logits accessible for NLL.
	•	T2 (Decision task): Given piece x, decide member vs non-member; score s(x) monotone in membership likelihood.
	•	T3 (Evaluation): ROC-AUC, TPR@FPR \in\{1\%,5\%,10\%\}, partial AUC in [0,1]\% FPR; DeLong CIs.

Acceptance Checklist

% [Threat-Check]
% - Clear access assumptions; no hidden labels; fair evaluation metrics defined.


⸻

5) Method: TS-RaMIA (1.5–2 pages)

Goal: Formalize structural masking, sample-level NLL, debiasing, tail-of-top-k, and meta-attacker. Include a framework figure.

Outline
	•	M1 (Overview + Fig): One paragraph referencing Figure~\ref{fig:tsramia_framework} (pipeline: tokenization→mask→sample-NLL→debias→tail-k→fusion).
	•	M2 (Structural mask): Define masks for REMI (Bar/Position/Tempo) and ABC (barlines, meter, key, tempo, repeats, line breaks). Mention unit tests (100% header removal; structure tagging).
	•	M3 (Sample-level NLL): Teacher-forcing; exclude chunk-leading tokens; per-token loss \ell_t and masked sum/mean.
	•	M4 (Debiasing): Length matching on n_{\text{struct}}; conditional calibration via residualizing against \log n_{\text{struct}}. Explain why batch-level PPL is flawed (Jensen).
	•	M5 (Tail-of-Top-k): Sort masked \ell_t; take mean of largest k\in\{32,64,128\}; rationale: sparse memorization pockets. (Avoid reverse-order Δ; empirically weak.)
	•	M6 (Meta-attacker): 9-feature set (top-k and windowed p95 variants); logistic regression, \texttt{class_weight=balanced}, composer-stratified 5-fold CV; calibration to member-likelihood.
	•	M7 (Checkpoint scan): Explain procedure for T46 (epoch-wise risk).

Artifacts
	•	Insert \begin{figure} for fig/tsramia_framework.pdf with a clear caption.
	•	Provide 1–2 short equations (loss, top-k).

Acceptance Checklist

% [Method-Check]
% - Figure included and referenced; all five components defined.
% - Equations for per-token loss and top-k score included.
% - Rationale for choices (why structural, why top-k) explained.


⸻

6) Experimental Setup (0.75–1 page)

Goal: Provide enough detail to reproduce.

Outline
	•	E1 (Data): Maestro-like corpus; piece counts; chunking (REMI: 1024; ABC: character chunks); exclusion of head tokens from loss.
	•	E2 (Models): Model A: ~67M, from scratch; training epochs; tokenizer. Model B (NotaGen): ABC, hierarchical GPT-2; evaluation via char-chunks, with provisional placeholders for results.
	•	E3 (Splits): train/val/test; composer-stratified CV for meta-attacker; fixed seeds.
	•	E4 (Metrics & Stats): AUC, TPR@FPR, partial AUC; DeLong CI; three views (raw, length-matched, conditionally calibrated).

Acceptance Checklist

% [Setup-Check]
% - Concrete numbers for data/splits; model specs; CV protocol; metrics; stats.
% - NotaGen marked provisional where needed.


⸻

7) Results (1–1.5 pages)

Goal: Present main table and ROCs; emphasize low-FPR gains and debiasing.

Outline
	•	R1 (Main table): Table~\ref{tab:main} summarizing Baseline vs T52 vs T53v2 (AUC, TPR@1,5,10%); both raw and debiased columns. Use macros for Model A; provisional row for NotaGen with TODO note.
	•	R2 (ROC): fig/roc_main.png and fig/roc_lowfpr.png; discuss low-FPR improvement (meta-attacker lifts TPR@1% from ~1.5% to \tprOne).
	•	R3 (Debiasing impact): Baseline drops to \aucDebiased under length matching; calibration stabilizes but does not create signal.
	•	R4 (Checkpoint scan): Reference fig/ckpt_auc_curve.png—AUC rises with training progress.

Acceptance Checklist

% [Results-Check]
% - Main table + ROC figures + ckpt curve referenced.
% - Text highlights low-FPR and debiasing impacts.


⸻

8) Cross-Representation Study: NotaGen (0.75 page)

Goal: Describe ABC pipeline and present provisional parallels to Model A until runs complete.

Outline
	•	N1 (Why NotaGen): Different representation (ABC), hierarchical decoder, pretrained; tests transfer.
	•	N2 (Pipeline): MIDI→MusicXML→ABC; structure mask; char-chunks; evaluation identical to T52 (structural top-k).
	•	N3 (Results): Provide provisional numbers using macros \aucNotaGen, \tprNotaGen; clearly mark as TODO to replace after experiments.
	•	N4 (Takeaway): Trend parity suggests TS-RaMIA transfers across representations.

Acceptance Checklist

% [NotaGen-Check]
% - Clear provisional labeling; pipeline details; same metrics; TODO note.


⸻

9) Ablations & Robustness (0.75 page)

Goal: Show the effect of k, windowing, equal-N, and negative results.

Outline
	•	A1 (Top-k sweep): k∈{32,64,128}; stability vs performance; 64 as default.
	•	A2 (Windowed p95): When helpful vs not; explain dispersion.
	•	A3 (Equal-N): 8 segments per piece; AUC change < 0.01.
	•	A4 (Negative results): Note-only attack fails (AUC ≈ 0.32); EVT fails under small non-member tails (AUC ≈ 0.66).

Artifacts: tab/ablations.tex with means and 95% CIs.

Acceptance Checklist

% [Ablation-Check]
% - Covers top-k, windowing, equal-N, and negatives; cites numbers or Appendix table.


⸻

10) Discussion (0.5 page)

Goal: Interpret mechanism: why structural tokens leak; implications for training/deployment.

Outline
	•	Structural tokens are lattice coordinates (bars/positions/tempo) aligned with phrasing; memorization pockets become high-loss tails.
	•	Debiasing guards against overstated risk; tail-top-k isolates sparse pockets.
	•	Defenses: early stopping, regularization, data augmentation of structural patterns; auditing recommendations.

Acceptance Checklist

% [Disc-Check]
% - Mechanistic argument + practical guidance; no new experiments introduced.


⸻

11) Ethics & Broader Impact (short)

Goal: Acknowledge copyright/privacy concerns; auditing intent; responsible reporting.

Acceptance Checklist

% [Ethics-Check]
% - States research intent; risks; recommends responsible release & auditing.


⸻

12) Limitations & Future Work (short)

Goal: Bound the scope; point to next steps.

Outline
	•	AR focus; non-AR diffusion symbolic models left for future proxies.
	•	EVT limitations under small n; need larger non-member pools.
	•	More models and corpora; stronger cross-domain validation.

⸻

13) Conclusion (short)

Goal: Re-assert contributions and impact in 3–5 sentences.

⸻

14) Reproducibility Checklist (bulleted, or Appendix)

Goal: Concrete artifacts.
	•	Seeds, environment (requirements.txt), Git commit, data SHA sums.
	•	Scripts for scoring, debiasing, meta-attacker, plots.
	•	Three evaluation views (raw/length-matched/calibrated).
	•	Composer-stratified CV details.

⸻

15) Appendix (as needed)

Goal: Extra tables/plots; algorithmic pseudocode; hyperparameters; unit tests for ABC mask; details of checkpoint scan; additional low-FPR curves.

⸻

Final Assembly Instructions (for Cursor)
	1.	Create sections/ directory; write each section into its own *.tex.
	2.	In main.tex, \input{sections/abstract.tex} etc in the expected order.
	3.	Add macros block near top of main.tex and only change numbers there.
	4.	Place figures/tables in fig/ and tab/, ensure paths compile.
	5.	Insert % TODO(Notagen) comments where provisional content appears.
	6.	Do a pass to ensure every claim is supported by a citation, figure, table, or macro value.
	7.	Keep total length appropriate for AAAI Workshop (typically 8 pages PMLR format including figures; verify call).

⸻
