\section{Related Work} \label{sec:background}

\subsection{Membership Inference for Generative Models}
Membership inference attacks (MIAs) test whether a specific sample was in the training set of a model~\citep{shokri2017membership}. A common observation is that models assign lower loss—\emph{for language-like models, lower negative log-likelihood (NLL)}—to training samples, a pattern often correlated with memorization and overfitting~\citep{yeom2018privacy,carlini2021extracting,carlini2023quantifying}. Signals span \emph{loss-based}, \emph{posterior/threshold}, \emph{feature/activation}, and \emph{robustness}-based attacks under black/gray/white-box assumptions. Beyond loss, robustness-style MIAs posit that member samples require more effort to fool~\citep{choquette2021labelonly,jalalzai2022membership,xue2025imia}, while feature-based attacks train classifiers on intermediate activations~\citep{dealcala2025mint}. These trends indicate the value of domain-specific probes in structured data).

\subsection{Pitfalls in MIA Evaluation}
Likelihood-based MIAs are sensitive to confounders such as sequence length and piece complexity~\citep{watson2021de}. More broadly, several benchmarks exhibit member/non-member distribution shifts from dataset construction; under such shifts, \emph{artifact-aware} non-query (“blind”) baselines can rival or surpass model-query MIAs~\citep{das2025blind}. These observations motivate debiased, controlled evaluations that isolate true membership leakage from artifacts.

\subsection{Domain-Adapted MIAs in Structured Modalities}
For modalities with internal structure, effective MIAs increasingly probe where models memorize, in particular components or token subsets, rather than averaging signals across entire sequences. Diffusion-model MIAs and audits tailor probes to generative trajectories and noise schedules~\citep{duan2023diffusion,matsumoto2023diffusion}; vision–language attacks adapt to cross-modal heads and alignment mechanisms~\citep{li2024vlm}. These precedents reinforce moving beyond generic sequence averages when the data/model structure is explicit. Symbolic music, with hierarchical grammar and structure tokens, is a clear case for such domain-adapted analysis.

\subsection{Positioning}
Applications of MIAs to symbolic music remain limited~\citep{hildt2023privacy}. Generic LM MIAs overlook hierarchical structure and are sensitive to length and \emph{event density} (events per bar), which can inflate results. We study a structure-aware approach that targets structural tokens, aggregates tail-of-loss cues, and evaluates under confounder controls (length and event density); see §\ref{sec:method}. Checks across REMI and ABC representations provide evidence for robustness under representation changes.