\section{Methodology}

\subsection{Framework Overview}

\begin{wrapfigure}{r}{0.45\textwidth}
    \centering
    \includegraphics[width=0.95\linewidth]{img/cama_system}
    \caption{Overview of the proposed \textbf{\AbbrName (\AbbrFullName)}.}
    \label{fig:cama_system}
\end{wrapfigure}

The proposed \textbf{\AbbrName (\AbbrFullName)} enhances large language models' (LLMs) capacity for culturally aligned and emotionally safe interaction in postpartum mental health contexts.
Rather than serving as a diagnostic model, \AbbrName functions as a \textit{multi-agent adaptation layer} that dynamically adjusts linguistic tone, empathy, and communicative style according to users' dialectal and cultural backgrounds.

The system integrates five specialized agents (\textbf{Cultural Detection}, \textbf{Culture Pack}, \textbf{Response Generation}, \textbf{Co-Design}) arranged in a cascaded pipeline.
Each agent contributes distinct functions: detecting users' linguistic and cultural context, loading corresponding culture packs, generating culturally adapted responses, iteratively refining them through multi-agent feedback, and synthesizing a final empathetic, culturally coherent output.

As illustrated in Figure~\ref{fig:cama_system}, \AbbrName integrates five cooperative agents arranged in a cascaded pipeline:
\begin{enumerate}
    \item the \textbf{Cultural Detection Agent (Figure~\ref{fig:cul_det_agent})}, which analyses user input to infer dialectal, linguistic, and affective cues and identifies the user's probable cultural region;
    \item the \textbf{Culture Pack Agent (Figure~\ref{fig:cul_pack_agent})}, which retrieves and activates a region-specific \textit{Culture Pack} containing idiomatic expressions, tone templates, and communicative norms relevant to the detected culture;
    \item the \textbf{Response Generation Agent}, which produces an initial draft reply conditioned on both the user query and the loaded cultural context;
    \item the \textbf{Co-Design Agent Group}, where five role-inspired agents ( Psychologist, Linguist, Teacher, Mother, and AI Researcher) collaboratively evaluate and refine the generated response through iterative feedback; and
\end{enumerate}

\begin{figure}[t]
    \centering
    \begin{subfigure}[t]{0.48\linewidth}
        \centering
        \includegraphics[width=\linewidth]{img/cul_det_agent}
        \caption{Cultural Detection Agent.}
        \label{fig:cul_det_agent}
    \end{subfigure}\hfill
    \begin{subfigure}[t]{0.48\linewidth}
        \centering
        \includegraphics[width=\linewidth]{img/cul_pack_agent}
        \caption{Cultural Pack Agent.}
        \label{fig:cul_pack_agent}
    \end{subfigure}
    \caption{Overview of the cultural agents in \AbbrName.}
    \label{fig:cultural_agents_overview}
\end{figure}

Through this structured collaboration, \AbbrName transforms conventional LLM dialogue into \textit{culturally aware, empathetic communication}, bridging the gap between model generalization and local specificity.
It establishes a lightweight yet interpretable pathway toward trustworthy, inclusive AI for maternal mental-health support.

\subsection{Cultural Detection Agent}

The agent operates through a two-stage pipeline.
In the first stage, it performs \textit{linguistic grounding} by leveraging few-shot prompting of a large language model (LLM) to identify dialectal markers, region-specific lexical items, and syntactic variations.
This approach enables classification across five Chinese dialect clusters, \NEMcolour{Northeastern Mandarin}, \CANcolour{Cantonese}, \MINcolour{Southern Min}, \CENcolour{Central Plains Mandarin}, and \SWMcolour{Southwestern Mandarin}, without relying on any fine-tuned external encoders.
In the second stage, it infers a cultural context profile from socio-pragmatic cues via zero-shot prompting.

In interactive use, the agent conditions its inference on available dialogue history rather than a single utterance, and the dialect estimate is updated as new turns arrive.
In our user study, the first downstream use of the dialect estimate occurred only after several user turns (typically 3--4), and it was refreshed on each subsequent turn.

The extracted attributes form a structured \textit{cultural context profile}, including tone formality, emotional framing, and implicit value orientation (e.g., collectivism, emotional restraint, humour).
This profile is communicated to the downstream \textit{Culture Pack Agent}, guiding the retrieval of region-specific communicative norms and helping ensure that response generation remains culturally congruent, contextually empathetic, and emotionally safe.

\subsection{Culture Pack Agent}

The \textbf{Culture Pack (CulPack) Agent} serves as a cultural knowledge interface that converts the abstract cultural representations identified by the \textbf{Cultural Detection Agent} into actionable linguistic and pragmatic cues.
These cues directly guide the \textbf{Response Generation Agent}, ensuring that the generated text follows regional discourse norms and culturally grounded communication styles.

Within the \textbf{\AbbrName} framework, we build \textbf{five core Chinese Culture Packs} representing major regional and sociolinguistic communities: \textbf{\NEMcolour{Northeastern Mandarin}, \CANcolour{Cantonese}, \MINcolour{Southern Min}, \CENcolour{Central Plains Mandarin}}, and \textbf{\SWMcolour{Southwestern Mandarin}} (Yunnan-Guizhou-Sichuan) Chinese.
These packs capture not only substantial \textbf{dialectal and lexical variations} but also distinct \textbf{emotional expressions and communicative norms} characteristic of each region.
Concretely, the \textbf{\NEMcolour{Northeastern Mandarin}} pack emphasizes directness and humour, expressing empathy through casual banter; the \textbf{\CANcolour{Cantonese}} pack favors politeness and restraint, showing care through considerate wording; the \textbf{\MINcolour{Southern Min}} pack reflects warmth and family orientation in a gentle tone; the \textbf{\CENcolour{Central Plains Mandarin}} pack highlights sincerity and grounded realism; and the \textbf{\SWMcolour{Southwestern Mandarin}} pack conveys optimism and comfort through relaxed, talkative interaction.

To ensure both representativeness and manageability, each Culture Pack follows a unified structure with four hierarchical layers:
\begin{enumerate}
    \item \textbf{Lexical Layer}: a curated lexicon of dialect-specific vocabulary, colloquial phrases, and regionally preferred expressions.
    \item \textbf{Emotional Idiom Layer}: figurative language, proverbs, and idiomatic expressions that convey culturally nuanced emotional states.
    \item \textbf{Tone Template Layer}: prototypical templates capturing stylistic and emotional response patterns typical of the dialectal community.
    \item \textbf{Pragmatic Rule Layer}: communicative norms governing politeness strategies, indirectness levels, and culturally appropriate emotional disclosures.
\end{enumerate}

Each Culture Pack contains approximately \textbf{110 structured entries} on average (around 70 lexical items, 26 emotional idioms, 4--5 tone templates, and 9 pragmatic rules), totaling roughly \textbf{550 culturally grounded items} across all five dialects (see Table~\ref{tab:culture_packs_overview}). This lightweight yet comprehensive configuration enables efficient few-shot adaptation and prompt integration under low-resource conditions, while providing a stable and interpretable cultural foundation for downstream generation tasks.

Although each Culture Pack represents a distinct linguistic community, real-world users often exhibit \textbf{mixed dialectal and cultural cues}.
To accommodate such cultural blending, the \textbf{CulPack Agent} performs \textbf{embedding-based retrieval and fusion} over all available Culture Packs.
Each pack ($P_i$) is represented by a meta-descriptor vector ($v_i = f_{GTE}(P_i)$), where $f_{GTE}(\cdot)$ denotes the \textbf{GTE embedding model}.
Given a user's \textit{Cultural Context Profile} ($C_u$) from the \textbf{Cultural Detection Agent}, its embedding vector is $v_u = f_{GTE}(C_u)$.
The CulPack Agent computes cosine similarities between $v_u$ and each $v_i$, followed by a softmax-based weighting mechanism:
\[
    w_i = \text{softmax}\left(\frac{\cos(v_u, v_i)}{\alpha}\right),
\]
where $\alpha$ controls the sharpness of selection and $N$ is the total number of Culture Packs.
The final \textbf{hybrid cultural representation} is then obtained as:
\[
    v_{hyb} = \sum_{i=1}^{N} w_i v_i.
\]
This fused vector integrates lexical, tonal, and pragmatic traits across multiple dialectal communities, forming a probabilistic blend that mirrors the user's communicative diversity.
The resulting representation is serialized into a structured prompt segment, which conditions the \textbf{Response Generation Agent} to produce linguistically authentic, empathetic, and culturally consistent outputs.

To ensure the robustness of this fusion mechanism under \textbf{low-resource and cold-start conditions}, each Culture Pack is \textbf{initialized through few-shot synthesis combined with expert-in-the-loop refinement}.
A small set of seed examples (dialectal expressions, emotional idioms, and tone descriptors) is first collected from linguistic corpora and social media data, and then expanded via \textbf{LLM-assisted controlled generation}.
Human reviewers validate all entries to remove synthetic artifacts and ensure cultural appropriateness.
Over time, the \textbf{CulPack Agent} supports \textbf{incremental updates}, incorporating newly observed expressions through confidence-weighted adaptation.
This continual refinement enables \AbbrName{} to remain both \textbf{culturally grounded and dynamically adaptive} without relying on large-scale annotated corpora.


\input{tables/4-dialects_overview}

\subsection{Response Generation Agent}

The \textbf{Response Generation (Response Gen) Agent} serves as the dialogue engine of \AbbrName, producing culturally aligned and emotionally grounded responses \textbf{without any task-specific fine-tuning}.
Given the hybrid cultural representation ($P_{\text{hybrid}}$) from the CulPack Agent and the user query ($x$), the Response Gen \textbf{Agent} constructs a \textbf{two-layer prompt}:
\begin{enumerate}
    \item A \textit{system layer} embedding cultural constraints (including tone markers, dialectal exemplars, taboo lists, and pragmatic rules) derived from ($P_{\text{hybrid}}$); and
    \item A \textit{task layer} encoding the user's intent and situational context.
\end{enumerate}

The underlying LLM then generates an initial draft response ($y^{(0)}$) conditioned on both layers:
\[
    y^{(0)} = \operatorname{LLM}\Big(\text{System}(P_{\text{hybrid}}), \text{Task}(x), \theta_{\text{ctrl}}\Big),
\]
where ($\theta_{\text{ctrl}}$) denotes decoding parameters (e.g., temperature, top-$p$) tuned for stable cultural style expression.

To ensure \textbf{effective validity under maternal mental-health norms}, the Response Gen \textbf{Agent}  performs a \textbf{zero-training affective control} procedure.
The model self-evaluates ($y^{(0)}$) via a few-shot rubric to estimate affective scores of Stylistic and Emotional Characteristics of the Five Chinese Culture Packs in \AbbrName
($\hat{\mathbf{e}} = [\hat v, \hat a, \hat \epsilon] \in [0,1]^3$) for \textit{valence}, \textit{arousal}, and \textit{empathy}.

Dialect-specific target ranges from ($P_{\text{hybrid}}$) are denoted as
($\mathcal{T} = [v_\ell,v_u] \times [a_\ell,a_u] \times [\epsilon_\ell,1]$).

If ($\hat{\mathbf{e}} \notin \mathcal{T}$), the Response Gen \textbf{Agent}  triggers a \textbf{style-preserving constrained rewrite} (without modifying model weights) to obtain ($y^{(1)}$):
\[
    y^{(1)} = \operatorname{LLM}\Big(\text{Rewrite}(y^{(0)}, \text{targets}=\mathcal{T}, \text{dialect}=d)\Big),
\]
otherwise ($y^{(1)} = y^{(0)}$).

This single-pass revision preserves factual content while enforcing culturally appropriate warmth and emotional intensity.

Next, the Response Gen \textbf{Agent} performs \textbf{safety filtering} using a hybrid rule-and-LLM mechanism, still \textbf{zero-trained}.

A deterministic rule layer screens ($y^{(1)}$) for medical directives, absolute assurances, and culture-specific taboos sourced from ($P_{\text{hybrid}}$). When triggered, or when uncertainty persists, the system requests a short risk classification
\[
    r \in \{\text{self-harm}, \text{medical}, \text{toxicity}, \text{none}\}.
\]

Mitigation is template-driven: self-harm routes to crisis-support language; medical risk invokes directive removal and professional-help disclaimers; cultural or tonal violations prompt a taboo-aware rewrite.
The final output is ($y^{*}$).

This \textbf{training-free pipeline} translates cultural knowledge into actionable generation constraints, yielding responses that are \textbf{linguistically authentic, affectively calibrated, and clinically cautious}, while maintaining minimal data and computational requirements.

% For deployment efficiency, \AbbrName uses a \textbf{single-pass} Response Gen \textbf{Agent}  for real-time interaction, while extended co-design rationales and ethics audits run in \textit{light-tag} mode online and \textit{full} mode offline, ensuring zero latency impact for end-users.

% To balance cultural fidelity and deployment efficiency, \AbbrName adopts a single-pass RGA for real-time interaction.
% Extended co-design rationales and ethics audits operate asynchronously (in light-tag mode online and full mode offline), ensuring zero latency impact for end-users.

For deployment efficiency, \AbbrName generates the user-facing response in a \textbf{single-pass} Response Gen \textbf{Agent} for real-time interaction. Extended co-design rationales and periodic ethics audits operate \textit{asynchronously} (in \textit{light-tag} mode online and \textit{full} mode offline), ensuring \textbf{zero latency impact} for end-users.

This design allows the system to maintain \textbf{cultural sensitivity, affective reliability, and ethical accountability} in practice, while remaining efficient for everyday mental-health dialogue applications.

\subsection{Co-Design Agent Group (CDAG)}

The Co-Design Agent Group (CDAG) constitutes the participatory feedback core of the \AbbrName framework, integrating interdisciplinary expertise to evaluate and refine the model's cultural, emotional, and ethical alignment.

Rather than relying on a single evaluative channel, CDAG orchestrates five symbolic reasoning agents (the Psychologist (PA), Linguist (LA), Teacher (TA), Mother (MA), and AI Researcher (AIA)), each representing a distinct stakeholder in maternal mental-health communication.

Together, they convert qualitative human insights into structured evaluative signals, enhancing fairness, trustworthiness, and interpretability without any additional model training.

Each agent focuses on a dedicated evaluative dimension.
\begin{itemize}
    \item The Psychologist Agent assesses emotional safety, empathy, and affective appropriateness, flagging replies that may induce distress or reinforce stigma.
    \item The Linguist Agent verifies dialectal authenticity, idiomatic coherence, and pragmatic consistency against the active Culture Pack.
    \item The Teacher Agent evaluates clarity, accessibility, and pedagogical tone to ensure inclusivity across literacy levels.
    \item The Mother Agent represents the end-user perspective, gauging emotional resonance, comfort, and sincerity.
    \item Finally, the AI Researcher Agent supervises logical soundness, fairness across dialectal groups, and adherence to ethical-alignment principles.
\end{itemize}

Together, these five agents span the principal dimensions of cultural fidelity, emotional safety, and ethical compliance within maternal-mental-health dialogues.

Their concrete evaluation metrics (covering linguistic, affective,
and fairness-related subdimensions) are detailed in the Evaluation and Results section, which formalizes how qualitative
judgments are operationalized into quantifiable scores.

Formally, each agent outputs an evaluation tuple:
\[
    f_i = (s_i, c_i),
\]
where $s_i \in [0,1]$ denotes a confidence score and $c_i \in \{0,1\}$ indicates binary acceptance.

The combined vector
\[
    \mathbf{f} = [f_P, f_L, f_T, f_M, f_A]
\]
summarizes multidimensional judgments spanning empathy, authenticity, clarity, cultural safety, and fairness.

CDAG operates synchronously with the \textbf{Response Generation Agent} within a single decoding pass.
Role-specific self-assessment instructions are appended to the system prompt, guiding the LLM to output both the final reply and structured metadata in JSON format \cite{shanahan2023roleplaylargelanguagemodels}.
This \textbf{single-pass self-evaluation} design introduces negligible latency, as the role-based tags are produced within the same decoding stream rather than through a secondary inference step \cite{Zhang2024SinglePass,manakul2023selfcheckgpt}.

\begin{wrapfigure}{r}{0.45\textwidth}
    \centering
    \includegraphics[width=0.95\linewidth]{img/co_design_agents}
    \caption{Co-Design Agent Group (CDAG) structure and workflow.}
    \label{fig:co_design_agents}
\end{wrapfigure}

When disagreement or low confidence ($s_i < 0.6$) arises, the interaction is stored in a \textit{Co-Design Memory Pool} for offline expert review.
Domain specialists corresponding to each agent type annotate linguistic misalignment or emotional risks, and these annotations are subsequently used to refine Culture Pack entries and affective target ranges.

Over successive iterations, the relative influence of each agent dimension is re-balanced using a lightweight update rule:
\[
    w_i(t+1) = w_i(t) + \eta (s_i - 0.5),
\]
allowing the framework to evolve through continuous human-in-the-loop calibration without retraining.

An internal synthesis layer aggregates feedback from all five agents (Psychologist, Linguist, Teacher, Mother, and AI Researcher) and consolidates their judgments into a unified decision vector.
This ensures that the final response delivered to the user is culturally coherent, emotionally appropriate, and ethically verified.

By embedding multi-perspective evaluation directly into the generation loop, the Co-Design Agent Group (Figure~\ref{fig:co_design_agents}) transforms diverse human expertise into a living governance mechanism, reinforcing \AbbrName's cultural sensitivity, affective reliability, and ethical transparency.

Through these mechanisms, \AbbrName embodies a practical model of responsible and culturally grounded AI for maternal mental health support.