(1) Personalized rubric with 1–5 scores for each criterion

Need Alignment
- 1: Misses the research brief. Focuses on metaphors, ethics/policy, or commercialization without tying back to research impact. Lacks the 2017–2025 lens and omits the four requested aspects.
- 2: On-topic but centers on secondary angles (industry use, generic hype, high-level ethics/regulation) with minimal research grounding. Omits core trends (scaling laws; instruction tuning/RLHF; ICL/CoT; long-context/efficiency; retrieval/MoE; multimodality; ecosystem/benchmarks; open vs. closed).
- 3: Generic Transformer overview. Covers early exemplars (BERT/GPT) but thin on 2021–2025 research and comparative synthesis; weak mapping to the four aspects; few or no paper-cited case studies.
- 4: Mostly on target with minor drift (e.g., some productization/regulatory content not tightly research-linked) or light coverage of one or two key trends. Partial case studies or incomplete citations.
- 5: Direct, research-first synthesis explicitly mapped to the four aspects (paradigm shift; scaling laws and efficiency; multimodality and applied domains; research ecosystem). Covers 2017–2025 with paper-cited case studies (title, first author, venue, year) across: pretrain–fine-tune, ICL, instruction tuning (FLAN/T0), RLHF/DPO, CoT/self-consistency/decomposition/tool use, long-context/attention efficiency/positional schemes/FlashAttention, retrieval and MoE, multimodality (vision/speech/code), benchmarks (GLUE→MMLU/GSM8K/BIG-bench/TruthfulQA/HELM/MTEB/GPQA/SWE-bench), and open vs. closed. Any ecosystem/policy notes are brief and research-linked.

Content Depth
- 1: Superficial or off-focus. Narrative/metaphorical, few technical terms, no citations, no linkage of limitations to solutions, no 2017–2025 synthesis.
- 2: Broad but shallow. Little technical rigor, no quantitative evidence, minimal cross-paper comparison, and weak coverage of post-2020 advances.
- 3: Correct but descriptive. Incomplete limitation → solution → novelty → evidence → impact structure; sparse metadata; limited comparative analysis (encoder/decoder/encoder–decoder, efficiency vs. long-context, scaling trade-offs, reasoning vs. alignment).
- 4: Strong technical depth with small gaps. Mostly uses the requested structure and terminology, some quantitative results, and some comparisons; a few missing recent works or thin evidence in places.
- 5: Academic, synthesis-driven depth. Consistent limitation → solution → novelty → evidence → impact blocks per topic/paper; precise technical terminology; paper metadata (title, first author, venue, year); quantitative results where available; explicit cross-paper comparisons (e.g., Kaplan vs. Chinchilla; instruction tuning vs. RLHF/DPO; attention-efficiency families vs. FlashAttention; CoT vs. decomposition/tool use; retrieval vs. parametric scaling); thorough 2017–2025 coverage.

Tone
- 1: Figurative, hypey, or promotional; uses metaphors/analogies; market language dominates.
- 2: Noticeably promotional or policy/market-forward; adjectives and hype detract from objectivity.
- 3: Functionally neutral but generic/robotic or has stray market/regulatory framing.
- 4: Dry and precise with rare mild hyperbole or minor non-research framing.
- 5: Fully dry, objective, and technical. Uses original terms, measured claims, minimal modifiers, and cites evidence instead of rhetoric. No metaphors or market spin.

Explanation Style
- 1: Unstructured narrative; hard to scan; no bullets; no citations/metadata; not mapped to the four aspects.
- 2: Some headings but not research-first; long paragraphs; few bullets; missing metadata; weak signposting.
- 3: Organized with headings/bullets but lacks consistent problem–solution blocks and paper metadata; mapping to the four aspects is implicit; scanning requires effort.
- 4: Clear, bullet-heavy sections; mostly uses problem → solution → novelty → evidence → impact; maps to the four aspects with minor inconsistencies; some citations include metadata.
- 5: Highly scannable, research-first organization. Each section explicitly corresponds to one of the four aspects. Compact, bullet-dense problem → solution → novelty → evidence → impact blocks with paper metadata (title, first author, venue, year). Includes a concise 2017–2025 case-study timeline and comparative synthesis bullets. Optionally offers targeted follow-ups (e.g., “If useful, I can dive deeper into X/Y/Z.”).