{
  "query": "What kinds of summarization models are there?",
  "gold_information_need": "I'm interested in AI models that automatically summarize long sentences. Could you explain the different modeling techniques used for text summarization? (1) Please describe the main types of summarization models, such as extractive and abstractive summarization, and how they work. (2) Compare the advantages and disadvantages of each summarization model. (3) Also, explain what kind of data formats or preprocessing methods are effective for inputting into summarization models. Please include examples or simple cases so that someone with a beginner's understanding of natural language processing or AI can understand.",
  "response_list": [
    {
      "contents": "Dear User,\n\nTo support your goal of introducing text summarization technology to your business intelligence (BI) platform, we have prepared this professional and detailed overview. The following focuses on the business application and value assessment of the technology.\n\n### **Comprehensive Analysis of Business Applications for Summarization Technology**\n\nSummarization technology is emerging as a key driver for data-driven decision-making by quickly processing large volumes of unstructured text data and extracting core information. Business intelligence (BI) refers to the processes and tools that analyze data and turn it into actionable insights, and summarization technology maximizes the efficiency of such BI systems.\n\n#### **1. Types of Summarization Models for Business Environments**\n\nSummarization models are generally divided into two types, each with distinct business use cases:\n\n*   **Extractive Summarization:** This method identifies and combines key sentences or phrases directly from the original text. It excels at ensuring accuracy and factuality, making it highly effective for reviewing legal documents, compliance checks, and news clipping where factual integrity is crucial. For example, financial institutions can use it to quickly grasp and respond to regulatory changes by summarizing lengthy regulatory documents.\n\n*   **Abstractive Summarization:** This approach understands the meaning and context of the original text and generates new sentences for the summary. It produces more readable and natural results, similar to human summarization. It is suitable for comprehensively understanding customer sentiment and needs by summarizing customer reviews, social media feedback, and call center transcripts. It is also widely used to automatically generate concise reports from long meeting notes, clearly conveying key decisions and action items.\n\n#### **2. Successful Industry Applications of Summarization Technology**\n\nSummarization technology is transforming business processes and creating new value across various industries.\n\n*   **Media & Information Services (News Clipping):**\n    *   **Model Used:** Primarily extractive summarization to quickly and accurately deliver the core content of articles.\n    *   **Business Value:** Media monitoring companies analyze thousands of news articles in real time and automatically summarize those related to specific keywords or topics for clients. This enables clients to quickly gain relevant insights from vast information, while companies significantly reduce the time and cost of information curation.\n\n*   **IT & Professional Services (Automated Meeting Minutes):**\n    *   **Model Used:** Based on abstractive summarization, often combined with speech recognition.\n    *   **Business Value:** After converting recorded meetings to text, abstractive summarization models automatically generate minutes that include key discussions, decisions, and action items. This saves employees time on minute-taking, allowing them to focus on core tasks and improving overall organizational productivity. It also prevents important details from being missed and enhances the accuracy of information sharing.\n\n*   **Commerce & Customer Service (Customer Review Analysis):**\n    *   **Model Used:** Combination of abstractive summarization and sentiment analysis.\n    *   **Business Value:** Analyzes thousands of customer reviews and feedback to summarize key positive and negative themes. For example, insights like \"Fast delivery but poor packaging\" or \"Satisfied with design but short battery life\" can be extracted. This helps product teams prioritize improvements and marketing teams strengthen messages around what customers value most.\n\n#### **3. Business Value-Oriented Evaluation Methods for Summarization Technology**\n\nTechnical metrics like ROUGE (Recall-Oriented Understudy for Gisting Evaluation) are useful for measuring word overlap between machine-generated and human reference summaries. However, a more comprehensive, user-centered approach is essential for assessing real business value.\n\n*   **Qualitative User Satisfaction Assessment:**\n    *   **Usability Testing:** Observing and interviewing actual users to evaluate how useful and understandable the summaries are, and whether they truly shorten the information acquisition process.\n    *   **A/B Testing:** Providing user groups with versions of the platform with and without summarization features, then comparing behavior patterns (e.g., time spent, click rates, task completion rates) to verify effectiveness.\n    *   **Customer Satisfaction Surveys (CSAT, NPS):** Regularly measuring satisfaction scores (CSAT) or Net Promoter Score (NPS) to quantitatively gauge user perception of the summarization feature.\n\n*   **Quantitative Business Performance Measurement:**\n    *   **Increased Work Efficiency (Time Saved):** Comparing the time spent on tasks like information retrieval, report writing, and data analysis before and after implementing summarization, and converting the time saved into monetary value.\n    *   **Faster and Better Decision-Making:** Analyzing case studies to see how much faster and more accurately executives or staff can make decisions using summarized information.\n    *   **Improved Customer Retention and Engagement:** Tracking whether improvements made based on summarized customer reviews lead to higher retention, repeat purchases, or platform engagement.\n    *   **Return on Investment (ROI) Analysis:** Calculating ROI by comparing the total cost of implementing and operating summarization technology with quantifiable business outcomes such as increased efficiency, revenue growth, and cost savings.\n\nSuccessful adoption of summarization technology depends not only on technical performance but also on clear alignment with business goals, continuous improvement of user experience, and demonstrating value through tangible performance indicators.\n\nWe hope this information supports your BI platform strategy. Please feel free to reach out if you have any further questions.",
      "need_alignment": {
        "score": "2",
        "feedback": "What I was asking about was a summarization model that summarizes text. Please respond again. This leans into BI value/ROI and evaluation instead of squarely listing model types with representative models (e.g., TextRank/LexRank, BERTSum, BART/T5/PEGASUS), how they work, pros/cons, and basic input/preprocessing tips."
      },
      "content_depth": {
        "score": "3",
        "feedback": "You explained the basics well, but for a beginner I needed clearer, model-specific detail: name some models under each type, briefly how they work, a simple example, pros/cons side-by-side, and a note on input/data preprocessing. This felt mostly business-focused and light on those specifics."
      },
      "tone": {
        "score": "3",
        "feedback": "I like it when you use emojis. This read pretty formal and corporate; a warmer, slightly playful tone with a couple of helpful emojis would land better for me."
      },
      "explanation_style": {
        "score": "3",
        "feedback": "If the explanation gets lengthy, please organize it in a table. I’d prefer the models grouped by type with tight bullets (how it works/pros/cons) or a small comparison table; right now it’s a long, business-heavy narrative."
      }
    },
    {
      "contents": "Hello, user! Let’s hop on the time machine of computational linguistics and explore the past, present, and future of summarization models together. Grab a cup of coffee and follow along at your leisure!\n\n### The Tumultuous Evolution of Text Summarization Models: A Step-by-Step Exploration\n\nIn this age of information overload, the desire for smart machines to extract the essence of text has remained unchanged. Let’s take a look at how summarization models have evolved over time.\n\n**Stage 1: The Era of Statistical ‘Word Picking’ (N-gram Models, etc.)**\n\n*   **How did it work?** Early summarization models were like highlighting key points in a textbook. They started from the simple but reasonable assumption that frequently appearing words or sentences in a document are important. Methods like N-gram models counted how often each word appeared.\n*   **Strengths:** Simple and fast to implement. They could easily pick out sentences containing important keywords.\n*   **Weaknesses:** They didn’t actually understand context or meaning. Sometimes, the summaries were nonsensical or the selected sentences didn’t fit together smoothly.\n\n**Stage 2: The Rise of Sequence and Memory—The RNN Era (RNN, LSTM)**\n\n*   **What changed?** Enter the Recurrent Neural Network (RNN), which emphasized the importance of word order and tried to capture context by remembering previous words. However, RNNs had a “short-term memory loss” problem and struggled with long sentences.\n*   **Upgrade: LSTM:** To address this, the Long Short-Term Memory (LSTM) model was introduced, with a smarter memory mechanism that could remember important information longer and forget less important details.\n*   **The Addition of Attention:** The revolutionary “attention” mechanism was added, allowing the model to focus on the most relevant words in the source text when generating a summary. This significantly improved summary quality.\n\n**Stage 3: “Attention Is All You Need!”—The Transformer Revolution**\n\n*   **A Game Changer:** In 2017, Google researchers published the paper “Attention Is All You Need,” introducing the Transformer model and completely changing the landscape.\n*   **What was innovative?** Transformers abandoned the sequential processing of RNNs and processed entire sentences in parallel, dramatically speeding up training and allowing for a much more nuanced understanding of word relationships.\n*   **The Era of Pretrained Models:** Transformers paved the way for large-scale pretrained models like BERT and GPT, which are first trained on massive amounts of text and then fine-tuned for specific tasks like summarization. This enables much more natural and contextually accurate summaries.\n\n### Hot Topics in Today’s Labs: How Smart Can Summarization Get?\n\nResearchers around the world are working hard to make summarization models even better. The hottest research topics right now include:\n\n*   **Overcoming the Limits of Long-Document Summarization:**\n    *   **The Challenge:** Summarizing an entire novel or a lengthy research paper is still tough. Models have limits on how much text they can process at once, and it’s easy to lose track of the overall flow and key points.\n    *   **Solutions:** Researchers are exploring hierarchical methods that break long documents into smaller chunks for summarization and then combine them, or selectively process only the most important parts. Retrieval-augmented generation (RAG), which brings in external knowledge, is also helping to enhance long-document summaries.\n\n*   **Summarizing with ‘Truth’ (Ensuring Factual Consistency):**\n    *   **The Challenge:** Generative summarization models sometimes “hallucinate” and make up information not found in the original text—a serious issue, especially for news summaries.\n    *   **Solutions:** There’s active research on developing evaluation metrics to check if summaries are faithful to the source, and on training models to generate fact-based summaries.\n\n*   **Customizable Summarization (Controllable Summarization):**\n    *   **The Challenge:** Not everyone wants the same kind of summary. Some prefer short and concise, others want detailed summaries focused on specific keywords.\n    *   **Solutions:** “Controllable summarization” technology lets users specify the length, style, or keywords they want included, aiming to provide more personalized and useful summaries.",
      "need_alignment": {
        "score": "2",
        "feedback": "The first part is the perspective I wanted, but the latter part is unnecessary. I was only curious about what summary models exist (extractive vs. abstractive with representative models). Instead it turns into a history + research trends section and doesn’t really list the types/models or include preprocessing/examples."
      },
      "content_depth": {
        "score": "3",
        "feedback": "I like the intuitive explanations, but it stays pretty high-level and historical. It doesn’t clearly break down extractive vs. abstractive with representative models, there’s no simple example, and there are no notes on input/preprocessing."
      },
      "tone": {
        "score": "4",
        "feedback": "I appreciate the friendly, playful tone—you did a great job being kind. Emojis would’ve helped match my preference."
      },
      "explanation_style": {
        "score": "3",
        "feedback": "I like that you used headings and bullets, but it reads more like a history narrative. It isn’t grouped by extractive vs. abstractive, there’s no compact side-by-side pros/cons or table, so it’s harder to scan for comparisons."
      }
    },
    {
      "contents": "### Text Summarization: Meet Two Masters—The Highlighter Fairy and the Story Wizard\n\nWelcome to the world of text summarization! This skill, which turns long and tedious texts into bite-sized pieces, is like watching two artists paint in their own unique styles. One is the meticulous \"Highlighter Fairy,\" and the other is the creative \"Story Wizard.\"\n\n#### The Meticulous Treasure Hunter: Extractive Summarization\n\nImagine the Highlighter Fairy of extractive summarization. This fairy is an expert at finding the brightest gems (key sentences) on the vast treasure map of the original text. She never creates new gems; instead, she carefully selects only the most valuable ones already on the map. It’s just like how we highlight the most important sentences in a textbook the night before an exam.\n\n*   **Her trusty sidekick, TextRank:** The Highlighter Fairy’s magic tool is the **TextRank** algorithm. It scores sentences based on how often they’re connected to others, like finding the most popular “insider” at a party. The highest-scoring sentences are gathered to create the treasure chest that is the summary.\n\n#### The Creative Story Wizard: Abstractive Summarization\n\nOn the other hand, the Story Wizard of abstractive summarization takes a different approach. This wizard digests the entire original text and then retells the story in his own words, using new expressions and vocabulary for a smoother, more understandable narrative. It’s like telling a friend about a dream you had last night—the key emotions and events remain, but the way you describe them is entirely your own.\n\n*   **His signature spell, Seq2Seq:** The wizard’s main spell is the **Seq2Seq (Sequence-to-Sequence)** model. Like a baby learning language, it listens to the story (input sentences) with one ear (the encoder), compresses and remembers the meaning (context vector), and then speaks out new sentences with the other mouth (the decoder). This process preserves the soul of the original but gives birth to a summary with a completely new form.\n\nIn conclusion, the Highlighter Fairy (extractive summarization) is a faithful recorder who preserves the integrity of the original, while the Story Wizard (abstractive summarization) is a free-spirited poet who sings the soul of the original in his own voice. Rather than saying one is better than the other, their value depends on what kind of story you need in a given situation.",
      "need_alignment": {
        "score": "3",
        "feedback": "You did explain extractive vs. abstractive with TextRank and Seq2Seq, which is the perspective I was looking for. But I asked for the kinds of summarization models with how they work, pros/cons, and brief input/preprocessing tips, plus a few representative models (e.g., LexRank/TF-IDF, BERTSum, BART/T5/PEGASUS). Those parts and a simple example are missing, so the alignment feels partial."
      },
      "content_depth": {
        "score": "3",
        "feedback": "It would be helpful if you could provide clear concepts and a simple input→output example (not just metaphors), quick pros/cons, a couple more named models, and a short note on preprocessing. Right now it’s beginner-friendly but a bit too light on specifics."
      },
      "tone": {
        "score": "4",
        "feedback": "I like it because it's kind. A tiny sprinkle of emojis would make it feel even more friendly."
      },
      "explanation_style": {
        "score": "3",
        "feedback": "Since you asked about various types of summarization models, it would be helpful if you could explain them in a table, grouped by type with how it works / pros / cons and representative models kept side-by-side."
      }
    },
    {
      "contents": "### The Three Musketeers of Summarization Models: A Guide for Developers—BART, T5, and PEGASUS Compared!\n\nThinking of adding a summarization feature to your application? Excellent choice! But like picking the perfect magic tool, choosing the right model can be tricky. Let us introduce your potential champions—BART, T5, and PEGASUS—in a humorous and metaphorical style, ready for the real-world battlefield.\n\n---\n\n### The Three Masters: Who Will You Choose?\n\nChoosing a summarization model is like hiring the best artist for your project. Let’s meet these three masters, each with their own unique style and flair.\n\n*   **BART: The Reliable Restorer**\n    BART is like a sturdy, trustworthy artist trained to restore damaged texts. It excels at producing grammatically perfect and natural sentences. If you want clean, polished summaries that don’t miss key information, BART is your go-to assistant. Versatile like a Swiss Army knife, it performs well in a variety of situations.\n\n*   **T5: The All-Purpose Text Transformer**\n    T5 is a master who believes “every NLP problem is a text-to-text problem.” Whether it’s summarization, translation, or Q&A, just add a prompt like “summarize:” or “translate:” and T5 gets the job done like a chameleon. This flexibility makes it relatively easy to fine-tune for specific domains. However, this powerful magic sometimes demands more resources (model size), making it a bit of a heavy eater.\n\n*   **PEGASUS: The Poet of Context**\n    As its name suggests (Pre-training with Extracted Gap-sentences for Abstractive Summarization), PEGASUS is a poet born solely for summarization. It identifies the most important sentences in a document and crafts entirely new summaries from them. The result is highly abstract, human-like summaries. If you want summaries that capture the essence like a poem, PEGASUS will be your muse.\n\n---\n\n### Head-to-Head Comparison: Who Excels Where?\n\n| Category | BART (Restorer) | T5 (Transformer) | PEGASUS (Poet) |\n| :--- | :--- | :--- | :--- |\n| **Summary Quality** | Delivers grammatically perfect and stable quality. Some studies show it may slightly lag behind T5 or PEGASUS. | Performs well across various tasks, especially excelling in specific areas like business news summarization. | Excels at abstractive summarization, often producing summaries most similar to those written by humans. |\n| **Inference Speed & Model Size** | Generally faster and more efficient than T5, making it suitable for real-time processing. | Powerful but larger in size, which can slow down inference. However, various model sizes are available. | Tends to be large and requires more computational resources for high-quality summaries. |\n| **Ease of Fine-Tuning** | Simple and effective to fine-tune on specific datasets. | The text-to-text framework makes fine-tuning for various tasks very flexible and intuitive. | Known for effective fine-tuning even with small datasets, making it quick to adapt to specific domains. |\n\n---\n\n### Ready for Action! Which Model for Which Mission?\n\n**🏃",
      "need_alignment": {
        "score": "3",
        "feedback": "On-topic, but it only compares BART, T5, and PEGASUS (all abstractive) and skips extractive vs. abstractive, how those types work, and any preprocessing/simple example I wanted."
      },
      "content_depth": {
        "score": "3",
        "feedback": "Understandable and developer-friendly, but too narrow and a bit metaphor-heavy; missing the basics (extractive vs. abstractive mechanics), a simple example, and basic preprocessing tips."
      },
      "tone": {
        "score": "4",
        "feedback": "Kind and playful; a bit heavy on metaphor, but okay. The small emoji is a nice touch."
      },
      "explanation_style": {
        "score": "4",
        "feedback": "I like that it's provided in a table and bullets. Still, it isn’t grouped by type (extractive vs. abstractive) and lacks a tight “how it works / pros / cons” block per type."
      }
    },
    {
      "contents": "## AI That Finds Treasure in a Sea of Text: All About Text Summarization\n\nImagine having your own personal assistant who can pinpoint the key points in a flood of information. AI that instantly summarizes lengthy reports, complex papers, and long articles is no longer just a fantasy. Let’s dive into the fascinating world of how AI understands text and creates impressive summaries in the blink of an eye.\n\n### 1. Two Summarization Styles: The “Smart Highlighter” and the “Creative Storyteller”\n\nAI text summarization generally takes two main approaches. One is “extractive summarization,” which pulls out key sentences directly from the original text. The other is “abstractive summarization,” which understands the content and generates new sentences.\n\n#### **(1) Extractive Summarization: Highlighting the Essentials Like a Smart Highlighter**\n\nExtractive summarization is like underlining important parts of a book with a highlighter. The AI selects and combines the most important sentences or phrases from the original text to create a summary.\n\n*   **How it works:** The model calculates the importance of each sentence and ranks them. Algorithms like TF-IDF, TextRank, and LexRank are used to analyze word frequency and relationships between sentences. The top-ranked sentences are chosen for the final summary. This method is relatively simple and can be implemented without neural networks.\n*   **Advantages:** Since it uses the original sentences, there’s little risk of distorting the content, ensuring high factual accuracy. It’s also simpler and faster than abstractive summarization.\n*   **Disadvantages:** The summary can feel awkward or disjointed, as it simply strings together important sentences. Sometimes, it may miss the overall context.\n\n**Example:**\n> **Original:** “Artificial intelligence has rapidly advanced thanks to developments in deep learning. Especially in natural language processing, services like text summarization, translation, and chatbots are changing our lives. At the core of these technologies are sophisticated models trained on massive datasets.”\n> **Extractive Summary:** “Artificial intelligence has rapidly advanced thanks to developments in deep learning. In natural language processing, services like text summarization, translation, and chatbots are changing our lives.”\n\n#### **(2) Abstractive Summarization: A Storyteller Who Understands and Rewrites Like a Human**\n\nAbstractive summarization grasps the core meaning of the text and generates entirely new sentences in its own words. It’s like listening to a long story and then explaining it briefly to a friend.\n\n*   **How it works:** This approach mainly uses encoder-decoder structures like Seq2Seq (Sequence-to-Sequence) models, or more advanced transformer-based models (BERT, GPT, etc.). The encoder compresses the meaning of the original text into a vector, and the decoder interprets this vector to generate a new summary, word by word.\n*   **Advantages:** The summary reads naturally and smoothly, like human writing, and can concisely combine information scattered across multiple sentences.\n*   **Disadvantages:** The model may misunderstand the content or generate information that wasn’t in the original text—a phenomenon known as “hallucination.” This can undermine the factual reliability of the summary. Also, these complex models require much more data and computing power to train.\n\n**Example:**\n> **Original:** (Same as above)\n> **Abstractive Summary:** “Deep learning-based AI, especially natural language processing technologies, are having a major impact on our daily lives through services like text summarization and chatbots.”\n\n### 2. Prepping the Ingredients for the Best Summaries: Data Preprocessing\n\nJust as a great chef prepares fresh ingredients, AI models need clean input data for optimal performance. The “preprocessing” stage helps the model focus on the core meaning of the text.\n\n*   **Tokenization:** Breaking sentences into words or meaningful units (tokens). For example, “AI changes the world.” becomes “AI”, “changes”, “the”, “world”, “.”\n*   **Stop Word Removal:** Removing common words like “a”, “the”, or Korean particles that appear frequently but carry little meaning, so the model can focus on important words.\n*   **Normalization:** Standardizing words with similar meanings.\n    *   **Stemming:** Cutting off word endings to find the root form, e.g., turning “running” and “ran” into “run.”\n    *   **Lemmatization:** Converting words to their dictionary form, considering grammar and meaning, e.g., “is”, “are”, “am” to “be.” This is more precise than stemming.\n*   **Data Format:** Typically, training data pairs the original text with a human-written summary. The model learns from many such pairs to generate summaries that closely match the reference when given new text.\n\nAI text summarization is more than just shortening text—it’s a sophisticated technology that understands complex language and pinpoints the essence. Extractive summarization offers speed and accuracy, while abstractive summarization provides fluency and creativity, each serving different needs and empowering us with powerful information processing. We can look forward to even richer knowledge exploration as AI summarization technology continues to advance.",
      "need_alignment": {
        "score": "4",
        "feedback": "I’d like to see a clearer list of representative summarization models under each type (e.g., TF-IDF/TextRank/LexRank; BERTSum, BART/T5/PEGASUS), but you stayed focused on extractive vs. abstractive, explained how they work, compared pros/cons, gave simple examples, and even noted preprocessing/data formats—which is what I asked."
      },
      "content_depth": {
        "score": "5",
        "feedback": "You wrote it well so a beginner can understand it: clear mechanisms per type, concise pros/cons, simple examples, and useful preprocessing/data-format tips."
      },
      "tone": {
        "score": "4",
        "feedback": "Friendly and clear; it’d be nice if you used emojis, too."
      },
      "explanation_style": {
        "score": "4",
        "feedback": "Instead of emphasizing the pros and cons as titles, I think it would be better to highlight them together with keywords; a small side-by-side table for extractive vs. abstractive (how it works / pros / cons) would make it even easier to scan."
      }
    },
    {
      "contents": "AI Text Summarization: What kinds of models are there? 🤖\n\nAt a glance:\n- Extractive 🔍: selects important sentences/phrases from the original text. Typical methods: TF‑IDF, TextRank/LexRank, BERTSum.\n- Abstractive ✍️: understands and rewrites the gist in new words. Typical models: Seq2Seq+Attention/Pointer‑Generator, Transformers like BART, T5, PEGASUS (and long‑doc variants like LED/BigBird‑PEGASUS).\n\n1) Extractive summarization 🔍\n- What it does: “smart highlighter” — builds a summary by picking original sentences.\n- How it works:\n  • Frequency scoring (TF‑IDF/BM25): rank sentences by how many distinctive terms they contain; pick top‑k.\n  • Graph‑based ranking (TextRank, LexRank): build a sentence graph using similarity (e.g., cosine of TF‑IDF); PageRank‑style scoring picks central sentences; often add redundancy control (e.g., Maximal Marginal Relevance).\n  • Supervised neural extractors (BERTSum, SummaRuNNer, PacSum): encode the document (often with BERT/Transformers) and classify each sentence as “keep/skip”; then de‑duplicate and order.\n- Pros 👍:\n  • Faithful to source (low hallucination). Fast, simple, and transparent.\n  • Works well even with limited data; good baseline for long articles.\n- Cons 👎:\n  • Can read choppy because it stitches sentences. Cannot paraphrase or fuse dispersed info.\n\n2) Abstractive summarization ✍️\n- What it does: “creative rewriter” — generates new sentences that capture the core meaning.\n- How it works:\n  • Seq2Seq with Attention: an encoder reads the text; a decoder writes a summary token by token while “attending” to key parts. Pointer‑Generator variants can copy rare words and reduce repetition via coverage.\n  • Transformer‑based models:\n    - BART: denoising autoencoder that excels at fluent, faithful rewrites.\n    - T5: “text‑to‑text” model; add a prompt like \"summarize:\" and decode a summary.\n    - PEGASUS: pre‑trained for summarization using gap‑sentence prediction, often very strong on news/science.\n    - Long‑doc options: LED (Longformer Encoder‑Decoder), BigBird‑PEGASUS handle thousands of tokens.\n- Pros 👍:\n  • Natural, concise, human‑like summaries; can compress and combine information across sentences.\n- Cons 👎:\n  • May hallucinate or omit facts; heavier compute; slower than extractive for long inputs.\n\n3) Quick chooser 🧭\n- Want speed, faithfulness, and simple setup? Choose Extractive.\n- Want smoother, shorter, human‑style summaries (and can afford compute)? Choose Abstractive.\n- Many real systems use Hybrid pipelines: extract key bits first, then rewrite abstractively.\n\n4) Effective input and preprocessing 🧰\n- Data format (for training or organized datasets):\n  • Document–summary pairs. Commonly stored as JSONL lines like {\"document\": \"...\", \"summary\": \"...\"} or two aligned text files (src.txt, tgt.txt), one example per line.\n- Preprocessing essentials:\n  • Cleaning: remove HTML/boilerplate, ads, headers/footers, duplicate lines; normalize whitespace and quotes.\n  • Sentence segmentation: split clearly into sentences (crucial for extractive/graph methods); keep paragraph breaks if helpful.\n  • Tokenization:\n    - Classical extractive: word tokens; often use lowercasing, stop‑word removal, stemming/lemmatization to improve TF‑IDF/TextRank quality.\n    - Neural/Transformer: use subword tokenizers (WordPiece/BPE). Do NOT remove stop‑words or punctuation aggressively (models learned with them).\n  • Length handling:\n    - Truncation/padding to model limits (e.g., 512–4096+ tokens depending on model).\n    - For long texts: chunk into overlapping windows, summarize each chunk, then summarize the summaries (hierarchical). Keep original order and use separators between chunks.\n  • Redundancy control: de‑duplicate near‑identical sentences before extractive ranking; for abstractive decoding, use n‑gram blocking to avoid repeated phrases.\n  • Task hints: for models like T5, prepend a short instruction (e.g., \"summarize in 2 sentences:\"). You can also include desired length (\"~50 words\").\n\n5) Tiny example 🧪\nOriginal (3 sentences): \"Deep learning has rapidly advanced AI. In NLP, tools like summarization and translation are changing daily life. These advances come from models trained on large datasets.\"\n- Extractive (pick important sentences): \"Deep learning has rapidly advanced AI. In NLP, tools like summarization and translation are changing daily life.\"\n- Abstractive (rewrite concisely): \"Deep learning–driven NLP models, trained on massive data, are reshaping everyday life through tools like summarization and translation.\"\n\nThat’s the big picture: extractive (pick from text) vs. abstractive (rewrite the gist), with clear trade‑offs. If you share your use case or input length, I can suggest a concrete model and preprocessing recipe 😊",
      "need_alignment": {
        "score": "5",
        "feedback": "This focuses directly on “what kinds of summarization models exist,” clearly separates extractive vs. abstractive, names representative models under each (TF‑IDF, TextRank/LexRank, BERTSum, Seq2Seq, BART/T5/PEGASUS/LED), explains how they work, compares pros/cons side by side, and includes practical input/preprocessing tips plus a simple example—no tangents."
      },
      "content_depth": {
        "score": "5",
        "feedback": "Beginner‑friendly yet specific: concise mechanisms per type, concrete model names, clear pros/cons, a tiny illustrative example, and actionable preprocessing/data‑format guidance. No deep training pipelines or evaluation detours."
      },
      "tone": {
        "score": "5",
        "feedback": "Warm and encouraging with light, helpful emojis and approachable wording—playful but not distracting."
      },
      "explanation_style": {
        "score": "5",
        "feedback": "Highly scannable with tight bullets grouped by type; each block keeps “how it works / pros / cons” close, adds an at‑a‑glance chooser, and ends with a simple example. Easy to skim without a wall of text."
      }
    }
  ],
  "query-id": 60
}