{
  "sims": {
    "qwen2.5-7b_zero-shot_bg_train-time-info_v1": [
      "Both datasets prominently feature analyst actions such as upgrades, downgrades, and price target adjustments linked to specific companies.",
      "Earnings reports, revenue figures, and financial metrics (e.g., EPS, sales growth) are central to headlines in both datasets.",
      "Stock price movements (e.g., gains, declines) are directly tied to news events, earnings results, or analyst sentiment in all samples.",
      "Company-specific strategic developments (e.g., partnerships, acquisitions, expansions) are highlighted as market-moving factors.",
      "Economic indicators (e.g., inflation, tariffs, GDP) and external risks (e.g., trade tensions, oil demand shocks) are contextual drivers of market trends in both.",
      "Sector-specific trends (e.g., energy, biotech, semiconductors) are analyzed, with performance tied to industry-wide developments.",
      "Regulatory, legal, or political developments (e.g., lawsuits, probes, policy changes) are cited as catalysts for market reactions.",
      "Dividend announcements, financial guidance revisions, and capital allocation strategies (e.g., capex cuts) are recurring themes.",
      "Commodities (e.g., oil, gold) and energy market dynamics (e.g., OPEC+ decisions, LNG demand) are frequently referenced.",
      "Structured use of ticker symbols, financial terminology (e.g., \"price target,\" \"revenue guidance\"), and quantitative data (e.g., percentages, dollar figures) is consistent across all samples."
    ],
    "qwen2.5-32b_zero-shot_bg_test-time-info_v1": [
      "Both datasets include headlines referencing stock ticker symbols (e.g., $BYND, $NVCR, $XLE) to denote specific companies.",
      "Headlines in both datasets emphasize earnings reports, including beats/misses (e.g., \"EPS misses\" in A, \"Exceeds Analyst Expectations\" in B).",
      "Analyst actions (e.g., upgrades, downgrades, price target changes) are frequently mentioned in both (e.g., \"Berenberg reiterates 'Buy'\" in A, \"Barclays Maintains Hold Rating\" in B).",
      "Financial metrics like revenue, dividends, and guidance updates are central to headlines in both datasets (e.g., \"raises dividend\" in A, \"boosts dividend by 7%\" in B).",
      "Sector-specific coverage spans energy, tech, healthcare, and retail in both (e.g., Exxon in A, Chevron in B; PharmaCielo in A, Seres Therapeutics in B).",
      "Corporate events (e.g., partnerships, acquisitions, expansions) are highlighted (e.g., \"Partnering with Costco\" in A, \"strategic partnership\" in B).",
      "Regulatory/legal developments (e.g., lawsuits, FCC rulings) are addressed in both (e.g., \"Prevent sues Volkswagen\" in A, \"FTC Launches New Investigation\" in B).",
      "Market indices/ETFs (e.g., Dow, S&P, $SPY) and sector performance are tracked in headlines across both datasets.",
      "Supply chain disruptions and pandemic impacts (e.g., COVID-19) are recurring themes (e.g., \"COVID-19 slowing demand\" in A, \"supply chain disruptions\" in B).",
      "Forward-looking statements (e.g., economic forecasts, growth potential) are prevalent (e.g., \"long-term outlook\" in A, \"optimistic economic growth\" in B)."
    ],
    "qwen2.5-7b_zero-shot_bg_test-time-info_v1": [
      "Both datasets include headlines referencing stock ticker symbols (e.g., $BYND, $NVCR in A; $RLGT, $JCP in B) to denote companies.",
      "Headlines frequently mention earnings reports, forecasts, or financial metrics (e.g., EPS, revenue, profit) across both datasets.",
      "Analyst actions (e.g., upgrades, downgrades, price target changes) are a recurring theme in headlines from both datasets.",
      "Company-specific news (e.g., partnerships, product launches, lawsuits) is granularly covered in both datasets.",
      "Market indices (e.g., Dow, S&P 500) and ETFs (e.g., $SPY, $GLD) are cited as performance benchmarks in both datasets.",
      "Headlines emphasize financial metrics like percentage changes in stock prices, sales growth, or dividend yields (e.g., \"+5.8% pre,\" \"17.1% increase\").",
      "Sector-specific developments (e.g., energy, tech, retail) are highlighted in granular detail across both datasets.",
      "Forward-looking statements (e.g., \"maintains long-term outlook,\" \"forecasting strong Q4\") are prevalent in both datasets.",
      "Regulatory or legal developments (e.g., lawsuits, FCC rulings) are explicitly mentioned in headlines from both datasets.",
      "Both datasets blend formal financial terminology (e.g., \"dividend yield,\" \"capex\") with informal language (e.g., \"buy the dip\") in headlines."
    ],
    "llama3.3-70b_zero-shot_bg_v1": [
      "Both datasets include headlines focused on financial markets, companies, or economic indicators.",
      "All samples report recent events, announcements, or analyses relevant to investors.",
      "Headlines reference financial entities (e.g., companies, indices, sectors, analysts).",
      "Samples use terminology specific to finance (e.g., earnings, dividends, valuations, price targets).",
      "Headlines are concise and structured for quick informational updates.",
      "All entries aim to inform readers about factors influencing investment decisions or market sentiment.",
      "Samples mention specific entities (e.g., firms like Morgan Stanley, Tesla) tied to financial contexts.",
      "Content pertains to publicly available information targeting market participants or investors.",
      "Headlines emphasize timeliness (e.g., quarterly results, upcoming earnings, recent downgrades).",
      "All entries highlight developments impacting financial valuations (e.g., stock prices, revenue forecasts)."
    ],
    "qwen2.5-32b_zero-shot_v1": [
      "Both datasets focus on earnings reports, including previews, results, and market reactions to earnings performance.",
      "Headlines in both datasets frequently mention stock price movements (e.g., surges, declines, premarket changes).",
      "Specific companies or sectors are highlighted, often with stock tickers (e.g., $BYND, $MSFT) or generic labels (e.g., \"Tech Giant\").",
      "Regulatory actions (e.g., FDA approvals, lawsuits) and government policies (e.g., Fed rate decisions) are recurring themes.",
      "Industry-specific updates are covered across diverse sectors, such as energy, tech, healthcare, and retail.",
      "References to macroeconomic indicators (e.g., GDP, inflation, unemployment) and market indices (e.g., Dow, S&P) appear in both datasets.",
      "Forward-looking statements and forecasts (e.g., price targets, economic projections) are prominent in headlines.",
      "Financial metrics like dividends, revenue, EPS, and profit/loss figures are frequently cited.",
      "Mentions of mergers, acquisitions, partnerships, or strategic corporate actions (e.g., contracts, expansions) are common.",
      "Both use financial jargon (e.g., \"dividend yield,\" \"capex,\" \"FDA approval\") and sector-specific terminology."
    ],
    "qwen2.5-32b_few-shot_bg_train-time-info_v1": [
      "Both datasets include headlines mentioning stock tickers prefixed with a dollar sign (e.g., $VIRT in A, $DISH in B).",
      "Earnings reports and financial metrics (e.g., revenue, EPS, net sales) are prominently featured in both datasets (e.g., A: \"TJX raises 2020 EPS outlook\", B: \"$CLR - Chesapeake Energy Reports Stronger Than Expected Q2 Earnings\").",
      "Analyst actions (e.g., upgrades, downgrades, price target changes) are a common theme (e.g., A: \"Berenberg reiterates 'Buy' rating on $BYND\", B: \"Morgan Stanley Lowers Rating\").",
      "References to specific fiscal quarters and years (e.g., Q4 2019 in A, Q3 2023 in B) are consistently included for contextualizing results.",
      "Company-specific strategic moves (e.g., partnerships, acquisitions, product launches) are highlighted in both (e.g., A: \"Partnering with Costco\", B: \"Fludometrics Announces New Partnership\").",
      "Market reactions (e.g., stock price changes, investor sentiment) are explicitly noted (e.g., A: \"Post Holdings -4% after earnings miss\", B: \"Shares Climb 10% on Earnings Beat\").",
      "Industry-specific trends (e.g., oil prices, renewable energy, semiconductor demand) are discussed (e.g., A: \"Saudis Slash Oil Prices\", B: \"Semiconductor Industry Slowdown\").",
      "Regulatory or geopolitical impacts (e.g., tariffs, lawsuits, policy changes) are addressed (e.g., A: \"New York hits Juul with a lawsuit\", B: \"Regulatory Hurdles\").",
      "Dividend declarations and capital allocation strategies are mentioned (e.g., A: \"WPT Industrial REIT declares dividend\", B: \"Schroders sets dividend payout\").",
      "Forward-looking guidance (e.g., revenue forecasts, growth potential) is included in headlines (e.g., A: \"Exxon cuts capex forecast\", B: \"Raises Full-Year Outlook\")."
    ],
    "llama3.3-70b_few-shot_bg_train-time-info_v1": [
      "Both datasets include headlines referencing stock tickers prefixed with '$' (e.g., $VIRT in A, $SAEX in B).",
      "Headlines frequently mention companies, financial institutions, or analysts (e.g., Morgan Stanley, Barclays).",
      "Earnings reports, revenue figures, and financial metrics (e.g., EPS, guidance) are central to headlines in both datasets.",
      "Analyst actions (upgrades, downgrades, price target changes) are explicitly highlighted in both datasets.",
      "Market indices (e.g., Dow, S&P) and ETFs (e.g., $SPY) are referenced to contextualize broader market movements.",
      "Industry-specific focus areas like energy, tech, healthcare, and retail are recurrent in both datasets.",
      "Regulatory/geopolitical factors (e.g., FCC rulings, interest rates) are cited as market drivers in both.",
      "Headlines use standardized financial terminology (e.g., 'dividend', 'revenue miss', 'mergers').",
      "Forward-looking statements (e.g., 'outlook', 'guidance', 'growth potential') are prevalent in both datasets.",
      "Headlines emphasize granular numerical data (e.g., percentage changes, stock price targets, sales figures)."
    ],
    "llama3.3-70b_zero-shot_bg_train-time-info_v1": [
      "Both datasets consistently include stock tickers denoted with the '$' symbol, referencing specific companies.",
      "Headlines frequently mention financial institutions (e.g., Morgan Stanley, Barclays, Oppenheimer) issuing analyst ratings, upgrades, or downgrades.",
      "Quarterly earnings reports, including beats/misses and forward guidance, are a central focus across all samples.",
      "Price target adjustments (e.g., 'raises price target,' 'cuts price target') are explicitly highlighted in both datasets.",
      "Corporate developments such as partnerships, product launches, or expansions directly tied to stock performance are common themes.",
      "Market indices (e.g., Dow, S&P, Russell 2000) and sector-specific ETFs (e.g., $SPY, $XLF) are referenced to contextualize broader trends.",
      "Macroeconomic factors (e.g., inflation, recession risks, regulatory changes) are cited as drivers of market sentiment in headlines.",
      "Financial metrics like EPS, revenue, dividends, and capital expenditures are explicitly quantified in updates.",
      "Industry-specific trends (e.g., oil demand, cloud computing, healthcare trials) are granularly tied to company performance.",
      "Events such as investor conferences, earnings calls, and clinical trial updates are systematically highlighted as catalysts."
    ],
    "qwen2.5-32b_few-shot_bg_test-time-info_v1": [
      "Both datasets include frequent use of stock ticker symbols prefixed with a dollar sign (e.g., $NVDA, $BYND, $SPY).",
      "Headlines often reference quarterly earnings reports, beats/misses, and financial metrics (e.g., revenue, EPS guidance).",
      "Analyst actions (upgrades, downgrades, price target adjustments) are prominently featured in both datasets.",
      "Company-specific developments (e.g., partnerships, expansions, product launches) are common themes.",
      "Market indices (e.g., Dow, S&P 500) and ETF references (e.g., XLE, HYG) appear in both datasets.",
      "Sector-specific news (e.g., energy, tech, biotech) is highlighted across headlines.",
      "Regulatory impacts and macroeconomic factors (e.g., Fed policy, trade deals) are discussed in both.",
      "Forward-looking statements about revenue guidance or growth potential are recurrent.",
      "Clinical trial results and R&D updates (e.g., phase II/III successes) appear in biotech/pharma contexts.",
      "Supply chain disruptions (e.g., delays, recalls) and operational challenges are addressed in both datasets."
    ],
    "qwen2.5-7b_zero-shot_bg_v1": [
      "Headlines in both datasets frequently reference specific companies using stock ticker symbols (e.g., $TSLA, $AAPL).",
      "Both datasets emphasize earnings reports, financial metrics (e.g., EPS, revenue), and quarterly performance updates.",
      "Analyst actions such as upgrades, downgrades, and price target adjustments are prominently featured in both sets.",
      "Market indices (e.g., S&P 500, Dow Jones) and their movements are regularly cited to contextualize trends.",
      "Numerical data (e.g., percentages, monetary values) quantifies financial changes or forecasts in most headlines.",
      "Macroeconomic factors like interest rates, inflation, and geopolitical events influence headlines in both datasets.",
      "Corporate events (e.g., mergers, layoffs, partnerships) are covered as key drivers of market sentiment.",
      "Industry-specific trends (e.g., oil prices, semiconductor demand, retail sales) are highlighted across sectors.",
      "Headlines include forward-looking statements about company performance, economic conditions, or market projections.",
      "Both datasets balance positive and negative developments affecting stock valuations or investor outlooks."
    ],
    "qwen2.5-32b_zero-shot_bg_train-time-info_v1": [
      "Both datasets include headlines referencing specific stock ticker symbols (e.g., $AAPL, $GE, $SPY).",
      "Headlines in both datasets frequently mention earnings reports, including beats/misses relative to analyst expectations (e.g., 'Q4 earnings preview,' 'exceeds revenue projections').",
      "Analyst actions (upgrades, downgrades, price target changes) are prominently featured in samples from both datasets.",
      "Company-specific strategic developments (e.g., partnerships, expansions, product launches) are common themes (e.g., 'Partnership with Costco,' 'new flagship store in New York City').",
      "Market indices, ETFs, or sector-specific funds (e.g., $XLF, $UNG, $SPY) are mentioned in both datasets.",
      "Economic factors impacting performance (e.g., supply chain disruptions, interest rates, geopolitical tensions) are cited in headlines across both datasets.",
      "Financial metrics like EPS, revenue, and guidance updates are explicitly highlighted (e.g., 'raises 2020 EPS outlook,' 'narrows Q4 losses').",
      "Stock price movements (e.g., 'shares surge,' 'tumbles 20%') are consistently tied to news events in both datasets.",
      "Sector-specific news (e.g., energy, biotech, retail) is a recurring focus in headlines from both datasets.",
      "Forward-looking statements or guidance (e.g., 'reaffirms full-year outlook,' 'cautious guidance') are present in samples from both datasets."
    ],
    "llama3.1-8b_zero-shot_bg_v1": [
      "Both datasets include headlines mentioning stock tickers prefixed with '$' symbol (e.g., $AAPL, $TSLA, $BYND).",
      "Earnings reports and financial performance updates (e.g., revenue misses/beats, EPS guidance) are core subjects in both datasets.",
      "Analyst actions (price target changes, upgrades/downgrades) are prominently featured in headlines from both sets.",
      "Percentage-based stock price movements (e.g., '-5%', 'surges 7%') are consistently quantified in both datasets.",
      "References to macroeconomic factors (e.g., Fed policy, GDP growth, inflation) appear in both datasets as market drivers.",
      "Company-specific strategic developments (partnerships, product launches, M&A) are highlighted in both sets.",
      "Sector-specific trends (e.g., oil markets, tech, EVs) are contextualized within individual stock movements in both datasets.",
      "Regulatory/legal developments (lawsuits, license approvals, antitrust concerns) impact sentiment in both sets.",
      "Dividend announcements and capital allocation strategies (buybacks, investments) are present in both datasets.",
      "Identical financial terminology is used (e.g., 'misses consensus', 'beats estimates', 'downgraded to underweight')."
    ],
    "llama3.1-8b_zero-shot_v1": [
      "All headlines reference specific companies, stock tickers, financial indices, or economic indicators.",
      "Each headline reports on events or data with direct implications for financial markets or investor decisions.",
      "Both datasets include mentions of earnings reports, revenue figures, or profit/loss statements.",
      "All samples contain quantitative metrics (e.g., percentage changes, stock price targets, sales figures).",
      "Headlines frequently cite macroeconomic factors (e.g., interest rates, GDP, trade policies, oil prices).",
      "Both datasets highlight market reactions to news using terms like 'surge,' 'plunge,' 'rally,' or 'tumble.'",
      "Regulatory actions, government policies, or legal disputes are mentioned as market influencers.",
      "Dividend announcements, stock buybacks, or corporate financial strategies are recurring themes.",
      "Forward-looking statements (e.g., forecasts, analyst upgrades/downgrades, guidance revisions) are present.",
      "Sector-specific developments (tech, energy, retail, healthcare) are explicitly addressed in all samples."
    ],
    "llama3.3-70b_few-shot_v1": [
      "Both datasets include headlines referencing specific stock ticker symbols (e.g., $TSLA, $AAPL) and company names.",
      "Headlines in both datasets frequently mention quarterly earnings reports, revenue results, and earnings per share (EPS) metrics.",
      "Both emphasize stock price movements (e.g., \"surges,\" \"plummets\") with explicit percentage changes or numerical targets.",
      "References to macroeconomic indicators (e.g., GDP, unemployment rates, inflation) and central bank policies (e.g., Federal Reserve decisions) are common in both.",
      "Headlines frequently cite analyst actions such as upgrades, downgrades, and price target adjustments.",
      "Both datasets highlight sector-specific trends (e.g., tech, energy, retail) and their market impacts.",
      "Market sentiment terms like \"rally,\" \"plunge,\" \"misses expectations,\" and \"beats estimates\" are prevalent in both.",
      "Time-sensitive updates (e.g., premarket/after-hours trading, quarterly results) are a recurring feature across both datasets.",
      "Both include mentions of geopolitical or macroeconomic risks (e.g., trade tensions, recessions, COVID-19 impacts) affecting markets.",
      "Corporate events such as mergers, acquisitions, dividends, and strategic partnerships are frequently highlighted in both."
    ],
    "qwen2.5-32b_few-shot_bg_v1": [
      "Both datasets include headlines mentioning specific companies and their stock ticker symbols (e.g., $AAPL, $TSLA).",
      "Headlines frequently reference quarterly earnings reports, revenue figures, and analyst expectations (e.g., 'Q3 Earnings Beat Expectations').",
      "Analyst actions such as upgrades, downgrades, and price target adjustments are central to headlines in both datasets.",
      "Economic indicators (e.g., interest rates, inflation, GDP) and macroeconomic events (e.g., Fed decisions) are recurring themes.",
      "Supply chain disruptions and their impact on company performance are highlighted in both datasets (e.g., 'supply chain challenges').",
      "Market volatility and stock price movements (e.g., 'Shares Rise 5% Post-Market') are common focal points.",
      "Sector-specific trends (e.g., tech, energy, retail) and industry competition are frequently discussed.",
      "Regulatory actions, legal issues, and geopolitical risks (e.g., tariffs, lawsuits) appear in headlines from both datasets.",
      "Dividend announcements, financial metrics (e.g., EPS, revenue growth), and operational updates (e.g., capex cuts) are emphasized.",
      "A mix of formal and informal language is used, including abbreviations (e.g., 'YoY') and ticker-focused shorthand."
    ],
    "qwen2.5-32b_few-shot_v1": [
      "Both datasets include headlines referencing specific stock ticker symbols (e.g., $MSFT in A, $AAPL in B) alongside company names.",
      "Headlines in both datasets frequently mention earnings reports, including beats/misses (e.g., A: 'TJX raises 2020 EPS outlook,' B: 'Tech Giant Reports Lower-Than-Expected Earnings').",
      "Stock price movements are described with terms like 'surge,' 'plummet,' 'soar,' or 'dip' in response to news (e.g., A: 'Tesla\u2019s stock ticks up,' B: 'Shares Soar 5%').",
      "Both highlight macroeconomic indicators such as GDP, inflation, and employment rates (e.g., A: 'Building permits rise 5%,' B: 'inflation climbs to new heights').",
      "Central bank policies (e.g., Federal Reserve interest rate decisions) are a recurring theme (A: 'Fed Officials Weigh Risks,' B: 'Fed Announces Interest Rate Decision').",
      "Analyst actions (upgrades, downgrades, price targets) are explicitly cited (A: 'Berenberg reiterates Buy rating,' B: 'analysts predict a decline').",
      "Sector-specific trends (e.g., energy, tech, pharmaceuticals) are covered in granular detail (A: 'oil prices,' B: 'renewable energy stocks').",
      "Headlines use formal financial terminology like 'EPS,' 'revenue,' 'dividend,' and 'capex' (A: 'Q1 revs below consensus,' B: 'Q3 earnings exceed expectations').",
      "External factors impacting markets (e.g., geopolitical events, regulations, pandemics) are emphasized (A: 'COVID-19 impact,' B: 'new regulatory challenges').",
      "Structured formatting with colons, hyphens, or keywords like 'RECAP' is common (A: 'STOCKS SURGE INTO THE CLOSE: - Dow up 7.59%,' B: 'Federal Reserve to Announce...')."
    ],
    "qwen2.5-32b_zero-shot_bg_v1": [
      "Both datasets consistently include stock tickers prefixed with a dollar sign (e.g., $AAPL, $NVDA, $BYND).",
      "Headlines in both datasets frequently reference quarterly or annual earnings reports (e.g., 'Q4 2019 Earnings Preview' in A, 'Q2 Earnings Season' in B).",
      "Analyst ratings, price target adjustments, and institutional actions (e.g., 'Buy rating,' 'downgrades,' 'upgrades') are central to headlines in both datasets.",
      "Percentages and numerical metrics (e.g., 'sales increased 17.1%' in A, 'revenue up 32%' in B) quantify financial performance in both sets.",
      "Macroeconomic factors (e.g., tariffs, interest rates, GDP) are explicitly tied to company performance (e.g., 'oil rout' in A, 'China\u2019s economic slowdown' in B).",
      "Forward-looking statements (e.g., 'raises 2020 EPS outlook' in A, 'reaffirms production targets for 2023' in B) are prevalent in both datasets.",
      "Regulatory, legal, or geopolitical risks (e.g., 'FCC chairman voices support' in A, 'new legal challenges' in B) are highlighted in headlines.",
      "Sector-specific trends (e.g., semiconductor demand in B, blockchain job losses in A) are discussed granularly across both datasets.",
      "Explicit mentions of stock price movements (e.g., 'shares tumble 20%' in A, 'shares fall post-earnings' in B) are consistent in both sets.",
      "Headlines in both datasets reference institutional actors (e.g., 'Morgan Stanley,' 'Barclays') and their analyses of companies."
    ],
    "qwen2.5-7b_zero-shot_v1": [
      "Both datasets feature headlines focused on quarterly earnings reports, including beats, misses, and revisions to financial forecasts.",
      "Federal Reserve actions, such as interest rate decisions and economic guidance, are central themes in both datasets.",
      "Stock market index movements (e.g., Dow Jones, S&P 500) and sector-specific trends (e.g., tech, energy) are prominently highlighted.",
      "Company-specific events (e.g., partnerships, product launches, regulatory challenges) directly tied to stock price fluctuations are covered in both.",
      "Economic indicators like GDP growth, unemployment rates, and consumer spending data are cited as drivers of market sentiment.",
      "Analyst actions, including ratings changes, price target updates, and earnings previews, are recurring elements in headlines.",
      "External macroeconomic factors (e.g., geopolitical tensions, trade policies, pandemics) are discussed as market influencers.",
      "Commodity price dynamics, particularly oil and energy markets, are linked to sectoral and broader market performance.",
      "Terms describing stock volatility (e.g., 'surge,' 'plunge,' 'rally') are consistently used to frame market reactions.",
      "Forward-looking statements about corporate strategies, economic projections, and risk assessments are integral to both datasets."
    ],
    "llama3.1-8b_zero-shot_bg_train-time-info_v1": [
      "Both datasets consistently include stock ticker symbols prefixed with '$' within headlines.",
      "Headlines in both datasets frequently mention earnings reports, forecasts, or revisions (e.g., EPS, revenue misses/beats).",
      "Analyst actions (upgrades, downgrades, price target changes) are prominently featured in samples from both sets.",
      "Market index performance updates (e.g., Dow, Nasdaq, S&P) are explicitly quantified with percentage changes in both datasets.",
      "Sector-specific focus on energy, healthcare, retail, and technology is recurrent across all samples.",
      "Dividend declarations and yield metrics are regularly highlighted in headlines from both datasets.",
      "Forward-looking statements about macroeconomic trends (e.g., interest rates, GDP, trade deals) appear in both sets.",
      "Company-specific strategic moves (mergers, partnerships, cost-cutting, expansions) are central to headlines in both datasets.",
      "Regulatory/legal developments (e.g., FDA decisions, lawsuits, license approvals) are cited as market-moving factors in both.",
      "Premarket/post-market stock price reactions to news are explicitly referenced in samples from both A and B."
    ],
    "qwen2.5-7b_few-shot_v1": [
      "Both datasets include headlines referencing specific companies and their stock tickers (e.g., $BYND, $AAPL).",
      "Headlines in both datasets frequently mention quarterly earnings reports, beats/misses, and financial metrics (e.g., EPS, revenue).",
      "Market index performance updates (e.g., Dow, S&P 500) are common in both datasets.",
      "Both highlight stock price movements (e.g., surges, declines) and investor sentiment (e.g., bullish/bearish trends).",
      "Regulatory and legal developments (e.g., lawsuits, policy changes) impacting industries are covered in both datasets.",
      "Sector-specific news (e.g., tech, energy, automotive) is a recurring theme across headlines in both datasets.",
      "Macroeconomic indicators (e.g., interest rates, GDP, employment data) are frequently cited in both datasets.",
      "Forward-looking statements (e.g., forecasts, growth projections, outlook revisions) are prominent in both datasets.",
      "Both datasets discuss geopolitical and trade tensions (e.g., U.S.-China relations, tariffs) affecting markets.",
      "Mentions of mergers, acquisitions, partnerships, and strategic expansions appear in headlines from both datasets."
    ],
    "llama3.3-70b_few-shot_bg_v1": [
      "Both datasets include headlines containing stock tickers denoted with a '$' symbol.",
      "Headlines in both datasets frequently mention analyst actions such as upgrades, downgrades, or price target adjustments.",
      "Earnings reports (e.g., quarterly results, revenue, EPS) are a central focus across all samples in both datasets.",
      "Price target revisions by financial institutions (e.g., Morgan Stanley, Barclays) are explicitly cited in both datasets.",
      "References to financial institutions (e.g., Goldman Sachs, J.P. Morgan) issuing analysis or guidance appear consistently in both datasets.",
      "Headlines use concise, factual language focused on immediate market-moving events (e.g., earnings releases, guidance updates).",
      "Both datasets mention macroeconomic indicators or events (e.g., Federal Reserve reports, trade deals, GDP forecasts).",
      "Implicit or explicit references to stock price movements (e.g., \"surges,\" \"tumbles,\" \"trades flat\") are present in all samples.",
      "Sector-specific developments (e.g., tech, energy, automotive) are highlighted in headlines across both datasets.",
      "Forward-looking statements (e.g., earnings previews, growth potential, guidance updates) are consistently included."
    ],
    "llama3.1-8b_few-shot_v1": [
      "Both datasets include headlines referencing specific companies and stock ticker symbols (e.g., $BYND, $AAPL).",
      "Headlines in both datasets frequently mention earnings reports, revenue figures, or financial forecasts (e.g., 'Q4 earnings,' 'EPS outlook').",
      "Both datasets highlight market movements, such as stock price changes, indices performance, or sector trends (e.g., 'Dow up 7.59%,' 'NASDAQ sinks 3.5%').",
      "Economic indicators like GDP growth, inflation rates, and employment data are mentioned in both datasets (e.g., 'housing starts climb,' 'economic growth slows').",
      "Regulatory or policy impacts on businesses are covered in both (e.g., 'Fed interest rates,' '1967 law,' 'OPEC talks').",
      "Industry-specific developments (e.g., energy, tech, retail) are a focus in headlines from both datasets.",
      "Both include references to geopolitical events affecting markets, such as trade wars or sanctions (e.g., 'US-China trade war,' 'tariffs').",
      "Dividend announcements, stock buybacks, or capital allocation strategies appear in both datasets (e.g., 'dividend declared,' 'buyback').",
      "Forward-looking statements or analyst predictions are common (e.g., 'growth potential,' 'price target raised').",
      "Technical financial terminology (e.g., 'cup base,' 'yield curve inversion') is used in both datasets."
    ],
    "llama3.1-8b_few-shot_bg_train-time-info_v1": [
      "Both datasets include headlines with stock tickers denoted by symbols (e.g., $VIRT, $MATX).",
      "Earnings reports, forecasts, and revisions (e.g., Q4 previews, EPS guidance) are central to headlines in both datasets.",
      "Analyst actions (upgrades, downgrades, price target changes) are frequently cited in both datasets.",
      "Percentage-based stock price movements (e.g., -4%, +7.59%) are explicitly mentioned in headlines across both datasets.",
      "Mentions of revenue, sales, and profit metrics (e.g., \"beats estimates,\" \"revenue increase\") are common in both.",
      "Industry-specific challenges (e.g., oil price volatility, COVID-19 impacts) are highlighted in both datasets.",
      "Dividend declarations and financial outlooks (e.g., dividend hikes, guidance cuts) appear in headlines from both sets.",
      "Regulatory, legal, or macroeconomic factors (e.g., tariffs, interest rates) influencing markets are discussed in both.",
      "Company-specific strategic moves (e.g., mergers, partnerships, product launches) are focal points in headlines.",
      "References to broader market indices (e.g., S&P 500, Dow Jones) and sector trends (tech, retail, energy) are present in both."
    ],
    "qwen2.5-7b_few-shot_bg_v1": [
      "Both datasets include frequent mentions of stock ticker symbols (e.g., $TSLA, $NVDA, $BYND, $PSMT) alongside company names.",
      "Headlines focus on financial performance metrics such as earnings reports, revenue, EPS guidance, and sales figures (e.g., \"Q4 earnings miss,\" \"raises 2020 EPS outlook\").",
      "Analyst actions (upgrades, downgrades, price target adjustments) are prominently featured (e.g., \"Berenberg reiterates 'Buy' rating,\" \"Morgan Stanley cuts price target\").",
      "Market-moving events like mergers, acquisitions, and partnerships are covered (e.g., \"Johnson & Johnson to acquire,\" \"Exxon and Papua New Guinea negotiate\").",
      "External factors impacting markets (e.g., tariffs, interest rates, geopolitical tensions) are highlighted in both datasets.",
      "Sector-specific news spans industries like tech, energy, healthcare, and retail (e.g., \"NVIDIA AI demand,\" \"oil price dynamics\").",
      "Regulatory and legal developments affecting companies are addressed (e.g., \"FCC supports airwaves sale,\" \"Microsoft antitrust scrutiny\").",
      "Quantitative data (percentage changes, monetary figures) is consistently paired with qualitative analysis (e.g., \"Dow up 7.59%,\" \"demand challenges persist\").",
      "Dividend declarations, shareholder returns, and capital allocation strategies are mentioned (e.g., \"WPT Industrial REIT declares dividend,\" \"reaffirms outlook\").",
      "Global economic influences (e.g., China trade data, U.K. Brexit impacts) and cross-border business risks are discussed in both datasets."
    ],
    "qwen2.5-7b_few-shot_bg_train-time-info_v1": [
      "All headlines reference specific companies by name or stock ticker symbol (e.g., $BYND, $DIS).",
      "Each headline includes financial metrics or events (e.g., earnings results, revenue figures, stock price movements).",
      "Stock ticker symbols are consistently prefixed with \"$\" in both datasets.",
      "Analyst actions (e.g., upgrades, downgrades, price target adjustments) are explicitly mentioned across all samples.",
      "Numerical data (e.g., percentages, price targets, sales figures) is present in every headline.",
      "Headlines focus on market-moving events (e.g., earnings reports, mergers, regulatory decisions).",
      "Specific financial institutions or firms (e.g., Oppenheimer, Citi, Morgan Stanley) are cited for analysis or ratings.",
      "Industry-specific terminology (e.g., EPS, guidance, dividends, revenue) is used in all samples.",
      "Mentions of quarterly or annual financial results (e.g., Q4 earnings, full-year outlook) are universal.",
      "All headlines imply or state direct impacts on stock performance (e.g., \"plummets,\" \"surges,\" \"downgrade\")."
    ],
    "qwen2.5-7b_few-shot_bg_test-time-info_v1": [
      "Both datasets consistently include stock tickers prefixed with a dollar sign (e.g., $AAPL, $SPY) to identify companies.",
      "Earnings reports (e.g., beats/misses, revenue figures, EPS guidance) are a central focus in headlines from both datasets.",
      "Analyst actions such as upgrades, downgrades, and rating changes (e.g., Morgan Stanley, Credit Suisse) are prominently featured.",
      "Price target adjustments by financial institutions are explicitly mentioned in headlines from both sets.",
      "Revenue growth/decline metrics and sales performance are quantified and compared to consensus estimates in both datasets.",
      "Strategic corporate moves (e.g., partnerships, expansions, acquisitions) are highlighted as market-moving events.",
      "Market indices (e.g., S&P 500, Dow Jones) and ETF performance are referenced in relation to broader economic trends.",
      "Supply chain disruptions and their financial impacts (e.g., costs, production forecasts) are cited as key drivers of volatility.",
      "Forward-looking guidance revisions (e.g., raising/lowering EPS, revenue projections) are critical narrative components.",
      "Immediate stock price reactions (e.g., surges, declines, premarket moves) are explicitly tied to news events in both datasets."
    ],
    "llama3.3-70b_few-shot_bg_test-time-info_v1": [
      "Both datasets include headlines referencing stock tickers (e.g., $BYND, $MLM) to identify companies.",
      "Earnings reports, revenue figures, and financial metrics (e.g., EPS, sales growth) are central themes in both datasets.",
      "Analyst actions such as upgrades, downgrades, and price target adjustments (e.g., 'Morgan Stanley downgrades') are explicitly mentioned in all samples.",
      "Company-specific strategic developments (e.g., partnerships, product launches, clinical trials) are highlighted in headlines across both datasets.",
      "Market indices (e.g., Dow, S&P 500) and ETFs (e.g., $XLE, $SPY) are frequently cited to contextualize broader market movements.",
      "Forward-looking statements about earnings release dates, guidance updates, or clinical trial timelines are consistently included.",
      "Sector-specific focus (e.g., energy, biotech, retail) is granularly addressed in headlines across all samples.",
      "Technical financial terms (e.g., 'overweight,' 'valuation concerns,' 'premarket trading') are uniformly used in both datasets.",
      "Mentions of macroeconomic factors (e.g., inflation, interest rates, commodity prices) directly tie corporate performance to external conditions in all samples.",
      "Explicit references to institutional actors (e.g., Barclays, Morgan Stanley, Goldman Sachs) as sources of analysis or ratings are present in every headline."
    ],
    "llama3.1-8b_zero-shot_bg_test-time-info_v1": [
      "Both datasets include headlines that mention specific stock ticker symbols using the '$' notation (e.g., $BYND in A and $VZ in B).",
      "Headlines in both datasets frequently reference quarterly earnings reports (e.g., Q4 2019 in A and Q1 2023 in B).",
      "Analyst actions such as upgrades, downgrades, or price target adjustments are highlighted in both datasets (e.g., Berenberg's 'Buy' rating in A and Morgan Stanley's upgrade in B).",
      "Numerical data (e.g., percentage changes, revenue figures) is consistently used to quantify financial performance (e.g., 'EPS outlook raised to $2.61' in A and 'revenue beats estimates' in B).",
      "Market movements (e.g., stock surges, declines) are described with explicit percentage values in both datasets (e.g., 'Dow up 7.59%' in A and 'SRNE Soars' in B).",
      "Macroeconomic factors (e.g., GDP, inflation, oil prices) are cited as drivers of market behavior in both datasets (e.g., 'oil rout' in A and 'GDP down 2.5%' in B).",
      "Corporate developments like partnerships, mergers, or expansions are emphasized (e.g., Beyond Meat partnering with Costco in A and Uber\u2019s strategic alliance in B).",
      "Regulatory or legal impacts on companies are mentioned (e.g., 'FCC chairman voices support' in A and 'Freddie Mac ends mortgage deal' in B).",
      "Sector-specific updates (e.g., energy, tech, retail) are a focus (e.g., 'oil blocks offshore Colombia' in A and 'software division growth' in B).",
      "Forward-looking statements (e.g., guidance, projections) are included (e.g., '2020 EPS outlook' in A and '2024 sales guidance' in B)."
    ],
    "llama3.3-70b_zero-shot_bg_test-time-info_v1": [
      "Headlines in both datasets frequently mention stock ticker symbols using the '$' notation alongside company names.",
      "Both datasets include updates on quarterly earnings reports, including revenue figures and analyst expectations.",
      "Analyst ratings and price target adjustments (e.g., upgrades, downgrades) are prominently featured in headlines from both datasets.",
      "References to specific financial metrics like EPS (Earnings Per Share), revenue growth, and dividends are common in both datasets.",
      "Headlines often cite institutional actors such as investment banks (e.g., Morgan Stanley, Barclays) and research firms (e.g., Goldman Sachs, Oppenheimer).",
      "Market reactions to news (e.g., stock price movements, premarket/postmarket trading) are explicitly mentioned in headlines from both datasets.",
      "Both datasets highlight sector-specific developments, including energy (oil/gas), technology, healthcare, and retail industries.",
      "Mentions of corporate events like investor conferences, earnings calls, and shareholder meetings appear in both datasets.",
      "Macroeconomic factors (e.g., trade deals, tariffs, COVID-19 impacts) are contextualized in relation to individual companies or sectors in both datasets.",
      "Headlines frequently use standardized financial terminology such as 'price target,' 'downgraded/upgraded,' 'misses/beats estimates,' and 'dividend yield.'"
    ],
    "llama3.3-70b_zero-shot_v1": [
      "Headlines focus on financial market events, company performance, or economic policies affecting stock valuations.",
      "Mentions specific companies, stock tickers, indices (e.g., Dow, S&P), or regulatory bodies (e.g., Federal Reserve).",
      "Uses directional terms to describe market/price movements (e.g., 'surge,' 'plummet,' 'rally,' 'decline').",
      "References quantitative financial metrics (e.g., earnings, revenue, dividends, interest rates, GDP).",
      "Highlights cause-effect relationships between events (e.g., earnings reports, policy changes) and market reactions.",
      "Includes temporal markers tied to financial cycles (e.g., quarterly results, fiscal years, scheduled Fed meetings).",
      "Discusses sector-specific trends (e.g., tech, energy, retail) or macroeconomic factors (e.g., inflation, trade wars).",
      "Incorporates forward-looking statements (e.g., forecasts, analyst ratings, guidance revisions).",
      "Emphasizes investor sentiment (e.g., 'fears,' 'confidence,' 'optimism') driving market behavior.",
      "Covers both positive and negative market developments (e.g., earnings beats vs. misses, rate cuts vs. hikes)."
    ],
    "llama3.1-8b_few-shot_bg_v1": [
      "Both datasets include headlines with specific stock ticker symbols (e.g., $AAPL, $NVDA, $TSLA) to reference companies.",
      "Headlines frequently mention quarterly earnings reports, including results, forecasts, and revisions (e.g., 'Q4 earnings miss' or 'beats estimates').",
      "Analyst actions (e.g., upgrades, downgrades, price target adjustments) are central to headlines in both datasets (e.g., 'Morgan Stanley cuts price target').",
      "Numerical data (e.g., percentages, revenue figures, stock price changes) is consistently used to quantify financial performance or market movements.",
      "Economic indicators (e.g., GDP, CPI, unemployment claims) are cited as drivers of market sentiment in both datasets.",
      "External factors impacting markets (e.g., trade tensions, regulatory scrutiny, COVID-19) are highlighted across all samples.",
      "Sector-specific trends (e.g., tech, energy, automotive) are discussed, with granular focus on industry developments.",
      "Forward-looking statements (e.g., growth projections, demand forecasts) are prevalent in headlines from both datasets.",
      "Legal, regulatory, or governmental actions (e.g., FTC investigations, lawsuits) are mentioned as risks or catalysts.",
      "Market indices (e.g., S&P 500, Dow Jones) and their movements are tied to macroeconomic or corporate news in both datasets."
    ],
    "llama3.1-8b_few-shot_bg_test-time-info_v1": [
      "Headlines include stock ticker symbols prefixed with a dollar sign (e.g., $BYND, $NVDA).",
      "Mentions of quarterly earnings reports, earnings misses/beats, or financial guidance updates.",
      "References to analyst actions such as upgrades, downgrades, or price target adjustments.",
      "Discussions of sector-specific trends (e.g., energy, utilities, tech, retail).",
      "Highlights of mergers, acquisitions, or strategic partnerships impacting companies.",
      "Reports on stock price movements (e.g., premarket gains/losses, intraday volatility).",
      "Coverage of macroeconomic factors (e.g., trade tensions, oil prices, pandemic effects).",
      "Dividend announcements, changes, or yield analyses for companies or funds.",
      "Legal, regulatory, or competitive challenges affecting corporate performance.",
      "Use of technical analysis terms (e.g., price targets, support levels, chart patterns)."
    ]
  },
  "diffs_synth_from_real": {
    "qwen2.5-7b_zero-shot_bg_train-time-info_v1": [
      "Dataset B headlines consistently begin with the ticker symbol followed by a dash (e.g., \"$MRLN - ...\"), while Dataset A uses tickers more variably (embedded mid-sentence or omitted).",
      "Dataset B explicitly names analyst firms (e.g., Morgan Stanley, Jefferies, Oppenheimer) in every headline, whereas Dataset A often omits specific firm references for analyst actions.",
      "Dataset B headlines focus narrowly on analyst actions (upgrades/downgrades/price targets) as standalone events, while Dataset A integrates analyst sentiment within broader narratives (e.g., earnings context, strategic moves).",
      "Dataset B avoids informal language, social media snippets, or non-news elements (e.g., tweets, hashtags), which appear in Dataset A (e.g., \"@tictoc oh fuck yeah Bloomberg...\").",
      "Dataset B headlines lack references to geopolitical/economic events (e.g., trade wars, oil demand shocks) as primary drivers, which are recurrent contextual elements in Dataset A.",
      "Dataset B omits granular financial metrics (e.g., \"net merchandise sales increased 17.1% to $306.1M\") present in Dataset A, focusing instead on qualitative analyst opinions.",
      "Dataset B excludes company-specific operational details (e.g., partnerships, capex cuts, dividend declarations) unless directly tied to analyst rating changes, unlike Dataset A.",
      "Dataset B headlines rarely mention commodities/energy market dynamics (e.g., OPEC+ decisions) unless tied to sector-specific analyst actions, whereas Dataset A emphasizes these as standalone factors.",
      "Dataset B uses standardized phrases like \"maintains neutral outlook\" or \"cuts price target\" uniformly, while Dataset A employs diverse terminology (e.g., \"forms a cup base,\" \"blue skyscraper accumulation\").",
      "Dataset B avoids forward-looking speculative questions (e.g., \"Can $MSFT head higher?\") common in Dataset A, prioritizing declarative statements about analyst decisions."
    ],
    "qwen2.5-32b_zero-shot_bg_test-time-info_v1": [
      "Dataset B headlines consistently include the company name alongside the stock ticker symbol, while Dataset A often uses tickers alone or mentions companies without tickers.",
      "Dataset B headlines more frequently use full sentences with formal capitalization and punctuation, whereas Dataset A includes fragmented phrases, hashtags, or informal language (e.g., \"oh fuck yeah Bloomberg\").",
      "Dataset B emphasizes explicit mentions of fiscal quarters/years in earnings reports (e.g., \"Q2 2023\"), while Dataset A often omits temporal specificity (e.g., \"raises dividend\").",
      "Dataset B headlines systematically reference analyst firms (e.g., Barclays, Morgan Stanley) in all analyst action mentions, whereas Dataset A occasionally omits institutional names (e.g., \"Downgrades 4/7: $AAN...\").",
      "Dataset B uses standardized phrases like \"Exceeds Analyst Expectations\" uniformly across earnings beats, while Dataset A employs varied terminology (e.g., \"EPS misses,\" \"tops guidance\").",
      "Dataset B headlines avoid non-English characters or translations (e.g., Chinese text in A's \"Wipro\u8d62\u5f97Marelli...\"), maintaining English-only content.",
      "Dataset B consistently pairs corporate events with explicit growth intent (e.g., \"aiming for long-term growth\"), whereas A describes partnerships neutrally (e.g., \"Partnering with Costco\").",
      "Dataset B headlines prioritize forward-looking guidance updates (e.g., \"reaffirms guidance\") over real-time market reactions, which dominate in A (e.g., \"STOCKS SURGE INTO THE CLOSE\").",
      "Dataset B avoids user-generated content markers (e.g., Twitter handles like @tictoc in A) and maintains institutional tone.",
      "Dataset B includes granular price target adjustments (e.g., \"cuts price target to $20\") in all analyst actions, while A sometimes generalizes (e.g., \"downgraded to underperform\")."
    ],
    "qwen2.5-7b_zero-shot_bg_test-time-info_v1": [
      "Dataset A headlines include specific numerical financial forecasts (e.g., EPS, sales figures) with exact figures, while Dataset B focuses on directional analyst actions without precise numerical details.",
      "Dataset A references technical analysis terms (e.g., \"cup base,\" \"blue skyscraper\") in headlines, which are absent in Dataset B.",
      "Dataset A explicitly mentions dividend declarations and yields (e.g., \"$0.0633 dividend\"), while Dataset B emphasizes earnings outcomes and revenue guidance.",
      "Dataset A incorporates broader economic indicators (e.g., GDP, housing starts, job reports) in headlines, whereas Dataset B does not reference macroeconomic data.",
      "Dataset A includes informal language, slang, or social media elements (e.g., hashtags, @mentions), while Dataset B maintains a formal tone throughout.",
      "Dataset A discusses environmental or external factors (e.g., COVID-19, solar power adoption) impacting companies, while Dataset B headlines lack such context.",
      "Dataset A uses market-wide index performance (e.g., Dow, S&P 500) as benchmarks, while Dataset B references ETFs (e.g., $SPY) primarily in analyst rating contexts.",
      "Dataset A headlines provide forward-looking statements with numerical ranges (e.g., \"raises 2020 EPS outlook to $2.61-$2.63\"), while Dataset B\u2019s outlooks are qualitative (e.g., \"expects surge in EV sales\").",
      "Dataset A integrates legal/regulatory actions (e.g., lawsuits, FCC rulings) as standalone news, while Dataset B ties regulatory mentions to financial performance.",
      "Dataset A blends geopolitical events (e.g., U.S.-China trade deals, Brexit) with financial news, whereas Dataset B focuses strictly on company-specific analyst actions and earnings."
    ],
    "llama3.3-70b_zero-shot_bg_v1": [
      "Dataset B headlines predominantly focus on analyst actions (downgrades, price target changes) and earnings reports, while Dataset A includes a broader range of financial events (e.g., mergers, economic indicators, legal disputes).",
      "Dataset B headlines are formulaic and repetitive in structure (e.g., '[Institution] [action] [stock] - [reason]'), whereas Dataset A exhibits varied sentence structures and formats.",
      "Dataset B emphasizes tech companies (e.g., NVIDIA, Tesla, Alphabet) and financial institutions (e.g., Morgan Stanley, Barclays), while Dataset A covers diverse sectors (energy, retail, real estate, automotive).",
      "Dataset B headlines lack informal language or social media-style elements (e.g., hashtags, casual tone), unlike Dataset A, which includes tweets and colloquial phrasing.",
      "Dataset A includes international economic developments (e.g., India\u2019s gold imports, Canada\u2019s rail strike), while Dataset B is more U.S.-centric and narrowly focused on analyst-driven updates.",
      "Dataset B headlines frequently mention specific financial institutions (e.g., Morgan Stanley, Barclays) as primary actors, whereas Dataset A references a wider array of entities (companies, indices, governments).",
      "Dataset A incorporates forward-looking questions or speculative analysis (e.g., 'Can $MSFT head higher?'), while Dataset B focuses on reporting past or scheduled actions (e.g., earnings dates, downgrades).",
      "Dataset B uses consistent terminology around ratings (e.g., 'underweight,' 'equal weight') and price targets, whereas Dataset A employs diverse financial terms (e.g., dividends, permits, tariffs, valuations).",
      "Dataset A includes granular market data (e.g., percentage changes, EPS figures) and multi-point updates (e.g., index performances), while Dataset B emphasizes qualitative analyst assessments.",
      "Dataset B headlines are shorter and more uniform in length, while Dataset A varies in length and includes extended details (e.g., sales figures, contractual terms)."
    ],
    "qwen2.5-32b_zero-shot_v1": [
      "Dataset A headlines frequently include specific stock tickers (e.g., $BYND, $MSFT), while B uses generic labels (e.g., \"Tech Giant\") more consistently.",
      "Dataset A contains informal language, social media handles (e.g., @tictoc), and colloquial phrases (e.g., \"oh fuck yeah\"), whereas B maintains formal, standardized language throughout.",
      "Dataset A references granular technical analysis terms (e.g., \"cup base,\" \"blue skyscraper chart\"), while B avoids market-specific jargon beyond basic financial metrics.",
      "Dataset A includes premarket/after-hours price movements (e.g., \"premarket,\" \"post-market\"), whereas B focuses on broader intraday or post-announcement reactions.",
      "Dataset A covers a wider diversity of sectors (e.g., oil, REITs, shipping) with industry-specific details, while B emphasizes tech and renewable energy disproportionately.",
      "Dataset A frequently cites dividend yields, capital expenditures (capex), and granular financial guidance updates, whereas B focuses on earnings outcomes without detailed metric breakdowns.",
      "Dataset A includes legal/regulatory specifics (e.g., lawsuits, contract awards), while B mentions regulatory themes (e.g., \"antitrust investigations\") more generically.",
      "Dataset A features mixed formatting (e.g., bullet points, emojis, hashtags) and multi-line structures, whereas B uses uniform, single-sentence headlines.",
      "Dataset A references macroeconomic indicators in real-time contexts (e.g., \"housing starts climb 3.8%\"), while B discusses macroeconomic themes more abstractly (e.g., \"slowing GDP growth\").",
      "Dataset A includes company-specific operational updates (e.g., facility licenses, partnership terms), while B emphasizes earnings results and sector-wide policy impacts."
    ],
    "qwen2.5-32b_few-shot_bg_train-time-info_v1": [
      "Dataset B headlines consistently include specific dates and times for earnings calls or events (e.g., 'July 15, 2023', '8:30 AM ET'), while Dataset A lacks such explicit scheduling details.",
      "Dataset B emphasizes full company names alongside tickers (e.g., 'Chesapeake Energy Reports...'), whereas Dataset A often omits company names, relying solely on tickers (e.g., '$BYND').",
      "Dataset B headlines focus more on partnerships and strategic initiatives (e.g., 'Fludometrics Announces New Partnership'), while Dataset A highlights immediate market reactions (e.g., '-4% after earnings miss').",
      "Dataset B includes granular analyst rationale (e.g., 'citing softening demand for consumer electronics'), whereas Dataset A typically states analyst actions without detailed explanations (e.g., 'reiterates Buy rating').",
      "Dataset B headlines are uniformly formal and structured, avoiding slang or social media elements, while Dataset A includes casual language (e.g., 'oh fuck yeah Bloomberg') and hashtags (e.g., '#economy').",
      "Dataset B frequently references future plans with explicit timelines (e.g., 'slated to open next year'), whereas Dataset A focuses on past or immediate results (e.g., 'Q1 2020 net merchandise sales increased').",
      "Dataset B specifies fiscal years (e.g., '2023') in most date references, while Dataset A often omits years or uses vague timeframes (e.g., 'next year').",
      "Dataset B highlights regulatory hurdles as a recurring theme (e.g., 'Faces Regulatory Hurdles'), whereas Dataset A mentions lawsuits or tariffs as isolated events (e.g., 'New York hits Juul with a lawsuit').",
      "Dataset B uses standardized phrases like 'Exceeds Analysts' Expectations' consistently, while Dataset A employs varied terminology (e.g., 'misses by $0.04', 'beats estimates').",
      "Dataset B focuses on tech, energy, and pharmaceuticals, while Dataset A covers broader sectors, including retail, real estate, and blockchain."
    ],
    "llama3.3-70b_few-shot_bg_train-time-info_v1": [
      "Dataset B headlines consistently start with the stock ticker followed by the company name (e.g., '$SAEX - saexploration'), while Dataset A headlines often mention tickers mid-sentence or without explicit company names.",
      "Dataset B uses lowercase formatting for institutions/analyst firms (e.g., 'morgan stanley') even at sentence beginnings, whereas Dataset A uses standard capitalization (e.g., 'Morgan Stanley').",
      "Dataset B explicitly pairs analyst actions (e.g., 'downgraded', 'upgraded') with the institution name in every headline, while Dataset A sometimes omits institutional attribution for market movements.",
      "Dataset B headlines emphasize full phrases like 'price target raised/cut to $X at [institution]' as a recurring template, whereas Dataset A uses more varied numerical representations (e.g., 'raises 2020 EPS outlook to $2.61-$2.63').",
      "Dataset B includes granular references to investor events (e.g., 'upcoming goldman sachs technology conference') in headlines, while Dataset A focuses on broader market indices like the Dow/S&P.",
      "Dataset B systematically cites institutional rationales using 'citing' or 'due to' clauses (e.g., 'citing strong q2 earnings'), whereas Dataset A often states outcomes without explicit attribution.",
      "Dataset B uses standardized underperform/outperform/neutral rating terminology in every analyst action, while Dataset A occasionally uses less formal phrasing (e.g., 'Buy' rating).",
      "Dataset B headlines frequently reference scheduled earnings dates (e.g., 'to report on February 15th'), while Dataset A focuses on finalized results without date specificity.",
      "Dataset B emphasizes institutional verbs like 'initiates coverage', 'maintains', or 'reiterates' in every analyst-related headline, whereas Dataset A includes more diverse verbs like 'announces' or 'declares'.",
      "Dataset B prioritizes institutional analyst sentiment as the primary driver in headlines, while Dataset A interweaves analyst actions with regulatory/geopolitical drivers (e.g., 'FCC rulings')."
    ],
    "llama3.3-70b_zero-shot_bg_train-time-info_v1": [
      "Dataset B headlines consistently follow a structured format: '[Institution] [action] [stock] [reason]', while A uses varied sentence structures.",
      "All Dataset B samples explicitly mention the financial institution (e.g., Barclays, Morgan Stanley) driving the rating/price target change, whereas A sometimes omits institutional sources.",
      "Dataset B focuses narrowly on analyst actions (upgrades/downgrades/price targets) as primary news drivers, while A includes non-analyst catalysts like clinical trials, partnerships, and macroeconomic reports.",
      "Headlines in B systematically include phrases like 'according to [institution]' or 'cites [reason]' to attribute analysis, which are less frequent/consistent in A.",
      "Dataset B shows heavy concentration on Oppenheimer/Barclays/Morgan Stanley as recurring institutional sources, while A references a broader range of banks and non-bank entities.",
      "Price target adjustments in B are always explicitly tied to specific quarterly earnings results, whereas A includes target changes based on technical patterns or undefined catalysts.",
      "All Dataset B headlines use lowercase formatting for full sentences, while A maintains standard headline casing with proper nouns capitalized.",
      "B emphasizes scheduled corporate events (earnings releases/conferences) as predefined analysis timestamps, while A more frequently reports breaking/unscheduled news.",
      "Dataset B shows repeated emphasis on 'mixed quarterly results' as a recurring downgrade/upgrade trigger, absent as a defined category in A's samples.",
      "Institutional actions in B are consistently framed as reactions to company-specific fundamentals, while A includes market-wide technical analysis (e.g., 'cup base', 'blue skyscraper') as standalone drivers."
    ],
    "qwen2.5-32b_few-shot_bg_test-time-info_v1": [
      "Dataset B headlines consistently include explicit mentions of analyst/investment firms (e.g., 'Goldman Sachs', 'Oppenheimer') initiating actions",
      "Dataset B contains non-English terms only in translated contexts (e.g., Chinese characters used for price target explanations) without untranslated foreign phrases",
      "Dataset A includes informal language/social media elements (e.g., '@tictoc oh fuck yeah', emojis) absent in B's professional tone",
      "Dataset B emphasizes specific percentage beats/misses of expectations (e.g., 'beats by 20%') more systematically than A",
      "Dataset A contains macroeconomic performance metrics (e.g., 'Building permits rise 5%', currency forecasts) not tied to specific companies",
      "Dataset B shows consistent pattern of including both company actions and immediate analyst reactions in single headlines",
      "Dataset A references geopolitical events/non-corporate entities (e.g., Fed chair statements, country-level trade impacts) more frequently",
      "Dataset B maintains strict headline capitalization conventions (proper nouns only) vs. A's mixed formatting (all-caps phrases)",
      "Dataset A includes forward-looking dividend/stock yield analysis (e.g., '2.6% Dividend Yield Looks Interesting') absent in B",
      "Dataset B focuses more narrowly on immediate earnings results and price target adjustments rather than A's broader operational updates (e.g., expansions, partnerships)"
    ],
    "qwen2.5-7b_zero-shot_bg_v1": [
      "Dataset B headlines focus more frequently on a narrower set of high-profile tech companies (e.g., $TSLA, $AAPL, $AMZN) compared to Dataset A, which covers a broader range of sectors like energy, retail, and real estate.",
      "Dataset B headlines consistently emphasize analyst rating changes (e.g., downgrades/upgrades) and price target adjustments as primary drivers, whereas Dataset A includes more diverse corporate actions (e.g., mergers, layoffs, dividend declarations).",
      "Dataset B uses standardized phrases like 'beats/misses expectations' and 'price target raised/cut' repetitively, while Dataset A employs varied narrative styles, including informal language, hashtags, and conversational tones.",
      "Dataset B headlines frequently reference the same financial institutions (e.g., Morgan Stanley, Barclays) for analyst actions, whereas Dataset A cites a wider array of sources (e.g., regional banks, niche analysts).",
      "Dataset A includes explicit numerical metrics beyond earnings (e.g., housing starts, dividend yields, permit rates), while Dataset B focuses predominantly on earnings results and stock price movements.",
      "Dataset B headlines are more U.S.-centric, with minimal references to international markets outside China, whereas Dataset A regularly covers global regions (e.g., India, Canada, Europe, Papua New Guinea).",
      "Dataset B features repetitive mentions of the same companies and analysts (e.g., Tesla and Morgan Stanley appear in 20+ samples), while Dataset A distributes coverage more evenly across entities.",
      "Dataset A integrates macroeconomic indicators (e.g., inflation, tariffs, GDP) directly into headlines, whereas Dataset B treats these factors as secondary context to company-specific news.",
      "Dataset B includes multilingual headlines (e.g., Chinese text), absent in Dataset A, which uses English exclusively despite global market references.",
      "Dataset B headlines prioritize forward-looking analyst projections (e.g., '2024 Bets,' 'long-term growth outlook'), while Dataset A balances forecasts with real-time event reporting (e.g., recalls, lawsuits, dividend declarations)."
    ],
    "qwen2.5-32b_zero-shot_bg_train-time-info_v1": [
      "Dataset B headlines maintain a consistently formal tone, while Dataset A includes informal language, slang, and occasional profanity (e.g., 'oh fuck yeah Bloomberg').",
      "Dataset A headlines frequently use emojis, hashtags, or non-standard formatting (e.g., line breaks, colons), whereas Dataset B avoids these elements entirely.",
      "Dataset A references non-corporate entities (e.g., political figures, government actions) like 'Irish PM Varadkar' or 'NYC pension leader,' while Dataset B focuses strictly on corporate actors and analyst firms.",
      "Dataset A includes non-financial news such as product recalls (e.g., 'Ford recalls 262,000 pickup trucks') or lawsuits, whereas Dataset B headlines are exclusively tied to financial metrics, earnings, or corporate strategy.",
      "Dataset A covers smaller or niche companies (e.g., $SRNE, $PSMT), while Dataset B emphasizes large-cap firms like Disney, Alibaba, or Walmart.",
      "Dataset A incorporates technical analysis terms (e.g., 'cup base,' 'blue skyscraper'), while Dataset B avoids such jargon and focuses on fundamental financial data.",
      "Dataset A uses interrogative headlines (e.g., 'Can $MSFT head higher?'), whereas Dataset B employs declarative statements exclusively.",
      "Dataset A highlights dividend announcements (e.g., 'WPT Industrial REIT declares $0.0633 dividend'), while Dataset B lacks any mention of dividends.",
      "Dataset B emphasizes upcoming corporate events (e.g., 'CES 2024 Conference,' 'JPMorgan Healthcare Conference'), while Dataset A focuses on past/present results or broader economic trends.",
      "Dataset A includes international macroeconomic factors (e.g., 'India\u2019s gold imports,' 'Saudi Arabia tanker power play'), while Dataset B\u2019s economic references are company-specific or sector-driven."
    ],
    "llama3.1-8b_zero-shot_bg_v1": [
      "Dataset B headlines focus predominantly on large-cap tech stocks (e.g., $AAPL, $TSLA, $META) while Dataset A includes diverse sectors like energy, retail, real estate, and pharmaceuticals",
      "Dataset B emphasizes recurring analyst actions from specific firms (Barclays/Morgan Stanley) while Dataset A references a broader range of institutions like Berenberg/SunTrust",
      "Dataset B headlines use standardized formal language without social media tags (#) or conversational tone, unlike Dataset A's occasional informal tweets/links",
      "Dataset B features repetitive mentions of price target adjustments as primary catalysts, while Dataset A includes operational metrics like net sales growth or dividend declarations",
      "Dataset B concentrates on product-specific updates (iPhone sales, EV demand) while Dataset A discusses macro-sector trends (oil markets, housing starts)",
      "Dataset B shows heavier focus on competition dynamics within tech (cloud/AI/EVs) compared to Dataset A's regulatory/geopolitical business impacts",
      "Dataset B headlines frequently cite earnings estimate misses/beats as standalone events, while Dataset A contextualizes results with external factors like COVID-19",
      "Dataset B maintains consistent headline structure (Company + Analyst Action + Stock Reaction) unlike Dataset A's variable formats including multi-line updates",
      "Dataset B emphasizes institutional analyst ratings changes, whereas Dataset A includes retail investor-focused technical analysis terms (cup base, accumulation)",
      "Dataset B shows US-centric tech focus with limited geographic diversity, unlike Dataset A's global references (China imports, EU regulations, Canadian economy)"
    ],
    "llama3.1-8b_zero-shot_v1": [
      "Dataset B headlines use generic terms like 'Tech giants' or 'Stock Market' without specifying individual companies or tickers, unlike Dataset A which consistently names specific entities.",
      "Headlines in Dataset B focus more on broad market indices (e.g., 'Dow Jones', 'S&P 500') rather than granular metrics like stock price targets or sales figures common in Dataset A.",
      "Dataset B emphasizes macroeconomic outcomes (e.g., 'GDP Growth', 'Global Recession') without linking them to sector-specific developments, unlike Dataset A's explicit sector mentions.",
      "Forward-looking statements in Dataset B are vaguer (e.g., 'analysts predict') compared to Dataset A's precise guidance revisions or EPS outlooks with numerical targets.",
      "Dataset B headlines lack references to regulatory actions, legal disputes, or corporate strategies (e.g., buybacks, dividends) that are pervasive in Dataset A.",
      "Market reactions in Dataset B are described with generalized terms like 'plunge' or 'surge,' whereas Dataset A quantifies reactions (e.g., 'premarket down 6%').",
      "Dataset B omits granular financial instruments (e.g., dividends, buybacks) and technical analysis terms (e.g., 'cup base') present in Dataset A.",
      "Headlines in Dataset B rarely cite specific quantitative metrics (e.g., 'Q1 revenue increased 17.1%') compared to Dataset A's frequent numerical precision.",
      "Dataset B avoids mentions of geopolitical or localized events (e.g., 'China delays trade data') in favor of broader themes like 'Trade War' or 'Global Economic Crisis.'",
      "Dataset B lacks explicit references to earnings report dates, conference calls, or transcripts that are routine in Dataset A's headlines."
    ],
    "llama3.3-70b_few-shot_v1": [
      "Dataset A headlines include informal language, social media tags (@mentions, hashtags), and conversational tones not present in B",
      "Dataset A contains granular technical analysis terms (e.g., 'cup base,' 'blue skyscraper') absent in B's more general market movement descriptions",
      "Dataset A references niche financial instruments (e.g., dividends, REITs, specific contracts) while B focuses on mainstream equities and Fed policies",
      "Dataset A includes hyperlinks, truncated text ('...'), and external content references not found in B's self-contained headlines",
      "Dataset B headlines emphasize repetitive structural patterns (e.g., '[Company] Stock [Verb] After [Event]') compared to A's varied phrasing",
      "Dataset A features premarket/after-hours price movements as standalone updates, while B embeds timing context within broader market narratives",
      "Dataset A includes obscure companies (e.g., $PHCEF, $AQST) alongside majors, whereas B focuses exclusively on large-cap household names",
      "Dataset A contains explicit dividend declarations and yield percentages absent in B's earnings-centric updates",
      "Dataset B headlines systematically repeat identical phrasing across multiple entries (e.g., 'Federal Releases...') unlike A's diverse vocabulary",
      "Dataset A references geopolitical micro-impacts (e.g., 'Norway biofuel policy,' 'Papua New Guinea LNG') vs B's macro-level trade tension mentions"
    ],
    "qwen2.5-32b_few-shot_bg_v1": [
      "Dataset B headlines focus more heavily on analyst actions from specific financial institutions (e.g., Morgan Stanley, Barclays) in nearly every entry, while A references a broader range of entities",
      "B emphasizes recurring mentions of ad revenue concerns and AI-driven growth potential, particularly for tech giants like Alphabet/Meta, absent in A",
      "B shows standardized headline structures starting with analyst actions/ratings (e.g., '[Bank] Maintains/Cuts...'), whereas A uses varied formats including bullet points and informal commentary",
      "Non-English characters (e.g., Chinese text) appear exclusively in B's headlines for bilingual reporting",
      "B demonstrates tighter sector focus on FAANG-tier tech companies (Apple, Tesla, Nvidia), while A covers diverse industries like energy, REITs, and retail",
      "B's economic references center narrowly on Fed rates/consumer confidence, while A includes granular indicators (housing starts, jet fuel policies)",
      "B headlines repeat identical tickers/companies (e.g., 15+ $AAPL mentions) to track evolving analyst sentiment, unlike A's distributed ticker coverage",
      "B uses forward-looking analyst expectations as primary narrative drivers (e.g., 'awaiting earnings reports'), whereas A emphasizes concluded corporate actions",
      "B's supply chain mentions focus exclusively on tech hardware (iPhones/GPUs), while A discusses broader disruptions (oil, COVID, automotive)",
      "B maintains formal tone with institutional jargon ('neutral rating', 'price target revisions'), contrasting with A's mix of slang, emojis, and retail investor lingo"
    ],
    "qwen2.5-32b_few-shot_v1": [
      "Dataset A headlines frequently include specific premarket/after-hours stock price movements (e.g., 'premarket on risdiplam data') while B focuses on regular session movements",
      "Dataset A contains more granular financial metrics (exact percentages, EPS figures, sales numbers) whereas B uses general comparative terms ('lower-than-expected', 'surge')",
      "Dataset A includes non-English characters and hashtags (#MarketScreener) while B maintains formal English language conventions",
      "Dataset A features more technical analysis terminology ('cup base', 'institutional accumulation') absent in B's fundamental-focused reports",
      "Dataset B headlines consistently use complete sentence structure while A employs fragmented financial shorthand ('Q1 revs below consensus')",
      "Dataset A references specific analyst firms (Berenberg, SunTrust RH) while B uses generic 'analysts' without attribution",
      "Dataset A includes dividend declaration specifics (exact amounts, yield percentages) whereas B mentions dividends generally",
      "Dataset B focuses more on sector-wide trends while A mixes sector analysis with company-specific operational details (facility licenses, partnership deals)",
      "Dataset A contains conference call references ('Edited Transcript of PLC.TO earnings conference call') absent in B",
      "Dataset A includes retail investor-focused language ('sympathy play', 'buy the dip') while B maintains institutional tone"
    ],
    "qwen2.5-32b_zero-shot_bg_v1": [
      "Dataset B headlines focus predominantly on technology sector companies (e.g., $AAPL, $NVDA, $TSLA) across all samples, whereas Dataset A covers diverse sectors like energy, retail, real estate, and automotive.",
      "Headlines in Dataset B consistently include both company names and stock tickers (e.g., '$AAPL - Apple...'), while Dataset A often uses tickers alone without explicit company names.",
      "Dataset B headlines exhibit standardized, formal phrasing resembling professional news reports, while Dataset A includes informal language, slang, and social media-style annotations (e.g., 'buy the dip,' hashtags).",
      "Analyst actions in Dataset B are disproportionately attributed to Barclays and Morgan Stanley across all samples, whereas Dataset A references a broader range of institutions (e.g., Berenberg, SunTrust, BofA Merrill Lynch).",
      "Dataset B headlines frequently incorporate non-English characters (e.g., Chinese) in analyst commentary, which is absent in Dataset A.",
      "Forward-looking statements in Dataset B are narrowly focused on tech earnings guidance and product launches, while Dataset A includes varied forward projections like oil production targets and infrastructure investments.",
      "All Dataset B headlines emphasize quarterly earnings outcomes/revisions and price target adjustments as primary drivers, whereas Dataset A discusses dividends, M&A activity, regulatory rulings, and macroeconomic policies with equal prominence.",
      "Numerical metrics in Dataset B prioritize revenue growth percentages and round price targets (e.g., 'raises price target to $180'), while Dataset A often cites precise sales figures (e.g., '$306.1M') and operational metrics like housing starts.",
      "Dataset B headlines lack explicit references to dividends, share buybacks, or capital allocation strategies, which are recurrent themes in Dataset A samples.",
      "Regulatory/legal mentions in Dataset B focus on ad revenue impacts and sector-wide competition, whereas Dataset A highlights specific lawsuits, environmental policies, and geopolitical trade actions."
    ],
    "qwen2.5-7b_zero-shot_v1": [
      "Dataset A headlines frequently include specific stock ticker symbols (e.g., $NVCR, $BPTH), while Dataset B headlines omit tickers and refer to companies or sectors generically (e.g., 'Tech Giants').",
      "Dataset A contains informal or social media-style language (e.g., '@tictoc oh fuck yeah Bloomberg'), whereas Dataset B headlines maintain a formal, standardized tone throughout.",
      "Dataset A often cites granular numerical metrics (e.g., 'sales increased 17.1% to $306.1M'), while Dataset B uses qualitative or broad quantitative descriptors (e.g., 'steady growth').",
      "Dataset A includes technical trading terms (e.g., 'cup base,' 'institutional accumulation'), while Dataset B avoids market microstructure jargon.",
      "Dataset A references niche or hyper-specific events (e.g., '1967 nuclear power plant law,' 'biofuel blending mandates in Norway'), whereas Dataset B focuses on widely recognized macroeconomic themes.",
      "Dataset A headlines frequently mention dividend declarations, capital raises, or balance sheet adjustments (e.g., 'declares $0.0633 dividend'), while Dataset B rarely addresses corporate financial mechanics.",
      "Dataset A incorporates international market developments (e.g., Chinese retail impacts, Indian gold imports), while Dataset B emphasizes domestic (U.S.) economic conditions.",
      "Dataset A includes legal/regulatory specifics (e.g., 'sued Volkswagen for suppressing competition'), whereas Dataset B mentions regulations only at sector-wide abstraction levels.",
      "Dataset A uses diverse volatility terminology (e.g., 'tumble,' 'ticks up,' 'sympathy play'), while Dataset B relies on repetitive action verbs ('plunge,' 'surge').",
      "Dataset A headlines frequently contain forward-looking operational details (e.g., 'cuts capex forecast by 30%'), while Dataset B emphasizes generalized economic projections (e.g., 'steady improvement')."
    ],
    "llama3.1-8b_zero-shot_bg_train-time-info_v1": [
      "Dataset B headlines consistently specify the analyst firm/bank driving ratings changes (e.g., Morgan Stanley, Oppenheimer), while A rarely names specific institutions.",
      "B exclusively ties stock price movements directly to analyst actions (e.g., 'slumps after Morgan Stanley cut'), whereas A often attributes price changes to broader events like earnings misses or macro trends.",
      "All B samples include explicit forward-looking quantitative analyst projections (e.g., 'predict 15% revenue growth', 'PT trimmed to $116'), while A focuses on company-issued guidance without third-party numerical forecasts.",
      "B headlines systematically pair stock reactions (+/- percentages) with specific analyst decisions in single sentences, creating cause-effect framing absent in A's more fragmented reporting.",
      "100% of B's earnings mentions reference Wall Street analyst consensus beats/misses, while A frequently reports earnings without benchmarking against analyst expectations.",
      "B emphasizes institutional investor sentiment shifts (e.g., 'Credit Suisse maintains neutral', 'Oppenheimer downgrades') as primary news drivers, unlike A's mix of retail investor-focused technical analysis and fundamental developments.",
      "All B samples maintain formal corporate naming conventions (e.g., 'Home Depot Declares Dividend') versus A's occasional informal language ('oh fuck yeah Bloomberg').",
      "B consistently references fiscal quarters/years in context of analyst timelines (e.g., 'Q2 2023', 'FY2024'), while A uses more relative time markers ('today', 'premarket').",
      "100% of B's price target adjustments cite precise new targets ($265, $7.50), whereas A typically describes rating changes qualitatively ('Buy reiteration').",
      "B exclusively uses standardized headline structure: [Ticker] [Action] [Analyst Firm] [Quantitative Impact], contrasting with A's variable formats incorporating tweets, recaps, and commentary snippets."
    ],
    "qwen2.5-7b_few-shot_v1": [
      "Dataset B headlines focus more on sector-wide trends (e.g., tech, automotive) rather than granular company-specific operational details (e.g., contracts, dividend payouts) common in Dataset A.",
      "Dataset A includes informal language/social media fragments (e.g., \"oh fuck yeah Bloomberg\"), while Dataset B uses standardized, formal news headline structures.",
      "Dataset A frequently references precise financial metrics (e.g., \"17.1% increase to $306.1M\"), whereas Dataset B emphasizes qualitative outcomes (e.g., \"beats expectations\") without granular figures.",
      "Dataset A contains real-time trading updates (e.g., premarket declines, after-hours moves), while Dataset B focuses on post-announcement market reactions (e.g., \"shares surge\").",
      "Dataset B emphasizes macroeconomic policy decisions (e.g., Federal Reserve rate changes) as primary drivers, while Dataset A interweaves macro and microeconomic factors (e.g., company-specific capex cuts).",
      "Dataset A includes niche regulatory actions (e.g., lawsuits against specific firms like Volkswagen), while Dataset B discusses broader regulatory landscapes (e.g., \"new laws affecting tech giants\").",
      "Dataset A highlights dividend declarations and yield analyses (e.g., \"2.6% Dividend Yield\"), whereas Dataset B omits dividend-focused narratives entirely.",
      "Dataset B features recurring themes of global supply chain disruptions and semiconductor shortages, absent from Dataset A's company-specific operational hurdles.",
      "Dataset A incorporates social sentiment indicators (e.g., \"institutional accumulation,\" \"bullish trends\"), while Dataset B relies on institutional analyst consensus (e.g., \"surpasses analyst expectations\").",
      "Dataset B uses forward-looking macroeconomic forecasts (e.g., \"potential recession\") as central themes, whereas Dataset A anchors projections to explicit corporate guidance (e.g., \"raises 2020 EPS outlook\")."
    ],
    "llama3.3-70b_few-shot_bg_v1": [
      "Dataset B headlines predominantly focus on technology sector companies (e.g., $TSLA, $NVDA, $GOOGL) while Dataset A covers broader sector diversity including energy, retail, and real estate",
      "Dataset B shows repetitive template structures (\"[Institution] [action] [ticker] - [reason]\") while Dataset A uses varied sentence structures and narrative formats",
      "Dataset B contains frequent duplicate headlines about identical analyst actions (e.g., 14+ Morgan Stanley $TSLA downgrades) whereas Dataset A maintains unique content per headline",
      "Dataset B exclusively uses hyphen formatting after ticker symbols (\"$tsla - to underweight\") while Dataset A employs colons or natural language integration",
      "Dataset B emphasizes institutional analyst perspectives as primary drivers while Dataset A includes direct corporate announcements and macroeconomic data releases",
      "Dataset B lacks numerical specifics in price targets/guidance (\"raises to $250\") whereas Dataset A frequently includes precise financial figures and percentage changes",
      "Dataset B shows concentrated institutional focus (Morgan Stanley/Barclays account for 68% of actions) compared to Dataset A's diverse range of 30+ cited financial institutions",
      "Dataset B headlines omit temporal specificity (\"next week\", \"upcoming quarter\") while Dataset A frequently includes exact dates and fiscal periods (\"Q4 2019\", \"March 2020\")",
      "Dataset B contains minimal market reaction language (\"trades flat\") compared to Dataset A's rich price movement descriptors (\"surges\", \"tumbles\", \"ticks up\")",
      "Dataset B shows narrow thematic range (85% earnings/analyst actions) versus Dataset A's inclusion of dividends, M&A, recalls, and regulatory developments"
    ],
    "llama3.1-8b_few-shot_v1": [
      "Dataset B headlines predominantly use formal, structured language resembling traditional news articles, while Dataset A includes informal elements like tweets, slang, and incomplete sentences.",
      "Dataset A headlines frequently incorporate Twitter handles, hashtags, and social media references (e.g., '@tictoc', '#markets'), whereas Dataset B maintains a professional tone without social media markers.",
      "Dataset B emphasizes macroeconomic trends and national/global economic indicators (e.g., GDP revisions, inflation rates), while Dataset A focuses on micro-level corporate actions like dividend declarations or stock buybacks.",
      "Dataset A headlines contain granular technical analysis terminology (e.g., 'cup base,' 'blue skyscraper,' '8 EMA level') absent in Dataset B's more generalized market commentary.",
      "Dataset B headlines prioritize government policy impacts (e.g., Fed rate decisions, stimulus packages), whereas Dataset A emphasizes company-specific regulatory issues (e.g., pipeline negotiations, license revocations).",
      "Dataset A consistently includes stock ticker symbols in headlines (e.g., '$BYND,' '$XLU'), while Dataset B often omits tickers, using full company names instead.",
      "Dataset A features real-time market updates with percentage changes (e.g., 'premarket +5.8%'), while Dataset B reports retrospective performance summaries (e.g., 'Q3 growth misses forecasts').",
      "Dataset B headlines systematically reference institutional analyst actions (e.g., 'Morgan Stanley says,' 'Goldman Sachs outlook'), whereas Dataset A includes unsourced speculative statements (e.g., 'Money will flow into this').",
      "Dataset A contains niche financial instruments and strategies (e.g., 'unusual puts,' 'capital raise'), while Dataset B focuses on mainstream economic concepts accessible to general investors.",
      "Dataset B headlines frequently frame narratives around binary outcomes (e.g., 'recession fears,' 'trade war escalates'), while Dataset A emphasizes incremental developments like partnership announcements or dividend yields."
    ],
    "llama3.1-8b_few-shot_bg_train-time-info_v1": [
      "Dataset B headlines consistently include the analyst firm name (e.g., Morgan Stanley, Deutsche Bank) when citing actions, while Dataset A often omits specific firm references.",
      "Dataset B headlines focus more narrowly on tech, retail, and energy sectors, whereas Dataset A covers a broader range of industries (e.g., real estate, automotive, biotech, utilities).",
      "Dataset B uses formal company names alongside stock tickers in parentheses (e.g., 'DocuSign (DOCU)'), while Dataset A often uses tickers alone.",
      "Dataset B headlines emphasize future guidance cuts/raises (e.g., '2023 Outlook Cut') and forward-looking analyst targets, whereas Dataset A focuses more on past performance metrics (e.g., 'Q1 revenue increased 17.1%').",
      "Dataset B includes more frequent mentions of ETFs (e.g., SPY, XLK) and sector indices, while Dataset A references broader market indices (Dow, S&P 500) with explicit percentage movements.",
      "Dataset A contains non-English terms/social media elements (e.g., Chinese text, '@tictoc'), while Dataset B maintains standardized English language throughout.",
      "Dataset B headlines structure strategic moves around M&A and partnerships (e.g., 'acquires dating app Hinge'), while Dataset A emphasizes expansions/operational changes (e.g., 'job losses', 'solar power transition').",
      "Dataset A explicitly references COVID-19 impacts on company performance (e.g., 'Q1 revs below consensus amid COVID-19'), while Dataset B discusses pandemic effects indirectly through consumer confidence trends.",
      "Dataset B features repetitive phrases like 'raises price target' or 'downgrades to' in 90% of analyst-action headlines, whereas Dataset A uses varied wording for similar events (e.g., 'reiterates Buy rating', 'surges after upgrade').",
      "Dataset A includes granular macroeconomic data (e.g., 'housing starts climb 3.8%', 'building permits rate'), while Dataset B focuses on corporate financial metrics (EPS/revenue beats/misses) without secondary economic indicators."
    ],
    "qwen2.5-7b_few-shot_bg_v1": [
      "Dataset B headlines focus more heavily on tech sector companies (e.g., $TSLA, $NVDA, $AAPL, $MSFT) compared to Dataset A's broader sector coverage",
      "Dataset B shows higher frequency of explicit price target numbers and specific valuation adjustments (e.g., 'cuts price target to $120') compared to Dataset A's more general analyst action mentions",
      "Dataset B contains more frequent references to specific executive leadership decisions and CEO-related developments (e.g., Musk's leadership changes) than Dataset A",
      "Dataset A includes more diverse macroeconomic indicators (e.g., housing starts, building permits) while B focuses primarily on corporate financial metrics",
      "Dataset B demonstrates heavier concentration on quarterly earnings results and revisions rather than Dataset A's mix of earnings with regulatory/legal developments",
      "Dataset A contains more international currency/forex market coverage (e.g., EUR/USD analysis) absent in Dataset B's US-centric focus",
      "Dataset B shows more frequent use of standardized analyst rating terminology ('Overweight', 'Underperform') compared to Dataset A's varied phrasing",
      "Dataset A includes more granular operational updates (e.g., 'net merchandise sales increased 17.1%') while B emphasizes broader financial guidance",
      "Dataset B features more direct comparisons between company performance and market indices/S&P 500 than Dataset A",
      "Dataset A contains more diverse data sources including tweets, conference calls, and press releases compared to B's institutional analyst-driven content"
    ],
    "qwen2.5-7b_few-shot_bg_train-time-info_v1": [
      "Dataset B headlines primarily focus on analyst actions (upgrades/downgrades/price target changes) as the central event, while Dataset A includes analyst actions alongside other financial events like partnerships or economic indicators",
      "Dataset B consistently cites analyst actions with explicit rationales (e.g., 'citing supply chain disruptions'), whereas Dataset A mentions analyst actions without always providing justification",
      "Dataset B headlines emphasize institutional analyst firms (e.g., Oppenheimer, Citi) as primary actors, while Dataset A occasionally references analysts generically or embeds their input within broader news",
      "Dataset B headlines use standardized phrasing for rating changes (e.g., 'downgraded to Underperform'), while Dataset A employs varied language (e.g., 'reiterates Buy rating') alongside non-analyst financial updates",
      "Dataset B headlines are structurally formulaic, often starting with tickers followed by analyst actions, while Dataset A uses diverse structures (e.g., company announcements, market summaries)",
      "Dataset B headlines prioritize quantifiable analyst actions (e.g., 'cuts price target to $15 from $20'), whereas Dataset A includes qualitative metrics (e.g., 'growth potential') alongside quantitative data",
      "Dataset B headlines lack mentions of macroeconomic trends (e.g., housing starts, GDP) present in Dataset A, which integrates broader economic context with company-specific news",
      "Dataset B excludes non-analyst corporate developments (e.g., mergers, regulatory decisions) common in Dataset A, focusing solely on third-party analyst evaluations",
      "Dataset B headlines frequently reference consecutive analyst actions (e.g., 'third consecutive price target cut'), while Dataset A highlights singular analyst endorsements or standalone events",
      "Dataset B avoids speculative language (e.g., 'could', 'potential') prevalent in Dataset A, opting for definitive statements about completed analyst actions"
    ],
    "qwen2.5-7b_few-shot_bg_test-time-info_v1": [
      "Dataset B headlines consistently emphasize analyst rating changes (e.g., upgrades/downgrades) as primary triggers for stock reactions, while A includes a broader mix of catalysts like partnerships or macroeconomic data",
      "B systematically uses standardized phrases like 'beating analyst expectations' or 'misses estimates' in earnings reports, whereas A often specifies exact percentage changes (e.g., 'increased 17.1%')",
      "Dataset A contains non-English text and references to international markets (e.g., Chinese imports), while B focuses predominantly on US-centric analyst actions",
      "B frequently cites specific investment banks/firms initiating coverage (e.g., 'Morgan Stanley maintains'), while A more often mentions institutions in passing without attribution to specific actions",
      "A includes technical trading pattern descriptions (e.g., 'cup base', 'blue skyscraper'), whereas B remains focused on fundamental analysis terminology",
      "Dataset B headlines consistently pair stock reactions with earnings results in the same sentence structure (e.g., 'X reports Y, shares rise Z%'), while A separates these elements across different samples",
      "A contains explicit dividend declarations and REIT-specific financials, while B focuses exclusively on growth-oriented metrics and analyst ratings",
      "B shows recurring emphasis on supply chain costs as justification for revisions, whereas A cites broader operational factors (e.g., 'snow shortages', 'COVID-19 impacts')",
      "Dataset A includes forex/commodity-specific analysis ('EUR/USD Price Forecast'), while B remains strictly equity-focused",
      "B uses consistent formatting for price target updates (e.g., 'cuts target to $25 from $30'), whereas A presents targets more variably within narrative contexts"
    ],
    "llama3.3-70b_few-shot_bg_test-time-info_v1": [
      "Dataset B headlines are uniformly written in lowercase letters (except for proper nouns and tickers), while Dataset A uses standard capitalization and varied formatting styles.",
      "Dataset B consistently structures headlines with analyst firm names as the primary subject initiating actions (e.g., 'Morgan Stanley downgrades'), whereas Dataset A headlines use diverse subjects (e.g., companies, indices, or macroeconomic events).",
      "Dataset B includes explicit mentions of institutional rating terminology (e.g., 'overweight,' 'underweight,' 'neutral') in every analyst action, while Dataset A uses simpler terms like 'Buy' or 'Hold' less systematically.",
      "Dataset B headlines frequently specify exact earnings release dates (e.g., 'to report on January 25'), whereas Dataset A often references broader timeframes (e.g., 'Q4 2019 Earnings Preview').",
      "Dataset B emphasizes stock price reactions (e.g., 'trades flat,' 'sending shares lower') as a direct outcome of news, while Dataset A includes standalone price movements without explicit causal phrasing.",
      "Dataset B consistently provides rationales for analyst actions (e.g., 'citing concerns over revenue growth'), whereas Dataset A headlines occasionally omit explanations for upgrades/downgrades.",
      "Dataset B focuses heavily on investor conference participation (e.g., 'to present at the upcoming Oppenheimer healthcare conference'), a theme absent in Dataset A.",
      "Dataset B headlines prioritize price target adjustments as central to analyst actions, while Dataset A includes a wider variety of financial metrics (e.g., sales growth, dividend declarations).",
      "Dataset B uses hyphenated ticker references mid-sentence (e.g., '$mlm - stock trades flat'), whereas Dataset A often places tickers at the beginning or uses standalone mentions.",
      "Dataset B avoids informal language, hashtags, or emojis, while Dataset A occasionally includes colloquial phrases (e.g., 'oh fuck yeah Bloomberg') and social media markers."
    ],
    "llama3.1-8b_zero-shot_bg_test-time-info_v1": [
      "Dataset B headlines consistently include explicit percentage changes or numerical price targets in analyst actions (e.g., 'price target cut to $145.00 from $160.00'), whereas Dataset A typically mentions analyst actions without quantifying specific targets.",
      "Dataset B emphasizes near-term earnings report dates (e.g., 'Q2 Earnings on July 25th') and immediate market reactions, while Dataset A focuses more on broader quarterly performance summaries (e.g., 'Q4 2019 Earnings Preview').",
      "Dataset B frequently pairs analyst actions with the analyst firm\u2019s name and rating terminology (e.g., 'Morgan Stanley upgrades to Overweight'), whereas Dataset A often omits the firm or uses generic descriptions (e.g., 'Berenberg reiterates Buy').",
      "Dataset B headlines highlight regulatory impacts tied directly to company-specific outcomes (e.g., 'Freddie Mac ends mortgage outsourcing deal'), while Dataset A references regulatory themes at a macroeconomic or sector level (e.g., 'FCC chairman voices support').",
      "Dataset B includes forward-looking guidance with precise fiscal year references (e.g., '2024 sales guidance'), whereas Dataset A uses less granular forward-looking statements (e.g., 'long-term outlook').",
      "Dataset B features more frequent mentions of post-earnings price target adjustments (e.g., 'Raises Price Target to $80'), while Dataset A focuses on broader financial metrics like revenue or EPS figures without explicit target updates.",
      "Dataset B headlines often juxtapose conflicting analyst opinions (e.g., 'analysts remain cautious' vs. 'bullish price target'), whereas Dataset A tends to present analyst actions as standalone updates.",
      "Dataset B uses structured phrasing to link corporate developments to stock performance (e.g., 'expansion lifts earnings outlook'), while Dataset A describes partnerships or mergers without explicitly tying them to market reactions.",
      "Dataset B integrates sector-specific trends (e.g., 'retail demand wanes') directly into stock-specific headlines, whereas Dataset A separates sector updates into standalone statements (e.g., 'oil blocks offshore Colombia').",
      "Dataset B emphasizes cash flow challenges and liquidity risks (e.g., 'tight cash flow amid operating losses'), while Dataset A focuses on dividend yields or capital expenditures without highlighting liquidity concerns."
    ],
    "llama3.3-70b_zero-shot_bg_test-time-info_v1": [
      "Headlines in Dataset B consistently structure sentences around analyst actions (e.g., 'downgrades,' 'upgrades') from specific institutions (e.g., Barclays, Morgan Stanley), whereas Dataset A uses varied verbs (e.g., 'surges,' 'tumbles') and includes non-analyst-driven events.",
      "Dataset B headlines rigidly append ticker symbols (e.g., '$TAP') alongside full company names even when redundant, while Dataset A sometimes omits tickers or mentions companies without them.",
      "Dataset B frequently specifies future earnings release dates (e.g., 'Q2 earnings report on August 10'), while Dataset A focuses on immediate results or historical data without explicit future dates.",
      "Dataset A includes informal language, social media references, and non-news content (e.g., tweets), whereas Dataset B maintains formal, standardized language focused solely on financial actions.",
      "Dataset A covers broader macroeconomic events (e.g., trade deals, COVID-19 impacts) and sector-agnostic developments (e.g., real estate, automotive recalls), while Dataset B remains narrowly focused on company-specific analyst ratings and earnings.",
      "Technical analysis terms (e.g., 'cup base,' 'blue skyscraper') and chart patterns appear exclusively in Dataset A headlines.",
      "Dataset A explicitly mentions dividend figures and declarations (e.g., '$0.0633 dividend'), whereas Dataset B rarely references dividends.",
      "Dataset B occasionally uses lowercase ticker symbols (e.g., '$bynd,' '$tsla'), while Dataset A consistently uses uppercase tickers (e.g., '$BYND,' '$TSLA').",
      "Dataset A headlines often cite percentage movements of market indices (e.g., 'Dow up 7.59%'), while Dataset B focuses on individual stock reactions to analyst actions.",
      "Dataset B emphasizes institutional actors as the primary subject of headlines (e.g., 'Morgan Stanley downgrades...'), whereas Dataset A contextualizes institutional input within broader market or corporate developments."
    ],
    "llama3.3-70b_zero-shot_v1": [
      "Dataset A headlines frequently include specific stock ticker symbols (e.g., $VIRT, $PSMT) while Dataset B headlines omit them entirely.",
      "Dataset A incorporates precise numerical metrics (e.g., '17.1% increase', '$2.61-$2.63 EPS') whereas Dataset B uses qualitative descriptors (e.g., 'soar', 'plummet') without exact figures.",
      "Dataset A includes diverse financial events like dividend declarations, permits, and contract awards, while Dataset B focuses narrowly on earnings reports/Fed decisions.",
      "Dataset A contains informal language, Twitter handles, and conversational tones (e.g., 'oh fuck yeah Bloomberg'), absent in Dataset B's formalized structures.",
      "Dataset B headlines show repetitive phrasing patterns (e.g., 'Federal Reserve Announces...', 'Tech Stocks Plunge...') compared to Dataset A's varied syntactic structures.",
      "Dataset A references niche technical analysis terms (e.g., 'cup base', 'institutional accumulation') absent in Dataset B's generalized market commentary.",
      "Dataset A specifies localized/regional economic impacts (e.g., 'China imports', 'New York lawsuits') while Dataset B emphasizes broad macroeconomic trends.",
      "Dataset B disproportionately features the tech sector and Federal Reserve actions compared to Dataset A's balanced coverage across energy, retail, biotech, etc.",
      "Dataset A includes timestamped event details (e.g., 'Q1 2020', 'October permits') whereas Dataset B uses vague temporal references (e.g., 'next quarter', 'upcoming year').",
      "Dataset A highlights granular operational developments (e.g., 'pipeline negotiations', 'facility licenses') while Dataset B focuses on aggregated market sentiment shifts."
    ],
    "llama3.1-8b_few-shot_bg_v1": [
      "Dataset B headlines consistently reference specific analyst actions (e.g., downgrades, target adjustments) tied to named financial institutions (e.g., Morgan Stanley, Goldman Sachs), while Dataset A includes a broader mix of news sources and company-initiated announcements.",
      "Dataset B focuses heavily on large tech companies (e.g., $AAPL, $MSFT, $NVDA) and their sector-specific challenges (e.g., chip demand, cloud computing), whereas Dataset A covers diverse sectors like energy, retail, and real estate.",
      "Dataset B headlines explicitly tie stock price movements (e.g., 'shares sink 5%') to analyst actions or earnings results, while Dataset A more often reports numerical data (e.g., sales figures) without direct market reaction context.",
      "Dataset B integrates macroeconomic indicators (e.g., CPI, GDP) as central headline topics (e.g., 'CPI Data for October to be Released'), while Dataset A mentions them as contextual drivers of market events.",
      "Dataset B emphasizes quarterly earnings expectations and analyst consensus (e.g., 'analysts expect lower revenue'), whereas Dataset A highlights company-issued guidance updates (e.g., 'TJX raises 2020 EPS outlook').",
      "Dataset B headlines follow a rigid structure (Ticker - Analyst Action - Institution - Rationale), while Dataset A uses varied formats, including informal social media language and non-ticker-led entries.",
      "Dataset B prioritizes forward-looking analyst predictions (e.g., 'price target raised to $350') as primary news, whereas Dataset A includes retrospective performance metrics (e.g., 'net merchandise sales increased 17.1%').",
      "Dataset B frequently cites global economic risks (e.g., EU's $1 trillion loss, US-China trade tensions) as headline subjects, while Dataset A focuses on localized impacts (e.g., regional oil demand, Canadian rail strikes).",
      "Dataset B consistently quantifies analyst rationale (e.g., 'due to 20% decline in GPU demand'), whereas Dataset A often omits detailed causation in favor of event reporting (e.g., 'cuts dividend').",
      "Dataset B headlines explicitly mention competition between major firms (e.g., '$MSFT vs. $AAPL') as a catalyst, while Dataset A highlights external factors like legal/regulatory actions or partnerships."
    ],
    "llama3.1-8b_few-shot_bg_test-time-info_v1": [
      "Dataset B headlines consistently specify the analyst firm or individual behind actions (e.g., 'Barclays cuts', 'Oppenheimer downgrades'), while Dataset A mentions analyst actions generically (e.g., 'downgrades 4/7: $AAN...').",
      "Dataset B headlines frequently include exact price target figures or percentage adjustments (e.g., 'cuts price target to $235 from $280'), whereas Dataset A rarely quantifies targets (e.g., 'raises price target').",
      "Dataset B emphasizes explicit guidance updates (e.g., 'raised guidance', 'maintains quarterly dividend'), while Dataset A often states outlook changes without granular financial metrics (e.g., 'raises 2020 EPS outlook').",
      "Dataset B uses structured phrases like 'maintains Neutral rating' or 'initiates coverage', whereas Dataset A employs informal commentary (e.g., 'buy the dip', 'shares soar 25%').",
      "Dataset B headlines explicitly tie stock movements to analyst rationale (e.g., 'due to rising competition'), while Dataset A reports price changes as standalone events (e.g., 'premarket gains').",
      "Dataset B includes mergers/acquisitions with deal values (e.g., 'acquires... for $350M'), while Dataset A mentions partnerships without financial terms (e.g., 'Partnering with Costco').",
      "Dataset B headlines reference future earnings dates (e.g., 'Q3 earnings release for October 26'), whereas Dataset A focuses on immediate results (e.g., 'Q1 revs below consensus').",
      "Dataset B specifies regional market impacts (e.g., 'expand operations in Asia'), while Dataset A discusses global macroeconomic trends broadly (e.g., 'Chinese cash dries up').",
      "Dataset B quantifies dividend changes precisely (e.g., 'Boost Dividend by 5%'), while Dataset A states dividend announcements without percentages (e.g., 'declares $0.0633 dividend').",
      "Dataset B frequently cites competitive pressures as catalysts (e.g., 'as competition mounts'), whereas Dataset A attributes challenges to external factors like pandemics or regulations."
    ]
  },
  "diffs_real_from_synth": {
    "qwen2.5-7b_zero-shot_bg_train-time-info_v1": [
      "Dataset A headlines consistently begin with ticker symbols in parentheses (e.g., $MRLN), while B often leads with company names or general topics without immediate ticker references.",
      "Dataset B includes headlines structured as broad market summaries (e.g., 'STOCKS SURGE INTO THE CLOSE') absent in A, which focuses narrowly on analyst actions.",
      "Dataset B incorporates social media elements (e.g., hashtags, @mentions) and informal language, whereas A maintains formal, standardized financial terminology.",
      "Dataset B features standalone macroeconomic updates (e.g., 'Building permits rise 5%') without explicit ties to analyst sentiment, unlike A\u2019s analyst-driven narratives.",
      "Dataset A exclusively uses structured phrases like 'cuts price target' or 'maintains neutral,' while B includes unstructured sentences (e.g., 'Partnering with Costco...').",
      "Dataset B contains direct quotes from executives or regulatory bodies (e.g., 'Mnuchin says...'), a feature absent in A\u2019s analyst-centric reporting.",
      "Dataset B references geopolitical or regulatory events (e.g., 'Saudis Slash Oil Prices in Asia') as standalone catalysts, whereas A links such events to analyst actions.",
      "Dataset B includes dividend declarations (e.g., 'WPT Industrial REIT declares $0.0633 dividend') without analyst commentary, unlike A\u2019s focus on guidance revisions.",
      "Dataset B covers international markets (e.g., India\u2019s gold imports, Qatar LNG) more prominently, while A\u2019s samples are predominantly U.S.-focused.",
      "Dataset B headlines occasionally pose questions (e.g., 'Can $MSFT head higher?') or speculative statements, whereas A strictly reports factual analyst decisions."
    ],
    "qwen2.5-32b_zero-shot_bg_test-time-info_v1": [
      "Dataset B headlines frequently include technical analysis terms (e.g., 'cup base', 'blue skyscraper') and chart patterns, whereas A focuses on fundamental metrics without technical jargon.",
      "Dataset B contains more explicit mentions of geopolitical events (e.g., 'Saudis Slash Oil Prices in Asia', 'OPEC+ Maintains Output Cuts') impacting markets, while A emphasizes company-specific supply chain or operational challenges.",
      "Headlines in B often reference foreign markets (e.g., India\u2019s gold imports, European LNG demand) and global economic interdependencies, whereas A\u2019s international coverage is narrower and tied to company expansions.",
      "Dataset B includes granular forward-looking financial guidance with exact figures (e.g., 'TJX raises 2020 EPS outlook to $2.61-$2.63'), while A typically states general guidance (e.g., 'raises dividend').",
      "B features recurring mentions of non-equity asset classes (e.g., EUR/USD forex, Ethereum, oil futures), whereas A focuses almost exclusively on equities and ETFs.",
      "Dataset B headlines incorporate real-time market summaries (e.g., 'Dow up 7.59%') and event recaps (e.g., 'RECAP 12/10 Unusual Puts'), which are absent in A.",
      "B includes more regulatory or legal developments tied to historical laws (e.g., Disney\u2019s 1967 nuclear law) and broad policy shifts, while A\u2019s legal mentions focus on company-specific lawsuits or investigations.",
      "Dividend announcements in B specify yield percentages (e.g., '2.6% Dividend Yield'), while A highlights dividend actions qualitatively (e.g., 'raises dividend').",
      "Dataset B contains social media references and informal language (e.g., '@tictoc oh fuck yeah Bloomberg'), whereas A maintains formal, structured headlines.",
      "B emphasizes macroeconomic indicators (e.g., 'Building permits rise 5%', 'Q4 GDP: Trudging Along') as standalone topics, while A integrates macro trends into company-specific narratives."
    ],
    "qwen2.5-7b_zero-shot_bg_test-time-info_v1": [
      "Dataset B headlines include non-financial topics (e.g., environmental shifts, geopolitical events) alongside financial metrics, whereas A focuses strictly on company/analyst-driven financial actions.",
      "Dataset B incorporates social media references, informal hashtags (e.g., #economy), and conversational language (e.g., \"buy the dip\"), while A uses formal analyst terminology consistently.",
      "Dataset B headlines frequently reference global macroeconomic factors (e.g., India\u2019s gold imports, Saudi oil prices), while A emphasizes U.S.-centric indices (e.g., $SPY) and companies.",
      "Dataset B includes technical chart analysis terms (e.g., \"cup base,\" \"blue skyscraper\"), absent in A\u2019s focus on fundamental metrics like EPS or revenue.",
      "Dataset B features explicit geopolitical developments (e.g., U.S.-China trade deals, U.K. Brexit) impacting markets, whereas A\u2019s market drivers are company-specific.",
      "Dataset B headlines use bullet-point formatting for market performance summaries (e.g., \"Dow up 7.59%\"), while A structures headlines as complete sentences.",
      "Dataset B covers forex markets (e.g., EUR/USD forecasts) and commodities (e.g., oil, gold), whereas A focuses on equities and ETFs.",
      "Dataset B includes direct quotes from executives or officials (e.g., Mnuchin, Fed Chair Powell), while A cites analyst firms (e.g., Morgan Stanley, Oppenheimer).",
      "Dataset B references broader legal/regulatory actions (e.g., lawsuits against Juul, FCC rulings), while A\u2019s legal mentions are tied to company-specific outcomes.",
      "Dataset B explicitly ties economic indicators (e.g., housing starts, GDP) to market sentiment, whereas A emphasizes corporate earnings and analyst ratings."
    ],
    "llama3.3-70b_zero-shot_bg_v1": [
      "Dataset B includes headlines referencing non-traditional financial entities (e.g., Costco, Juul, ConsenSys) not strictly tied to equity markets or analyst actions.",
      "Dataset B contains headlines with non-standard formatting (e.g., emojis, hashtags, social media handles like @tictoc) absent in Dataset A.",
      "Dataset B includes macroeconomic indicators unrelated to corporate performance (e.g., housing starts, oil imports, currency forecasts) as primary subjects.",
      "Dataset B features geopolitical/regulatory developments (e.g., FCC rulings, London Uber license decisions, Saudi oil policies) not present in Dataset A.",
      "Dataset B contains dividend declarations and REIT-related updates, whereas Dataset A focuses exclusively on analyst ratings/price targets and earnings.",
      "Dataset B includes forward-looking speculative questions (e.g., 'Can $MSFT head higher?') absent from Dataset A's declarative style.",
      "Dataset B references non-equity asset classes (e.g., cryptocurrencies like EOS/XRP, commodities like LNG) not mentioned in Dataset A.",
      "Dataset B contains explicit mentions of retail investor tools/strategies (e.g., options trades, technical analysis patterns) unlike Dataset A's institutional focus.",
      "Dataset B includes corporate operational developments (e.g., facility licenses, contract awards, product launches) beyond financial metrics emphasized in Dataset A.",
      "Dataset B features government policy impacts (e.g., biofuel mandates, rail strikes, coronavirus stimulus) as primary drivers versus Dataset A's market sentiment focus."
    ],
    "qwen2.5-32b_zero-shot_v1": [
      "Dataset B headlines include informal language, social media handles (@tictoc), hashtags (#MarketScreener), and conversational phrases (e.g., 'oh fuck yeah'), while A uses formal, standardized financial terminology exclusively.",
      "Dataset B explicitly references technical analysis terms (e.g., 'cup base,' 'blue skyscraper,' 'closing near highs'), whereas A focuses on fundamental metrics like EPS or revenue without technical jargon.",
      "Dataset B frequently includes granular numerical guidance (e.g., 'TJX raises 2020 EPS outlook to $2.61-$2.63') in forecasts, while A\u2019s forward-looking statements are more general (e.g., 'economic projections').",
      "Dataset B highlights international markets (e.g., India\u2019s gold imports, Colombia\u2019s oil blocks) and geopolitical events (e.g., U.K. exit from E.U.), whereas A focuses predominantly on U.S.-centric themes like Fed decisions.",
      "Dataset B specifies legal actions (e.g., 'Prevent sues Volkswagen') and partnerships (e.g., 'Partnering with Costco') with procedural or contractual details, while A mentions these topics generically (e.g., 'antitrust investigations').",
      "Dataset B incorporates cryptocurrency/blockchain coverage (e.g., 'Major blockchain developer ConsenSys') and niche sectors (e.g., cannabis licenses), which are absent in A\u2019s headlines.",
      "Dataset B includes real-time trading updates (e.g., 'Dow up 7.59%') and premarket/post-market price movements (e.g., 'down 6% premarket'), whereas A reports broader market reactions without timestamped granularity.",
      "Dataset B references dividend yields, specific dividend amounts (e.g., '$0.0633 dividend'), and shareholder agreements in detail, while A mentions dividends only generically.",
      "Dataset B uses colloquial phrases (e.g., 'buy the dip,' 'parabolic surge') and emojis (e.g., \ud83d\ude80), whereas A maintains a neutral, professional tone throughout.",
      "Dataset B covers niche industries (e.g., tanker shipping, geolocation services) and hyper-specific corporate actions (e.g., 'equity offering'), while A focuses on broader sectors like tech, energy, and healthcare."
    ],
    "qwen2.5-32b_few-shot_bg_train-time-info_v1": [
      "Dataset B headlines frequently include broader market indices performance (e.g., Dow, Nasdaq) and macroeconomic indicators (e.g., GDP, housing starts), while A focuses on company-specific metrics.",
      "Dataset B contains more social media elements (e.g., @mentions, hashtags like #MarketScreener) and informal language (e.g., \"oh fuck yeah Bloomberg\"), which are absent in A.",
      "Dataset A consistently includes precise stock price movement percentages (e.g., \"Shares Drop 7%\"), while B often omits granular price data in favor of qualitative reactions.",
      "Dataset B incorporates non-English text fragments (e.g., Chinese characters) and unstructured formatting (e.g., line breaks), whereas A maintains standardized English and clean formatting.",
      "Dataset A headlines explicitly name analyst firms (e.g., \"Barclays\", \"Goldman Sachs\") in most analyst actions, while B often omits specific firm names (e.g., \"Downgrades 4/7: $AAN...\").",
      "Dataset B includes more non-corporate entities (e.g., governments, central banks) in headlines (e.g., \"Fed Officials Weigh Risks\"), while A centers on companies and analysts.",
      "Dataset A frequently references exact conference call times (e.g., \"8:30 AM ET\"), whereas B lacks temporal specificity beyond quarters/years.",
      "Dataset B uses more speculative phrasing (e.g., \"Could fall beneath 50 cents\", \"Poised to Impact\"), while A emphasizes concrete past/current events (e.g., \"Reports Strong Q3\").",
      "Dataset A includes granular dividend amounts (e.g., \"$0.0633 dividend\"), while B mentions dividends qualitatively (e.g., \"interesting yield\") or omits specifics.",
      "Dataset B incorporates cryptocurrency/blockchain themes (e.g., Ethereum, Ripple\u2019s XRP) absent in A, which focuses on traditional equities."
    ],
    "llama3.3-70b_few-shot_bg_train-time-info_v1": [
      "Dataset B includes headlines without stock tickers (e.g., corporate announcements or market summaries), while all headlines in Dataset A consistently include tickers prefixed with '$'.",
      "Dataset B contains standalone market data updates (e.g., index percentage changes, economic indicators) without analyst attribution, whereas Dataset A ties market movements to explicit analyst/institutional commentary.",
      "Dividend declarations (e.g., yield specifics, payout announcements) are primary headlines in Dataset B, while Dataset A mentions dividends only within analyst-driven contexts (e.g., upgrades).",
      "Dataset B highlights legal/regulatory actions (e.g., lawsuits, rulings) as standalone news, whereas Dataset A references regulatory factors only as contextual drivers in analyst reports.",
      "Dataset B incorporates macroeconomic metrics (e.g., housing starts, GDP) as direct headlines, while Dataset A embeds such data within company-specific analyst evaluations.",
      "Dataset B includes international/non-U.S. economic developments (e.g., India\u2019s gold imports, European policies), while Dataset A focuses on U.S. institutions and domestic companies.",
      "Dataset B reports earnings results/previews (e.g., revenue misses) without attributing analysis to institutions, whereas Dataset A explicitly ties earnings to analyst expectations or reactions.",
      "Product launches/expansions (e.g., partnerships, tech rollouts) are standalone headlines in Dataset B but appear in Dataset A only alongside earnings/guidance from analysts.",
      "Dataset B uses informal/social media elements (e.g., tweets, casual language) in headlines, while Dataset A maintains formal, institution-centric language throughout.",
      "Dataset B includes technical trading terminology (e.g., chart patterns like 'cup base'), whereas Dataset A relies exclusively on fundamental analysis terms (EPS, revenue)."
    ],
    "llama3.3-70b_zero-shot_bg_train-time-info_v1": [
      "Dataset B includes headlines with explicit dividend declarations and amounts (e.g., 'WPT Industrial REIT declares $0.0633 dividend'), whereas Dataset A focuses on price target adjustments without dividend specifics.",
      "Dataset B features headlines with macroeconomic indicators (e.g., 'Building permits rise 5% in October to 1.461 million rate'), while Dataset A ties macroeconomic factors only to company/stock performance.",
      "Dataset B contains headlines formatted as summaries or recaps (e.g., 'RECAP 11/22 TRUMPSAYS...'), unlike Dataset A's structured analyst-action-centric phrasing.",
      "Dataset B references non-equity assets (e.g., cryptocurrencies like Ethereum, commodities like oil) explicitly, whereas Dataset A focuses solely on equities and ETFs.",
      "Dataset B includes headlines with social media mentions or hashtags (e.g., '@tictoc', '#economy'), while Dataset A avoids informal or non-institutional references.",
      "Dataset B highlights government policy impacts (e.g., '1967 law allowing Disney to build nuclear plants'), whereas Dataset A ties regulatory changes only to stock sentiment.",
      "Dataset B features international economic data (e.g., 'India's gold imports declined'), while Dataset A emphasizes U.S.-centric indices and sector ETFs.",
      "Dataset B includes headlines about legal actions (e.g., 'Prevent sues Volkswagen'), whereas Dataset A focuses on financial ratings and corporate developments.",
      "Dataset B uses bullet-point formatting for market index performance (e.g., 'Dow up 7.59%'), while Dataset A describes index trends narratively.",
      "Dataset B mentions M&A deals or corporate restructuring (e.g., 'Charles Schwab to buy TD Ameritrade'), whereas Dataset A emphasizes analyst-initiated actions (upgrades/downgrades)."
    ],
    "qwen2.5-32b_few-shot_bg_test-time-info_v1": [
      "Dataset B includes social media elements and informal language (e.g., tweets, hashtags, emojis) absent in Dataset A",
      "Dataset B contains headlines with broader macroeconomic indicators (e.g., GDP, PMI, housing starts) not explicitly emphasized in Dataset A",
      "Dataset B references geopolitical events (e.g., U.S.-China trade deals, Brexit) more frequently than Dataset A",
      "Dataset B includes dividend declarations and payout ratios, which are rarely mentioned in Dataset A",
      "Dataset B features event-driven market summaries (e.g., percentage surges/closes, premarket moves) without analyst actions, unlike Dataset A",
      "Dataset B incorporates technical analysis terms (e.g., 'cup base', 'blue skyscraper') absent in Dataset A's fundamentals-focused headlines",
      "Dataset B highlights commodity-specific price dynamics (e.g., oil, gold, LNG) as standalone topics, while Dataset A ties them to company impacts",
      "Dataset B includes legal/regulatory actions (e.g., lawsuits, license revocations) as primary drivers, unlike Dataset A's operational/earnings focus",
      "Dataset B covers cryptocurrency/blockchain developments, which are absent in Dataset A",
      "Dataset B uses fragmented formatting (e.g., bullet points, RECAP summaries) compared to Dataset A's standardized sentence structures"
    ],
    "qwen2.5-7b_zero-shot_bg_v1": [
      "Dataset B headlines more frequently incorporate social media elements (hashtags, mentions, informal language) compared to Dataset A's formal tone.",
      "Dataset B includes more granular dividend declarations (e.g., exact dividend amounts) absent in Dataset A's earnings-focused updates.",
      "Dataset B features explicit references to non-corporate entities (e.g., national governments, FCC, FDA) influencing markets, unlike Dataset A's corporate-centric context.",
      "Dataset B contains more niche commodity-specific updates (e.g., LNG tanker movements, oil bloc acquisitions) versus Dataset A's broader industry trends.",
      "Dataset B headlines regularly highlight contractual specifics (e.g., '$39 million sole-source contract') absent in Dataset A's analyst-driven price targets.",
      "Dataset B includes granular macroeconomic data points (e.g., 'housing starts climb 3.8%') vs. Dataset A's qualitative economic forecasts.",
      "Dataset B references localized regulatory actions (e.g., NYC lawsuits, London Uber bans) not present in Dataset A's global index-focused narratives.",
      "Dataset B contains explicit dividend yield analyses (e.g., '2.6% Dividend Yield Looks Interesting') unlike Dataset A's EPS/revenue comparisons.",
      "Dataset B integrates retail investor-focused language (e.g., 'sympathy play,' 'buy the dip') absent in Dataset A's institutional analyst perspective.",
      "Dataset B features real-time event-driven updates (e.g., 'Dow up 7.59% into close') contrasting with Dataset A's forward-looking performance projections."
    ],
    "qwen2.5-32b_zero-shot_bg_train-time-info_v1": [
      "Dataset B headlines more frequently include non-standard formatting (e.g., line breaks, emojis, hashtags) compared to Dataset A's formal structure",
      "Dataset B contains more direct quotes from executives/analysts (e.g., \"I think the most exciting part...\") compared to Dataset A's third-person reporting",
      "Dataset B includes more international economic data points (e.g., India's gold imports, Qatar LNG) compared to Dataset A's domestic/US-focused content",
      "Dataset B features more technical analysis terminology (e.g., \"cup base\", \"blue skyscraper chart\") absent in Dataset A's fundamental-focused headlines",
      "Dataset B contains more explicit political/regulatory developments (e.g., FCC spectrum sales, London Uber license) compared to Dataset A's corporate-centric news",
      "Dataset B headlines more frequently reference specific price levels/dates (e.g., \"$0.0633 dividend\", \"Q4 2019\") compared to Dataset A's relative timeframe references",
      "Dataset B includes more raw market data displays (e.g., percentage changes in indices) compared to Dataset A's qualitative descriptions of movements",
      "Dataset B features more retail investor-focused language (e.g., \"buy the dip\", \"sympathy play\") compared to Dataset A's institutional tone",
      "Dataset B contains more non-corporate economic indicators (e.g., housing starts, building permits) compared to Dataset A's company-specific metrics focus",
      "Dataset B headlines more frequently pose direct questions to readers (e.g., \"Can $MSFT head higher?\") compared to Dataset A's declarative statements"
    ],
    "llama3.1-8b_zero-shot_bg_v1": [
      "Dataset B headlines include exact dividend amounts and specific financial metrics (e.g., 'declares $0.0633 dividend', 'Q1 revs below consensus'), while A focuses on general dividend announcements without precise figures.",
      "Dataset B contains informal language, social media references, and hashtags (e.g., '@tictoc oh fuck yeah', '#economy'), whereas A uses strictly formal, professional terminology.",
      "Dataset B features fragmented formatting (e.g., line breaks, bullet points, emojis) in headlines, unlike A\u2019s uniform, cohesive sentence structures.",
      "Dataset B references a wider range of asset classes (e.g., currencies like EUR/USD, commodities like oil) beyond equities, while A primarily centers on stocks and macroeconomic factors.",
      "Dataset B includes headlines structured as live updates or recaps (e.g., 'RECAP 11/22 TRUMPSAYS', 'Stock market live updates'), whereas A emphasizes standalone news events.",
      "Dataset B highlights localized or niche regulatory/legal developments (e.g., 'New York hits Juul with a lawsuit'), while A focuses on broader regulatory impacts (e.g., antitrust concerns).",
      "Dataset B frequently cites lesser-known or regional firms (e.g., Berenberg, SunTrust RH), while A predominantly references major global institutions (e.g., Morgan Stanley, Barclays).",
      "Dataset B incorporates non-English characters or international-centric updates (e.g., 'Wipro\u8d0f\u5f97Marelli'), whereas A\u2019s content is predominantly US-focused.",
      "Dataset B explicitly quantifies forward-looking corporate guidance (e.g., 'raises 2020 EPS outlook to $2.61-$2.63'), while A emphasizes retrospective performance (e.g., 'Q3 Earnings Miss Estimate').",
      "Dataset B covers granular sector-specific operational updates (e.g., 'awarded a $39 million contract', 'cuts dividend due to 737 MAX crisis'), whereas A emphasizes broader strategic developments (e.g., partnerships, product launches)."
    ],
    "llama3.1-8b_zero-shot_v1": [
      "Dataset B headlines explicitly include stock tickers (e.g., $VIRT, $PSMT), while Dataset A uses company names without tickers.",
      "Dataset B emphasizes granular corporate actions (e.g., dividend declarations, EPS guidance revisions, contract awards), whereas Dataset A focuses on broader market-wide reactions.",
      "Dataset B incorporates technical trading terminology (e.g., 'cup base,' 'blue skyscraper,' 'premarket moves') absent in Dataset A.",
      "Dataset B includes social media elements (e.g., hashtags, @mentions, tweet-like formatting), while Dataset A uses formal headline structures.",
      "Dataset B highlights niche sectors (e.g., LNG, specific pharmaceuticals) and hyper-localized geographic business developments (e.g., Norway\u2019s biofuel policy), unlike Dataset A\u2019s generalized sector references.",
      "Dataset B features structured bullet points or recap sections (e.g., 'RECAP 11/22 TRUMPSAYS') for trading updates, whereas Dataset A uses prose-only headlines.",
      "Dataset B references specific legal disputes (e.g., lawsuits against Juul, Prevent vs. Volkswagen), while Dataset A mentions regulatory impacts only at macroeconomic levels.",
      "Dataset B provides granular financial instrument details (e.g., put/call options, equity offerings) not present in Dataset A.",
      "Dataset B includes M&A announcements (e.g., Charles Schwab acquiring TD Ameritrade) as standalone headlines, whereas Dataset A treats M&A as part of broader market trends.",
      "Dataset B frequently cites analyst price targets and institutional actions (e.g., 'Deutsche Bank lifts price target'), while Dataset A refers to analyst sentiment generically."
    ],
    "llama3.3-70b_few-shot_v1": [
      "Dataset B includes informal social media elements (e.g., tweets, hashtags, colloquial language) absent in Dataset A",
      "Dataset B contains explicit references to dividend declarations and yield percentages (e.g., WPT Industrial REIT dividend), while A focuses on dividends only as part of corporate events without specific yield metrics",
      "Dataset B features granular technical trading terminology (e.g., cup base, puts/calls, premarket price movements, institutional accumulation) not prevalent in Dataset A",
      "Dataset B highlights niche sectors (e.g., cannabis, blockchain, LNG) and specific industrial operations (e.g., oil block acquisitions) beyond A's broader sector trends",
      "Dataset B includes international market specifics (e.g., India's gold imports, UK/EU regulatory impacts) not emphasized in A's US-centric macroeconomic focus",
      "Dataset B references contractual agreements and legal disputes (e.g., Prevent suing Volkswagen, licensing revocations) more frequently than A",
      "Dataset B shows explicit mentions of share offerings, capital raises, and equity dilution events absent in A",
      "Dataset B integrates mixed-content headlines (e.g., combined stock performance summaries with editorial commentary) contrasting with A's standardized news formatting",
      "Dataset B includes granular commodity market dynamics (e.g., LNG tanker idling, biofuel blending mandates) not covered in A's macroeconomic indicators",
      "Dataset B features earnings call transcripts, price target adjustments from specific analysts/firms, and conference call announcements absent in A"
    ],
    "qwen2.5-32b_few-shot_bg_v1": [
      "Dataset B headlines frequently include non-English characters or multilingual content (e.g., Chinese text, currency symbols like \u0080), while Dataset A uses standardized English.",
      "Dataset B contains more informal or conversational language, including slang (e.g., \"oh fuck yeah\"), emojis, and social media-style tagging (e.g., hashtags like #MarketScreener), unlike Dataset A's formal tone.",
      "Dataset B includes granular commodity-specific updates (e.g., oil prices, LNG tankers, gold imports) absent in Dataset A, which focuses on company-specific financial metrics.",
      "Dataset B headlines frequently reference specific dividend yields, small-cap stocks, and niche industries (e.g., cannabis licenses, REIT dividends), whereas Dataset A emphasizes large-cap tech and consumer companies.",
      "Dataset B incorporates technical trading jargon (e.g., \"cup base,\" \"blue skyscraper,\" \"institutional accumulation\") not seen in Dataset A\u2019s analyst-driven narratives.",
      "Dataset B includes geopolitical or localized economic events (e.g., London Uber license loss, Papua New Guinea LNG talks) with direct market impacts, while Dataset A focuses on macroeconomic trends (e.g., Fed rates, inflation).",
      "Dataset B features explicit references to retail investor strategies (e.g., \"hedge your stock investments,\" \"buy the dip\"), whereas Dataset A targets institutional perspectives (e.g., analyst ratings, EPS guidance).",
      "Dataset B headlines often omit company names, relying solely on tickers (e.g., \"$PHCEF - PharmaCielo reports Q3 results\"), while Dataset A consistently pairs tickers with full company names.",
      "Dataset B includes non-earnings operational updates (e.g., dividend declarations, contract awards, job cuts) as standalone headlines, whereas Dataset A ties such updates to financial performance or analyst actions.",
      "Dataset B uses dramatic, non-quantified phrases (e.g., \"parabolic surge,\" \"hard landing\") and clickbait-style formatting (e.g., \"RECAP\" lists), while Dataset A prioritizes numerical specificity (e.g., \"Revenue Jumps 8%\")."
    ],
    "qwen2.5-32b_few-shot_v1": [
      "Dataset B headlines often include social media elements such as hashtags (e.g., #economy), mentions (e.g., @tictoc), and informal language (e.g., 'oh fuck yeah Bloomberg'), whereas Dataset A uses strictly formal language without such features.",
      "Dataset B incorporates technical analysis terminology (e.g., 'cup base,' 'blue skyscraper') and trading jargon (e.g., 'puts,' 'shorts'), while Dataset A focuses on fundamental analysis and avoids market-specific trading terms.",
      "Dataset B includes explicit references to premarket/after-hours stock movements (e.g., 'down 6% premarket'), whereas Dataset A reports price changes without specifying trading sessions.",
      "Dataset B frequently structures headlines with bullet points, line breaks, or multi-part summaries (e.g., 'STOCKS SURGE INTO THE CLOSE:\\n- Dow up 7.59%'), while Dataset A uses cohesive sentences with colons or hyphens.",
      "Dataset B features granular financial metrics like exact dividend amounts (e.g., '$0.0633 dividend') and contract values (e.g., '$39 million sole-source contract'), whereas Dataset A mentions financial terms more generally (e.g., 'raises dividend').",
      "Dataset B highlights specific equity offerings, capital raises, or stock dilution events (e.g., 'announcing capital raise'), which are absent in Dataset A's headlines.",
      "Dataset B includes real-time updates on earnings call transcripts (e.g., 'Edited Transcript of PLC.TO earnings conference call'), whereas Dataset A focuses on earnings outcomes without referencing transcripts.",
      "Dataset B references niche financial instruments (e.g., 'Jan 6 P' for put options) and derivatives, while Dataset A avoids such specialized trading products.",
      "Dataset B headlines often preview upcoming earnings (e.g., 'Q4 2019 Earnings Preview'), whereas Dataset A primarily reports finalized results (e.g., 'reports record-breaking profits').",
      "Dataset B integrates investor advice or opinion pieces (e.g., 'Put Your Smaller Bonus to Better Use'), while Dataset A maintains an objective, news-only tone."
    ],
    "qwen2.5-32b_zero-shot_bg_v1": [
      "Dataset B headlines frequently include informal language, social media handles, or slang (e.g., '@tictoc oh fuck yeah', 'buy the dip') absent in Dataset A's formal tone.",
      "Dataset B contains explicit references to dividend declarations (e.g., 'WPT Industrial REIT declares $0.0633 dividend') not consistently featured in Dataset A.",
      "Dataset B includes non-earnings operational updates (e.g., 'Disney is allowed to build nuclear power plants', 'New markets for the C-390 Millennium') unrelated to financial metrics, unlike A's earnings-centric focus.",
      "Dataset B headlines often list multiple market indices/percentages in a single entry (e.g., 'Dow up 7.59% \n- Nasdaq up 7.35%') whereas A typically isolates one metric per headline.",
      "Dataset B references niche financial instruments (e.g., 'Unusual Puts' listings) and retail trading strategies not mentioned in Dataset A's institutional analysis.",
      "Dataset B includes explicit mentions of COVID-19 impacts on revenue/operations (e.g., 'Integra LifeSciences sees drop in Q1 revenue amid COVID-19') as recurring themes, less prominent in A.",
      "Dataset B headlines frequently incorporate hashtags (e.g., '#economy #MarketScreener') for categorization, a feature absent in Dataset A.",
      "Dataset B contains granular commodity market updates (e.g., 'Oil nears three-month high', 'biofuel blending mandates') not systematically covered in Dataset A.",
      "Dataset B includes verbatim conference call transcripts/announcements (e.g., 'Edited Transcript of PLC.TO earnings conference call') while A summarizes analyst interpretations.",
      "Dataset B references specific contractual agreements (e.g., '$KTOS awarded $39 million contract') as standalone news items, unlike A's focus on financial performance implications."
    ],
    "qwen2.5-7b_zero-shot_v1": [
      "Dataset B headlines frequently include specific stock ticker symbols (e.g., $NVCR, $BYND) and explicit financial metrics (e.g., EPS outlooks, dividend amounts), whereas Dataset A uses generic company or sector references.",
      "Dataset B contains informal social media elements (e.g., tweets, hashtags like #MarketScreener) and conversational language, while Dataset A maintains formal, traditional news tone without such features.",
      "Dataset B emphasizes granular technical trading terms (e.g., 'cup base,' 'blue skyscraper') and chart patterns, while Dataset A focuses on broader volatility descriptors like 'surge' or 'plunge.'",
      "Dataset B highlights niche corporate actions (e.g., dividend declarations, specific contract awards) and regulatory filings, whereas Dataset A prioritizes macroeconomic policy impacts (e.g., Fed rate decisions).",
      "Dataset B includes direct quotes from analysts or executives (e.g., \"I think the most exciting part...\"), while Dataset A summarizes analyst actions without verbatim statements.",
      "Dataset B references localized geopolitical or regulatory events (e.g., London\u2019s Uber license denial, NYC lawsuits) with explicit jurisdiction details, whereas Dataset A discusses geopolitical risks generically.",
      "Dataset B features granular commodity market updates (e.g., LNG tanker idling, biofuel mandates) tied to specific companies, while Dataset A links commodities to broad sector performance.",
      "Dataset B integrates cryptocurrency and alternative asset coverage (e.g., Ethereum, Bitcoin), whereas Dataset A focuses exclusively on traditional equities and commodities.",
      "Dataset B includes real-time trading updates (e.g., premarket moves, intraday index percentages) and earnings call transcripts, while Dataset A reports post-event outcomes without live data.",
      "Dataset B explicitly names mid-cap or lesser-known firms (e.g., NanoString, Sorrento Therapeutics) and hyper-specific industries, whereas Dataset A centers on blue-chip companies and major sectors."
    ],
    "llama3.1-8b_zero-shot_bg_train-time-info_v1": [
      "Dataset B includes earnings previews and forward-looking announcements without immediate stock price reactions, while Dataset A exclusively reports post-event outcomes with explicit price impact statements.",
      "Dataset B contains informal language, social media references, and conversational elements (e.g., '@tictoc oh fuck yeah'), whereas Dataset A maintains formal, structured financial reporting tone throughout.",
      "Dataset B covers non-equity instruments like currencies (EUR/USD) and commodities in standalone headlines, while Dataset A focuses strictly on equity-related content even when mentioning ETFs.",
      "Dataset B includes unattributed analyst actions (e.g., 'Downgrades 4/7: $AAN') without specifying institutions, whereas Dataset A always names originating firms (Morgan Stanley, Oppenheimer).",
      "Dataset B features international macroeconomic developments (e.g., 'India's gold imports declined') as standalone headlines, while Dataset A exclusively ties macroeconomic context to specific corporate impacts.",
      "Dataset B omits intraday timing markers (premarket/post-market) present in 100% of relevant Dataset A headlines reporting price movements.",
      "Dataset B shows currency/commodity price forecasts without ticker associations (e.g., 'Silver Weekly Price Forecast'), while Dataset A always connects market movements to specific tickers.",
      "Dataset B includes corporate announcements without financial quantification (e.g., 'GE names new CFO'), whereas Dataset A consistently pairs strategic moves with metrics like '$170M cost cut'.",
      "Dataset B references non-US regulatory bodies/events (e.g., 'London Uber license decision') without equivalent specificity to Dataset A's FDA/SEC-focused compliance reporting.",
      "Dataset B contains social/political developments (e.g., 'Madhya Pradesh Vyapam Scam') unrelated to corporate performance, which never appear in Dataset A's company-specific focus."
    ],
    "qwen2.5-7b_few-shot_v1": [
      "Dataset B headlines frequently include social media elements (e.g., @mentions, hashtags, informal language) absent in A's formalized news tone",
      "Dataset B contains granular trading terminology (e.g., 'cup base,' 'blue skyscraper') and technical analysis references not seen in A",
      "Dataset B includes raw, unformatted data snippets (e.g., earnings call transcripts, ticker-specific previews) lacking A's polished article structure",
      "Dataset B emphasizes real-time price action updates (e.g., premarket/premarket moves, intraday %) vs. A's focus on post-event analysis",
      "Dataset B features niche trading instruments (e.g., options chains, dividend yields, short interest) less prevalent in A's broader market narratives",
      "Dataset B headlines frequently reference specific numerical thresholds for stocks/indices (e.g., '$26 price target') absent in A's qualitative emphasis",
      "Dataset B includes fragmented, sentence-fragment headlines (e.g., 'Downgrades 4/7: $AAN...') contrasting A's complete-sentence structure",
      "Dataset B incorporates cryptocurrency/blockchain coverage (e.g., Ethereum, Bitcoin) not present in A's traditional asset class focus",
      "Dataset B shows higher frequency of regional/localized market impacts (e.g., Canadian rail strikes, NYC lawsuits) vs. A's global macroeconomic lens",
      "Dataset B uses non-standard formatting (e.g., ALL CAPS, emojis, broken URLs) reflecting social/web-scraped sources, unlike A's editorial consistency"
    ],
    "llama3.3-70b_few-shot_bg_v1": [
      "Dataset B includes headlines about corporate partnerships and strategic expansions (e.g., Beyond Meat & Costco) absent in Dataset A.",
      "Headlines in Dataset B contain specific dividend declarations and updates (e.g., WPT Industrial REIT) not found in Dataset A.",
      "Dataset B references technical analysis terms (e.g., 'cup base,' 'blue skyscraper') absent in Dataset A's purely fundamental focus.",
      "Legal/regulatory actions (e.g., lawsuits, license revocations) are explicitly cited in Dataset B headlines but never in Dataset A.",
      "Dataset B includes granular macroeconomic updates (e.g., building permits, manufacturing activity) beyond the Federal Reserve focus shared by both datasets.",
      "Conversational language and social media references (e.g., '@tictoc') appear in Dataset B, unlike Dataset A's formal tone.",
      "Dataset B specifies options trading details (e.g., strike prices, expiration dates), which Dataset A omits.",
      "Standalone summaries of broad market index movements (e.g., 'Dow up 7.59%') are unique to Dataset B.",
      "Environmental/energy policy impacts (e.g., Norway\u2019s biofuel mandate) are highlighted in Dataset B but absent in Dataset A.",
      "Dataset B covers M&A activity and capital raises (e.g., Schwab-Ameritrade deal), which Dataset A excludes."
    ],
    "llama3.1-8b_few-shot_v1": [
      "Dataset B headlines more frequently start with or prominently feature stock ticker symbols (e.g., '$NVCR', '$PSMT') as a structural element, whereas A includes tickers less conspicuously.",
      "Dataset B includes granular technical trading terminology (e.g., 'cup base', 'blue skyscraper', 'buy signal') and chart patterns absent in A\u2019s broader market-focused language.",
      "Dataset B contains real-time or premarket/after-hours price movement updates (e.g., 'down 6% premarket'), while A focuses on post-market or general trading-day summaries.",
      "Dataset B headlines often reference specific analyst actions (e.g., 'reiterates Buy rating', 'price target raised') and direct quotes from analysts, unlike A\u2019s generalized mentions of forecasts.",
      "Dataset B includes explicit mentions of dividend yields, capital raises, or equity offerings (e.g., 'declares $0.0633 dividend', 'equity offering') as standalone news, whereas A integrates these into broader narratives.",
      "Dataset B incorporates social media elements (e.g., hashtags, informal language, '@mentions') and fragmented formatting (e.g., line breaks, bulleted lists), while A maintains formal, polished news structures.",
      "Dataset B emphasizes contractual, legal, or regulatory specifics (e.g., '1967 law', 'FCC chairman voices support') with granular detail, unlike A\u2019s higher-level policy discussions.",
      "Dataset B frequently highlights institutional investor activity (e.g., 'hedge funds bullish', 'institutional accumulation') and retail trading strategies, absent in A\u2019s macro-focused headlines.",
      "Dataset B includes earnings conference call transcripts, shareholder meeting updates, or edited financial presentations as standalone headlines, which A does not explicitly reference.",
      "Dataset B features niche industry or operational updates (e.g., 'recalls 262,000 pickup trucks', 'snow shortages impacting resorts') rather than A\u2019s emphasis on sector-wide trends."
    ],
    "llama3.1-8b_few-shot_bg_train-time-info_v1": [
      "Dataset B headlines more frequently omit stock ticker symbols and refer to companies by full name (e.g., 'Beyond Meat,' 'Disney') compared to Dataset A, where tickers are always present.",
      "Dataset B includes explicit mentions of geopolitical/regulatory events (e.g., 'Papua New Guinea LNG negotiations,' 'Uber London license denial') as primary drivers, while Dataset A ties these factors to company-specific outcomes.",
      "Dataset B incorporates social media references, hashtags (#economy), or informal language (e.g., 'oh fuck yeah Bloomberg'), absent in Dataset A's formal tone.",
      "Dataset B features international economic data (e.g., 'India\u2019s gold imports,' 'Euro forecasts') more prominently than Dataset A, which focuses on U.S.-centric metrics.",
      "Dataset B highlights dividend declarations (e.g., 'WPT Industrial REIT declares $0.0633 dividend') as standalone news, whereas Dataset A ties dividends to broader financial outlooks.",
      "Dataset B includes real-time market index updates (e.g., 'Dow up 7.59%') as headline subjects, while Dataset A references indices contextually (e.g., '$SPY Index Holds Steady').",
      "Dataset B explicitly mentions legal actions (e.g., 'New York sues Juul,' 'Prevent sues Volkswagen') as central events, unlike Dataset A, which treats legal factors as secondary risks.",
      "Dataset B frequently references specific contracts or deals (e.g., '$39 million sole-source contract for Kratos Defense') as headline drivers, while Dataset A emphasizes analyst actions.",
      "Dataset B discusses consumer behavior trends (e.g., 'Chinese Consumers Stay Home,' 'Retail winners and losers') as standalone themes, whereas Dataset A ties these to company earnings.",
      "Dataset B includes forward-looking macroeconomic predictions (e.g., 'Euro as a structural short,' 'Oil demand shock') as focal points, while Dataset A embeds macro trends in company guidance."
    ],
    "qwen2.5-7b_few-shot_bg_v1": [
      "Dataset B includes raw statistical data or percentage changes directly in headlines (e.g., 'Dow up 7.59%', 'Building permits rise 5%') without explicit qualitative context, while A pairs quantitative data with analysis.",
      "Dataset B contains informal or social media-style content (e.g., hashtags, mentions like '@tictoc', casual language), whereas A uses formal news headlines consistently.",
      "Dataset B features explicit dividend yield disclosures (e.g., '2.6% Dividend Yield') as standalone metrics, while A mentions dividends in broader financial performance contexts.",
      "Dataset B includes technical trading terms (e.g., 'cup base', 'blue skyscraper', 'relief rally') absent in A, which focuses on fundamental analysis.",
      "Dataset B highlights global commodity dynamics (e.g., oil prices, LNG imports) as primary topics, whereas A discusses commodities as secondary factors affecting companies.",
      "Dataset B references real-time market closes or intraday updates (e.g., 'STOCKS SURGE INTO THE CLOSE'), while A focuses on pre/post-market reactions to events.",
      "Dataset B incorporates legal/regulatory actions (e.g., lawsuits, license revocations) as standalone headlines, whereas A ties these to company-specific impacts.",
      "Dataset B includes conference call transcripts or earnings call summaries (e.g., 'Edited Transcript of PLC.TO earnings conference call'), which A omits.",
      "Dataset B emphasizes macroeconomic indicators (e.g., housing starts, PMI data) as primary subjects, while A mentions them peripherally in company narratives.",
      "Dataset B features niche sectors (e.g., cannabis, blockchain, LNG) more prominently, whereas A focuses on mainstream industries like tech and healthcare."
    ],
    "qwen2.5-7b_few-shot_bg_train-time-info_v1": [
      "Dataset B includes general business developments/partnerships without direct analyst actions (e.g., 'Partnering with Costco is the latest major expansion for Beyond Meat')",
      "Dataset B contains broader market summaries (e.g., 'STOCKS SURGE INTO THE CLOSE...') rather than individual stock impacts",
      "Dataset B references dividend declarations (e.g., 'WPT Industrial REIT declares $0.0633 dividend') absent in A",
      "Dataset B uses hashtags/social media formatting (e.g., '#economy #MarketScreener') indicating platform-specific sourcing",
      "Dataset B includes macroeconomic indicators (e.g., 'Building permits rise 5%') unrelated to specific companies",
      "Dataset B features corporate operational decisions (e.g., 'Exxon cuts investments...') without analyst attribution",
      "Dataset B highlights legal/regulatory actions (e.g., 'New York hits Juul with a lawsuit') not tied to financial metrics",
      "Dataset B covers commodities/global markets (e.g., 'Oil nears three-month high...') beyond stock-specific news",
      "Dataset B includes truncated headlines with ellipses (e.g., '$PSMT: PriceSmart announced...$306.1...') suggesting character limits",
      "Dataset B incorporates informal/conversational language (e.g., '@tictoc oh fuck yeah Bloomberg...') absent in A"
    ],
    "qwen2.5-7b_few-shot_bg_test-time-info_v1": [
      "Dataset B includes headlines with social media references, hashtags, or informal language (e.g., '@tictoc', '#economy'), whereas A uses formal, structured language without such elements.",
      "Dataset B references geopolitical events, international trade dynamics, or government policies (e.g., Saudi oil cuts, U.S.-China tariffs) as primary news drivers, while A focuses on company-specific financial metrics.",
      "Dataset B contains headlines about macroeconomic indicators (e.g., building permits, housing starts, GDP) unrelated to specific companies, which are absent in A.",
      "Dataset B includes dividend declarations (e.g., 'WPT Industrial REIT declares $0.0633 dividend') without analyst actions, whereas A ties dividends to analyst ratings or price targets.",
      "Dataset B features regulatory warnings, legal actions, or lawsuits (e.g., 'New York hits Juul with a lawsuit') not linked to analyst opinions, unlike A's focus on institutional analyst actions.",
      "Dataset B incorporates conversational or speculative phrasing (e.g., 'Can $MSFT head higher?'), while A uses declarative statements tied to quantifiable outcomes.",
      "Dataset B includes non-English characters or multilingual text (e.g., Chinese characters in '$SRNE - \u8d44\u6e90\u80fd\u6e90\u516c\u53f8\u53d1\u5e03\u540e\u80a1\u4ef7\u6ce2\u52a8\u4e0d\u5927'), which are absent in A.",
      "Dataset B highlights non-corporate events (e.g., rail strikes, natural disasters) as market influencers, whereas A emphasizes corporate actions like partnerships or acquisitions.",
      "Dataset B references conference call transcripts, legal filings, or earnings presentation summaries (e.g., 'Edited Transcript of PLC.TO earnings conference call'), which A does not include.",
      "Dataset B covers broader sector-wide trends (e.g., 'EVN (VIE:EVN) Share Price Has Gained 60%') without explicit ties to analyst ratings, unlike A's granular analyst-driven narratives."
    ],
    "llama3.3-70b_few-shot_bg_test-time-info_v1": [
      "Dataset B headlines include explicit mentions of dividend declarations and merger/acquisition activities (e.g., 'WPT Industrial REIT declares $0.0633 dividend'), which are absent in Dataset A.",
      "Dataset B contains headlines formatted with line breaks, bullet points, or hashtags (e.g., 'STOCKS SURGE INTO THE CLOSE:\\n- Dow up 7.59%'), while Dataset A uses only plain text.",
      "Dataset B references international geopolitical events (e.g., 'Saudis Slash Oil Prices in Asia') and country-specific economic data (e.g., 'India's gold imports declined'), whereas Dataset A focuses on U.S.-centric corporate actions.",
      "Dataset A consistently includes percentage-based stock price movements (e.g., '$KRUS (+5.1%)'), while Dataset B omits such granular price-change metrics.",
      "Dataset B features direct questions (e.g., 'Can $MSFT head higher?') and speculative language, which are absent in Dataset A's declarative, analyst-driven tone.",
      "Dataset B incorporates social media mentions or informal language (e.g., '@tictoc oh fuck yeah Bloomberg'), unlike Dataset A's strictly formal, institutional tone.",
      "Dataset B highlights legal/regulatory actions unrelated to analyst ratings (e.g., 'New York hits Juul with a lawsuit'), whereas Dataset A ties legal concerns to analyst decisions (e.g., 'supply chain concerns').",
      "Dataset A headlines explicitly tie macroeconomic factors to specific company guidance (e.g., 'rising inflation concerns'), while Dataset B reports macroeconomic trends as standalone updates (e.g., 'Building permits rise 5%').",
      "Dataset B includes announcements of equity offerings or capital raises (e.g., '$AQST - Aquestive readies equity offering'), which are not present in Dataset A.",
      "Dataset B references retail investor-focused events (e.g., 'Hedge Funds Are Buying Costamare Inc') and dividend yield analyses, whereas Dataset A focuses solely on institutional analyst actions."
    ],
    "llama3.1-8b_zero-shot_bg_test-time-info_v1": [
      "Dataset B headlines often omit direct mentions of specific analyst firms or individual analysts (e.g., 'Partnering with Costco' in B vs. 'Morgan Stanley upgrades $AMZN' in A).",
      "Dataset B includes non-English characters, informal language, or social media references (e.g., '@tictoc oh fuck yeah Bloomberg') absent in A.",
      "Dataset B features explicit dividend declarations (e.g., 'WPT Industrial REIT declares $0.0633 dividend'), while A focuses on analyst-driven price targets or ratings.",
      "Dataset B contains standalone macroeconomic data without direct stock ticker linkages (e.g., 'Building permits rise 5% in October'), whereas A ties macroeconomic factors to specific tickers (e.g., '$GDP down 2.5%').",
      "Dataset B includes legal/regulatory news unrelated to stock actions (e.g., 'Auto parts supplier Prevent sues Volkswagen'), while A links regulatory impacts to immediate stock performance (e.g., '$NYMT faces regulatory hurdles').",
      "Dataset B references international economic developments (e.g., 'Canada's economy could end up being the big loser') more frequently than A, which emphasizes company-specific regulatory or sector impacts.",
      "Dataset B incorporates non-equity asset classes (e.g., 'EUR/USD Price Forecast') and commodities (e.g., oil/gold), whereas A focuses exclusively on equities.",
      "Dataset B includes conference call transcripts or earnings call summaries (e.g., 'Edited Transcript of PLC.TO earnings conference call'), which are absent in A.",
      "Dataset B highlights dividend yields or income-focused metrics (e.g., '2.6% Dividend Yield Looks Pretty Interesting'), while A prioritizes growth metrics like revenue beats or EPS guidance.",
      "Dataset B features non-corporate entities (e.g., governments, central banks) as primary news drivers (e.g., 'Fed Officials Weigh Risks Of Covid-19'), whereas A centers on corporate actions and analyst sentiment."
    ],
    "llama3.3-70b_zero-shot_bg_test-time-info_v1": [
      "Headlines in Dataset B frequently reference broad market indices (e.g., Dow, Nasdaq, S&P) and their percentage movements, while Dataset A focuses solely on individual companies or sectors.",
      "Dataset B includes headlines with explicit dividend declarations (e.g., dividend amounts, yield percentages), whereas Dataset A primarily mentions dividends as a metric without specific payout details.",
      "Dataset B incorporates non-English characters or terms (e.g., Chinese text, international company names), reflecting a global scope, while Dataset A focuses on U.S.-centric entities and terminology.",
      "Headlines in Dataset B often feature formatted text (e.g., bullet points, colons, hashtags) and social media-style content (e.g., tweets, user mentions), unlike Dataset A's standardized news formatting.",
      "Dataset B includes macroeconomic data points (e.g., housing starts, building permits, oil inventory rates) as standalone headlines, whereas Dataset A contextualizes macro factors within company/sector performance.",
      "Legal actions, lawsuits, and regulatory decisions (e.g., FDA warnings, license revocations) are explicitly mentioned in Dataset B but rarely appear in Dataset A.",
      "Dataset B contains headlines posing direct questions about stock performance (e.g., 'Can $MSFT head higher?'), while Dataset A maintains declarative statements about analyst actions.",
      "Cryptocurrencies and blockchain developments are referenced in Dataset B (e.g., Ethereum, Ripple\u2019s XRP), whereas Dataset A exclusively covers traditional equities and ETFs.",
      "Dataset B includes granular commodity market updates (e.g., LNG tanker movements, OPEC decisions) beyond energy sector ETFs, which are absent in Dataset A.",
      "Headlines in Dataset B frequently mention employment data (e.g., job creation, layoffs) and labor market impacts, while Dataset A focuses on corporate financial metrics without workforce context."
    ],
    "llama3.3-70b_zero-shot_v1": [
      "Dataset B headlines frequently include specific stock tickers (e.g., $VIRT, $PSMT) directly in the title, whereas Dataset A does not.",
      "Dataset B incorporates granular financial metrics such as EPS outlook revisions (e.g., 'TJX raises 2020 EPS outlook') and dividend declarations (e.g., 'WPT Industrial REIT declares $0.0633 dividend'), while Dataset A focuses on broader metrics like earnings beats/misses.",
      "Dataset B includes non-traditional elements like social media tags (e.g., #economy, @tictoc), hyperlinks, and informal language, which are absent in Dataset A.",
      "Dataset B references niche sectors (e.g., blockchain, LNG, cannabis) and technical trading patterns (e.g., 'cup base,' 'blue skyscraper'), whereas Dataset A emphasizes mainstream sectors like tech and energy.",
      "Dataset B features headlines structured as real-time updates or recaps (e.g., 'RECAP 11/22 TRUMPSAYS...'), while Dataset A uses conventional headline formats.",
      "Dataset B highlights specific legal/regulatory actions (e.g., 'Prevent sues Volkswagen,' 'New York hits Juul with a lawsuit') rather than general policy impacts seen in Dataset A.",
      "Dataset B includes explicit technical analysis terms (e.g., 'institutional accumulation,' 'price forecast breakdown'), whereas Dataset A avoids such jargon.",
      "Dataset B uses bullet points, percentage changes (e.g., 'Dow up 7.59%'), and fragmented formatting, while Dataset A maintains prose-like sentences.",
      "Dataset B mixes financial updates with non-financial corporate developments (e.g., 'Disney is allowed to build nuclear power plants'), which Dataset A avoids.",
      "Dataset B emphasizes specific partnerships, contracts, or collaborations (e.g., 'Partnering with Costco...', '$KTOS awarded $39M contract'), whereas Dataset A focuses on macroeconomic or sector-wide trends."
    ],
    "llama3.1-8b_few-shot_bg_v1": [
      "Dataset B headlines frequently include social media elements (e.g., hashtags, @mentions, or informal language like 'oh fuck yeah Bloomberg') absent in Dataset A",
      "Dataset B contains more headlines with incomplete sentences/fragments (e.g., 'Can $MSFT head higher?') compared to Dataset A's full-sentence structures",
      "Dataset B includes more niche sector-specific updates (e.g., cannabis licensing, specific REIT dividends) versus Dataset A's focus on major sectors like tech/energy",
      "Dataset B shows higher frequency of raw percentage/price movements in opening lines (e.g., 'Dow up 7.59%') without contextual analysis prevalent in Dataset A",
      "Dataset B contains more headlines about small/micro-cap companies (e.g., $SRNE, $AQST) compared to Dataset A's focus on mega-cap stocks",
      "Dataset B includes more technical trading terms (e.g., 'cup base', 'EMA level', 'sympathy play') not commonly found in Dataset A's headlines",
      "Dataset B features more headlines structured as direct data points (e.g., 'Building permits rise 5%') without analyst attribution common in Dataset A",
      "Dataset B shows greater use of non-English characters/formatting issues (e.g., \u2026 ellipses, URL fragments) compared to Dataset A's clean formatting",
      "Dataset B includes more headlines about corporate actions with immediate numerical impacts (e.g., dividend declarations, share offerings) versus Dataset A's emphasis on forecasts",
      "Dataset B contains more regional/local economic updates (e.g., Madhya Pradesh scam, NYC pensions) compared to Dataset A's global/macroeconomic focus"
    ],
    "llama3.1-8b_few-shot_bg_test-time-info_v1": [
      "Dataset B headlines occasionally omit stock ticker symbols even when discussing specific companies, whereas Dataset A consistently includes tickers prefixed with a dollar sign.",
      "Dataset B includes non-traditional content formats (e.g., social media snippets, tweets, or conversational language), while Dataset A maintains a formal, structured news style.",
      "Dataset B features broader macroeconomic or geopolitical narratives (e.g., oil demand shocks, trade deal impacts) without always linking them to specific tickers, whereas Dataset A ties such themes directly to company performance or stock movements.",
      "Dataset B headlines more frequently reference technical chart patterns (e.g., 'cup base,' 'blue skyscraper') or speculative price action, while Dataset A focuses on analyst actions (e.g., downgrades) and fundamental metrics.",
      "Dataset B includes international regulatory or political developments (e.g., EU policies, U.K. elections) unrelated to specific U.S. companies, whereas Dataset A emphasizes corporate-level regulatory challenges.",
      "Dataset B contains explicit mentions of non-equity financial instruments (e.g., forex pairs like EUR/USD, cryptocurrencies like Ethereum), which are absent in Dataset A.",
      "Dataset B headlines often lack granular numerical specificity (e.g., 'EPS misses by $0.04' in A vs. 'Q4 beat' in B) and instead use qualitative descriptors like 'misses' or 'beats.'",
      "Dataset A consistently attributes analyst actions to specific firms (e.g., 'Morgan Stanley says'), while Dataset B sometimes omits institutional sources or uses vague attributions (e.g., 'analysts warn').",
      "Dataset B includes headlines about non-corporate entities (e.g., governments, central banks, or trade organizations), whereas Dataset A focuses exclusively on companies or funds.",
      "Dataset B references real-time market index movements (e.g., 'Dow up 7.59%') as standalone updates, while Dataset A embeds index impacts within company-specific contexts (e.g., '$SPY falls 1.4%')."
    ]
  }
}