{
  "sims": {
    "qwen2.5-7b_zero-shot_bg_train-time-info_v1": [
      "Both datasets include headlines featuring stock tickers (e.g., $MAR in A, $MRLN in B) and company names to identify market entities.",
      "Earnings reports, financial metrics (e.g., EPS, revenue), and analyst consensus comparisons are central to headlines in both datasets.",
      "Analyst ratings (upgrades/downgrades) and price target adjustments are frequently cited as market drivers in both datasets.",
      "Sector-specific trends (e.g., energy, tech, biotech) and macroeconomic factors (e.g., inflation, trade tensions) are highlighted in both.",
      "Regulatory actions, legal issues, and government probes directly impacting companies are common themes across both datasets.",
      "Company-specific developments, such as product recalls, clinical trials, or technological advancements, drive headlines in both.",
      "References to ETFs (e.g., $SPY in A, $XLK in B) and indices contextualize broader market movements in both datasets.",
      "Forward-looking statements, including growth projections, guidance revisions, and economic forecasts, are prominent in both.",
      "Supply chain disruptions, operational challenges, and cost-related risks are cited as critical factors in headlines from both datasets.",
      "Analyst firms (e.g., Morgan Stanley, JPMorgan) and their reports serve as primary sources of sentiment in headlines across both datasets."
    ],
    "qwen2.5-32b_zero-shot_bg_test-time-info_v1": [
      "Both datasets include headlines referencing specific stock tickers (e.g., $MAR in A, $COLL in B) alongside company names or news.",
      "Headlines frequently mention earnings reports, beats/misses, or financial metrics (e.g., EPS, revenue) relative to analyst expectations.",
      "Analyst actions (e.g., upgrades, downgrades, price target adjustments) are consistently highlighted in both datasets.",
      "Corporate events (e.g., mergers, acquisitions, partnerships, expansions) are a recurring theme across samples.",
      "Regulatory scrutiny, legal challenges, or compliance issues (e.g., recalls, licensing) are addressed in headlines from both datasets.",
      "Market movements (e.g., stock price fluctuations, sector performance) are explicitly tied to news catalysts in all samples.",
      "Sector-specific news (e.g., energy, tech, healthcare, retail) is granularly covered with industry jargon and metrics.",
      "Headlines incorporate macroeconomic indicators (e.g., oil prices, inflation, GDP) and their impact on companies or markets.",
      "Forward-looking statements (e.g., guidance, forecasts, clinical trial outcomes) are present in all samples.",
      "Formatting conventions (e.g., ticker symbols, percentages, mixed tenses for past results/future projections) are structurally consistent."
    ],
    "qwen2.5-7b_zero-shot_bg_test-time-info_v1": [
      "Both datasets include headlines referencing stock tickers (e.g., $MAR, $CVX, $SPY) paired with company names or financial instruments.",
      "Headlines in both datasets frequently mention earnings reports, revenue results, or financial metrics (e.g., EPS beats/misses, revenue guidance).",
      "Analyst actions (e.g., upgrades, downgrades, price target revisions) are a recurring theme in both datasets.",
      "Both datasets reference specific fiscal quarters (e.g., Q3, Q4) and calendar periods (e.g., March 2020, December 2019).",
      "Company names are often explicitly stated alongside tickers (e.g., 'Chevron' with $CVX, 'Morgan Stanley' with $MS).",
      "Headlines in both datasets focus on stock price movements, volatility, or technical indicators (e.g., 'shares up 5%', '52-week high').",
      "Sector-specific news (e.g., energy, tech, pharmaceuticals) is prevalent in both datasets, with granular mentions of industries like oil, cloud computing, or biotech.",
      "Regulatory actions (e.g., FDA decisions, license revocations) and macroeconomic factors (e.g., inflation, trade tensions) are common themes.",
      "Both datasets include references to ETFs (e.g., $GLD, $USO) and market indices (e.g., S&P 500, Nasdaq-100).",
      "Mergers, acquisitions, and partnerships are frequently highlighted as catalysts for stock movements in both datasets."
    ],
    "llama3.3-70b_zero-shot_bg_v1": [
      "All headlines are related to financial markets, corporate performance, or economic conditions.",
      "All headlines mention specific companies, financial instruments (e.g., stocks, ETFs), or institutions (e.g., regulators, banks).",
      "All headlines include numerical data, metrics (e.g., stock prices, revenue figures, percentages), or quantitative references (e.g., earnings targets, dividend amounts).",
      "All headlines reference events with direct implications for investors, traders, or market sentiment (e.g., earnings reports, mergers, regulatory actions).",
      "All headlines use formal financial terminology (e.g., EPS, price targets, downgrades, revenue misses, dividend declarations).",
      "All headlines include time-sensitive information (e.g., quarterly results, upcoming earnings dates, real-time market movements).",
      "All headlines focus on cause-and-effect relationships (e.g., regulatory decisions impacting stock prices, earnings results affecting analyst ratings).",
      "All headlines adhere to concise, declarative structures typical of financial news reporting.",
      "All headlines incorporate abbreviations or symbols common in finance (e.g., $, %, tickers like $TSLA, Q2, EPS).",
      "All headlines target an audience of investors, analysts, or market participants seeking actionable insights."
    ],
    "qwen2.5-32b_zero-shot_v1": [
      "Both datasets include headlines about stock price reactions to company-specific news (e.g., earnings, product launches, or regulatory actions).",
      "Both reference quarterly/annual earnings reports and analyst expectations (e.g., EPS beats/misses, revenue results).",
      "Both mention regulatory decisions impacting companies or industries (e.g., FDA approvals, license revocations, antitrust investigations).",
      "Both highlight macroeconomic indicators (e.g., GDP growth, unemployment rates, inflation, retail sales) influencing market sentiment.",
      "Both feature central bank policies or interest rate decisions (e.g., Federal Reserve announcements, monetary policy changes).",
      "Both include sector-specific volatility (e.g., tech, energy, healthcare) driven by earnings, innovation, or external factors.",
      "Both discuss analyst actions (e.g., stock upgrades/downgrades, price target revisions) and their market impact.",
      "Both emphasize forward-looking guidance (e.g., profit outlooks, economic forecasts, recession risks) from companies or institutions.",
      "Both reference market-moving geopolitical or trade events (e.g., U.S.-China tensions, OPEC decisions, Brexit implications).",
      "Both report mergers, acquisitions, or partnerships (e.g., project awards, stake purchases) affecting company valuations."
    ],
    "qwen2.5-32b_few-shot_bg_train-time-info_v1": [
      "Both datasets consistently include stock ticker symbols (e.g., $MAR in A, $DISH in B) within headlines.",
      "Headlines in both datasets frequently reference analyst actions (upgrades/downgrades) and price target adjustments (e.g., SunTrust RH in A, Morgan Stanley in B).",
      "Earnings reports (beats/misses) and revenue performance are central themes across all samples in both datasets.",
      "Regulatory challenges (e.g., London Uber license in A, Morgan Stanley downgrades in B) are explicitly mentioned.",
      "Numerical specificity is maintained (e.g., '$1,481.20/oz' in A, 'Q3 revenue $4.625B' in B) for financial metrics.",
      "Sector-specific impacts (e.g., oil prices in A, semiconductor slowdown in B) are granularly addressed across industries.",
      "Company-specific strategic moves (e.g., mergers in A, contract wins in B) are consistently highlighted.",
      "Market reactions (e.g., 'shares dropped' in A, 'stock soars' in B) are explicitly tied to news events in all samples.",
      "Forward-looking statements (e.g., '2020 profit outlook' in A, '2023 growth plans' in B) are consistently included.",
      "Geopolitical/economic macro factors (e.g., Brexit in A, global chip shortage in B) directly tie to company performance."
    ],
    "llama3.3-70b_few-shot_bg_train-time-info_v1": [
      "Both datasets include headlines containing stock tickers prefixed with '$' (e.g., $MAR, $SAEX).",
      "Analyst actions (e.g., upgrades, downgrades, price target changes) are frequently mentioned in both datasets.",
      "Headlines reference quarterly earnings reports, revenue results, and EPS metrics (e.g., 'Q2 earnings,' 'beats/misses').",
      "Market movements (e.g., stock price changes, commodity price fluctuations) are central to headlines in both datasets.",
      "Regulatory actions (e.g., FDA approvals, recalls, license revocations) impact company performance in both sets.",
      "Mergers, acquisitions, and partnerships are cited as drivers of market activity in both datasets.",
      "Company-specific operational updates (e.g., production cuts, new product launches) are highlighted in both.",
      "Sector-specific trends (e.g., energy, biotech, retail) contextualize financial performance in headlines.",
      "Financial terminology (e.g., 'dividend,' 'revenue,' 'price target') is consistently used across both datasets.",
      "Headlines focus on macroeconomic factors (e.g., recession fears, interest rates) influencing market sentiment."
    ],
    "llama3.3-70b_zero-shot_bg_train-time-info_v1": [
      "Both datasets include headlines referencing stock tickers with a '$' symbol (e.g., $TSLA, $MAR).",
      "Headlines in both datasets frequently mention analyst actions (e.g., upgrades, downgrades, price target changes) from firms like Morgan Stanley, Barclays, and Oppenheimer.",
      "Both datasets highlight quarterly earnings reports, revenue results, and EPS metrics (e.g., 'misses on revenue,' 'meets expectations').",
      "Company performance updates (e.g., production delays, expansion plans, guidance withdrawals) are central to headlines in both datasets.",
      "Both include mentions of sector-specific developments (e.g., energy, tech, healthcare, retail) and regulatory/legal issues (e.g., FDA actions, recalls).",
      "Market reactions (e.g., stock price movements, investor sentiment) are explicitly noted in headlines (e.g., 'shares surge,' 'stock trades flat').",
      "Economic indicators (e.g., inflation, consumer confidence) and macroeconomic trends (e.g., oil demand, labor markets) are addressed in both datasets.",
      "Headlines in both datasets use similar formatting conventions like colons, hyphens, and abbreviations (e.g., 'Q2,' 'EPS').",
      "References to investor events (e.g., earnings calls, conferences) and corporate announcements (e.g., mergers, partnerships) appear in both.",
      "Both datasets emphasize forward-looking statements (e.g., 'future growth strategies,' 'await guidance') and industry-specific risks (e.g., competition, supply chain)."
    ],
    "qwen2.5-32b_few-shot_bg_test-time-info_v1": [
      "Both datasets include headlines referencing specific company stock tickers using the '$' symbol (e.g., $MAR, $NVDA).",
      "Headlines in both datasets frequently mention analyst actions (e.g., upgrades, downgrades, price target adjustments).",
      "Earnings reports (e.g., revenue beats/misses, EPS results) are a central theme across all samples in both datasets.",
      "Both contain granular financial metrics such as percentages, dollar amounts, and comparative figures (e.g., 'Q3 adj. EPS 41 cents').",
      "Company-specific operational updates (e.g., partnerships, clinical trials, supply chain issues) appear consistently in both datasets.",
      "References to sector-specific trends (e.g., energy, tech, healthcare, retail) are present in all samples.",
      "Market reactions (e.g., 'shares rise/drop', 'stock price target raised/cut') are explicitly stated in every headline.",
      "Forward-looking statements (e.g., guidance, forecasts, strategic initiatives) are emphasized across all entries.",
      "Regulatory actions (e.g., FDA approvals, license revocations, recalls) are cited as market-moving events in both datasets.",
      "All headlines use standardized financial terminology (e.g., 'beats expectations', 'misses estimates', 'reaffirms guidance')."
    ],
    "qwen2.5-7b_zero-shot_bg_v1": [
      "Headlines in both datasets frequently include stock ticker symbols (e.g., $TSLA, $AAPL, $AMZN) to identify companies.",
      "Both datasets emphasize analyst actions (e.g., upgrades, downgrades, price target adjustments) from firms like Morgan Stanley, Barclays, and JPMorgan.",
      "Quarterly earnings reports (e.g., Q3, Q4) and performance metrics (EPS, revenue beats/misses) are central themes in headlines from both datasets.",
      "Stock price movements (e.g., \"shares up 5%\", \"plunges 10%\") and percentage changes are explicitly quantified in most samples.",
      "Headlines frequently reference macroeconomic factors (e.g., inflation, GDP growth, Fed policy) influencing market sentiment.",
      "Regulatory actions (e.g., FDA approvals, antitrust investigations) and geopolitical risks (e.g., supply chain disruptions) are common topics.",
      "Company-specific challenges (e.g., production delays, patent disputes, leadership changes) are highlighted in both datasets.",
      "Market indices (e.g., S&P 500, Nasdaq) and sector-specific trends (e.g., semiconductor demand, cloud services) are regularly mentioned.",
      "Forward-looking statements (e.g., earnings guidance, growth forecasts) and revisions by analysts appear consistently.",
      "Headlines blend granular financial data (e.g., revenue figures, dividend declarations) with qualitative narratives about investor sentiment."
    ],
    "qwen2.5-32b_zero-shot_bg_train-time-info_v1": [
      "Both datasets include headlines referencing stock ticker symbols (e.g., $MAR, $GE, $SPY, $IWM).",
      "Headlines frequently mention earnings reports, revenue results, or financial metrics (e.g., EPS beats/misses, revenue guidance).",
      "Analyst ratings, price target adjustments, and institutional commentary (e.g., Morgan Stanley, Barclays) are common themes.",
      "References to market indices, ETFs (e.g., SPY, IWM), or sector-specific funds (e.g., USO, XLE) appear in both datasets.",
      "Sector-specific news (e.g., energy, tech, retail, biotech) is granularly covered across headlines.",
      "Corporate actions like mergers, acquisitions, partnerships, or product launches are highlighted in both datasets.",
      "Regulatory, legal, or operational challenges (e.g., recalls, lawsuits, license revocations) are recurring topics.",
      "Forward-looking statements (e.g., earnings guidance, economic forecasts, strategic plans) are prevalent.",
      "Explicit mentions of stock price movements (e.g., \"shares rise/drop,\" \"surges,\" \"slides\") post-events are consistent.",
      "Global macroeconomic factors (e.g., trade tensions, currency fluctuations, geopolitical risks) influence headlines in both datasets."
    ],
    "llama3.1-8b_zero-shot_bg_v1": [
      "Both datasets include headlines referencing stock tickers with symbols (e.g., $AAPL, $TSLA, $NVDA).",
      "Headlines frequently mention earnings reports, revenue beats/misses, and company guidance updates.",
      "Analyst actions (e.g., price target changes, upgrades, downgrades) are central to headlines in both datasets.",
      "Market reactions (e.g., 'shares plunge,' 'stock rises') are explicitly tied to news events in all samples.",
      "References to macroeconomic indicators (e.g., GDP, inflation, unemployment, retail sales) are present in both.",
      "Sector-specific news (e.g., tech, energy, retail, biotech) is granularly covered across all samples.",
      "Regulatory or legal developments (e.g., FDA decisions, licensing issues, antitrust concerns) drive sentiment in headlines.",
      "Company-specific operational updates (e.g., product launches, mergers, layoffs) are consistently highlighted.",
      "Global economic trends (e.g., recession risks, trade tensions, central bank policies) contextualize market moves.",
      "Headlines use similar formatting conventions (e.g., colons, hyphens) to separate entities from news content."
    ],
    "llama3.1-8b_zero-shot_v1": [
      "Both datasets include headlines referencing stock price movements (e.g., gains, drops, surges, or tumbles) tied to corporate or macroeconomic events.",
      "Earnings reports (beats/misses) and financial metrics (EPS, revenue) are central themes in samples from both datasets.",
      "Headlines in both datasets frequently cite economic indicators like GDP, unemployment rates, retail sales, or consumer confidence.",
      "Central bank actions (e.g., Federal Reserve rate decisions, stimulus policies) directly influence market narratives in both datasets.",
      "Regulatory developments (e.g., FDA rulings, licensing decisions, antitrust scrutiny) drive market reactions in headlines from both sets.",
      "Market sentiment terms like \"surge,\" \"plunge,\" \"slump,\" \"rally,\" and \"caution\" are consistently used to frame volatility.",
      "Company-specific operational updates (layoffs, mergers, product launches, legal issues) are prominent in both datasets.",
      "Global macroeconomic risks (trade wars, recessions, geopolitical tensions) underpin market uncertainty in both sets.",
      "Analyst actions (upgrades/downgrades, price target revisions) are cited as catalysts for stock movements in both datasets.",
      "Sector-specific trends (tech, energy, retail, healthcare) are highlighted in headlines from both sets to contextualize performance."
    ],
    "llama3.3-70b_few-shot_v1": [
      "Both datasets include headlines referencing stock price movements (e.g., surges, drops) tied to company-specific news or earnings reports.",
      "Headlines in both datasets frequently mention quarterly earnings results, including beats/misses against analyst expectations.",
      "Both datasets highlight macroeconomic indicators (e.g., unemployment rates, GDP growth, consumer confidence) influencing market sentiment.",
      "References to Federal Reserve actions (e.g., interest rate decisions, monetary policy reports) appear in both datasets as market-moving events.",
      "Company-specific operational updates (e.g., product recalls, mergers, executive decisions) are common in headlines from both datasets.",
      "Ticker symbols or explicit company names are used to identify firms in nearly all headlines across both datasets.",
      "Analyst upgrades/downgrades and price target revisions are cited as catalysts for stock movements in both datasets.",
      "Numerical specificity (e.g., percentage changes, dollar amounts, metric values) is consistently used to quantify market impacts.",
      "Regulatory actions (e.g., FDA decisions, licensing changes, sanctions) are framed as key business risks or catalysts in both datasets.",
      "Global economic factors (e.g., trade tensions, geopolitical events, currency fluctuations) are contextualized as drivers of market volatility in both datasets."
    ],
    "qwen2.5-32b_few-shot_bg_v1": [
      "Both datasets include headlines referencing specific company stock tickers (e.g., $AAPL, $TSLA, $MAR, $GOOG).",
      "Headlines in both datasets frequently mention analyst actions (e.g., price target adjustments, upgrades/downgrades).",
      "Earnings reports (beats/misses) and revenue results are central themes in samples from both datasets.",
      "Both datasets include mentions of macroeconomic indicators (e.g., interest rates, inflation, consumer confidence).",
      "References to institutional financial firms (e.g., Morgan Stanley, Barclays, Goldman Sachs) appear in both datasets.",
      "Headlines in both datasets mix company-specific news with broader sector or market trends (e.g., tech volatility, supply chains).",
      "Stock price reactions (e.g., \"shares rise\", \"plummet\") are explicitly stated in headlines across both datasets.",
      "Both datasets highlight forward-looking statements (e.g., earnings guidance, economic forecasts, growth projections).",
      "Regulatory or geopolitical impacts (e.g., Fed policies, sanctions, trade disruptions) are recurring topics in both.",
      "Headlines in both datasets use a similar structure: company/event + quantitative metric + analyst/agency attribution."
    ],
    "qwen2.5-32b_few-shot_v1": [
      "Both datasets include headlines mentioning specific companies and stock tickers (e.g., $MAR in A, Tech Giant in B).",
      "Headlines in both datasets report earnings results, revenue misses/beats, and profit/loss metrics relative to analyst expectations.",
      "Both datasets highlight stock price movements (e.g., gains, drops, surges) linked to corporate news or financial performance.",
      "Analyst actions (upgrades, downgrades, price target adjustments) are frequently cited in headlines from both datasets.",
      "Economic indicators (e.g., inflation, GDP, unemployment rates, oil prices) are central themes in headlines across both datasets.",
      "References to central bank policies (e.g., Federal Reserve interest rate decisions) appear consistently in both datasets.",
      "Sector-specific news (e.g., tech, energy, healthcare, retail) is a recurring focus in headlines from both datasets.",
      "Regulatory or government actions (e.g., FDA approvals, OPEC decisions) impact market narratives in both datasets.",
      "Headlines in both datasets use market sentiment terms (e.g., 'optimism,' 'uncertainty,' 'recession fears') to frame outcomes.",
      "Quantitative data (e.g., percentages, monetary values, timeframes) is embedded in headlines across both datasets for precision."
    ],
    "qwen2.5-32b_zero-shot_bg_v1": [
      "Both datasets include stock ticker symbols (e.g., $MAR, $AAPL) to denote companies.",
      "Headlines reference analyst actions (e.g., upgrades, downgrades, price target changes) from firms like Morgan Stanley and Barclays.",
      "Earnings reports (e.g., EPS beats/misses, revenue results) are a central focus in both datasets.",
      "Market indices (e.g., S&P 500, Nasdaq) and macroeconomic indicators (e.g., unemployment rates, Fed policy) are mentioned.",
      "Sector-specific updates (e.g., tech, energy, retail) are covered across all samples.",
      "Company-specific developments (e.g., mergers, product launches, partnerships) are highlighted in headlines.",
      "Regulatory actions, legal challenges, and recalls are recurring themes (e.g., FDA rulings, license revocations).",
      "Headlines use inconsistent capitalization (e.g., mixed case, lowercase) and financial abbreviations (e.g., EPS, Q3, GDP).",
      "Explicit financial metrics (e.g., stock price percentages, revenue figures) are included in most samples.",
      "Forward-looking statements (e.g., forecasts, guidance revisions) and market sentiment (bullish/bearish) are emphasized."
    ],
    "qwen2.5-7b_zero-shot_v1": [
      "Both datasets include headlines referencing stock tickers and company names (e.g., $MAR, Apple, Tesla).",
      "Headlines frequently mention earnings reports, revenue results, or analyst estimates (e.g., 'EPS beats,' 'revenue misses').",
      "Market indices (e.g., S&P 500, Dow Jones) and sector-specific performance (tech, energy) are recurring themes.",
      "Monetary policy updates (e.g., Federal Reserve interest rates) and macroeconomic indicators (e.g., unemployment, GDP) are covered in both.",
      "Regulatory actions (e.g., FDA decisions, antitrust scrutiny) impact headlines in both datasets.",
      "Stock price movements (e.g., 'surges,' 'plunges,' 'hits record highs') are quantified with percentages or numerical targets.",
      "Forward-looking statements (e.g., 'guidance cut,' 'economic forecasts') appear consistently across samples.",
      "Headlines incorporate analyst actions (e.g., upgrades, downgrades, price target revisions) and institutional sentiment.",
      "Sector-specific challenges (e.g., semiconductor shortages, oil price volatility) drive narrative similarities.",
      "Both datasets blend formal financial terminology with colloquial phrases (e.g., 'cash is king,' 'hard landing')."
    ],
    "llama3.1-8b_zero-shot_bg_train-time-info_v1": [
      "Both datasets consistently include stock ticker symbols prefixed with '$' (e.g., $MAR in A, $NYMT in B).",
      "Headlines frequently reference quarterly earnings results, forecasts, or analyst estimates (e.g., 'EPS misses' in A, 'Q4 earnings beat' in B).",
      "Analyst actions (upgrades, downgrades, price target changes) are a recurring theme (e.g., 'downgraded to sell' in A, 'Morgan Stanley upgrades' in B).",
      "Company-specific operational developments dominate headlines (e.g., plant closures in A, partnership announcements in B).",
      "Sector-specific focus (energy, tech, healthcare) is granularly addressed (e.g., 'oil reserves' in A, 'renewable energy pipeline' in B).",
      "Explicit mentions of financial metrics like revenue, dividends, and margins appear across all samples (e.g., 'dividend boost' in A, 'share buyback' in B).",
      "Regulatory/legal impacts on businesses are consistently highlighted (e.g., 'FDA ban' in A, 'regulatory probe' in B).",
      "Stock price movements (gains, losses, volatility) are explicitly quantified (e.g., 'up 24.57%' in A, 'slumps 15%' in B).",
      "Macroeconomic factors (interest rates, inflation, GDP) contextualize market performance (e.g., 'Fed moves' in A, 'economic uncertainty' in B).",
      "Headlines use standardized abbreviations for temporal references (e.g., 'Q3 2020' in A, 'FY2024' in B)."
    ],
    "qwen2.5-7b_few-shot_v1": [
      "Both datasets include headlines referencing company-specific earnings reports and financial performance metrics (e.g., revenue, EPS).",
      "Headlines in both datasets frequently mention stock price movements, including gains, declines, and price targets.",
      "Regulatory actions (e.g., recalls, licensing, antitrust scrutiny) are a common theme in both datasets.",
      "Both datasets highlight macroeconomic indicators such as retail sales, GDP, employment data, and consumer confidence.",
      "Sector-specific news (e.g., tech, energy, healthcare) is prominently featured in headlines across both datasets.",
      "Market sentiment analysis (e.g., investor optimism, recession fears, risk-taking) is a recurring topic in both.",
      "Central bank policies (e.g., Federal Reserve rate decisions, monetary easing) are frequently cited in headlines.",
      "Both datasets include updates on mergers, acquisitions, and corporate strategic decisions (e.g., expansions, layoffs).",
      "Commodity price fluctuations (e.g., oil, gold) and their market implications are covered in both datasets.",
      "Headlines in both datasets use financial terminology and abbreviations (e.g., EPS, ETFs, stock tickers) consistently."
    ],
    "llama3.3-70b_few-shot_bg_v1": [
      "Both datasets include stock tickers prefixed with a '$' symbol.",
      "Headlines frequently reference analyst actions such as upgrades, downgrades, or revised price targets.",
      "Earnings reports (e.g., EPS, revenue) and release dates are prominently featured in both datasets.",
      "Quantitative financial data (e.g., percentages, price targets, dollar values) is consistently included.",
      "Company-specific performance metrics (e.g., revenue misses/beats, profit outlooks) are highlighted.",
      "Macroeconomic indicators (e.g., Federal Reserve reports, consumer confidence data) are mentioned.",
      "Regulatory, legal, or operational risks (e.g., recalls, sanctions, license issues) are addressed.",
      "Sector-specific trends (e.g., tech, energy, automotive) are discussed across samples.",
      "Stock price reactions to news (e.g., 'shares drop,' 'trades sideways') are explicitly stated.",
      "Financial institutions (e.g., Morgan Stanley, Barclays) are cited as sources of analysis or ratings."
    ],
    "llama3.1-8b_few-shot_v1": [
      "Both datasets focus on financial market movements and stock price reactions to news events.",
      "Headlines in both datasets frequently mention earnings reports, revenue figures, and analyst estimates.",
      "Both include updates on macroeconomic indicators such as GDP growth, unemployment rates, and consumer confidence.",
      "Regulatory actions, legal decisions, and government policy impacts on companies are common themes.",
      "Commodity price fluctuations (oil, gold) and energy sector developments are prominently covered.",
      "Central bank policies, interest rate decisions, and inflation concerns appear in both datasets.",
      "Both highlight company-specific developments like mergers, acquisitions, partnerships, and product launches.",
      "Market volatility driven by geopolitical events (e.g., Brexit, U.S.-China trade tensions) is a recurring theme.",
      "Both datasets emphasize investor sentiment through mentions of hedge fund activity, dividend changes, and P/E ratios.",
      "Currency exchange rate movements and their economic implications are discussed in headlines from both datasets."
    ],
    "llama3.1-8b_few-shot_bg_train-time-info_v1": [
      "Both datasets include headlines referencing stock tickers using the '$' symbol notation (e.g., $MAR in A, $TSLA in B).",
      "All samples mention corporate entities, financial instruments, or market indices (e.g., Exxon in A, SPDR S&P 500 ETF in B).",
      "Price movements and financial metrics are explicitly stated (e.g., 'settle at $1,481.20/oz' in A, 'slides 5%' in B).",
      "Analyst actions/ratings appear consistently (e.g., 'downgraded to in line from outperform' in A, 'Morgan Stanley downgrades' in B).",
      "Earnings reports and guidance updates are central to headlines (e.g., 'EPS misses by $0.33' in A, 'Q2 earnings preview' in B).",
      "Sector-specific developments are emphasized (e.g., 'U.S. Shale Producers' in A, 'Tech Sector Index ETF' in B).",
      "Regulatory/legal impacts are noted (e.g., 'FDA Approves First Drug' in A, 'FDA delays decision' in B).",
      "Forward-looking statements about performance appear in all samples (e.g., '2020 profit outlook' in A, '2023 Outlook Cut' in B).",
      "Market sentiment indicators are included (e.g., 'Consumer Confidence Surges' in A, 'consumer confidence amid pandemic' in B).",
      "Corporate actions like M&A and partnerships are highlighted (e.g., 'acquires diagnostic tests' in A, 'acquires dating app Hinge' in B)."
    ],
    "qwen2.5-7b_few-shot_bg_v1": [
      "Headlines focus on financial markets, corporate performance, or macroeconomic indicators.",
      "Mentions specific companies, stock tickers (e.g., $TSLA, $AAPL), or financial instruments.",
      "Includes analyst actions (e.g., upgrades, downgrades, price target revisions).",
      "References earnings reports, revenue results, or financial metrics (e.g., EPS, misses/beats).",
      "Discusses regulatory, legal, or geopolitical developments impacting businesses.",
      "Highlights market trends (e.g., oil prices, interest rates, consumer spending).",
      "Reports stock price movements, percentage gains/losses, or trading volume.",
      "Covers mergers, acquisitions, layoffs, product launches, or strategic corporate decisions.",
      "Uses standardized financial phrases (e.g., 'beats estimates,' 'raises guidance').",
      "Contains forward-looking statements (e.g., forecasts, economic outlooks, earnings guidance)."
    ],
    "llama3.1-8b_few-shot_bg_v1": [
      "Both datasets include headlines mentioning specific stock tickers (e.g., $AAPL, $GOOGL, $TSLA) to identify companies.",
      "Analyst actions (upgrades, downgrades, price target adjustments) are explicitly cited in headlines from both datasets.",
      "Headlines reference quarterly earnings reports, revenue results, and EPS figures (e.g., 'misses on revenue,' 'beats estimates').",
      "Economic indicators (e.g., GDP, CPI, unemployment claims, retail sales) are frequently cited as drivers of market sentiment.",
      "Regulatory or legal developments (e.g., FDA decisions, antitrust investigations, license revocations) impact company performance in both datasets.",
      "Market reactions to news (e.g., 'shares plunge,' 'stock surges') are explicitly tied to events in headlines.",
      "Sector-specific trends (tech, automotive, energy, retail) are highlighted across industries in both datasets.",
      "Global macroeconomic factors (e.g., trade tensions, Brexit, EU policies) are cited as catalysts for market movements.",
      "Quantitative financial metrics (e.g., percentage gains/losses, revenue figures, profit margins) are consistently included.",
      "Forward-looking statements (e.g., earnings guidance, growth forecasts, economic projections) are central to headlines in both datasets."
    ],
    "llama3.3-70b_zero-shot_v1": [
      "Both datasets include headlines focused on stock price movements, such as surges, plunges, or drops in response to company-specific or macroeconomic news.",
      "Earnings reports and financial results (beating/missing expectations) are central themes driving market reactions in both datasets.",
      "References to Federal Reserve decisions/meetings regarding interest rates and monetary policy appear frequently in both sets.",
      "Economic indicators like recession fears, inflation concerns, and GDP growth are cited as market drivers in both datasets.",
      "Sector-specific performance (e.g., tech stocks, energy, retail) is highlighted across headlines in both collections.",
      "Analyst upgrades/downgrades, price target changes, and institutional sentiment are explicitly mentioned in both datasets.",
      "Company-specific catalysts (e.g., product launches, regulatory approvals, leadership changes) drive individual stock reactions in both sets.",
      "Market-wide volatility triggers like trade tensions, geopolitical events, and pandemic impacts are addressed in both datasets.",
      "Quantitative metrics such as EPS figures, revenue comparisons, and percentage price changes are consistently included in headlines.",
      "Forward-looking statements about guidance adjustments, economic projections, and policy impacts appear in both datasets."
    ],
    "llama3.1-8b_few-shot_bg_test-time-info_v1": [
      "Both datasets include headlines referencing stock ticker symbols (e.g., $MAR, $CMD) to identify companies.",
      "Headlines frequently mention analyst actions such as price target revisions, upgrades, or downgrades (e.g., Moody's in A, Barclays in B).",
      "Earnings reports, including EPS beats/misses and revenue results, are a common focus (e.g., BJ's Wholesale in A, Newmark Group in B).",
      "Regulatory or legal developments affecting companies are highlighted (e.g., FDA bans in A, FDA approvals in B).",
      "Sector-specific trends (e.g., energy, tech, healthcare) are discussed in both datasets (e.g., shale producers in A, natural gas in B).",
      "Mergers, acquisitions, and partnerships are frequently mentioned (e.g., Shell in A, Enbridge in B).",
      "Dividend announcements and changes are a recurring theme (e.g., Suncor in A, Kimco Realty in B).",
      "Macroeconomic indicators (e.g., retail sales, consumer confidence) are contextualized within market impacts (e.g., November sales in A, trade tensions in B).",
      "Geopolitical events (e.g., Brexit in A, Ukraine-Russia tensions in B) are linked to market movements.",
      "Product launches, patent disputes, or innovations are covered (e.g., Align patent case in A, new treatments in B)."
    ],
    "qwen2.5-7b_few-shot_bg_test-time-info_v1": [
      "Both datasets include headlines referencing stock tickers (e.g., $MAR, $COLL, $SPY) and company-specific financial updates.",
      "Headlines frequently mention analyst actions such as upgrades, downgrades, or price target adjustments (e.g., 'Moody's turns negative,' 'Morgan Stanley upgrades').",
      "Earnings reports (e.g., 'EPS beats/misses,' 'revenue guidance') are a common focus in both datasets.",
      "Revenue growth, cost pressures, and margin challenges are recurring themes (e.g., 'misses on revenue,' 'supply chain disruptions').",
      "Regulatory or legal developments impacting businesses (e.g., 'FDA bans,' 'license suspensions') appear in both datasets.",
      "Market indices and ETFs (e.g., S&P 500, $GLD) are cited as benchmarks for performance or sentiment.",
      "Company-specific operational updates (e.g., partnerships, layoffs, product launches) are granularly covered in both.",
      "Merger/acquisition activity or strategic business shifts (e.g., 'explores sale,' 'stake sales') are highlighted in headlines.",
      "Sector-specific trends (e.g., energy markets, tech disruptions, retail performance) are addressed in granular detail.",
      "Forward-looking statements (e.g., 'guidance cuts,' 'growth forecasts') are consistently included to contextualize market impact."
    ],
    "llama3.3-70b_few-shot_bg_test-time-info_v1": [
      "Both datasets consistently include stock tickers (e.g., $GOOG, $TSLA) in headlines to reference specific companies.",
      "Headlines in both datasets frequently mention analyst firms like Morgan Stanley, Barclays, and Oppenheimer for ratings or price targets.",
      "Quarterly earnings results (e.g., beats/misses, revenue figures) are a focal point in both datasets.",
      "Price target adjustments (e.g., raised, cut) by analysts are explicitly highlighted in headlines across both datasets.",
      "Specific financial metrics such as EPS and revenue deviations from consensus estimates are quantified in both datasets.",
      "Analyst rating changes (e.g., upgrades to 'buy' or downgrades to 'underweight') are prominently featured in headlines.",
      "Forward-looking guidance from companies (e.g., earnings outlooks, production goals) is referenced in both datasets.",
      "Sector-specific developments (e.g., energy, healthcare, tech) are covered in granular detail across headlines.",
      "Market indices (e.g., S&P 500, Dow Jones) and ETFs (e.g., $SPY, $XLE) are cited as performance benchmarks.",
      "Regulatory actions, legal disputes, or operational risks (e.g., FDA decisions, license revocations) are reported in both datasets."
    ],
    "llama3.1-8b_zero-shot_bg_test-time-info_v1": [
      "Both datasets consistently include stock tickers prefixed with '$' (e.g., $MAR, $VZ, $T).",
      "Headlines frequently reference earnings reports, revenue results, or financial metrics (e.g., EPS misses, Q3 revenue beats).",
      "Analyst actions (upgrades, downgrades, price target adjustments) are prominently featured (e.g., Morgan Stanley, Oppenheimer, UBS).",
      "Company-specific developments (mergers, partnerships, expansions) are central to headlines (e.g., Shell acquisitions, Uber partnerships).",
      "Market indices and ETFs (e.g., $SPY, $XLE) are cited as performance benchmarks in both datasets.",
      "Regulatory or legal impacts on companies are highlighted (e.g., FDA approvals, license revocations, antitrust scrutiny).",
      "Industry-specific trends (e.g., oil price fluctuations, retail demand shifts) are contextualized within news updates.",
      "Quantitative data (percentages, price targets, revenue figures) are explicitly stated (e.g., '20% gain,' '$2.8 billion revenue miss').",
      "Economic indicators (GDP, inflation, consumer confidence) are tied to market movements in both datasets.",
      "Forward-looking statements (guidance, forecasts, projected growth) are emphasized (e.g., '2023 sales guidance,' 'plans to hire 500 staff')."
    ],
    "qwen2.5-7b_few-shot_bg_train-time-info_v1": [
      "Both datasets include headlines referencing analyst actions such as upgrades, downgrades, and ratings (e.g., Moody's, Oppenheimer).",
      "Price target adjustments (e.g., raised to $335, cut to $15) are explicitly mentioned in samples from both datasets.",
      "Earnings reports (e.g., EPS beats/misses, revenue results) are a central focus across all samples in both datasets.",
      "Headlines frequently incorporate financial terminology like EPS, revenue, dividends, and guidance.",
      "Specific stock ticker symbols prefixed with \"$\" (e.g., $MAR, $DIS) are consistently used to identify companies.",
      "Regulatory, economic, or geopolitical impacts on companies (e.g., FDA bans, sanctions) are highlighted in both datasets.",
      "Quantitative data such as percentages, stock prices, and monetary values are included in nearly all samples.",
      "Forward-looking statements (e.g., profit outlooks, guidance cuts) are common across headlines in both datasets.",
      "Sector-specific references (e.g., energy, biotech, tech) contextualize company performance in both datasets.",
      "Market reactions (e.g., stock price changes, investor sentiment shifts) are explicitly tied to news events in all samples."
    ],
    "llama3.3-70b_zero-shot_bg_test-time-info_v1": [
      "Both datasets include headlines referencing specific companies, stock tickers (using $ notation), or financial instruments.",
      "All headlines pertain to financial markets, corporate performance, economic indicators, or regulatory actions affecting businesses.",
      "Analyst actions (upgrades, downgrades, price target adjustments) are a recurring theme in both datasets.",
      "Earnings reports (past results, release dates, or expectations) are consistently mentioned across samples.",
      "Numerical data (e.g., stock prices, percentages, revenue figures) is present in all headlines, either explicitly or contextually implied.",
      "Industry-specific terminology (e.g., EPS, dividends, revenue, price target) is used in all samples.",
      "Headlines focus on events with direct market impact, such as mergers, regulatory decisions, or earnings surprises.",
      "Both datasets emphasize time-sensitive updates (e.g., quarterly results, conference dates, real-time stock movements).",
      "Mentions of sectors (e.g., energy, tech, healthcare) or macroeconomic trends (e.g., oil prices, interest rates) appear in all samples.",
      "All headlines adopt a concise, factual tone characteristic of financial journalism, prioritizing actionable information for investors."
    ]
  },
  "diffs_synth_from_real": {
    "qwen2.5-7b_zero-shot_bg_train-time-info_v1": [
      "Dataset B headlines predominantly focus on analyst actions (upgrades/downgrades/price targets) as primary drivers, while A includes non-analyst factors like regulatory decisions, geopolitical events, and consumer trends.",
      "B headlines consistently name specific analyst firms (e.g., Morgan Stanley, Jefferies) in every entry, whereas A sometimes omits attribution to analyst sources.",
      "Dataset A contains headlines with non-English text/global market references (e.g., Myanmar tourism, Brazil politics), while B remains U.S.-centric with minimal international context.",
      "A includes headlines with non-financial content (e.g., Antonio Brown gossip, Halsey shower tweet) absent in B\u2019s strictly finance-focused entries.",
      "B emphasizes explicit stock price reactions to news (e.g., 'sends shares lower'), while A often reports events without immediate market impact statements.",
      "Dataset A references social media handles/calls-to-action (e.g., '@KristinReports', 'Subscribe to Seeking Alpha'), whereas B maintains formal, platform-agnostic language.",
      "A features granular commodity price updates (e.g., 'Feb. gold climbs $8.90'), while B focuses on sector/ETF trends through analyst ratings rather than raw metrics.",
      "B headlines systematically structure information as '[Ticker] - [Analyst Firm] + [Action]', while A uses variable formats (e.g., questions, narrative statements).",
      "Dataset A includes company-supplied forward guidance (e.g., Caterpillar\u2019s profit outlook), while B\u2019s forward-looking statements derive exclusively from analyst projections.",
      "A integrates breaking news about layoffs/political developments (e.g., WeWork jobs, Bolsonaro) unrelated to analyst sentiment, which B excludes entirely."
    ],
    "qwen2.5-32b_zero-shot_bg_test-time-info_v1": [
      "Dataset B headlines consistently include both company names and ticker symbols in a standardized format (e.g., '$COLL - Collabria Care'), whereas Dataset A often uses tickers or names separately or omits one.",
      "Dataset B emphasizes specific fiscal quarters (e.g., Q2, Q3) in earnings reports, while Dataset A frequently omits quarter references in favor of general timeframes (e.g., 'last quarter').",
      "Dataset B headlines strictly avoid informal language, emojis, or non-financial commentary (e.g., no equivalents to Dataset A's 'So Halsey needs a shower' or casual tweets).",
      "Dataset B consistently names specific analyst firms (e.g., Barclays, Morgan Stanley) in ratings actions, while Dataset A refers to analysts generically (e.g., 'analysts' expectations').",
      "Dataset B uses present tense for earnings announcements and forward-looking statements (e.g., 'reports,' 'announces'), whereas Dataset A mixes tenses (e.g., 'dropped,' 'has improved').",
      "Dataset A includes multilingual headlines (e.g., Chinese text) and regional regulatory actions (e.g., UK recalls), while Dataset B remains monolingual and focuses on global/U.S. entities.",
      "Dataset B headlines prioritize structured corporate updates (e.g., partnerships, guidance) without editorializing, unlike Dataset A's occasional opinion pieces (e.g., 'The FT View...').",
      "Dataset A incorporates market-specific timestamps and intraday price data (e.g., '9:45 am $SPY 272.99'), while Dataset B avoids real-time trading metrics.",
      "Dataset B features repetitive sector focus (e.g., energy, tech) with recurring tickers like $XLE and $AAPL, whereas Dataset A covers a broader, more eclectic mix of industries.",
      "Dataset B standardizes clinical trial/partnership impact language (e.g., 'sets stage for future collaborations'), while Dataset A uses varied phrasing (e.g., 'nabs accelerated review')."
    ],
    "qwen2.5-7b_zero-shot_bg_test-time-info_v1": [
      "Dataset A headlines include non-financial social/political content (e.g., Antonio Brown gossip, Ruth Bader Ginsburg health updates) absent in B's strictly financial focus",
      "Dataset A contains explicit mentions of geopolitical events (e.g., Brexit, Brazil's politics) while B focuses purely on corporate/analyst actions",
      "Dataset A includes macroeconomic commentary without ticker links (e.g., 'Cash is king' recession prep) whereas B ties all analysis to specific securities",
      "Dataset A features consumer-facing metrics (e.g., holiday debt surveys, retail sales) not present in B's institutional analyst perspective",
      "Dataset B systematically names analyst firms (Morgan Stanley, Oppenheimer) in every headline - A only sometimes references sources",
      "Dataset A contains non-corporate regulatory actions (e.g., London Uber license, vaping bans) while B's regulations focus on company-specific impacts",
      "Dataset B headlines emphasize rating changes (upgrade/downgrade) as primary drivers, whereas A shows more diverse catalysts like earnings gaps/volume spikes",
      "Dataset A includes technical trading language (e.g., 'cleared earnings gap at 71', 'STOCKS up vs down') absent in B's fundamentals-driven narratives",
      "Dataset A shows currency/FX market commentary (AUD/USD forecasts) while B remains equity/ETF focused",
      "Dataset B maintains consistent structure (Ticker - Analyst Action - Rationale) while A varies between news alerts, data points, and editorial commentary"
    ],
    "llama3.3-70b_zero-shot_bg_v1": [
      "Dataset B headlines consistently focus on analyst actions (e.g., downgrades, price target changes, rating updates) as primary news drivers, while A includes diverse triggers like earnings reports, regulatory decisions, and macroeconomic trends",
      "B exclusively uses standardized ticker symbol formatting ($NVDA) without variation, while A contains mixed formats (e.g., $MAR, NYSE:FBHS, LON:DCC)",
      "All B samples maintain a strict [Institution] + [Action] + [Ticker] syntactic structure, whereas A employs varied sentence structures including quotes, questions, and commentary",
      "B headlines emphasize rating terminology (underweight, equal weight, neutral) absent in A's samples",
      "100% of B samples reference specific financial institutions making recommendations (Morgan Stanley, Barclays, Oppenheimer), while A mentions diverse entities including regulators, companies, and non-financial organizations",
      "B exclusively focuses on large-cap tech stocks (NVDA, TSLA, GOOG, AAPL), while A covers diverse sectors including energy, healthcare, retail, and industrials",
      "All B headlines contain exactly one ticker symbol per headline, while A frequently includes multiple tickers or none",
      "B's numerical references are exclusively price targets and ratings, whereas A includes varied metrics (EPS, P/E ratios, sales figures, percentages)",
      "B headlines maintain consistent tense usage (present tense actions), while A mixes tenses to report past events and future projections",
      "100% of B samples use hyphen separators between institution and action (e.g., 'morgan stanley downgrades $nvda -'), a formatting pattern absent in A"
    ],
    "qwen2.5-32b_zero-shot_v1": [
      "Dataset B headlines use generic terms like 'Tech Giant' or 'Pharmaceutical Company' without naming specific companies or stock tickers, while Dataset A explicitly references real companies (e.g., $MAR, Uber) and individuals.",
      "Dataset B headlines are formulaic and repetitive in structure (e.g., 'Tech Giant Reports... Shares Surge/Plummet'), whereas Dataset A features diverse phrasing, including casual language, emojis, and non-standard formatting.",
      "Dataset B focuses narrowly on earnings reports, interest rate decisions, and sector trends (e.g., renewable energy), while Dataset A includes granular details like regulatory actions (e.g., FDA bans), geopolitical events, and macroeconomic data (e.g., retail sales figures).",
      "Dataset B avoids numerical specifics (e.g., 'Shares Plummet' without percentages), whereas Dataset A frequently includes precise metrics (e.g., 'up 3.8%', '$1,481.20/oz').",
      "Dataset B headlines emphasize broad sector movements (e.g., 'Renewable Energy Stocks Surge'), while Dataset A highlights company-specific volatility (e.g., 'Roku shares dropped after FOX channel removal').",
      "Dataset B lacks references to non-financial events (e.g., political developments, celebrity news), whereas Dataset A integrates such elements (e.g., Ruth Bader Ginsburg\u2019s hospitalization, Antonio Brown\u2019s comments).",
      "Dataset B rarely mentions regulatory agencies or specific policies (e.g., FDA, antitrust investigations), while Dataset A explicitly cites them (e.g., 'FDA issues ban on vaping products').",
      "Dataset B headlines are strictly financial in tone, avoiding colloquialisms, while Dataset A includes informal language (e.g., 'Cash is king', 'ghost white women') and promotional phrases (e.g., 'Get the best deals').",
      "Dataset B does not reference geopolitical events with granularity (e.g., Brexit, U.S.-China tensions), whereas Dataset A includes specific examples (e.g., 'BoJo\u2019s landslide victory').",
      "Dataset B avoids forward-looking guidance from named institutions (e.g., BlackRock\u2019s outlook), while Dataset A quotes or references specific entities (e.g., 'BlackRock says...', 'SunTrust RH raises target')."
    ],
    "qwen2.5-32b_few-shot_bg_train-time-info_v1": [
      "Dataset B headlines consistently mention specific future dates for earnings calls, conferences, or product launches (e.g., 'July 15, 2023', 'June 10th, 2023'), whereas Dataset A does not reference scheduled future events.",
      "Dataset B headlines frequently include the full names of analysts or institutions initiating actions (e.g., 'Analysts at Goldman Sachs', 'Morgan Stanley'), while Dataset A often omits institutional names or uses abbreviated references (e.g., 'SunTrust RH').",
      "Dataset B headlines show standardized formatting with ticker symbols consistently placed at the beginning of headlines, whereas Dataset A headlines have variable ticker placement (e.g., mid-sentence or omitted).",
      "Dataset B includes mixed-language elements in some headlines (e.g., Chinese characters in AMC earnings report), while Dataset A maintains English-only text despite occasional non-Latin characters in names.",
      "Dataset B headlines emphasize corporate growth strategies (e.g., 'strategic growth initiatives', 'partnerships') as primary drivers of market reactions, whereas Dataset A more frequently ties reactions to external factors like geopolitical events or regulatory decisions.",
      "Dataset B headlines consistently quantify analyst sentiment changes using precise terminology (e.g., 'cuts price target to $38', 'raises price target to $55'), while Dataset A often describes adjustments qualitatively (e.g., 'turns negative', 'reassures analysts').",
      "Dataset B headlines focus narrowly on financial metrics and analyst ratings without ancillary context, whereas Dataset A incorporates broader socio-economic elements (e.g., consumer debt surveys, unemployment claims projections).",
      "Dataset B demonstrates strict adherence to corporate earnings cycle reporting (e.g., 'Q3 earnings', 'Q2 results'), while Dataset A includes non-earnings financial metrics like commodity prices and retail sales data.",
      "Dataset B headlines maintain consistent tense usage focused on recent/upcoming corporate actions, whereas Dataset A mixes historical market data ('Feb. gold climbs') with forward-looking statements.",
      "Dataset B shows repetitive syntactic structures focusing on 'Analyst Action + Ticker + Rationale', while Dataset A employs diverse narrative formats including consumer-focused alerts and political commentary."
    ],
    "llama3.3-70b_few-shot_bg_train-time-info_v1": [
      "Dataset B headlines predominantly focus on specific analyst actions (upgrades/downgrades) from major investment firms (e.g., Morgan Stanley, Barclays), while Dataset A references a broader range of entities (regulators, companies, economists).",
      "Dataset B headlines consistently use lowercase formatting for analyst firm names (e.g., 'morgan stanley', 'barclays'), whereas Dataset A uses formal capitalization for entities like SunTrust RH or BlackRock.",
      "Dataset B headlines are structurally formulaic, often starting with the analyst firm + action + ticker (e.g., 'morgan stanley downgrades $UTX'), while Dataset A headlines have varied structures (e.g., macro updates, company statements).",
      "Dataset B emphasizes forward-looking analyst guidance (e.g., 'plans to release Q3 financial results on November 15') more frequently than Dataset A, which focuses on retrospective results (e.g., 'EPS misses by $0.33').",
      "Dataset A includes non-financial content (e.g., celebrity news, political events) irrelevant to markets, while Dataset B remains strictly financial/analyst-focused.",
      "Dataset A incorporates informal elements like emojis (\u2705), hashtags (#markets), and conversational phrases ('here's why'), whereas Dataset B maintains a formal, standardized tone.",
      "Dataset A references global macroeconomic factors (e.g., Eurozone politics, China\u2019s cash flow) extensively, while Dataset B is more U.S.-centric and sector-specific (tech, biotech).",
      "Dataset B headlines explicitly cite rationales for analyst actions (e.g., 'citing concerns over revenue growth'), while Dataset A often states actions without detailed justification.",
      "Dataset A includes operational metrics (e.g., 'production cuts', 'new product launches') as primary drivers, whereas Dataset B prioritizes analyst sentiment and price targets as key catalysts.",
      "Dataset A features mixed tenses (past earnings results, present market movements, future forecasts), while Dataset B headlines are predominantly forward-looking (upcoming earnings, guidance updates)."
    ],
    "llama3.3-70b_zero-shot_bg_train-time-info_v1": [
      "Dataset B headlines exclusively mention specific analyst firms (e.g., Barclays, Morgan Stanley, Oppenheimer) in every instance of ratings changes, while Dataset A sometimes references analysts generically without firm attribution.",
      "Dataset B consistently includes structured references to quarterly earnings report dates/schedules (e.g., 'schedules Q4 earnings release for February 22'), while Dataset A simply reports earnings results without date-specific scheduling language.",
      "Dataset B uses standardized phrases like 'sees price target cut/raised at [Firm]' as primary headline structure, while Dataset A employs more varied phrasing for similar analyst actions (e.g., 'downgraded to sell', 'target raised').",
      "Dataset B maintains consistent lowercase formatting throughout headlines (except tickers), while Dataset A uses title case capitalization and occasional uppercase emphasis.",
      "Dataset B focuses exclusively on company/stock-specific news, while Dataset A includes broader macroeconomic commentary (e.g., 'U.S. Shale Producers Slash Spending', 'Consumer Confidence Surges') within headlines.",
      "Dataset B consistently pairs analyst actions with specific rationales (e.g., 'due to weak demand outlook'), while Dataset A sometimes states analyst actions without explicit reasoning in the headline text.",
      "Dataset B headlines strictly maintain financial focus, whereas Dataset A includes non-financial content (e.g., celebrity news, political commentary) mixed with market updates.",
      "Dataset B uses standardized stock movement descriptors ('shares remain steady', 'trades flat'), while Dataset A employs more varied market reaction language ('surge', 'plunge', 'takes hit').",
      "Dataset B headlines avoid punctuation beyond commas/periods, while Dataset A frequently uses colons, emojis, hashtags, and special characters for formatting.",
      "Dataset B maintains consistent tense focus on recent/upcoming corporate events, while Dataset A includes historical comparisons and multi-temporal references (e.g., 'still paying off last year's holiday debt') within headlines."
    ],
    "qwen2.5-32b_few-shot_bg_test-time-info_v1": [
      "Dataset B headlines consistently mention specific analyst firms (e.g., Goldman Sachs, Barclays, Oppenheimer) by name, whereas Dataset A often omits firm names or uses generic references.",
      "Headlines in Dataset B frequently include explicit rationales for analyst actions (e.g., 'citing concerns over global economic slowdown'), while Dataset A typically states actions without detailed justification.",
      "Dataset B emphasizes supply chain challenges as a recurring standalone theme (e.g., 'faces supply chain delays'), whereas Dataset A mentions supply chains only in broader operational contexts.",
      "Non-English characters or localized content (e.g., Chinese text) appear in Dataset B headlines, which are absent in Dataset A.",
      "Dataset B includes maintained ratings (e.g., 'Maintains Neutral Rating') as a distinct category, while Dataset A focuses primarily on upgrades/downgrades without emphasizing unchanged positions.",
      "Headlines in Dataset B exhibit inconsistent capitalization (e.g., 'chimerix inc. announces' in lowercase), unlike Dataset A\u2019s standardized title casing.",
      "Dataset B explicitly pairs company names with tickers in nearly every headline (e.g., '$NVDA - Nvidia...'), whereas Dataset A occasionally uses tickers or names alone.",
      "Forward-looking statements in Dataset B often tie directly to management guidance (e.g., 'Following Management Guidance'), while Dataset A cites broader forecasts without attribution.",
      "Dataset B headlines systematically reference price target adjustments with specific dollar amounts (e.g., 'Cuts Price Target to $25'), whereas Dataset A sometimes omits numerical targets.",
      "Sector-specific ETFs (e.g., $XLE, $HYG) and indices (e.g., $SPY) are cited more frequently in Dataset B compared to Dataset A."
    ],
    "qwen2.5-7b_zero-shot_bg_v1": [
      "Dataset B headlines consistently begin with stock ticker symbols (e.g., \"$TSLA - ...\"), whereas Dataset A often embeds tickers mid-headline or omits them entirely for non-ticker-specific news.",
      "Headlines in Dataset B focus narrowly on analyst actions (e.g., downgrades, price target adjustments) and earnings results, while Dataset A includes broader non-financial events (e.g., regulatory recalls, geopolitical risks, layoffs).",
      "Dataset B features repetitive mentions of specific firms like Barclays cutting Alphabet/Meta targets across multiple headlines, whereas Dataset A references a wider variety of institutions and topics.",
      "Dataset A includes promotional content (e.g., \"\u2705Get the best deals...\") and non-English characters (e.g., Chinese text), while Dataset B maintains a formal, analyst-centric tone without marketing language.",
      "Dataset A quantifies granular metrics (e.g., \"Feb. gold climbs $8.90\") and intraday updates (e.g., \"9:45 am $SPY 272.99\"), whereas Dataset B emphasizes qualitative analyst sentiment over precise figures.",
      "Dataset B exclusively uses standardized phrases like \"beats/misses expectations\" for earnings, while Dataset A varies phrasing (e.g., \"EPS misses by $0.33\", \"adj. EPS 41 cents\").",
      "Dataset A references macroeconomic indices (e.g., S&P 500) and sector trends (e.g., semiconductor demand) frequently, while Dataset B rarely mentions these beyond company-specific analysis.",
      "Dataset B headlines often include forward-looking verbs (e.g., \"signaling,\" \"prompting,\" \"reaffirming\") to frame analyst projections, whereas Dataset A uses past-tense reporting (e.g., \"reports Q3 results\").",
      "Dataset A includes non-financial entities (e.g., Uber\u2019s London license, Antonio Brown) and consumer-facing impacts (e.g., retail sales), while Dataset B remains strictly corporate/financial.",
      "Dataset B repeats identical themes (e.g., Tesla downgrades, Alphabet/Meta target cuts) across multiple headlines, whereas Dataset A shows greater thematic diversity (e.g., oil reserves, unemployment claims, dividend declarations)."
    ],
    "qwen2.5-32b_zero-shot_bg_train-time-info_v1": [
      "Dataset B headlines consistently follow a structured format (e.g., 'Company X Reports Y, Shares Move Z%'), whereas Dataset A includes fragmented or conversational phrasing (e.g., hashtags, emojis, incomplete sentences).",
      "Dataset B exclusively uses formal company names alongside tickers (e.g., 'MercadoLibre ($MELI)'), while Dataset A often omits full names (e.g., '$MAR - Marriott seen running up 20% gain').",
      "Dataset B emphasizes explicit numerical outcomes (e.g., 'Shares Rise 5%', 'Beats Analysts' Estimates'), while Dataset A more frequently uses qualitative descriptors (e.g., 'slides', 'surges').",
      "Dataset B focuses narrowly on quarterly earnings, analyst actions, and corporate guidance, whereas Dataset A integrates non-financial events (e.g., recalls, geopolitical risks, consumer trends).",
      "Dataset B headlines reference future corporate events (e.g., conferences, product launches) as catalysts, while Dataset A emphasizes real-time market data (e.g., intraday price levels, volume metrics).",
      "Dataset A includes social media elements (e.g., hashtags, @mentions, emojis) absent in Dataset B.",
      "Dataset B uses uniform verb tenses (e.g., 'reports', 'announces') for corporate actions, while Dataset A mixes tenses and perspectives (e.g., 'says', 'warns', 'could').",
      "Dataset A incorporates macroeconomic indicators (e.g., gold/oil prices, unemployment claims) directly into headlines, whereas Dataset B ties market movements strictly to company-specific news.",
      "Dataset B avoids speculative or editorial language (e.g., 'could', 'might'), favoring factual statements (e.g., 'misses', 'exceeds'), unlike Dataset A.",
      "Dataset B headlines consistently attribute analyst actions to specific firms (e.g., 'Morgan Stanley cuts'), while Dataset A occasionally omits sources (e.g., 'analysts downgrade')."
    ],
    "llama3.1-8b_zero-shot_bg_v1": [
      "Dataset B headlines predominantly focus on immediate stock price reactions (e.g., 'Stock Falls 5%') tied to earnings misses/beats or analyst actions, while A includes broader market/economic consequences (e.g., layoffs, regulatory impacts).",
      "B consistently specifies exact percentage changes in stock prices (e.g., 'plummets 5%', 'surges 7.3%'), whereas A often uses qualitative descriptors (e.g., 'shares dropped', 'stock up').",
      "B headlines repetitively cite specific analyst firms (Barclays, Morgan Stanley, Deutsche Bank), while A references a wider variety of institutions (SunTrust, MKM Partners, Stifel).",
      "B emphasizes EPS estimate revisions and guidance misses/hits as primary drivers, whereas A includes operational metrics (e.g., 'Q3 comparable-store sales', 'net merchandise sales').",
      "B shows fixation on Big Tech stocks ($AAPL, $TSLA, $NVDA, $MSFT) across most samples, while A covers diverse sectors (energy, biotech, retail, real estate).",
      "B headlines lack references to geopolitical/regulatory developments (e.g., Brexit, FDA decisions) that are frequent in A.",
      "B uses standardized headline structures (Ticker - Event - Analyst Firm - Price Action), whereas A employs variable formats (questions, commentary, multi-sentence alerts).",
      "B omits macroeconomic data point specifics (e.g., 'retail sales increase less than expected' in A) despite mentioning economic trends abstractly.",
      "B excludes non-equity financial instruments (e.g., gold/oil prices, forex pairs like AUD/USD) prevalent in A's coverage.",
      "B avoids promotional language (e.g., A's '\u2705Get the best deals'), hashtags, or embedded links present in A's samples."
    ],
    "llama3.1-8b_zero-shot_v1": [
      "Dataset B headlines use generalized company references (e.g., 'Tech giants') rather than specific ticker symbols or full corporate names prevalent in Dataset A",
      "Dataset B samples lack granular numerical specificity (e.g., exact price targets, percentage changes) that characterize most Dataset A headlines",
      "Dataset B excludes social media elements (hashtags, @mentions) and promotional language present in multiple Dataset A samples",
      "Dataset B headlines avoid non-financial human interest elements (e.g., celebrity news, political commentary) that occasionally appear in Dataset A",
      "Dataset B maintains consistent headline structure (event + market reaction) compared to Dataset A's mix of announcements, updates, and analytical commentary",
      "Dataset B uses standardized volatility terminology ('plunge','surge') without Dataset A's colloquial variations ('ghost white women','cash is king')",
      "Dataset B focuses on aggregate market/sector movements rather than Dataset A's frequent granular operational updates (e.g., specific drug trial phases, construction projects)",
      "Dataset B references regulatory impacts generically compared to Dataset A's explicit regulatory body citations (FDA, FTC, TfL)",
      "Dataset B emphasizes macroeconomic outcomes while Dataset A includes microeconomic guidance details (earnings previews, dividend declarations)",
      "Dataset B omits shareholder-specific developments (activist investors, dividend yields) that feature prominently in Dataset A"
    ],
    "llama3.3-70b_few-shot_v1": [
      "Dataset B headlines consistently use full company names (e.g., 'Amazon') rather than ticker symbols (e.g., '$MAR'), which are prevalent in Dataset A.",
      "Dataset B headlines lack informal elements (e.g., hashtags, emojis, URLs) that frequently appear in Dataset A.",
      "Dataset B headlines focus more narrowly on major indices (e.g., Dow Jones, Nasdaq) and macroeconomic events, whereas Dataset A includes niche operational updates (e.g., product recalls, local real estate).",
      "Dataset B headlines use standardized percentage changes (e.g., '10%') and round numbers, while Dataset A often includes granular metrics (e.g., 'settle at $1,481.20/oz').",
      "Dataset B headlines are formulaic in structure (e.g., '[Company] Stock [Action] After [Event]'), whereas Dataset A includes conversational phrases (e.g., 'Here's Why').",
      "Dataset B headlines avoid forward-looking guidance mentions (e.g., 'Q4 2019 Earnings Preview' in A) and focus strictly on finalized results.",
      "Dataset B excludes non-financial social/political commentary (e.g., 'Jair Bolsonaro should keep his nerve' in A) seen in Dataset A.",
      "Dataset B headlines emphasize broad market reactions (e.g., 'Stocks Plummet'), while Dataset A includes localized impacts (e.g., 'Chinese cash dries up' affecting LA real estate).",
      "Dataset B uses repetitive templates for Federal Reserve updates (e.g., 'Releases Latest Monetary Policy Report'), while Dataset A describes Fed actions contextually (e.g., 'Kansas City Fed warned about oil insolvencies').",
      "Dataset B headlines avoid company-specific dividend/stock operation details (e.g., 'retire $100 million of stock' in A), focusing instead on earnings and rate decisions."
    ],
    "qwen2.5-32b_few-shot_bg_v1": [
      "Dataset B headlines consistently include the company stock ticker symbol in every headline, whereas Dataset A sometimes omits tickers or includes non-standard ticker formats.",
      "Dataset B headlines focus more narrowly on analyst actions (e.g., price target changes, rating upgrades/downgrades) as primary drivers, while Dataset A includes more diverse catalysts like regulatory decisions, geopolitical events, and consumer trends.",
      "Dataset B uses standardized formatting with institutional firm names at the beginning of headlines (e.g., 'Morgan Stanley Cuts...'), while Dataset A uses more variable structures including clickbait phrases and social media-style calls to action.",
      "Dataset B headlines emphasize sequential quarterly earnings results (Q2/Q3/Q4) with explicit numerical comparisons to expectations, while Dataset A more frequently references annual results or non-earnings financial metrics.",
      "Dataset B shows repeated mentions of the same cluster of mega-cap tech companies (Apple, Tesla, Nvidia, Meta), while Dataset A covers a broader range of sectors including energy, biotech, and retail.",
      "Dataset B headlines maintain consistent tense/formality without colloquial language, whereas Dataset A includes informal elements like emojis, hashtags, and conversational phrases (e.g., 'Wow.....', 'my position remains intact').",
      "Dataset B demonstrates strict adherence to financial terminology without mixed content, while Dataset A intermittently includes non-financial news (e.g., celebrity statements, political commentary, product recalls).",
      "Dataset B headlines contain precise percentage changes for price targets and stock reactions, whereas Dataset A more frequently uses vague descriptors like 'shares rise' without quantification.",
      "Dataset B shows repetitive template structures ('[Institution] [Action] on [Ticker] - [Rationale]'), while Dataset A employs more varied narrative styles including complete sentences and editorial perspectives.",
      "Dataset B focuses exclusively on US-listed large-cap equities, while Dataset A includes international market references (UK regulators, European shares, South Africa rand) and alternative assets (gold, oil futures)."
    ],
    "qwen2.5-32b_few-shot_v1": [
      "Dataset B headlines use generic terms for companies (e.g., 'Tech Giant') instead of specific tickers or names (e.g., '$MAR') prevalent in Dataset A.",
      "Dataset B headlines lack granular quantitative metrics (e.g., exact dollar amounts, percentages) found in Dataset A, favoring broader statements like 'earnings miss' without numerical specifics.",
      "Dataset B focuses heavily on recurring themes like Federal Reserve meetings and interest rate decisions, whereas Dataset A includes diverse non-policy events (e.g., recalls, political developments).",
      "Dataset A includes social/political narratives (e.g., Chinese investment impacts, Uber\u2019s license revocation) absent in Dataset B\u2019s strictly financial/policy focus.",
      "Dataset B headlines are formulaic and repetitive in structure (e.g., '[Entity] reports [result]'), while Dataset A features varied formats (questions, hashtags, informal language).",
      "Dataset A references a wider range of sectors (e.g., retail, real estate, healthcare) compared to Dataset B\u2019s narrower emphasis on tech, energy, and pharmaceuticals.",
      "Dataset A incorporates opinionated or casual language (e.g., 'Cash is king,' emojis) absent in Dataset B\u2019s neutral, formal tone.",
      "Dataset B omits granular regulatory specifics (e.g., FDA actions, OPEC decisions) common in Dataset A, favoring generic terms like 'new regulations.'",
      "Dataset A includes time-bound, event-driven market impacts (e.g., Super Bowl, holiday debt) absent in Dataset B\u2019s broader quarterly/economic trends.",
      "Dataset B emphasizes macroeconomic trends (e.g., inflation, recession forecasts) over Dataset A\u2019s micro-level corporate updates (e.g., layoffs, dividend changes)."
    ],
    "qwen2.5-32b_zero-shot_bg_v1": [
      "Dataset B headlines frequently include multilingual content (e.g., Chinese translations) for analyst actions or price targets, while Dataset A does not.",
      "Dataset B focuses narrowly on earnings outcomes, analyst ratings, and company-specific financial guidance, whereas Dataset A includes broader non-earnings corporate developments (e.g., recalls, legal challenges, geopolitical events).",
      "Dataset A incorporates promotional content, guides, or social media elements (e.g., hashtags, emojis, links), while Dataset B headlines are strictly news-oriented without such features.",
      "Dataset A explicitly quantifies financial metrics (e.g., 'EPS 41 cents vs. 40 consensus'), whereas Dataset B often generalizes results (e.g., 'beats expectations') without granular figures.",
      "Dataset B headlines are formulaically structured around analyst firms and actions (e.g., '[Firm] maintains [rating] on $TICKER'), while Dataset A uses more varied narrative styles.",
      "Dataset A covers commodities (e.g., gold, oil), forex (e.g., AUD/USD), and macroeconomic indicators beyond equities, while Dataset B is equity-centric.",
      "Dataset A includes editorialized or opinion-driven headlines (e.g., 'The FT View...'), whereas Dataset B maintains a neutral, factual tone.",
      "Dataset B concentrates heavily on tech (e.g., Apple, Tesla) and financial sectors, while Dataset A spans diverse sectors like energy, healthcare, and retail.",
      "Dataset A references geopolitical events (e.g., Brexit, Lebanon-IMF) impacting markets, whereas Dataset B ties market movements primarily to earnings and economic data.",
      "Dataset A features non-financial corporate updates (e.g., layoffs, product recalls, regulatory bans), while Dataset B avoids such topics unless directly tied to financial performance."
    ],
    "qwen2.5-7b_zero-shot_v1": [
      "Dataset B headlines focus more on broad sector performance (e.g., 'Tech Sector Slumps') without referencing granular details like stock tickers (e.g., $MAR) or specific percentages seen in A.",
      "Headlines in B use formal, standardized phrasing (e.g., 'Federal Reserve Keeps Interest Rates Unchanged') compared to A's mix of colloquial phrases (e.g., 'cash is king') and informal hashtags.",
      "B emphasizes macroeconomic stability (e.g., 'Economic Indicators Remain Stable') rather than granular company-specific challenges (e.g., 'Chinese cash dries up in LA real estate') prevalent in A.",
      "Market index milestones (e.g., 'Dow Jones Exceeds 35,000') are explicitly highlighted in B, while A focuses on intraday price movements (e.g., '$SPY 272.99 at 9:45 am').",
      "B lacks references to regulatory actions (e.g., FDA bans, recalls) or institutional filings (e.g., 'Vereit files for senior notes offering') common in A.",
      "Forward-looking statements in B are generalized (e.g., 'Steady Improvement Predicted') versus A's specific guidance cuts or earnings previews (e.g., 'Martin Marietta Q4 2019 Earnings Preview').",
      "B avoids mixed-content headlines (e.g., A's non-financial references like 'Antonio Brown ghosting white women' or RBG hospitalization).",
      "Quantitative stock price movements in B use vague terms (e.g., 'Shares Plunge') instead of A's numerical precision (e.g., 'gold climbs $8.90, or 0.6%').",
      "B features repetitive thematic phrasing (e.g., 'Tech Stocks Plunge' as a recurring template) compared to A's diverse narrative structures.",
      "Earnings results in B are described generically (e.g., 'Post Strong Earnings') versus A's explicit metrics (e.g., 'EPS misses by $0.33, misses on revenue')."
    ],
    "llama3.1-8b_zero-shot_bg_train-time-info_v1": [
      "Dataset B headlines consistently begin with the stock ticker symbol at the start of the sentence, while A places tickers in varied positions",
      "B exclusively references specific analyst firms (e.g., Morgan Stanley, Oppenheimer) in action descriptions, whereas A sometimes uses generic 'analysts' without naming institutions",
      "B contains forward-looking statements about upcoming earnings dates (e.g., 'to Report Q1 Earnings on April 24') that don't appear in A's samples",
      "B consistently quantifies price target adjustments (e.g., 'PT Trimmed to $116'), while A mentions target changes without specific numbers in some cases",
      "B uses standardized analyst rating terminology (Equal-Weight, Underperform, Outperform) absent in A's more generic 'downgraded to sell' descriptions",
      "A contains promotional/non-news elements (e.g., '\u2705Get the best deals...') completely absent from B's professional-focused headlines",
      "B includes specific partnership types (e.g., 'European Expansion', 'IoT Initiatives') while A uses generic 'partnership announcements'",
      "A shows text formatting inconsistencies (ellipses, hashtags, truncated text) absent from B's polished, complete sentence structures",
      "B headlines frequently contain dual perspectives (e.g., 'Despite...', 'Amid...') to contextualize movements, unlike A's more linear reporting",
      "A includes non-corporate/non-financial references (sports figures, political figures) absent from B's strictly business-focused content"
    ],
    "qwen2.5-7b_few-shot_v1": [
      "Dataset A headlines frequently include stock ticker symbols (e.g., $MAR, $RHHBY), while Dataset B headlines consistently use full company names without tickers.",
      "Dataset A contains informal elements like emojis, casual language, and social media-style updates, whereas Dataset B headlines maintain a formal, polished news tone throughout.",
      "Dataset A often references granular financial metrics (e.g., specific EPS misses by $0.33, exact commodity prices), while Dataset B reports earnings outcomes or trends without numerical precision.",
      "Dataset A includes non-financial content (e.g., celebrity gossip, political commentary) interspersed with financial news, while Dataset B headlines remain strictly finance/economics-focused.",
      "Dataset A features real-time market updates with timestamps (e.g., '9:45 am') and intraday price movements, whereas Dataset B headlines lack time-specific data and focus on broader events.",
      "Dataset A headlines frequently mention analyst actions (e.g., price target revisions, rating changes), while Dataset B emphasizes company-reported results or macroeconomic developments.",
      "Dataset A covers niche sectors like energy commodities and regional real estate in detail, while Dataset B focuses disproportionately on tech giants and automotive (e.g., Tesla, Apple).",
      "Dataset A includes hyperlocal regulatory actions (e.g., UK appliance recalls, Los Angeles real estate), while Dataset B discusses regulatory themes at a national/global policy level.",
      "Dataset A headlines often use fragmented phrasing and abbreviations (e.g., 'EPS beats by $0.01'), whereas Dataset B employs complete sentences with standardized financial terminology.",
      "Dataset A incorporates non-English text and region-specific economic impacts (e.g., Chinese investment shifts), while Dataset B headlines are uniformly in English and emphasize globalized narratives."
    ],
    "llama3.3-70b_few-shot_bg_v1": [
      "Dataset B headlines are exclusively focused on technology and semiconductor sectors (e.g., $NVDA, $TSLA), while A covers diverse sectors including energy, retail, healthcare, and real estate",
      "Dataset B shows repetitive focus on a limited set of companies (Tesla, Alphabet, NVIDIA) across multiple entries, whereas A features a wider variety of corporations",
      "Dataset B consistently uses full company names alongside tickers (e.g., 'Tesla stock downgraded' with $TSLA), while A often uses either tickers or company names separately",
      "Dataset B contains numerous duplicate/rephrased entries about the same analyst actions (e.g., 20+ Morgan Stanley Tesla downgrades), while A maintains unique content across samples",
      "Dataset B emphasizes quarterly earnings calendar dates and precise reporting timelines, while A focuses more on general earnings results without specific date references",
      "Dataset B exclusively uses lowercase formatting for headlines, while A maintains standard headline capitalization",
      "Dataset B shows heavy concentration on 3 financial institutions (Morgan Stanley, Barclays, Goldman Sachs), while A references a broader range of analysts and firms",
      "Dataset B features recurring template structures (e.g., '[Institution] [action] [company] [rationale]'), while A uses more varied sentence constructions",
      "Dataset B lacks non-corporate entities/government agencies in headlines beyond the Federal Reserve, while A includes diverse actors like regulators, politicians, and consumers",
      "Dataset B maintains strict focus on analyst ratings and earnings, while A incorporates additional elements like dividend declarations, M&A news, and product development updates"
    ],
    "llama3.1-8b_few-shot_v1": [
      "Dataset A headlines frequently include specific stock ticker symbols (e.g., $MAR, $RHHBY), while Dataset B avoids tickers entirely.",
      "Dataset A uses informal language, emojis, and conversational phrases (e.g., 'Cantilever doubled its price target'), whereas Dataset B maintains a formal, standardized tone.",
      "Dataset A emphasizes granular numerical details (e.g., 'settle at $1,481.20/oz') and real-time price/volume updates, while Dataset B focuses on aggregated trends (e.g., 'GDP growth slows').",
      "Dataset A includes non-financial content (e.g., celebrity gossip, political figures' health) irrelevant to markets, while Dataset B stays strictly finance/economics-focused.",
      "Dataset A headlines often reference minor company-specific operational updates (e.g., 'Q3 adj. EPS 41 cents'), whereas Dataset B emphasizes macro-level impacts (e.g., 'global recession fears').",
      "Dataset B headlines contextualize events within broader policy frameworks (e.g., 'Fed hints at rate cuts'), while Dataset A prioritizes immediate stock reactions without systemic analysis.",
      "Dataset A includes investor advice or commentary (e.g., 'Here\u2019s Why You Need To Hedge'), whereas Dataset B presents factual updates without prescriptive language.",
      "Dataset B headlines follow a structured journalistic format (e.g., 'X Soars on Y Report'), while Dataset A uses fragmented, irregular phrasing (e.g., 'Buying dips.').",
      "Dataset A features real-time intraday market fluctuations (e.g., 'premarket stock up 2.6%'), absent in Dataset B\u2019s end-of-day or post-event summaries.",
      "Dataset B explicitly links events to geopolitical/economic themes (e.g., 'COVID-19 cases surge in Asia'), while Dataset A mentions these peripherally via company impacts."
    ],
    "llama3.1-8b_few-shot_bg_train-time-info_v1": [
      "Dataset B headlines consistently include analyst or investment firm names directly in the main text (e.g., 'Morgan Stanley downgrades'), while A mentions rating agencies less prominently or omits them",
      "B uses standardized price target formats with explicit numerical values (e.g., 'sets $25.50 price target'), whereas A contains more varied financial metric descriptions without standardized target presentation",
      "B emphasizes sequential corporate guidance changes (e.g., 'raises full-year outlook', 'cuts 2023 outlook') as primary focus, while A mixes guidance updates with broader industry observations",
      "B shows greater focus on institutional investor actions (e.g., 'activist investors pressure', 'KKR announces purchase') compared to A's more general market commentary",
      "A contains geopolitical/regional economic impacts (e.g., Brexit, Chinese cash drying up), while B maintains tighter focus on company/sector-specific financial metrics",
      "B demonstrates stricter adherence to financial terminology standardization (e.g., consistent use of 'Q2', 'FY2023'), whereas A uses mixed temporal formatting (e.g., '2020', 'November')",
      "A includes non-corporate economic indicators (e.g., consumer confidence indices, unemployment claims), while B focuses exclusively on corporate financial performance metrics",
      "B exhibits systematic coverage of sector ETFs (e.g., XLK, SPY) across samples, whereas A references ETFs less frequently and more variably",
      "A contains promotional/non-news elements (e.g., '\u2705Get the best deals'), while B maintains strict financial reporting tone throughout",
      "B emphasizes sequential rating changes (e.g., 'downgraded to underweight', 'upgraded to buy') as structural components, whereas A mixes rating actions with other financial developments"
    ],
    "qwen2.5-7b_few-shot_bg_v1": [
      "Dataset B headlines predominantly reference major investment banks (e.g., Barclays, Morgan Stanley, JPMorgan) as sources of analyst actions, while Dataset A includes smaller or regional firms (e.g., SunTrust RH, MKM Partners).",
      "Dataset B focuses heavily on large-cap tech and semiconductor companies (e.g., Tesla, NVIDIA, Meta, Apple), whereas Dataset A covers a broader range of industries, including energy, retail, and industrials.",
      "Dataset B headlines frequently use a standardized structure pairing analyst actions with specific companies (e.g., '[Bank] upgrades/downgrades [Company]'), while Dataset A includes more varied sentence structures and non-analyst-driven news.",
      "Dataset B emphasizes price target revisions and rating changes as primary news drivers, while Dataset A includes more diverse triggers like regulatory decisions, geopolitical events, or macroeconomic trends.",
      "Dataset B shows repetitive mentions of the same companies (e.g., Tesla appears 20+ times) and banks (e.g., Barclays appears 15+ times), whereas Dataset A demonstrates greater diversity in entities covered.",
      "Dataset B explicitly ties stock movements to analyst opinions in most headlines, while Dataset A often reports price changes without attributing them to specific analyst actions.",
      "Dataset B contains more frequent forward-looking analyst projections (e.g., 'sees strong potential in AI growth'), whereas Dataset A's forward-looking statements often come from companies or macroeconomic forecasts.",
      "Dataset B uses numerical price targets with greater precision (e.g., 'raises to $350 from $310'), while Dataset A more commonly reports percentage movements or general metrics.",
      "Dataset B headlines frequently mention CEO-level developments (e.g., Musk's actions) alongside financial updates, a pattern less prevalent in Dataset A.",
      "Dataset B demonstrates tighter focus on quarterly earnings impacts (28% of samples reference earnings reports), while Dataset A distributes attention across earnings, M&A, regulatory changes, and consumer trends more evenly."
    ],
    "llama3.1-8b_few-shot_bg_v1": [
      "Dataset B headlines consistently mention specific analyst firms (e.g., Morgan Stanley, Goldman Sachs) when citing rating changes, while Dataset A typically references institutions generically (e.g., 'analysts say')",
      "Dataset B focuses more heavily on large-cap tech stocks ($AAPL, $MSFT, $NVDA) across multiple headlines, while Dataset A covers a broader range of sectors including energy, retail, and pharmaceuticals",
      "Dataset B headlines frequently include exact price target figures (e.g., 'cuts to $150 vs. $185') while Dataset A generally mentions price target changes without specific numbers",
      "Dataset B contains more structured references to economic data releases (e.g., 'CPI Data for October to be Released Today') compared to Dataset A's more general economic mentions",
      "Dataset B headlines emphasize quarterly earnings expectations and misses more systematically (e.g., 'Q2 Earnings Fall Short of Expectations') compared to Dataset A's varied earnings references",
      "Dataset B shows consistent pattern of mentioning both upgrades and downgrades in the same headline (e.g., 'downgrades to Hold; Price Target Cut') while Dataset A typically mentions one action per headline",
      "Dataset B contains more explicit mentions of competition as a key factor in analyst decisions (e.g., 'due to rising rivalry') compared to Dataset A",
      "Dataset B headlines frequently reference specific percentage growth projections (e.g., 'potential for 12% growth in FY22') while Dataset A uses more vague growth terminology",
      "Dataset B shows tighter integration of stock tickers with analyst actions (e.g., '$NVDA - Analysts cut price target') compared to Dataset A's more varied ticker usage",
      "Dataset B headlines maintain formal structure with complete institutional names (e.g., 'Raymond James', 'Piper Sandler') while Dataset A uses more casual abbreviations (e.g., 'SunTrust RH')"
    ],
    "llama3.3-70b_zero-shot_v1": [
      "Dataset B headlines are more formulaic and repetitive in structure (e.g., frequent use of 'Federal Reserve Announces Interest Rate Decision' as lead phrase), while A shows greater structural diversity",
      "Dataset B focuses almost exclusively on major tech companies and sector-wide tech movements, while A covers a broader range of sectors including energy, pharmaceuticals, retail, and industrial materials",
      "Dataset A contains numerous references to specific numerical metrics (e.g., 'settle at $1,481.20/oz'), while B uses more generic percentage movements ('plummets 10%') without precise figures",
      "Dataset A includes international market developments (e.g., Brazil, UK, Eurozone), while B focuses predominantly on US domestic market reactions",
      "Dataset B headlines emphasize binary market reactions (soar/plummet) to earnings reports, while A includes more nuanced operational updates (e.g., 'secures two wind construction project awards totaling $115M')",
      "Dataset A contains multiple headlines about regulatory actions and legal developments (e.g., FDA bans, license revocations), which are largely absent in B",
      "Dataset B shows heavy repetition of specific phrases ('economic uncertainty', 'recession fears'), while A demonstrates wider lexical diversity in describing market conditions",
      "Dataset A includes forward-looking statements with specific timeframes ('2020 profit outlook', 'March 2020 net merchandise sales'), while B's forward-looking language is more vague ('in coming quarter')",
      "Dataset A contains headlines mixing financial news with non-financial elements (e.g., 'Ruth Bader Ginsburg hospitalized'), while B maintains strict focus on financial/economic content",
      "Dataset A features numerous references to specific stock tickers and trading data (e.g., '$SPY 272.99'), while B uses generic company descriptors ('Tech Giant')"
    ],
    "llama3.1-8b_few-shot_bg_test-time-info_v1": [
      "Dataset B headlines consistently place stock ticker symbols at the beginning of the headline (e.g., \"$CMD - Carnival Stock Plummets...\"), while Dataset A often embeds tickers mid-headline or omits structured placement.",
      "Dataset B emphasizes explicit analyst firm names and price target adjustments in nearly every headline (e.g., \"Barclays cuts AMTD's price target\"), whereas Dataset A occasionally omits firm names or specifics.",
      "Dataset B includes granular forward-looking guidance updates (e.g., \"withdraws FY20 guidance\") more systematically, while Dataset A focuses on retrospective results (e.g., \"EPS misses\").",
      "Dataset B headlines frequently quantify earnings/revenue deviations (e.g., \"misses estimates by 2%\"), while Dataset A often states beats/misses without precise numerical margins.",
      "Dataset B uses standardized phrases like \"maintains Neutral rating\" or \"downgrades to Underweight\" for analyst actions, whereas Dataset A employs less formulaic language (e.g., \"turns negative on Party City\").",
      "Dataset B explicitly links corporate actions to stock price movements in real time (e.g., \"sending $ABEO shares higher\"), while Dataset A often separates event reporting from market reactions.",
      "Dataset B incorporates merger/acquisition valuations (e.g., \"acquires...for $350M\") more consistently than Dataset A, which mentions deals without financial terms.",
      "Dataset B features recurring mentions of conference call dates (e.g., \"earnings release for October 26\"), absent in Dataset A's samples.",
      "Dataset B standardizes dividend change rationales (e.g., \"as Retail Rebound Accelerates\"), while Dataset A states dividend changes without explicit causality.",
      "Dataset B systematically references sequential performance metrics (e.g., \"Q2 results highlight growth potential\"), whereas Dataset A uses absolute timeframes (e.g., \"Q3 adj. EPS\")."
    ],
    "qwen2.5-7b_few-shot_bg_test-time-info_v1": [
      "Dataset B headlines consistently focus on analyst actions (upgrades/downgrades) and earnings reports without diversifying into non-financial topics (e.g., politics, social issues), unlike Dataset A.",
      "Dataset B headlines lack non-English text or multilingual content, while Dataset A includes headlines with Chinese characters and other non-English phrases.",
      "Dataset B headlines are formulaically structured around specific financial metrics (e.g., 'beats/misses,' 'price target set at'), whereas Dataset A uses more varied phrasing and narrative styles.",
      "Dataset A includes headlines with explicit references to geopolitical events (e.g., Brexit, U.S.-China trade tensions), while Dataset B avoids geopolitical context entirely.",
      "Dataset B headlines emphasize quarterly earnings outcomes and forward guidance adjustments more uniformly, while Dataset A mixes earnings updates with operational news (e.g., layoffs, partnerships).",
      "Dataset A incorporates social media elements (e.g., hashtags, @mentions) and informal language, whereas Dataset B maintains a formal, standardized tone without such features.",
      "Dataset A includes headlines with regulatory/legal developments (e.g., FDA bans, license suspensions) as standalone topics, while Dataset B ties regulatory impacts indirectly to financial metrics (e.g., 'supply chain disruptions').",
      "Dataset B headlines systematically reference price targets and ratings changes (e.g., 'Morgan Stanley upgrades to Overweight'), whereas Dataset A mentions these less consistently.",
      "Dataset A features headlines with broader macroeconomic commentary (e.g., consumer confidence, recession risks), while Dataset B remains narrowly focused on company-specific financial data.",
      "Dataset B avoids non-corporate entities (e.g., no mentions of public figures like Ruth Bader Ginsburg), whereas Dataset A interweaves financial news with cultural/political events."
    ],
    "llama3.3-70b_few-shot_bg_test-time-info_v1": [
      "Dataset B headlines consistently use lowercase formatting for company names and tickers (e.g., '$mlm', '$tsla'), while Dataset A uses uppercase or mixed case.",
      "Dataset B headlines strictly follow a structured template of '[Analyst Firm] [Action] [Ticker] [Rationale]', whereas Dataset A includes free-form sentences with diverse narrative styles.",
      "Dataset B exclusively focuses on analyst actions (downgrades/upgrades/price target changes) as primary news drivers, while Dataset A includes non-analyst-driven events like geopolitical developments or operational announcements.",
      "Dataset B headlines systematically mention specific conference participations (e.g., 'upcoming Oppenheimer healthcare conference') as material events, unlike Dataset A.",
      "Dataset B uses standardized phrases like 'reports quarterly earnings, [ticker] stock trades flat' for earnings coverage, whereas Dataset A varies descriptors (e.g., 'misses', 'beats', 'trailed estimates').",
      "Dataset B contains explicit references to scheduled earnings dates (e.g., 'to report on February 15th') in most relevant headlines, while Dataset A mentions dates inconsistently.",
      "Dataset B headlines avoid non-financial social/political commentary present in Dataset A (e.g., gig worker impacts, regulatory bans on vaping flavors).",
      "Dataset B maintains strict parity between analyst actions and ticker mentions (every rating change specifies a ticker), while Dataset A sometimes discusses companies without tickers.",
      "Dataset B uses standardized rating terminology ('underweight', 'equal weight', 'overweight') across all analyst actions, unlike Dataset A's varied terms ('bear call', 'bullish').",
      "Dataset B headlines consistently append analyst rationale clauses with 'citing...' or 'due to...', creating uniform causal structure absent in Dataset A's more varied explanations."
    ],
    "llama3.1-8b_zero-shot_bg_test-time-info_v1": [
      "Dataset B headlines consistently mention specific analyst actions (e.g., downgrades, price target changes) alongside the analyst firm name (e.g., Morgan Stanley, Oppenheimer), while Dataset A includes broader analyst references without always pairing actions with specific firms.",
      "Dataset B headlines frequently quantify risks or opportunities using explicit percentage drops/gains (e.g., '21% price drop risk') in the main clause, whereas Dataset A often states numerical data without framing it as forward-looking risk/opportunity metrics.",
      "Dataset B headlines systematically include both stock tickers and full company names in the same headline (e.g., 'Clover Health ($CLDR)'), while Dataset A typically uses either tickers or names separately.",
      "Dataset B consistently pairs earnings results with immediate analyst reactions (e.g., '10 analysts lower price target'), while Dataset A often reports earnings independently of contemporaneous analyst responses.",
      "Dataset B headlines emphasize current analyst consensus shifts (e.g., '25% Price Target Cut') as primary drivers, whereas Dataset A more frequently cites company-reported metrics without analyst consensus context.",
      "Dataset B shows uniform structural patterns starting with tickers followed by analyst actions (e.g., '$TIF to Report...'), while Dataset A uses varied headline structures with occasional non-financial interjections.",
      "Dataset B consistently references specific institutional analyst perspectives (e.g., 'JPMorgan analysts see...') as validation points, while Dataset A sometimes cites non-analyst entities like surveys or regulators as primary sources.",
      "Dataset B headlines maintain strict focus on equity-specific implications, whereas Dataset A occasionally includes tangential non-financial details (e.g., celebrity references, political commentary).",
      "Dataset B uses standardized phrasing for analyst actions (e.g., 'downgrades to Neutral', 'maintains buy recommendation'), while Dataset A employs more varied descriptive language for similar events.",
      "Dataset B consistently ties macroeconomic trends (e.g., inflation) to specific security impacts within the headline, whereas Dataset A sometimes reports economic indicators as standalone context."
    ],
    "qwen2.5-7b_few-shot_bg_train-time-info_v1": [
      "Dataset B headlines consistently specify the analyst firm (e.g., Oppenheimer, Citi) in every mention of ratings, upgrades, or downgrades, whereas Dataset A sometimes omits the analyst firm name.",
      "Dataset B headlines use a standardized structure focusing on analyst actions (e.g., '[Analyst Firm] [Action] [Ticker]'), while Dataset A includes more varied narrative styles (e.g., market commentary, geopolitical events).",
      "Dataset B headlines emphasize forward-looking analyst opinions (e.g., 'maintains neutral outlook') as primary drivers, while Dataset A often ties market reactions to external events (e.g., regulatory decisions, geopolitical shifts).",
      "Dataset B excludes non-earnings-related regulatory or geopolitical news (e.g., FDA bans, sanctions) present in Dataset A, focusing strictly on analyst-driven financial actions.",
      "Dataset B headlines rarely include quantitative market data (e.g., stock price percentages, volume stats) outside earnings metrics, unlike Dataset A, which frequently incorporates such details.",
      "Dataset B avoids non-analyst corporate announcements (e.g., product launches, partnerships) common in Dataset A unless directly tied to analyst evaluations.",
      "Dataset B headlines lack references to macroeconomic indicators (e.g., retail sales figures, unemployment claims) that Dataset A frequently includes alongside company-specific news.",
      "Dataset B consistently pairs ticker symbols with full company names (e.g., '$DIS - Disney') less frequently than Dataset A, which occasionally includes both.",
      "Dataset B excludes consumer-focused financial updates (e.g., holiday debt surveys, e-commerce issues) that appear in Dataset A, maintaining a strict corporate-analyst focus.",
      "Dataset B avoids mixed-topic headlines (e.g., combining earnings with geopolitical impacts) seen in Dataset A, maintaining singular focus on analyst actions and earnings."
    ],
    "llama3.3-70b_zero-shot_bg_test-time-info_v1": [
      "Dataset B headlines consistently mention specific financial institutions (e.g., Barclays, Morgan Stanley, Goldman Sachs) as sources of analyst actions, while Dataset A sometimes omits institutional attribution for market movements or analyst decisions.",
      "Dataset B focuses exclusively on analyst actions (upgrades/downgrades/price targets) and earnings reports as primary drivers of headlines, whereas Dataset A includes broader market-impacting events like regulatory decisions, geopolitical developments, and macroeconomic trends.",
      "Dataset A contains headlines with non-financial contextual elements (e.g., political figures, consumer behavior surveys, unemployment claims) that indirectly affect markets, while Dataset B maintains strict focus on direct corporate/financial instrument developments.",
      "Dataset B uses standardized phrase structures (e.g., '[Company] sees price target [action] at [Institution]') across all samples, while Dataset A employs more varied sentence constructions and narrative styles.",
      "Dataset A includes truncated headlines with ellipses and incomplete thoughts (e.g., 'Transport for London has decided...'), whereas Dataset B headlines are consistently complete, self-contained statements.",
      "Dataset A contains social media elements (hashtags, @mentions, URL fragments) in some samples, while Dataset B maintains professional formatting without platform-specific artifacts.",
      "Dataset B emphasizes institutional consensus metrics (e.g., 'beats analyst expectations', 'in line with estimates') across all earnings-related headlines, while Dataset A sometimes presents numerical data without explicit analyst benchmark comparisons.",
      "Dataset B headlines are predominantly triggered by scheduled corporate events (earnings releases, investor conferences), while Dataset A includes more unscheduled/breaking news developments (lawsuits, regulatory bans, unexpected job cuts).",
      "Dataset A shows greater temporal diversity in market references (historical price comparisons, multi-year trends), while Dataset B focuses narrowly on immediate forward-looking guidance and quarterly performance.",
      "Dataset B maintains strict capitalization consistency for ticker symbols (e.g., $TSLA) throughout all samples, whereas Dataset A occasionally uses lowercase tickers (e.g., $phcef) and inconsistent formatting."
    ]
  },
  "diffs_real_from_synth": {
    "qwen2.5-7b_zero-shot_bg_train-time-info_v1": [
      "Dataset B includes headlines without explicit stock tickers or company names (e.g., 'Another coal-fired power plant closes its doors'), while all Dataset A headlines reference specific entities via tickers or names.",
      "Dataset B contains non-English characters, emojis (e.g., \u2705), or informal social media elements (e.g., hashtags, links), which are absent in Dataset A.",
      "Dataset B features general macroeconomic or geopolitical commentary (e.g., 'The Euro Is A Structural Short...') without tying these trends to specific companies, unlike Dataset A, which links sector/macro trends to named entities.",
      "Dataset B cites non-analyst entities (e.g., regulators, economists, companies) as primary sentiment drivers (e.g., 'UK regulator orders recall...'), whereas Dataset A exclusively uses analyst firms as sources.",
      "Dataset B includes headlines focused on raw financial data (e.g., 'Feb. gold climbs $8.90...') without analyst interpretation, while Dataset A always contextualizes data via analyst consensus or reactions.",
      "Dataset B references non-equity instruments (e.g., gold, oil futures, ETFs like $GLD) as standalone subjects, whereas Dataset A primarily discusses equities and ties ETFs to broader market context.",
      "Dataset B covers non-corporate events (e.g., political elections, legal rulings on licenses) impacting markets, while Dataset A\u2019s legal/regulatory themes are strictly company-specific probes.",
      "Dataset B includes headlines with casual language or non-news content (e.g., 'So Halsey needs a shower'), which are absent in Dataset A\u2019s formal, finance-focused structure.",
      "Dataset B reports company earnings/results without analyst comparisons (e.g., 'BJ\u2019s Wholesale Q3 adj. EPS 41 cents'), whereas Dataset A always contrasts results with analyst expectations.",
      "Dataset B features forward-looking statements based on general market sentiment (e.g., 'Oil rises as OPEC+ weighs action'), while Dataset A ties projections to explicit analyst guidance or firm upgrades/downgrades."
    ],
    "qwen2.5-32b_zero-shot_bg_test-time-info_v1": [
      "Dataset B headlines include non-financial social or political events (e.g., celebrity statements, geopolitical shifts) absent in A.",
      "Dataset B contains fragmented headlines with truncated text or ellipses (e.g., 'Caterpillar Inc.'s 2020 profit outlook trailed analysts' estimates, with the heavy-equipment saying continued econo...'), unlike A's complete sentences.",
      "Dataset B incorporates casual language, slang, or opinionated phrasing (e.g., 'Buying dips,' 'Wow..... Morgan Stanley 13G 6.7%') not seen in A's formal tone.",
      "Dataset B features explicit hashtags, URLs, or social media callouts (e.g., '#markets #business #economy') absent in A.",
      "Dataset B includes real-time market data (e.g., '$SPY 272.99') and granular commodity price updates (e.g., 'Feb. gold climbs $8.90') not emphasized in A.",
      "Dataset B references non-corporate entities (e.g., countries, central banks) as primary subjects (e.g., 'Lebanon Turns to IMF') more frequently than A.",
      "Dataset B headlines often omit direct ties between macroeconomic indicators and specific companies/sectors, unlike A's explicit linkages (e.g., 'crude prices fall, $XLE down 1.5%').",
      "Dataset B includes dividend declarations, retail sales figures, or consumer sentiment metrics without tying them to corporate performance, unlike A's company-centric focus.",
      "Dataset B contains headlines resembling conversational updates or tweets (e.g., '$AIKI my position remains intact. Buying dips.') absent in A's structured news format.",
      "Dataset B features non-actionable headlines (e.g., 'Sony : Amazon names Sony executive...') lacking earnings/analyst actions, whereas A consistently ties events to market impacts."
    ],
    "qwen2.5-7b_zero-shot_bg_test-time-info_v1": [
      "Dataset B includes headlines without explicit stock tickers or financial instrument references, focusing on broader economic or corporate events (e.g., plant closures, regulatory actions).",
      "Dataset B contains headlines with non-analyst-driven market commentary (e.g., macroeconomic trends, geopolitical events) not tied to specific analyst actions or price targets.",
      "Dataset B incorporates non-financial social or political news impacting markets (e.g., Uber\u2019s London license loss, FDA vaping bans), unlike A\u2019s focus on direct financial metrics.",
      "Dataset B includes headlines with informal elements like hashtags, emojis, or conversational phrases (e.g., 'Cash is king,' '\u2705Don\u2019t break the bank'), absent in A\u2019s formal tone.",
      "Dataset B features granular commodity price updates (e.g., 'Feb. gold climbs $8.90') as standalone headlines, whereas A embeds commodity trends within analyst actions or ETF updates.",
      "Dataset B references non-corporate entities (e.g., governments, central banks) as primary news drivers more frequently than A\u2019s company/analyst-centric focus.",
      "Dataset B includes headlines about non-equity financial instruments (e.g., forex pairs like AUD/USD, crypto assets like Ether) not emphasized in A.",
      "Dataset B contains headlines with explicit numerical macroeconomic data (e.g., unemployment claims, retail sales figures) as standalone topics, unlike A\u2019s earnings-specific metrics.",
      "Dataset B integrates non-English terms or region-specific events (e.g., Brazil\u2019s political shifts, Hong Kong protests) more prominently than A\u2019s U.S.-centric corporate focus.",
      "Dataset B uses platform-specific language (e.g., 'premarket,' 'after hours,' '52-week high/low') as standalone status updates, whereas A ties these terms to analyst actions or company results."
    ],
    "llama3.3-70b_zero-shot_bg_v1": [
      "Dataset B includes headlines referencing geopolitical events or macroeconomic trends (e.g., Brexit, oil reserves) not directly tied to corporate actions, unlike Dataset A\u2019s company-specific focus.",
      "Dataset B incorporates informal elements like emojis (\u2705), hashtags (#markets), or conversational phrases (\"here's an idea\"), while Dataset A maintains strict formal language.",
      "Dataset B contains headlines with truncated text or ellipses (e.g., \"Boeing announces additional order for 737 MAX planes...\"), suggesting social media or real-time feed sources, unlike Dataset A\u2019s complete sentences.",
      "Dataset B covers non-financial sectors (e.g., real estate, consumer retail, utilities) in addition to tech/finance, whereas Dataset A focuses predominantly on tech and financial firms.",
      "Dataset B includes announcements about regulatory actions (e.g., recalls, licensing revocations) affecting companies, while Dataset A emphasizes analyst ratings and earnings metrics.",
      "Dataset B features headlines with qualitative projections (e.g., \"could increase unemployment claims\") without always specifying exact metrics, unlike Dataset A\u2019s reliance on concrete numerical data.",
      "Dataset B references legal, political, or social developments (e.g., impeachment impact on ad revenue) as market drivers, whereas Dataset A ties outcomes directly to financial performance.",
      "Dataset B mentions dividends, stock splits, or non-earnings financial updates (e.g., \"$JKHY increases quarterly dividend\") more frequently than Dataset A, which prioritizes earnings and price targets.",
      "Dataset B includes consumer-centric metrics (e.g., holiday debt surveys, shopper trends) impacting markets, while Dataset A focuses on institutional investor actions.",
      "Dataset B uses non-standard financial abbreviations or symbols (e.g., \"$XAUUSD:CUR\") and mixed formatting, whereas Dataset A consistently employs formal tickers and financial terminology."
    ],
    "qwen2.5-32b_zero-shot_v1": [
      "Dataset B headlines frequently include stock tickers (e.g., $MAR, $RHHBY) and specific price/percentage movements (e.g., gold settles at $1,481.20/oz), while Dataset A uses full company names and general terms like 'plummet' or 'surge'.",
      "Dataset B contains non-financial or tangential content (e.g., Antonio Brown\u2019s comments, Ruth Bader Ginsburg\u2019s hospitalization), whereas Dataset A strictly focuses on financial/economic events and their market impacts.",
      "Dataset B headlines often reference granular financial metrics (e.g., EPS misses by $0.33, revenue misses) without always tying them to stock price reactions, while Dataset A explicitly links earnings/results to share price changes.",
      "Dataset B includes social media-style updates, hashtags, and mentions (e.g., @KristinReports, #markets), while Dataset A maintains a formal, traditional news tone.",
      "Dataset B features editorial opinions or analyst quotes (e.g., 'BlackRock says...', 'Baird bullish on...'), whereas Dataset A reports facts neutrally (e.g., 'Fed Announces...').",
      "Dataset B covers niche or non-equity financial instruments (e.g., gold futures, AUD/USD forex, crypto), while Dataset A focuses on equities, central bank policies, and macroeconomic indicators.",
      "Dataset B includes headlines about non-publicly-traded entities (e.g., Uber\u2019s London license, Myanmar tourism), while Dataset A centers on publicly traded companies and institutional actions.",
      "Dataset B uses conversational or fragmented language (e.g., 'Buying dips.', 'Wow..... Morgan Stanley 13G 6.7%'), whereas Dataset A employs complete, structured sentences.",
      "Dataset B contains corporate operational updates (e.g., dividend raises, project awards) without explicit market reactions, while Dataset A emphasizes immediate stock price effects.",
      "Dataset B includes promotional or non-news elements (e.g., 'Get the best deals...', links to articles), while Dataset A is exclusively news-driven."
    ],
    "qwen2.5-32b_few-shot_bg_train-time-info_v1": [
      "Dataset B includes headlines specifically referencing commodities (e.g., gold, oil) with explicit price changes, while A focuses solely on company-specific financial metrics without standalone commodity updates.",
      "Headlines in B frequently mention ETFs or market indices (e.g., $SPY, $QQQ), whereas A exclusively references individual company tickers.",
      "B incorporates social media elements like hashtags (#markets) and platform citations (e.g., Seeking Alpha), which are absent in A's formal headlines.",
      "B features real-time market updates with timestamps (e.g., '9:45 am') or intraday price movements, while A only includes scheduled event times (e.g., earnings calls).",
      "B highlights macroeconomic indicators (e.g., consumer confidence, retail sales) as standalone topics, whereas A ties economic factors directly to company performance.",
      "Forward-looking statements in B address broad market/economy trends (e.g., 'extraordinary market returns are over'), while A's focus on company-specific guidance (e.g., 'raises full-year outlook').",
      "B cites external sources or authors (e.g., 'by @KristinReports'), while A omits such attributions in headlines.",
      "B discusses regulatory/economic actions impacting non-corporate entities (e.g., countries, central banks), whereas A limits regulatory mentions to company-specific challenges.",
      "B includes headlines about financial instruments like bonds, notes, and dividends (e.g., 'senior notes offering'), while A centers on equities and analyst actions.",
      "B uses colloquial language (e.g., 'Cash is king') and informal tone, contrasting with A's strictly formal, corporate announcement style."
    ],
    "llama3.3-70b_few-shot_bg_train-time-info_v1": [
      "Dataset B headlines include social media elements such as hashtags, mentions (@), and emojis (e.g., checkmarks), which are absent in Dataset A.",
      "Dataset B contains headlines with explicit geopolitical or macroeconomic commentary (e.g., Brexit impacts, U.S.-China trade tensions), whereas Dataset A focuses on company-specific analyst actions.",
      "Dataset B includes headlines with raw commodity price updates (e.g., gold, oil) and currency movements, while Dataset A emphasizes stock-specific price targets and analyst ratings.",
      "Dataset B features non-equity financial instruments (e.g., ETFs, bonds, currencies) and broader market indices, whereas Dataset A is strictly equity-focused with ticker-specific news.",
      "Dataset B incorporates sensational or non-financial events (e.g., Antonio Brown\u2019s comments, Ruth Bader Ginsburg\u2019s hospitalization) impacting markets, unlike Dataset A\u2019s strict focus on financial metrics.",
      "Dataset B includes headlines with real-time trading data (e.g., intraday price levels, volume spikes), while Dataset A focuses on post-event summaries (e.g., earnings results, rating changes).",
      "Dataset B references international regulatory actions (e.g., Transport for London revoking Uber\u2019s license) more frequently than Dataset A, which emphasizes U.S.-centric analyst firms.",
      "Dataset B contains headlines structured as conversational updates or opinion pieces (e.g., \u201cCash is king...\u201d), whereas Dataset A uses formal, standardized analyst report language.",
      "Dataset B includes headlines with explicit forward-looking macroeconomic forecasts (e.g., recession risks, unemployment claims) beyond company-specific guidance in Dataset A.",
      "Dataset B integrates non-English terms or global market references (e.g., Myanmar tourism, South Africa\u2019s rand) more prominently than Dataset A\u2019s U.S.-focused corporate updates."
    ],
    "llama3.3-70b_zero-shot_bg_train-time-info_v1": [
      "Dataset B includes headlines with informal elements like emojis (\u2705), hashtags (#markets), and social media-style formatting, absent in A.",
      "Dataset B covers non-financial events (e.g., regulatory actions, geopolitical developments) unrelated to analyst actions or earnings, unlike A's focus on corporate metrics.",
      "Dataset B references non-equity instruments (e.g., commodities like gold, forex pairs like AUD/USD) not emphasized in A.",
      "Dataset B features global market updates (e.g., South Africa's rand, Hong Kong elections) beyond A\u2019s U.S.-centric focus.",
      "Dataset B includes editorial/opinion-driven headlines (e.g., BlackRock\u2019s market outlook) rather than A\u2019s factual analyst actions.",
      "Dataset B contains non-corporate entity news (e.g., IMF, FDA actions) less prevalent in A\u2019s company-specific coverage.",
      "Dataset B uses casual language (e.g., 'ghost white women') and non-news content (e.g., celebrity tweets) absent in A.",
      "Dataset B highlights event-driven market reactions (e.g., Super Bowl channel loss impacting Roku) rather than A\u2019s analyst-driven updates.",
      "Dataset B includes headlines with incomplete sentences, bullet points, or promotional links, unlike A\u2019s structured formatting.",
      "Dataset B features broader macroeconomic commentary (e.g., unemployment claims, consumer debt) without direct ties to specific firms or tickers, whereas A ties metrics explicitly to companies."
    ],
    "qwen2.5-32b_few-shot_bg_test-time-info_v1": [
      "All headlines in Dataset A include at least one company stock ticker symbol (e.g., $NVDA), while Dataset B contains headlines with no ticker symbols.",
      "Dataset A headlines consistently reference specific analyst actions (e.g., upgrades, downgrades) or earnings results relative to analyst estimates, whereas Dataset B includes broader market/economic updates without analyst involvement.",
      "Dataset B includes headlines with non-financial content (e.g., social/political events, geopolitical news), while Dataset A focuses strictly on corporate financial or operational updates.",
      "Dataset A headlines always mention granular financial metrics (e.g., EPS, revenue beats/misses), while Dataset B sometimes lacks explicit quantitative financial figures.",
      "Dataset B contains headlines with incomplete sentences, truncated text (e.g., ellipses), or external links (e.g., URLs), which are absent in Dataset A.",
      "Dataset A headlines use standardized formatting (e.g., ticker first, analyst action, financial metric), while Dataset B exhibits varied structures (e.g., questions, opinions, conversational language).",
      "Dataset B includes macroeconomic indicators (e.g., commodity prices, PMI data) unrelated to specific companies, while Dataset A ties all metrics to corporate performance.",
      "All Dataset A headlines explicitly state market reactions (e.g., 'shares rise/drop'), whereas Dataset B sometimes omits direct stock price impact statements.",
      "Dataset B features headlines referencing non-corporate entities (e.g., governments, regulators) as primary actors, unlike Dataset A\u2019s company-centric focus.",
      "Dataset A headlines strictly adhere to formal financial terminology, while Dataset B occasionally uses casual language, slang, or non-standard phrases (e.g., 'Santa Claus rally')."
    ],
    "qwen2.5-7b_zero-shot_bg_v1": [
      "Headlines in Dataset B frequently include social media elements such as hashtags, links, and checkmark emojis, which are absent in Dataset A.",
      "Dataset B covers a broader range of non-equity financial instruments including commodities (e.g., gold, oil), forex pairs (e.g., AUD/USD), and cryptocurrencies (e.g., Ether), while Dataset A focuses predominantly on equities.",
      "Dataset B includes headlines about non-corporate regulatory actions (e.g., product recalls, vaping bans) unrelated to financial markets, whereas Dataset A\u2019s regulatory topics are tied to market impacts (e.g., FDA approvals for stocks).",
      "Headlines in Dataset B often present raw financial data (e.g., EPS misses, revenue figures) without contextual analyst commentary, unlike Dataset A, which ties metrics to analyst expectations or sentiment.",
      "Dataset B features geopolitical or macroeconomic events (e.g., Brexit, OPEC+ decisions) as standalone topics without direct linkage to specific companies, while Dataset A connects such events to corporate performance.",
      "Dataset B includes headlines about non-public companies or entities without stock tickers (e.g., WeWork, Uber), whereas Dataset A focuses exclusively on publicly traded firms with tickers.",
      "Headlines in Dataset B frequently reference retail investor metrics (e.g., P/E ratios, dividends) and consumer trends (e.g., holiday debt), which are rare or absent in Dataset A.",
      "Dataset B contains truncated headlines with ellipses, suggesting excerpts from longer articles or real-time feeds, unlike the polished, complete headlines in Dataset A.",
      "Dataset B covers labor market issues (e.g., gig workers affecting unemployment claims) and layoffs in non-financial sectors, topics not emphasized in Dataset A.",
      "Headlines in Dataset B mention sector-specific indices (e.g., Nasdaq-100 futures) and niche markets (e.g., luxury retail M&A) more diversely than Dataset A, which prioritizes broad indices like the S&P 500."
    ],
    "qwen2.5-32b_zero-shot_bg_train-time-info_v1": [
      "Dataset B includes headlines without explicit stock ticker symbols, whereas all samples in Dataset A contain tickers.",
      "Dataset B incorporates social media elements (e.g., hashtags, URLs, @mentions), while Dataset A avoids these entirely.",
      "Dataset B features non-corporate news impacting markets (e.g., geopolitical events, celebrity statements), unlike Dataset A\u2019s strict corporate/financial focus.",
      "Dataset B includes commodity price updates (e.g., gold, oil futures) and macroeconomic metrics, while Dataset A emphasizes company-specific earnings and analyst actions.",
      "Dataset B contains editorial/opinion-driven headlines (e.g., \"The FT View\"), whereas Dataset A maintains neutral, fact-based reporting.",
      "Dataset B uses informal language and grammatical fragments (e.g., \"Oil Prices' Hard Landing Initializing Descent\"), while Dataset A uses formal, structured phrasing.",
      "Dataset B references real-time market data (e.g., timestamps, intraday price movements), unlike Dataset A\u2019s post-event summaries.",
      "Dataset B covers broader labor market trends (e.g., gig workers, unemployment claims), whereas Dataset A focuses on company-level operational metrics.",
      "Dataset B includes non-English entities (e.g., European stocks, African retailers) without tickers, while Dataset A centers on U.S.-centric tickers and firms.",
      "Dataset B occasionally merges non-financial cultural/political events (e.g., RBG hospitalization) with market impact, a theme absent in Dataset A."
    ],
    "llama3.1-8b_zero-shot_bg_v1": [
      "Dataset B includes headlines about non-corporate entities (e.g., countries, regulators) without linking to stock-specific impacts, unlike A's focus on company-tied market reactions.",
      "Headlines in B frequently mention granular financial metrics (e.g., exact commodity prices, revenue figures) without analyst commentary or stock implications, whereas A ties metrics to analyst actions.",
      "Dataset B covers non-equity financial instruments (e.g., gold, oil futures, forex) absent in A, which focuses exclusively on equities and corporate news.",
      "B includes non-financial societal/consumer trends (e.g., holiday debt surveys, gig worker unemployment) not directly tied to market movements, unlike A's corporate-event-driven headlines.",
      "Dataset B features headlines with promotional formatting (e.g., checkmarks, hashtags, links) absent in A's standardized news style.",
      "B contains non-English characters/regional political events (e.g., Brexit, EU inflation) without explicit market sentiment links, while A emphasizes global trends affecting stocks.",
      "Headlines in B reference legal/regulatory updates (e.g., FDA bans, license recalls) without connecting them to stock sentiment, unlike A's focus on regulatory impacts on shares.",
      "Dataset B includes corporate actions (e.g., dividends, M&A) without tying them to price reactions, whereas A explicitly links such events to stock performance.",
      "B features non-analyst-driven data (e.g., earnings beats/misses without price targets) and neutral corporate updates, contrasting with A's emphasis on analyst-driven market narratives.",
      "Dataset B incorporates social/media-style content (e.g., celebrity tweets, sports references) irrelevant to financial analysis, absent in A's strictly market-focused headlines."
    ],
    "llama3.1-8b_zero-shot_v1": [
      "Dataset B headlines frequently include stock ticker symbols (e.g., $MAR, $RHHBY), while Dataset A uses company names without tickers.",
      "Dataset B contains granular financial metric specifics (e.g., EPS misses by $0.33, precise commodity price movements), whereas Dataset A reports outcomes generally (e.g., 'earnings beat').",
      "Dataset B includes dividend declarations, stock splits, and corporate debt offerings (e.g., 'Vereit files for senior notes offering'), which are absent in Dataset A.",
      "Dataset B headlines reference niche financial instruments (e.g., ETFs, futures contracts) and sector-specific funds, unlike Dataset A\u2019s focus on broad indices.",
      "Dataset B integrates informal elements like hashtags, promotional links, and social media-style commentary, absent in Dataset A\u2019s formal tone.",
      "Dataset B highlights non-macroeconomic commodity price drivers (e.g., 'Feb. gold climbs $8.90') as standalone news, while Dataset A ties commodities to macroeconomic trends.",
      "Dataset B cites minor operational updates (e.g., 'Acura Pharmaceuticals reports Q2 results') without explicit market impact, unlike Dataset A\u2019s emphasis on high-impact events.",
      "Dataset B features explicit analyst price targets and valuation metrics (e.g., 'price target raised to $335'), whereas Dataset A mentions analyst actions without numerical targets.",
      "Dataset B includes non-financial news with indirect market implications (e.g., 'Ruth Bader Ginsburg hospitalized'), unlike Dataset A\u2019s strict focus on direct financial catalysts.",
      "Dataset B covers cryptocurrency regulation and niche sectors (e.g., cannabis, vaping bans) more frequently than Dataset A."
    ],
    "llama3.3-70b_few-shot_v1": [
      "Dataset B includes headlines with promotional or non-news content (e.g., \u2705Get the best deals...) absent in A.",
      "Dataset B contains headlines with casual or conversational language (e.g., 'Cash is king...') not seen in A's formal tone.",
      "Dataset B references non-corporate entities (e.g., cities, countries) impacting markets indirectly (e.g., Lebanon's IMF turn) without direct stock ties.",
      "Dataset B includes social media-style updates (e.g., hashtags, @mentions) absent in A's traditional news format.",
      "Dataset B features headlines without explicit stock price impacts (e.g., plant closures) unlike A's strict stock-movement focus.",
      "Dataset B cites niche or non-financial events (e.g., Antonio Brown's comments) irrelevant to markets, unlike A's focused scope.",
      "Dataset B lacks numerical specificity in some headlines (e.g., 'Qiagen jumps') compared to A's consistent metrics (e.g., 'Plummets 10%').",
      "Dataset B includes forward-looking analyst opinions without immediate market reactions (e.g., 'Sees 25% Upside') vs. A's real-time earnings links.",
      "Dataset B uses ticker symbols inconsistently (e.g., '$MAR - Marriott...') mixed with non-standard formats, unlike A's uniform usage.",
      "Dataset B covers regulatory actions unrelated to stock catalysts (e.g., Uber\u2019s London license) vs. A\u2019s direct regulatory-stock links."
    ],
    "qwen2.5-32b_few-shot_bg_v1": [
      "Dataset B includes headlines with promotional or non-news content (e.g., guides, advertisements, social media links), absent in Dataset A.",
      "Dataset B contains headlines focused on non-corporate entities (e.g., geopolitical events, regulatory recalls, public health issues), while A focuses on company-specific financial metrics.",
      "Dataset B references retail-specific metrics (e.g., same-store sales, dividend yields, P/E ratios) more frequently than A, which emphasizes analyst ratings and earnings beats/misses.",
      "Dataset B includes headlines with casual language, humor, or non-financial topics (e.g., celebrity statements, sports references), unlike A's formal, finance-centric tone.",
      "Dataset B features commodity price updates (e.g., gold, oil) and macroeconomic data (e.g., unemployment claims) as standalone topics, whereas A ties these to company performance.",
      "Dataset B covers mergers, legal disputes, and regulatory filings (e.g., FDA approvals, license revocations) more prominently than A, which prioritizes analyst actions and earnings.",
      "Dataset B includes international or non-U.S. market developments (e.g., European Central Bank decisions, emerging markets) as primary topics, while A focuses on U.S. tech giants and institutional analysts.",
      "Dataset B incorporates user-generated content markers (e.g., hashtags, URLs, @mentions) and opinion columns (e.g., FT View), which are absent in A's structured headlines.",
      "Dataset B highlights dividend declarations, stock splits, and retail investor metrics (e.g., hedge fund activity) more than A, which centers on price targets and revenue guidance.",
      "Dataset B features standalone updates on non-earnings corporate actions (e.g., partnerships, project awards, dividend increases) without analyst attribution, unlike A's analyst-driven narratives."
    ],
    "qwen2.5-32b_few-shot_v1": [
      "Dataset B headlines frequently include stock tickers prefixed with '$' symbols at the start (e.g., '$MAR', '$RHHBY'), while Dataset A typically mentions tickers sparingly and without leading symbols in most samples.",
      "Dataset B contains headlines with informal elements like emojis (\u2705), hashtags (#markets), and social media-style formatting, whereas Dataset A maintains formal/professional language throughout.",
      "Dataset B includes granular financial instrument updates (e.g., 'Feb. gold climbs $8.90', 'AUD/USD Weekly Price Forecast') absent in Dataset A's macro-focused headlines.",
      "Dataset B features niche corporate actions like dividend declarations, patent litigation outcomes, and specific regulatory filings that don't appear in Dataset A's broader sector narratives.",
      "Dataset B headlines incorporate real-time trading updates (e.g., 'premarket', 'after hours', intraday price movements) unlike Dataset A's focus on completed quarterly results and scheduled events.",
      "Dataset B contains non-English characters/terms (e.g., Chinese text in stock price mentions, Portuguese names) and country-specific hyperlocal news absent from Dataset A's global perspective.",
      "Dataset B includes opinion-based statements from analysts/companies (e.g., 'BlackRock says...', 'economist predicts...') within headlines, while Dataset A presents facts without attribution.",
      "Dataset B shows frequent earnings guidance withdrawals/adjustments (e.g., 'withdraws FY20 guidance') and granular EPS misses/beats by exact cents, whereas Dataset A focuses on directional performance (beat/miss).",
      "Dataset B contains non-market content like celebrity gossip ('Antonio Brown ghosting white women') and political commentary absent from Dataset A's strictly financial focus.",
      "Dataset B features technical trading terminology (e.g., 'retracement zones', '52-week highs', 'volume analysis') missing from Dataset A's strategic/macroeconomic language."
    ],
    "qwen2.5-32b_zero-shot_bg_v1": [
      "Dataset B includes headlines with non-English text snippets and promotional content (e.g., '\u2705Get the best deals. \u2705Don't break the bank.') not present in Dataset A.",
      "Dataset B contains explicit commodity price updates (e.g., 'Feb. gold climbs $8.90, or 0.6%, to settle at $1,481.20/oz') absent in Dataset A.",
      "Dataset B features headlines about non-corporate entities (e.g., countries, cities, or regulatory bodies like 'Lebanon Turns to IMF') unlike Dataset A's company-centric focus.",
      "Dataset B includes social media tags, hashtags (e.g., '#markets #business #economy'), and direct user references (e.g., '@KristinReports') not found in Dataset A.",
      "Dataset B covers niche financial instruments (e.g., ETFs like 'First Trust Cloud Computing ETF') and non-equity assets (e.g., cryptocurrencies) absent in Dataset A.",
      "Dataset B references granular retail metrics (e.g., 'November retail sales increase less than expected') rather than corporate earnings guidance prevalent in Dataset A.",
      "Dataset B includes headlines structured as opinion pieces or commentary (e.g., 'The days of extraordinary market returns are over, says BlackRock') not seen in Dataset A's factual analyst actions.",
      "Dataset B contains explicit dividend declarations (e.g., 'INVESCO Perpetual UK Smaller Investment Trust : Dividend Declaration') absent in Dataset A.",
      "Dataset B features non-financial consumer news (e.g., 'Free agent Antonio Brown says it's time to ghost white women') irrelevant to corporate performance metrics in Dataset A.",
      "Dataset B uses colloquial trading jargon (e.g., '$KRUS fresh breakout and at resistance') and informal stock commentary lacking in Dataset A's formal analyst language."
    ],
    "qwen2.5-7b_zero-shot_v1": [
      "Dataset B headlines include precise numerical data (e.g., '$8.90, or 0.6%, to settle at $1,481.20/oz') more frequently than A, which focuses on directional trends without exact figures.",
      "Dataset B incorporates informal social media elements (e.g., hashtags, URLs, tweet-like syntax like 'Continue reading: #markets') absent in A's formal news style.",
      "Dataset B references niche or non-corporate entities (e.g., Uber\u2019s London license, specific ETFs) more often, while A centers on major indices and blue-chip companies.",
      "Dataset B includes granular financial instrument updates (e.g., 'dividend raised to $0.43/share', 'senior notes offering') not emphasized in A\u2019s broader market narratives.",
      "Dataset B features geopolitical or localized events (e.g., Brexit, Turkish banking) directly tied to market moves, whereas A focuses on macroeconomic themes like GDP or unemployment.",
      "Dataset B headlines frequently mention dividend declarations, stock price targets, and analyst actions (e.g., 'price target raised to $335') with explicit figures, unlike A\u2019s general references to 'upgrades' or 'downgrades'.",
      "Dataset B includes non-financial social/political events (e.g., RBG hospitalization, Antonio Brown comments) impacting markets, while A avoids such tangents.",
      "Dataset B uses fragmented, conversational language (e.g., 'Buying dips.', 'Wow..... Morgan Stanley 13G 6.7%') contrasting with A\u2019s polished, editorial tone.",
      "Dataset B highlights micro-level company metrics (e.g., 'Q3 adj. EPS 41 cents; FactSet consensus 40 cents') more than A, which prioritizes sector-wide performance.",
      "Dataset B contains non-English characters, truncated text (e.g., 'transit offic\u2026'), and informal abbreviations (e.g., 'revs'), whereas A maintains consistent formatting and formal terminology."
    ],
    "llama3.1-8b_zero-shot_bg_train-time-info_v1": [
      "Dataset B headlines include non-financial news elements (e.g., social commentary, geopolitical events) absent in A (e.g., 'Free agent Antonio Brown says it's time to ghost white women').",
      "Dataset B contains casual language and informal phrases (e.g., 'Santa Claus rally') compared to A's strictly formal financial terminology.",
      "Dataset B references non-equity instruments (e.g., gold, oil futures, currencies) more frequently than A, which focuses on equities.",
      "Dataset B headlines often omit explicit percentage quantification for price movements (e.g., 'Roku shares dropped') versus A's precise figures (e.g., 'slumps 15%').",
      "Dataset B includes global macroeconomic/political developments (e.g., Brexit, Chinese cash impacts) beyond A's narrower corporate/sector focus.",
      "Dataset B integrates social media elements (hashtags, handles like @KristinReports) absent in A's institutional tone.",
      "Dataset B features non-English corporate entities (e.g., 'Air Water : Notice of Execution...') unlike A's U.S.-centric tickers.",
      "Dataset B headlines lack granular analyst attribution (e.g., 'SunTrust RH' vs. A's 'Morgan Stanley upgrades to buy').",
      "Dataset B includes speculative market commentary (e.g., 'Can WTI crude fall beneath 50 cents?') not seen in A's fact-driven updates.",
      "Dataset B covers non-corporate entities (e.g., governments, NGOs) more prominently than A's exclusive corporate focus."
    ],
    "qwen2.5-7b_few-shot_v1": [
      "Dataset B headlines frequently include stock ticker symbols (e.g., $MAR, $RHHBY), while Dataset A does not use tickers in headlines.",
      "Dataset B contains fragmented or abbreviated updates (e.g., price changes, short alerts like 'Feb. gold climbs $8.90') without contextual explanations, whereas Dataset A headlines are more complete sentences with full context.",
      "Dataset B includes casual or non-financial content (e.g., 'So Halsey needs a shower', 'Free agent Antonio Brown says...') that diverges from strictly financial news, unlike Dataset A.",
      "Dataset B headlines often reference specific technical analysis terms (e.g., 'resistance levels', 'AUD/USD Weekly Price Forecast'), while Dataset A focuses on broader market trends without technical jargon.",
      "Dataset B incorporates social media elements such as user mentions (@KristinReports), hashtags (#markets), and embedded links, which are absent in Dataset A.",
      "Dataset B includes granular numerical updates (e.g., 'EPS 41 cents; FactSet consensus 40 cents') with precise metrics, whereas Dataset A emphasizes directional trends (e.g., 'beats expectations').",
      "Dataset B features niche or lesser-known companies (e.g., 'Plato Gold', 'Sally Beauty Holdings'), while Dataset A focuses on major corporations (e.g., Tesla, Apple) and macroeconomic themes.",
      "Dataset B headlines frequently mention regulatory filings (e.g., 'files for senior notes offering') and dividend declarations, whereas Dataset A emphasizes regulatory actions' market-wide impacts.",
      "Dataset B includes non-English characters and unedited text (e.g., Chinese script, truncated sentences), while Dataset A maintains formal, polished language throughout.",
      "Dataset B contains real-time trading updates (e.g., 'premarket', 'after hours') and futures/commodity price ticks, whereas Dataset A reports post-event outcomes and broader implications."
    ],
    "llama3.3-70b_few-shot_bg_v1": [
      "Dataset B includes non-English characters and phrases (e.g., Chinese political references, Portuguese names) absent in A.",
      "Dataset B contains headlines unrelated to analyst actions (e.g., Antonio Brown comments, Ruth Bader Ginsburg hospitalization) not seen in A.",
      "Dataset B features non-financial entities (e.g., Uber, FDA, political figures) as primary subjects, while A focuses strictly on corporations/institutions.",
      "Dataset B includes explicit dividend declarations and commodity price updates (e.g., gold/oil prices), which are rare/absent in A.",
      "Dataset B uses informal formatting (emojis, hashtags, URLs, Twitter handles) while A maintains formal financial reporting style.",
      "Dataset B references non-equity instruments (e.g., AUD/USD forex, U.S. Dollar Index Futures) not mentioned in A.",
      "Dataset B includes consumer-focused metrics (e.g., retail sales, holiday debt surveys) rather than purely corporate metrics as in A.",
      "Dataset B contains speculative future-tense predictions (e.g., 'seen running up 20% gain') unlike A's backward-looking analyst actions.",
      "Dataset B cites non-analyst sources (e.g., Reuters, Seeking Alpha, social media) whereas A exclusively uses institutional analysts.",
      "Dataset B addresses niche sectors (e.g., vaping bans, peanut allergy drugs) absent in A's tech/energy/auto focus."
    ],
    "llama3.1-8b_few-shot_v1": [
      "Dataset B headlines frequently include stock tickers and specific price targets (e.g., $MAR, $335 price target), while Dataset A uses formal company names without ticker symbols.",
      "Dataset B contains informal social media elements (e.g., hashtags, emojis, links) and conversational phrases (e.g., 'So Halsey needs a shower'), whereas Dataset A maintains a formal, traditional news tone.",
      "Dataset B emphasizes granular financial metrics (e.g., EPS misses by $0.33, dividend declarations) in headlines, while Dataset A focuses on broader earnings trends (e.g., 'Amazon's Earnings Soar Past Expectations').",
      "Dataset B includes opinion-based statements (e.g., 'The days of extraordinary market returns are over, says BlackRock'), whereas Dataset A headlines are strictly factual and event-driven.",
      "Dataset B features technical trading terms (e.g., 'resistance levels,' 'AUD/USD Weekly Price Forecast'), while Dataset A avoids market-specific jargon and targets general investors.",
      "Dataset B headlines often reference minor corporate actions (e.g., dividend hikes, senior notes offerings) and niche regulatory updates (e.g., product recalls), whereas Dataset A highlights macroeconomic policies and geopolitical events.",
      "Dataset B includes headlines structured as real-time updates (e.g., '9:45 am $SPY 272.99') or intraday trading data, absent in Dataset A's broader market summaries.",
      "Dataset B contains non-news content like advertisements (e.g., 'Get the best deals...'), blog excerpts, and investor commentary, while Dataset A is exclusively news-focused.",
      "Dataset B disproportionately covers hyper-specific industry developments (e.g., 'Lithium-Ion Battery Tech Breakthrough') compared to Dataset A's emphasis on sector-wide trends (e.g., energy sector developments).",
      "Dataset B headlines frequently cite niche or regional companies (e.g., 'Sally Beauty Holdings,' 'Plato Gold'), whereas Dataset A focuses on major global corporations (e.g., Apple, Amazon, Microsoft)."
    ],
    "llama3.1-8b_few-shot_bg_train-time-info_v1": [
      "Dataset A headlines consistently start with the stock ticker using the '$' symbol, while Dataset B headlines variably place tickers within the text or omit them entirely.",
      "Dataset A always references specific analyst firms (e.g., Morgan Stanley, Deutsche Bank), whereas Dataset B sometimes lacks explicit analyst/firm mentions.",
      "Dataset B includes non-professional elements (e.g., checkmarks \u2705, hashtags #markets, hyperlinks), absent in Dataset A.",
      "Dataset B headlines emphasize geopolitical/global events (e.g., Brexit, U.K. regulations), while Dataset A focuses on corporate-specific developments.",
      "Dataset A headlines frequently mention dividend announcements or consecutive payments, while Dataset B does not reference dividends.",
      "Dataset A provides granular financial metrics (e.g., 'EPS misses by $0.33'), whereas Dataset B often reports earnings/results without exact figures.",
      "Dataset B highlights broader regulatory actions (e.g., Uber\u2019s license revocation) affecting markets, while Dataset A\u2019s regulatory mentions are company-specific (e.g., FDA approvals).",
      "Dataset B references non-corporate entities (e.g., governments, central banks) impacting markets, unlike Dataset A, which centers on corporate entities and analysts.",
      "Dataset A consistently states precise price targets (e.g., 'target cut to $95'), while Dataset B often describes price movements qualitatively (e.g., 'slides 5%').",
      "Dataset A headlines follow a structured format (ticker + analyst action + metric), while Dataset B uses varied structures, including opinion pieces or macroeconomic commentary."
    ],
    "qwen2.5-7b_few-shot_bg_v1": [
      "Dataset B includes headlines with non-standard formatting elements like hashtags, emojis, or embedded links (e.g., '#markets #business #economy', '\u2705Get the best deals') absent in A.",
      "Dataset B contains headlines focused on non-analyst-driven corporate events (e.g., plant closures, regulatory recalls) without explicit financial metrics or analyst actions.",
      "Dataset B features headlines with incomplete or truncated text (e.g., 'Caterpillar Inc.'s 2020 profit outlook trailed analysts' estimates, with the heavy-equipment saying continued econo...') due to formatting issues.",
      "Dataset B includes headlines with non-English characters or phrases (e.g., Chinese text) not observed in A.",
      "Dataset B covers broader consumer behavior or societal trends (e.g., holiday debt surveys, gig worker impacts) rather than direct corporate financial performance.",
      "Dataset B references commodity-specific price movements (e.g., gold, oil futures) more frequently than A, which focuses on equities.",
      "Dataset B contains headlines with conversational or opinion-based statements (e.g., 'Cash is king') from non-analyst sources like economists or general commentators.",
      "Dataset B includes headlines about non-corporate entities (e.g., countries, regulatory bodies) impacting markets (e.g., Lebanon\u2019s IMF talks, Turkish banks).",
      "Dataset B reports on legal, health, or geopolitical events (e.g., FDA bans, Uber\u2019s license loss) without tying them to stock-specific financial outcomes.",
      "Dataset B features headlines with irregular structural patterns (e.g., social media handles, partial sentences) compared to A\u2019s formulaic analyst-action-driven structure."
    ],
    "llama3.1-8b_few-shot_bg_v1": [
      "Dataset B includes headlines without explicit stock ticker mentions, covering general economic events (e.g., coal plant closures, oil reserves), whereas A consistently references tickers.",
      "Dataset B incorporates informal or non-standard elements (e.g., emojis, casual language like 'So Halsey needs a shower'), while A maintains formal, structured financial reporting.",
      "Dataset B features non-equity asset coverage (e.g., gold prices, oil futures), whereas A focuses exclusively on equities and company-specific metrics.",
      "Dataset B includes headlines with truncated text or social media-style formatting (e.g., hashtags, ellipses), indicating real-time or platform-specific sourcing, unlike A's polished headlines.",
      "Dataset B references niche or non-traditional financial entities (e.g., crypto exchanges, specific ETFs) not prominently featured in A, which centers on major corporations and indices.",
      "Dataset B contains headlines on non-financial events impacting markets indirectly (e.g., celebrity statements, political figures' health), while A ties events directly to financial implications.",
      "Dataset B highlights localized or regional economic developments (e.g., real estate in Los Angeles, Turkish banks) without global macroeconomic framing, unlike A's global emphasis.",
      "Dataset B includes raw data snippets (e.g., 'Feb. gold climbs $8.90') without analyst interpretation, whereas A pairs metrics with expert analysis or reactions.",
      "Dataset B features non-English characters or region-specific content (e.g., Czech Republic reports, Hong Kong protests), absent in A\u2019s U.S.-centric, English-only headlines.",
      "Dataset B covers granular corporate actions (e.g., dividend declarations, retail sales) without tying them to analyst ratings, while A explicitly links events to analyst actions."
    ],
    "llama3.3-70b_zero-shot_v1": [
      "Dataset B includes explicit stock ticker symbols (e.g., $MAR, $RHHBY) in headlines, while Dataset A does not reference tickers.",
      "Dataset B headlines frequently cite precise numerical metrics (e.g., EPS misses by $0.33, gold prices at $1,481.20/oz), whereas Dataset A uses generalized terms like 'beats expectations' without exact figures.",
      "Dataset B incorporates non-financial news (e.g., Antonio Brown\u2019s statements, Ruth Bader Ginsburg\u2019s hospitalization) that indirectly impact markets, absent in Dataset A\u2019s strictly financial focus.",
      "Dataset B highlights regulatory/legal developments (e.g., FDA bans, Uber\u2019s license revocation) as market drivers, while Dataset A emphasizes macroeconomic or earnings-related catalysts.",
      "Dataset B features dividend declarations and specific distributions (e.g., 'declares $0.1404 dividend'), whereas Dataset A focuses on earnings outcomes and stock price reactions.",
      "Dataset B includes localized or regional economic events (e.g., Chinese cash drying up in LA real estate), while Dataset A centers on broader national/global macroeconomic trends.",
      "Dataset B uses informal elements like hashtags (#markets), social media handles (@KristinReports), and conversational language, unlike Dataset A\u2019s formal headline structure.",
      "Dataset B reports company-specific operational updates (e.g., layoffs, mergers, product suspensions) beyond earnings, while Dataset A prioritizes earnings reports and stock price movements.",
      "Dataset B frequently references commodity price movements (e.g., oil, gold) with exact values, absent in Dataset A\u2019s focus on equity markets.",
      "Dataset B includes analyst actions with explicit price targets (e.g., 'raised to $335') and ratings changes, whereas Dataset A mentions analyst sentiment in general terms."
    ],
    "llama3.1-8b_few-shot_bg_test-time-info_v1": [
      "Dataset B includes headlines without explicit stock ticker symbols (e.g., general market updates like 'Oil futures fall below $50'), whereas Dataset A consistently uses tickers in all headlines.",
      "Dataset B features more headlines focused on macroeconomic commentary (e.g., 'The days of extraordinary market returns are over') compared to Dataset A, which ties macroeconomic indicators directly to stock/ETF impacts (e.g., '$SPY falls 1.2% amid escalating trade tensions').",
      "Dataset B contains headlines with non-financial social/political events (e.g., 'Free agent Antonio Brown says it's time to ghost white women') that are absent in Dataset A, which maintains a strict focus on corporate/financial developments.",
      "Dataset B includes more international regulatory actions (e.g., 'UK regulator orders recall of fire-risk Whirlpool washing machines') compared to Dataset A, which emphasizes U.S.-centric regulatory news (e.g., FDA actions).",
      "Dataset B frequently references dividend announcements as reassurances during poor performance (e.g., 'Suncor's dividend boost reassures analysts amid weak Q4'), while Dataset A highlights dividend changes as standalone strategic updates (e.g., 'Kimco Realty to Boost Dividend by 5%').",
      "Dataset B uses sensationalist language in headlines (e.g., 'Tesla Tops $900 in Parabolic Surge') more often than Dataset A, which employs neutral, fact-based phrasing for analyst actions and earnings.",
      "Dataset B incorporates consumer-focused surveys (e.g., '1 in 6 shoppers are still paying off last year's holiday debt') absent in Dataset A, which prioritizes institutional analyst perspectives.",
      "Dataset B includes explicit geopolitical event price reactions (e.g., 'Gold rises' tied to tensions) without always naming specific assets, whereas Dataset A directly links events to tickers (e.g., '$UNG up 15% amid rising demand').",
      "Dataset B covers cryptocurrency/blockchain developments (e.g., 'Ether Turns Negative For the Year') absent in Dataset A, which focuses strictly on traditional equity markets.",
      "Dataset B features mergers/acquisitions framed as growth strategies for sectors (e.g., 'M&As in vogue for luxury retailers seeking growth'), while Dataset A emphasizes specific corporate transactions (e.g., 'Enbridge Energy Partners Merge to Create $130B Company')."
    ],
    "qwen2.5-7b_few-shot_bg_test-time-info_v1": [
      "Dataset B includes headlines with non-English characters or multilingual content (e.g., Chinese text, emojis), while Dataset A headlines are uniformly in English without symbolic annotations.",
      "Dataset B contains promotional or non-news content (e.g., '\u2705Get the best deals...'), whereas Dataset A is strictly focused on factual financial updates.",
      "Dataset B references real-time market price movements (e.g., 'Feb. gold climbs $8.90...') more frequently, while Dataset A focuses on analyst-driven price targets or ratings.",
      "Dataset B incorporates social media elements (e.g., hashtags, mentions like '@KristinReports'), absent in Dataset A's formalized financial reporting style.",
      "Dataset B includes broader macroeconomic commentary (e.g., 'The days of extraordinary market returns are over...'), while Dataset A emphasizes granular company-specific financial metrics.",
      "Dataset B features editorial/opinion pieces (e.g., 'The FT View: Jair Bolsonaro...'), unlike Dataset A's neutral, analyst-driven tone.",
      "Dataset B covers non-corporate entities (e.g., geopolitical events, regulatory bans without tickers), while Dataset A ties all updates to specific companies/tickers.",
      "Dataset B includes URLs, truncated text (e.g., '...'), and incomplete sentences, whereas Dataset A uses complete, structured headlines.",
      "Dataset B highlights dividend declarations, stock splits, or M&A explorations without analyst attribution, while Dataset A explicitly links such updates to analyst actions.",
      "Dataset B integrates non-financial cultural/political events (e.g., 'Ruth Bader Ginsburg hospitalized...'), absent in Dataset A's strictly financial scope."
    ],
    "llama3.3-70b_few-shot_bg_test-time-info_v1": [
      "Dataset B headlines include non-financial news (e.g., regulatory actions, political events) unrelated to analyst ratings or earnings, while A strictly focuses on company-specific financial metrics.",
      "Dataset B contains truncated sentences, social media handles (@KristinReports), URLs, and non-English characters (\u2705), which are absent in A's polished headlines.",
      "Dataset B references cultural/pop culture events (e.g., Antonio Brown's comments, Super Bowl accessibility) impacting stocks, unlike A's purely financial focus.",
      "Dataset B features macroeconomic commentary (e.g., 'Cash is king,' oil demand trends) without direct ties to specific companies, whereas A links all analysis to tickers/firms.",
      "Dataset B includes non-corporate entities (e.g., countries like Lebanon, currencies like AUD/USD) as focal points, while A centers exclusively on companies/ETFs/indices.",
      "Dataset B headlines often lack granular financial metrics (e.g., 'EPS misses by $0.33' vs. A's 'beats analyst expectations with $X revenue') and consensus comparisons.",
      "Dataset B incorporates sensationalized language (e.g., 'parabolic surge,' 'ghost white women') absent in A's neutral, formulaic tone.",
      "Dataset B covers non-earnings corporate updates (e.g., dividend declarations, IPO debuts) without analyst context, whereas A ties all news to ratings/targets.",
      "Dataset B includes non-structured data (e.g., stock index price levels, '% up/down' stats) without explicit analyst attribution, unlike A's firm-centric framing.",
      "Dataset B headlines frequently address geopolitical risks (e.g., Brexit, Brazil\u2019s economy) as market drivers, while A focuses on company-specific operational risks."
    ],
    "llama3.1-8b_zero-shot_bg_test-time-info_v1": [
      "Dataset B includes non-financial news headlines (e.g., celebrity gossip, political events) absent in Dataset A, which focuses strictly on company/stock-specific updates.",
      "Dataset B contains headlines with non-English entities or global geopolitical events (e.g., U.K. elections, Brazil\u2019s economy), whereas Dataset A emphasizes U.S.-centric corporate actions.",
      "Dataset B features commodity price updates (e.g., gold, oil) as standalone headlines, while Dataset A ties commodities to stock/ETF performance (e.g., $XLE).",
      "Dataset B uses informal language, hashtags, or social media-style formatting (e.g., 'Continue reading: #markets'), unlike Dataset A\u2019s formal, structured headlines.",
      "Dataset B includes non-corporate regulatory actions (e.g., FDA vaping bans, London Uber license revocation) without explicit stock tickers, unlike Dataset A\u2019s company-specific regulatory mentions.",
      "Dataset B headlines often lack analyst attribution (e.g., 'Oil Prices' Hard Landing...') compared to Dataset A, which consistently cites firms like Morgan Stanley or Oppenheimer.",
      "Dataset B covers macroeconomic surveys or consumer trends (e.g., holiday debt, unemployment claims) without linking them to specific equities, unlike Dataset A\u2019s direct stock-impact framing.",
      "Dataset B includes dividend declarations, merger updates, or earnings results without analyst commentary (e.g., 'BJ's Wholesale Q3 adj. EPS 41 cents'), whereas Dataset A integrates analyst reactions.",
      "Dataset B references non-equity financial instruments (e.g., forex pairs like AUD/USD) absent in Dataset A, which focuses on stocks/ETFs.",
      "Dataset B occasionally features trivial or non-actionable market data (e.g., '9:45 am...$SPY 272.99') lacking in Dataset A\u2019s analytical depth."
    ],
    "qwen2.5-7b_few-shot_bg_train-time-info_v1": [
      "Dataset B includes non-English characters, emojis, or social media-style elements (e.g., \u2705, hashtags, truncated text), while Dataset A uses formal, standardized language without such features.",
      "Headlines in Dataset B frequently reference macroeconomic indicators (e.g., gold prices, retail sales, oil reserves) or geopolitical events without tying them to specific companies, whereas Dataset A always links such impacts to explicit corporate entities or tickers.",
      "Dataset B contains headlines about non-equity assets (e.g., commodities like gold, currencies like AUD/USD) and broader market trends, while Dataset A focuses exclusively on equities and company-specific financial metrics.",
      "Dataset B includes opinion pieces, educational content, or analytical summaries (e.g., 'Here's Why You Need To Hedge...'), whereas Dataset A headlines are strictly factual reports of analyst actions or corporate financial results.",
      "Some headlines in Dataset B lack explicit ticker symbols or omit company names entirely (e.g., 'Another coal-fired power plant closes...'), while all Dataset A headlines include tickers prefixed with '$'.",
      "Dataset B features truncated or incomplete headlines (e.g., 'Caterpillar Inc.'s 2020 profit outlook trailed analysts' estimates, with the heavy-equipment saying continued econo...'), while Dataset A maintains complete, polished sentences.",
      "Dataset B incorporates social media or conversational language (e.g., '$AIKI my position remains intact. Buying dips.'), unlike Dataset A\u2019s formal, third-person reporting tone.",
      "Regulatory or non-corporate news in Dataset B (e.g., FDA vaping bans, Uber\u2019s London license) often lacks direct ties to stock-specific impacts, whereas Dataset A always connects regulatory events to company performance or ratings.",
      "Dataset B includes non-earnings financial data (e.g., dividend declarations, retail sales figures, consumer confidence indices) as standalone topics, while Dataset A centers on earnings reports, guidance, and analyst actions.",
      "Dataset B references non-financial events (e.g., celebrity statements, political developments) with indirect market implications, whereas Dataset A headlines are strictly financial in scope."
    ],
    "llama3.3-70b_zero-shot_bg_test-time-info_v1": [
      "Dataset B headlines frequently include non-analyst driven content such as geopolitical events, regulatory actions unrelated to analyst ratings, and consumer trends (e.g., coal plant closures, vaping bans, holiday debt), while Dataset A focuses strictly on analyst actions (upgrades/downgrades) and corporate financial metrics.",
      "Dataset B incorporates non-standard formatting elements like emojis (\u2705), hashtags (#markets), and social media references (@KristinReports), whereas Dataset A maintains a uniform, formal structure without visual or social media markers.",
      "Dataset B includes macroeconomic or commodity-specific numerical data (e.g., gold prices, oil futures) and broader economic indicators (retail sales), while Dataset A emphasizes company-specific metrics (revenue, stock prices, earnings targets).",
      "Dataset B features headlines with forward-looking opinions or speculative statements (e.g., \"The days of extraordinary market returns are over\"), whereas Dataset A focuses on factual, time-sensitive updates (earnings releases, conference dates).",
      "Dataset B covers non-corporate entities (e.g., governments, regulatory bodies) and non-public companies (e.g., Uber\u2019s license issues), while Dataset A exclusively references publicly traded companies and their tickers.",
      "Dataset B includes headlines structured as advice or analysis pieces (e.g., \"Here's Why You Need To Hedge Your Stock Investments\"), whereas Dataset A maintains a strictly informational, news-driven tone.",
      "Dataset B references non-earnings corporate events (lawsuits, mergers, product recalls) unrelated to financial performance, while Dataset A centers on earnings reports, price targets, and analyst ratings.",
      "Dataset B contains headlines with fragmented formatting (line breaks, ellipses) and incomplete sentences, while Dataset A uses complete, concise sentences typical of financial journalism.",
      "Dataset B mentions non-equity financial instruments (e.g., futures, ETFs, commodities) more frequently, whereas Dataset A focuses on individual stocks and sector-specific equities.",
      "Dataset B includes international or non-U.S.-centric content (e.g., Brazil\u2019s economy, European shares) more prominently, while Dataset A emphasizes U.S. companies and analyst firms (Morgan Stanley, Barclays)."
    ]
  }
}