{
  "sims": {
    "google_maps": {
      "nnetnav_live_site=google_maps_num_tasks=75_portion=2": [
        "Tasks involve searching for specific businesses or services (e.g., restaurants, hotels, stores).",
        "Tasks require filtering results by location (e.g., zip codes, cities, landmarks).",
        "Tasks include route planning (e.g., driving, walking, transit directions between points).",
        "Tasks involve checking real-time availability or status (e.g., open now, traffic conditions).",
        "Tasks require sorting or prioritizing results by criteria (e.g., highest ratings, proximity).",
        "Tasks focus on extracting detailed attributes (e.g., pricing, amenities, operating hours).",
        "Tasks include multi-step actions (e.g., search \u2192 filter \u2192 book/share/compare).",
        "Tasks involve transactional actions (e.g., reservations, bookings, sharing links).",
        "Tasks target accessibility or accessibility-related features (e.g., wheelchair access, transit stops).",
        "Tasks require parsing location-specific information (e.g., addresses, coordinates, public transit details)."
      ],
      "nnetnav_live_site=google_maps_num_tasks=75_portion=3": [
        "Tasks involve searching for specific locations or services using geographic parameters like city names, zip codes, or landmarks.",
        "Users must filter results by criteria such as ratings, price, hours of operation, or accessibility (e.g., wheelchair-friendly).",
        "Multi-step actions are required, such as locating a place followed by retrieving directions, reviews, or booking options.",
        "Navigation tasks require route planning between two or more points (e.g., driving, walking, public transit).",
        "Proximity-based queries (e.g., 'nearest,' 'closest') are central to many tasks.",
        "Tasks demand extraction of quantitative or qualitative details (e.g., number of results, amenities, reviews).",
        "Dynamic data like real-time traffic conditions, open/closed status, or availability checks are implied.",
        "Users interact with contextual filters (e.g., 'open now,' 'not 24 hours') to refine searches.",
        "Tasks involve cross-referencing multiple data points (e.g., combining location, reviews, and pricing).",
        "Actions extend beyond search to include sharing, saving, or booking (e.g., hotel reservations, ride bookings)."
      ],
      "nnetnav_live_site=google_maps_num_tasks=75_portion=1": [
        "Both datasets require users to perform location-based searches using specific geographic identifiers (e.g., zip codes, city names, landmarks).",
        "Tasks in both datasets involve route planning between multiple points (e.g., cities, landmarks, or custom coordinates).",
        "Both include tasks that filter results by criteria like business type, accessibility features, ratings, price range, or operating hours.",
        "Real-time or dynamic information retrieval is required (e.g., availability checks, current traffic conditions, open/closed status).",
        "Tasks necessitate interaction with user reviews, ratings, or popularity metrics to prioritize results.",
        "Both datasets involve multi-modal transportation options (e.g., walking, driving, transit) for route planning.",
        "Tasks require verifying or extracting detailed place attributes (e.g., parking availability, amenities, accessibility features).",
        "Both include tasks that demand parsing or synthesizing layered information (e.g., combining weather, traffic, and business details).",
        "Tasks involve iterative refinement of search results (e.g., filtering, sorting, or paginating through options).",
        "Both datasets require outputting structured details from map data (e.g., addresses, distances, time estimates, business names)."
      ],
      "nnetnav_live_site=google_maps_num_tasks=75_portion=0": [
        "Both datasets involve tasks requiring location-based searches (e.g., businesses, landmarks, routes).",
        "Tasks in both datasets require route planning between two or more geographic points.",
        "Both include filtering results by specific criteria (e.g., ratings, price ranges, accessibility features, real-time availability).",
        "Tasks demand retrieving operational details (e.g., hours, parking availability, amenities).",
        "Both datasets require users to parse and compare multiple search results (e.g., hotels, restaurants).",
        "Tasks involve validating real-time or dynamic information (e.g., traffic conditions, open/closed status).",
        "Both require interacting with user-generated content (e.g., reviews, ratings, photos).",
        "Tasks involve booking or reservation workflows with constraints (e.g., dates, budgets, guest counts).",
        "Both datasets include tasks requiring navigation metadata (e.g., distance, travel time, transport modes).",
        "Tasks in both datasets rely on map-based exploration to verify spatial relationships (e.g., proximity, routes)."
      ],
      "nnetnav_live_site=google_maps_num_tasks=75_portion=4": [
        "Tasks involve searching for specific businesses or points of interest (e.g., restaurants, hotels, stores) with contextual filters like location, ratings, or operational status.",
        "Navigation tasks require route planning between two or more locations using transportation modes (e.g., driving, walking, transit).",
        "Users frequently filter results by geographic proximity (e.g., 'near me,' zip codes, landmarks).",
        "Tasks demand verification of real-time or dynamic information (e.g., open hours, traffic conditions, availability).",
        "Queries include accessibility or usability criteria (e.g., wheelchair accessibility, parking availability).",
        "Users seek granular details about locations, such as amenities (e.g., parking lots), price ranges, or user reviews.",
        "Tasks often involve multi-step actions (e.g., search \u2192 filter \u2192 compare \u2192 retrieve details).",
        "There is a focus on comparative analysis (e.g., 'highest-rated,' 'shortest route,' 'most affordable').",
        "Tasks require parsing structured map data (e.g., labels, distances, travel times, elevation profiles).",
        "Users interact with location-based metadata (e.g., photos, menus, reservation systems, sharing links)."
      ]
    },
    "github": {
      "nnetnav_live_site=github_num_tasks=71_portion=3": [
        "Tasks require comparing features and pricing between GitHub plans (e.g. Free vs Pro, Copilot tiers)",
        "Navigation involves identifying repository attributes like stars, programming language, and update recency",
        "Tasks require locating product-specific pages (Copilot, Advanced Security, Codespaces)",
        "Users need to access security-related information (advisories, vulnerabilities, scanning features)",
        "Tasks involve finding customer success stories/case studies from organizations",
        "Navigation requires understanding GitHub's organizational structure (policies, terms, compliance)",
        "Tasks demand accessing educational resources (student packs, teacher tools, documentation)",
        "Users must locate system status/incident history pages for reliability checks",
        "Tasks require interaction with project management features (Issues, Projects, workflow automation)",
        "Navigation involves comparing AI-powered development tools versus traditional features"
      ],
      "nnetnav_live_site=github_num_tasks=71_portion=2": [
        "Tasks require locating repositories filtered by programming language, stars, or activity recency",
        "Tasks involve comparing pricing plans (e.g., Free vs Pro, Copilot tiers)",
        "Tasks require identifying GitHub Copilot features and implementation details",
        "Tasks demand navigation to customer success stories/case studies",
        "Tasks involve security-related navigation (advisories, Dependabot, Advanced Security)",
        "Tasks require API documentation lookup (REST API vs GraphQL)",
        "Tasks involve enterprise solution exploration (pricing quotes, Advanced Security)",
        "Tasks require open-source project analysis (contributors, licenses, popularity metrics)",
        "Tasks involve account management actions (sign-up flows, enterprise plan enrollment)",
        "Tasks require identification of trending/popular content (top repositories, ranked developers)"
      ],
      "nnetnav_live_site=github_num_tasks=71_portion=0": [
        "Tasks require locating repositories with specific criteria (stars, language, recency)",
        "Tasks involve comparing pricing tiers or plan features (e.g. Free vs Pro)",
        "Tasks require identifying product features from marketing/content pages",
        "Tasks involve security-related information gathering (vulnerabilities, compliance, data policies)",
        "Tasks require navigation between core platform sections (Copilot, Actions, Issues, Pricing)",
        "Tasks involve analyzing repository metadata (commits, releases, contributors)",
        "Tasks require understanding AI tool capabilities (Copilot features, extensions, autofix)",
        "Tasks involve account management actions (sign-up, configuration, policy review)",
        "Tasks require finding educational/developer resources (documentation, guides, tutorials)",
        "Tasks involve filtering/search mechanisms (language filters, date ranges, popularity sorts)"
      ],
      "nnetnav_live_site=github_num_tasks=71_portion=4": [
        "Tasks require navigating GitHub's repository search functionality",
        "Tasks involve comparing GitHub pricing plans and features",
        "Tasks focus on locating GitHub Copilot product information",
        "Tasks necessitate accessing GitHub's customer stories section",
        "Tasks require filtering repositories by programming language and stars",
        "Tasks involve finding security-related information and CVEs",
        "Tasks require understanding GitHub's account management features",
        "Tasks involve exploring GitHub's documentation and API resources",
        "Tasks require identifying repository maintenance patterns (last updated dates)",
        "Tasks involve analyzing GitHub's project management tools (Issues, Projects)"
      ],
      "nnetnav_live_site=github_num_tasks=71_portion=1": [
        "Tasks require locating and interpreting repository metadata (stars, creation date, language) for open-source projects.",
        "Tasks involve comparing features or limitations across different GitHub service tiers (Free vs Pro vs Enterprise).",
        "Tasks necessitate navigating GitHub's pricing pages to identify plan-specific capabilities.",
        "Tasks require searching/filtering repositories by programming language, topic tags, or update recency.",
        "Tasks involve extracting information from GitHub product pages (Copilot features, security tools, project management capabilities).",
        "Tasks require identifying technical documentation elements (wiki pages, release notes, issue trackers) within repositories.",
        "Tasks involve analyzing GitHub's marketing content (customer stories, benchmark statistics, product comparisons).",
        "Tasks require interaction with GitHub's authentication flows (sign-in/sign-up elements) for account-specific features.",
        "Tasks necessitate understanding GitHub's organizational structure (repository vs organization vs enterprise account levels).",
        "Tasks involve temporal filtering constraints (last week/month/year) for repository activity and feature updates"
      ]
    },
    "espn": {
      "nnetnav_live_site=espn_num_tasks=62_portion=0": [
        "Tasks focus on retrieving real-time or recent sports scores and game outcomes across leagues like NBA, NFL, and NCAA.",
        "Navigation requires accessing structured league-specific sections (e.g., NBA, NHL, NCAAW) for scores, standings, or schedules.",
        "Users frequently search for player-specific statistics (e.g., points, rebounds, passing yards) within games or seasons.",
        "Tasks involve identifying broadcast/streaming details (e.g., ESPN+, TNT) for live events or replays.",
        "Both datasets emphasize locating postseason/tournament information (e.g., Final Four, CFP brackets, Copa del Rey).",
        "Users interact with dynamic content like live game clocks, quarter/period updates, and in-progress scoreboards.",
        "Navigation includes cross-referencing team records (e.g., 30-5, 12-2) and head-to-head matchup histories.",
        "Tasks require parsing event metadata (e.g., game times, postponements, final/OT status) for accuracy.",
        "Users utilize search functionalities to filter teams by location, name keywords (e.g., 'Los Angeles', 'Golden'), or league affiliation.",
        "Both datasets involve accessing injury reports, roster details, and depth charts for fantasy sports or betting contexts."
      ],
      "nnetnav_live_site=espn_num_tasks=62_portion=4": [
        "Both datasets involve navigation tasks focused on retrieving live scores, game results, and team standings across multiple sports leagues.",
        "Tasks require interaction with dynamic content such as real-time game updates, box scores, and player statistics.",
        "Users must navigate through hierarchical menus to access specific sports categories (e.g., NBA, NHL, NCAA).",
        "Search functionality is critical for locating teams, players, or articles in both datasets.",
        "Navigation paths include accessing playoff brackets, tournament challenges, and postseason tracking features.",
        "Tasks involve identifying injured players or depth charts within team-specific pages.",
        "Both datasets require interaction with fantasy sports sections, including bracket predictions and player performance analysis.",
        "Users must parse multimedia content links (e.g., ESPN Radio, podcasts) embedded in sports articles.",
        "Navigation includes filtering by date/time for recent games or upcoming schedules.",
        "Tasks necessitate cross-referencing league standings with team performance metrics in both datasets."
      ],
      "nnetnav_live_site=espn_num_tasks=62_portion=1": [
        "Both datasets involve navigation tasks focused on retrieving real-time sports scores and game results across multiple leagues (NBA, NFL, NHL, NCAA, etc.).",
        "Tasks in both datasets require interaction with dynamic content such as live game updates, standings, and player statistics.",
        "Users must locate and interpret event-specific details (e.g., final scores, game times, team rankings) from structured lists or modules.",
        "Navigation paths involve accessing league-specific sections (e.g., NBA, EPL, Serie A) via categorized menus or headings.",
        "Both datasets include tasks that require identifying team abbreviations/logos and mapping them to full team names.",
        "Tasks frequently involve filtering content by date (e.g., 'latest games,' 'past 2 days') or competition stage (e.g., playoffs, finals).",
        "Users must interact with clickable elements (e.g., 'Gamecast,' 'Box Score') to retrieve granular match details.",
        "Both datasets emphasize cross-sport navigation (e.g., switching between basketball, football, hockey) within a single platform.",
        "Tasks require parsing composite data formats (e.g., scoreboards with team logos, abbreviated stats, time-sensitive statuses like 'Final/OT').",
        "Both involve accessing supplementary content like news articles, injury reports, or MVP predictions linked to specific games/players."
      ],
      "nnetnav_live_site=espn_num_tasks=62_portion=3": [
        "Both datasets focus on retrieving real-time sports scores and game results across multiple leagues (NBA, NFL, NHL, NCAA).",
        "Tasks require navigation to team-specific pages for details like rosters, standings, and player statistics.",
        "Users must interact with league-specific sections (e.g., NBA, NCAAF) to find schedules and matchups.",
        "Both involve accessing post-game details such as top performers, final scores, and game highlights.",
        "Tasks necessitate using search functionalities to locate teams/players by keywords (e.g., 'Golden,' 'Los Angeles').",
        "Navigation to playoff/bracket information and tournament challenges (e.g., March Madness) is common in both.",
        "Both require filtering temporal data (e.g., 'latest games,' 'past 2 days,' 'yesterday's matchups').",
        "Users must compare team/player stats across games or seasons in both datasets.",
        "Interaction with multimedia content (podcasts, live streams, ESPN+ tools) is required in tasks from both sets.",
        "Both datasets include tasks that involve parsing standings, conference rankings, and divisional hierarchies."
      ],
      "nnetnav_live_site=espn_num_tasks=62_portion=2": [
        "Tasks involve locating real-time or recent game scores across multiple sports leagues (e.g., NBA, NFL, NHL, NCAA).",
        "Navigation requires accessing team-specific pages for standings, schedules, or player stats (e.g., Buffalo Bills, Lakers, Red Sox).",
        "Users frequently search for league-specific structures like divisions, conferences, or playoff brackets (e.g., AFC East, Final Four).",
        "Tasks demand interaction with dynamic content such as live updates, play-by-play summaries, or gamecast data.",
        "Navigation includes filtering results by date (e.g., 'latest games,' 'next match,' 'Week 16 scores').",
        "Users retrieve player-specific performance metrics (e.g., Josh Allen\u2019s stats, Kevin Durant\u2019s trades).",
        "Tasks involve cross-referencing multiple data points (e.g., comparing team records, aggregating injury reports).",
        "Navigation paths include accessing fantasy sports tools or tournament challenges (e.g., bracket checks, fantasy stats).",
        "Users seek broadcast/streaming information (e.g., 'Where to Watch,' ESPN+ features, channel listings).",
        "Tasks require parsing structured data formats like scoreboards, standings tables, or play-by-play logs."
      ]
    },
    "huggingface": {
      "nnetnav_live_site=huggingface_num_tasks=76_portion=1": [
        "Tasks require searching/filtering models/datasets by criteria like recency, popularity, or domain (e.g. NLP, translation)",
        "Navigation involves interacting with documentation pages for API usage (e.g. Trainer API, Inference API)",
        "Tasks require identifying model metadata like update dates, license types, and evaluation metrics",
        "Users need to compare model performance statistics (download counts, likes, metrics)",
        "Tasks involve finding technical specifications like model architectures, tensor types, or hardware requirements",
        "Navigation requires accessing GitHub repositories and understanding code integration (e.g. Colab, Transformers.js)",
        "Tasks focus on exploring multimodal capabilities (text, image, audio) across different model types",
        "Users must locate licensing information and commercial usage terms for models/datasets",
        "Tasks require interacting with community features (forums, blog posts, Spaces applications)",
        "Navigation involves understanding model deployment options (GPU usage, Inference Endpoints, pricing)"
      ],
      "nnetnav_live_site=huggingface_num_tasks=76_portion=0": [
        "Tasks require searching for models/datasets using specific criteria (license, modality, recency, performance metrics).",
        "Users must navigate documentation to find API usage, configuration parameters, or training methodologies.",
        "Tasks involve comparing models/datasets based on popularity (downloads/likes) or technical specifications.",
        "Users need to interpret model/dataset metadata (update dates, creator info, size, tensor types).",
        "Tasks require interaction with Hugging Face ecosystem components (Inference API, Spaces, Transformers library).",
        "Users must identify licensing information and usage restrictions for models/datasets.",
        "Tasks involve connecting technical specifications to real-world applications (medical NLP, fake news detection).",
        "Users need to cross-reference research papers/arXiv entries with implementations on the platform.",
        "Tasks require understanding of model evaluation methodologies and benchmark performance metrics.",
        "Users must navigate between organizational profiles and their contributed resources (models/datasets)"
      ],
      "nnetnav_live_site=huggingface_num_tasks=76_portion=4": [
        "Tasks require searching for models/datasets by specific criteria (license, task type, recency, popularity).",
        "Navigation involves using the Inference API for text/image generation or similarity checks.",
        "Tasks involve extracting model/documentation details (parameters, metrics, update dates, usage instructions).",
        "Users need to locate enterprise-related features (pricing tiers, security controls, support options).",
        "Tasks require identifying trending/community metrics (likes, downloads, GitHub stars, follower counts).",
        "Navigation includes filtering by technical specifications (model size, tensor type, precision levels).",
        "Tasks involve troubleshooting API/configuration errors (e.g., 'Task not found' errors, parameter tuning).",
        "Users must compare model performance through stated evaluation metrics and benchmarks.",
        "Navigation focuses on multimodal AI capabilities (text, image, audio, 3D processing).",
        "Tasks require accessing/understanding open-source library implementations (Transformers, PEFT, TRL)."
      ],
      "nnetnav_live_site=huggingface_num_tasks=76_portion=2": [
        "Tasks involve searching for specific models/datasets using platform navigation and search functionality",
        "Users need to retrieve technical details like model metrics, update dates, and evaluation results",
        "Navigation requires accessing documentation for libraries (e.g., Transformers, Diffusers) and APIs",
        "Tasks require understanding of model licensing information and commercial use restrictions",
        "Users must interact with trending/new content sections to find recent publications",
        "Tasks involve comparing model performance statistics and popularity metrics (downloads/likes)",
        "Navigation paths require understanding of organizational structure (Models/Datasets/Spaces tabs)",
        "Users need to locate and interpret API parameters for inference/training tasks",
        "Tasks require cross-referencing between model cards, documentation, and external research papers",
        "Navigation involves handling filtered searches (by modality, task type, or technical requirements)"
      ],
      "nnetnav_live_site=huggingface_num_tasks=76_portion=3": [
        "Tasks require navigation through multiple hierarchical menu items (Models, Datasets, Spaces, Docs)",
        "Users must interact with search functionality to locate specific ML models/datasets",
        "Tasks involve extracting metadata from model/dataset cards (update dates, metrics, licenses)",
        "Navigation requires understanding of domain-specific categorization (NLP, ASR, text generation)",
        "Users need to interpret technical documentation about ML frameworks/APIs",
        "Tasks require comparison of model performance metrics across different versions",
        "Interaction with API demonstration interfaces (Inference API) is essential",
        "Navigation patterns involve alternating between model repositories and documentation",
        "Tasks demand understanding of licensing information and usage restrictions",
        "Users must parse complex model identifiers with organization/version components"
      ]
    },
    "coursera": {
      "nnetnav_live_site=coursera_num_tasks=72_portion=3": [
        "Tasks require filtering courses by criteria like duration, level (e.g., beginner), and completion certificates",
        "Users seek course details including instructor names, institutions, and learning outcomes/skills gained",
        "Navigation involves finding domain-specific courses (e.g., AI ethics, sustainability, Python programming)",
        "Tasks require comparing/identifying Specializations with project components and partner institution credentials",
        "Users look for pricing information and promotional discounts (e.g., Coursera Plus annual plans)",
        "Tasks involve verifying partner universities/companies and their country associations",
        "Queries focus on career-oriented credentials (professional certificates) from specific providers like Google/IBM",
        "Users validate course quality through star ratings and review percentages (e.g., 4.5+ stars)",
        "Tasks require extracting quantitative metadata: hours to complete, quiz counts, and salary/job opening statistics",
        "Navigation patterns emphasize discovering role-specific learning paths (e.g., Data Analyst, Cybersecurity Analyst)"
      ],
      "nnetnav_live_site=coursera_num_tasks=72_portion=2": [
        "Tasks require searching for courses by specific topics (e.g., Python, AI, Data Science, Cybersecurity) using structured queries.",
        "Users must filter results by criteria like skill level (beginner/intermediate), duration ranges (1-3 months), and certification availability.",
        "Tasks involve validating course attributes such as star ratings (4+ stars), completion time (<20 hours), and credential types (Professional Certificates).",
        "Navigation requires identifying partner institutions/companies (e.g., Google, IBM, Yale) and their course offerings.",
        "Users must locate pricing details for subscription models like Coursera Plus and promotional discounts.",
        "Tasks demand cross-referencing career outcomes (median salaries, job openings) with role-specific credentials.",
        "Navigation involves verifying instructor credentials, institutional affiliations, and related course offerings.",
        "Users need to distinguish between course types (Specializations, Degrees, Projects) and their learning formats.",
        "Tasks require accessing detailed syllabus components like assessments (quiz counts), hands-on projects, and tool coverage (Excel, Tableau).",
        "Navigation includes identifying free course availability and financial aid eligibility criteria across subjects."
      ],
      "nnetnav_live_site=coursera_num_tasks=72_portion=4": [
        "Users must filter courses by skill level (e.g., beginner, intermediate) to meet task requirements",
        "Tasks require identifying courses with specific technical focus areas like Python, JavaScript, or AI ethics",
        "Navigation involves verifying credential offerings (certificates, professional certs) for courses/specializations",
        "Price comparison tasks exist for subscription models like Coursera Plus with discount tracking",
        "Duration filtering is critical (e.g., <20 hours, 1-3 month ranges)",
        "Star rating verification required (4+ stars, 4.5+ thresholds)",
        "University/company partnership identification needed for course credibility checks",
        "Career outcome data analysis required (median salaries, job availability statistics)",
        "Free course discovery paths with certification options must be navigable",
        "Detailed course component verification needed (modules, quizzes, projects, instructor bios)"
      ],
      "nnetnav_live_site=coursera_num_tasks=72_portion=0": [
        "Tasks require filtering courses by skill level (e.g., beginner, advanced) across both datasets",
        "Users must locate courses with specific duration ranges (e.g., <20 hours, 1-3 months) in both datasets",
        "Both require identifying courses/programs offering completion certificates",
        "Tasks involve searching for courses by specific technical skills (Python, JavaScript, Data Analysis)",
        "Both datasets require comparing course ratings (4+ stars, 4.5+ ratings)",
        "Users need to extract instructor information and institutional affiliations",
        "Tasks require identifying partner universities/companies in course listings",
        "Both involve checking for career-relevant outcomes (skills developed, salary data)",
        "Users must navigate between different program types (Courses, Specializations, Degrees)",
        "Tasks require price comparison features (Coursera Plus discounts, free vs paid options)"
      ],
      "nnetnav_live_site=coursera_num_tasks=72_portion=1": [
        "Tasks require filtering courses by skill level (e.g. beginner/intermediate)",
        "Users search for credentials with specific program types like Professional Certificates or Specializations",
        "Navigation involves comparing duration ranges (1-3 months, <20 hours) across courses",
        "Tasks require identifying institution/company partnerships offering courses",
        "Users validate course quality through star ratings (4+, 4.5+ thresholds)",
        "Navigation patterns include checking certification availability upon completion",
        "Tasks involve matching courses to specific subject areas (e.g. AI Ethics, Sustainability)",
        "Users search for instructor credentials and alternative courses they offer",
        "Navigation requires comparing pricing models including discounts/Coursera Plus",
        "Tasks involve cross-referencing career outcomes with role-specific salary/job data"
      ]
    },
    "arxiv": {
      "nnetnav_live_site=arxiv_num_tasks=80_portion=1": [
        "Both datasets require users to search for academic papers using specific categories or subject codes (e.g., cs.CL, quant-ph)",
        "Tasks involve filtering results by submission date ranges (e.g., 'within last week' or '2023')",
        "Users need to locate and extract metadata such as author affiliations, submission dates, and paper versions",
        "Navigation requires understanding arXiv's hierarchical subject structure and subfield abbreviations",
        "Tasks demand accessing and interpreting technical content like abstracts, formulas, and methodology sections",
        "Both involve cross-referencing between arXiv categories and external academic platforms/institutions",
        "Users must handle multiple paper formats (HTML, PDF) and compare content across formats",
        "Tasks require using advanced search parameters including title, abstract, and specific field searches",
        "Navigation patterns involve recursive exploration of related papers through citation/reference chains",
        "Both datasets include verification tasks requiring comparison of metadata across different paper versions"
      ],
      "nnetnav_live_site=arxiv_num_tasks=80_portion=4": [
        "Both datasets require searching for papers using specific categories/subcategories (e.g., cs.CL, quant-ph)",
        "Tasks involve accessing recent submissions through time-bound filters (last week, date ranges)",
        "Navigation requires interacting with hierarchical subject archives (e.g., Physics \u2192 Condensed Matter \u2192 Mesoscale Physics)",
        "Users must locate paper metadata including abstracts, submission dates, and version history",
        "Both require handling advanced search parameters (title keywords, author names, arXiv identifiers)",
        "Tasks involve format conversion/access (HTML vs. PDF views of papers)",
        "Both utilize category-specific abbreviations (astro-ph.CO, math.NA) for navigation",
        "Requires cross-referencing author affiliations and institutional links",
        "Tasks demand interpretation of bibliographic details (reference numbering, citation counts)",
        "Both involve accessing supplemental materials/source code through paper landing pages"
      ],
      "nnetnav_live_site=arxiv_num_tasks=80_portion=0": [
        "Tasks require search functionality with field-specific filters (e.g., title/author/abstract)",
        "Navigation involves hierarchical subject categories with subfield abbreviations (e.g., cs.CL, nlin.CD)",
        "Tasks require date-range filtering for recent submissions (last week/yesterday/specific dates)",
        "Users must locate and interpret paper metadata (authors, submission dates, versions)",
        "Tasks involve cross-referencing between category listings and search results",
        "Requires understanding of arXiv's document formats (PDF/HTML) and format-specific content",
        "Tasks demand extraction of quantitative details from papers (formula counts, figure/table numbers)",
        "Navigation patterns involve category-specific \"recent submissions\" lists and search result filtering",
        "Tasks require identification of institutional affiliations through author metadata",
        "Users must distinguish between arXiv's primary research content and supplementary materials/help pages"
      ],
      "nnetnav_live_site=arxiv_num_tasks=80_portion=2": [
        "Require search functionality using titles, keywords, or subject categories",
        "Involve navigation through hierarchical subject archives (e.g., Physics > Condensed Matter)",
        "Require access to paper metadata including submission dates and version history",
        "Need extraction of specific content sections like abstracts, introductions, or methodologies",
        "Involve format-based interactions (PDF download, HTML view)",
        "Require filtering by date ranges (last week, specific years)",
        "Depend on author information retrieval (affiliations, first author identification)",
        "Include cross-category comparison tasks (specific archive vs. all archives)",
        "Require quantitative analysis of paper components (formulas, figures, tables)",
        "Involve verification of paper existence or category classification"
      ],
      "nnetnav_live_site=arxiv_num_tasks=80_portion=3": [
        "Both datasets involve navigation tasks requiring search functionality with field-specific filters (e.g., title, author, abstract).",
        "Tasks in both datasets require interaction with subject-specific categories and subcategories (e.g., Physics, Computer Science subfields).",
        "Users must locate and interpret submission dates or version history of papers in both datasets.",
        "Both require accessing paper metadata such as author affiliations, abstracts, and reference sections.",
        "Navigation tasks involve filtering results by date ranges (e.g., 'submitted within the last week').",
        "Users need to identify arXiv category abbreviations (e.g., cs.CL, quant-ph) in both datasets.",
        "Tasks require distinguishing between HTML and PDF formats for paper access in both datasets.",
        "Both datasets involve cross-referencing help documentation (e.g., submission guidelines, category descriptions).",
        "Users must parse hierarchical subject structures (e.g., 'Nonlinear Sciences > Chaotic Dynamics') in both datasets.",
        "Tasks in both datasets require comparing search results across specific archives versus all archives."
      ]
    },
    "bbc": {
      "nnetnav_live_site=bbc_num_tasks=69_portion=2": [
        "Tasks require navigating through categorized content sections (e.g., News, Sport, Business, Culture)",
        "Users must locate timestamped articles (e.g., '3 hrs ago') to verify recency of information",
        "Both datasets involve extracting key details from article headlines and summaries",
        "Navigation requires interaction with hierarchical menus (main categories \u2192 subcategories \u2192 articles)",
        "Tasks involve identifying regional/country-specific news sections (e.g., Middle East, Asia, Europe)",
        "Users must distinguish between different media formats (text articles vs videos vs podcasts)",
        "Both require filtering content by specific tags/categories (e.g., 'War', 'Innovation', 'Football')",
        "Tasks demand recognition of editorial sections (e.g., 'BBC InDepth', 'Most Watched', 'Special Reports')",
        "Users need to parse article metadata including timestamps and topic classifications",
        "Both involve cross-referencing multiple content sections to complete composite tasks"
      ],
      "nnetnav_live_site=bbc_num_tasks=69_portion=3": [
        "Tasks require locating recent articles within specific time frames (e.g., 'within the last two days', 'today', '3 hrs ago').",
        "Users must navigate hierarchical category structures (e.g., Sports \u2192 Football, Business \u2192 Stock Market, Travel \u2192 Destination guides).",
        "Queries demand identification and summarization of key points from news articles with clear headings/paragraph structures.",
        "Tasks involve finding multimedia content through dedicated sections (e.g., Video, Audio, Podcasts) or embedded media in articles.",
        "Navigation requires interaction with timestamped content blocks for recency validation (e.g., '2 hrs ago', '21 mins ago').",
        "Users must parse geographic/regional sections (e.g., Asia, Europe, US & Canada) for location-specific queries.",
        "Tasks rely on consistent labeling of thematic sections (e.g., 'MOST READ', 'OTHER TOP STORIES', 'SPORT') for orientation.",
        "Queries involve filtering by topic tags (e.g., 'Business', 'Climate', 'Culture') appended to article previews.",
        "Tasks require parsing article metadata (e.g., headlines, images with captions, timestamps) to verify relevance.",
        "Users must cross-reference content between sections (e.g., linking a video in 'MOST WATCHED' to its related article)."
      ],
      "nnetnav_live_site=bbc_num_tasks=69_portion=1": [
        "Tasks require navigating hierarchical website structures (e.g., Home > Section > Article)",
        "Both involve time-sensitive information retrieval (e.g., 'latest', 'recent', 'today')",
        "Users must locate content within specific categories (e.g., Sport, Business, Technology)",
        "Tasks require understanding of geographic news categorization (e.g., Middle East, Europe)",
        "Both require identification of multimedia content (podcasts, videos, images)",
        "Tasks involve searching/filtering by temporal parameters (e.g., 'within last two days')",
        "Users need to parse article metadata (timestamps, categories, author credits)",
        "Both require cross-section navigation (e.g., finding climate change content across multiple categories)",
        "Tasks demand content summarization from article bodies",
        "Both involve interaction with featured/highlighted content modules (e.g., 'Top Stories', 'Most Read')"
      ],
      "nnetnav_live_site=bbc_num_tasks=69_portion=0": [
        "Tasks require navigating through categorized sections (e.g., News, Sport, Business) to locate specific content.",
        "Users must identify and filter time-sensitive information (e.g., 'latest news,' 'within the last two days').",
        "Tasks involve summarizing key points from articles or multimedia content (e.g., podcasts, videos).",
        "Both datasets include tasks requiring access to regional or geographical subsections (e.g., Asia, Europe, US & Canada).",
        "Users interact with live or real-time updates (e.g., 'LIVE' coverage, live sports results).",
        "Tasks demand locating multimedia elements like images, videos, or podcasts within specific categories.",
        "Navigation includes hierarchical structures (e.g., main menus \u2192 subsections \u2192 articles).",
        "Users must parse timestamps or publication dates to verify recency of information.",
        "Tasks involve cross-referencing multiple articles or sections to answer complex queries (e.g., economic impacts, event timelines).",
        "Both datasets require distinguishing between editorial content, fact-checking sections (e.g., BBC Verify), and external links."
      ],
      "nnetnav_live_site=bbc_num_tasks=69_portion=4": [
        "Tasks require navigating hierarchical category structures (e.g., regional sections like Middle East/Asia or topics like Sport/Business)",
        "Users must locate time-sensitive content (e.g., 'latest', 'recent', or date-bound articles)",
        "Search targets combine contextual filters (section + topic + recency) for precision",
        "Multi-format content retrieval required (articles, videos, podcasts, live updates)",
        "Tasks demand parsing article metadata (headlines, timestamps, categories) for verification",
        "Navigation patterns involve alternating between broad sections and specific sub-sections",
        "Requires understanding of persistent section organization (e.g., fixed navigation menus)",
        "Tasks involve cross-referencing content between related categories (e.g., Politics \u2192 Business \u2192 Technology)",
        "Time-bound success criteria present (e.g., 'within last two days', 'today')",
        "Requires distinguishing between featured content vs. archive material through UI cues"
      ]
    },
    "amazon": {
      "nnetnav_live_site=amazon_num_tasks=63_portion=2": [
        "Tasks involve searching for products with specific attributes (e.g., price range, size, material, ratings).",
        "Users frequently filter results based on categories like electronics, home goods, fashion, or books.",
        "Navigation includes actions like adding items to cart, sorting results, or checking availability.",
        "Tasks require comparing prices across multiple products or sellers.",
        "Users seek deals, discounts, or items under a specific price threshold.",
        "Queries often specify product features (e.g., waterproof, stainless steel, capacity, compatibility).",
        "Tasks involve verifying customer reviews (e.g., minimum star ratings or review counts).",
        "Users prioritize delivery options (e.g., free shipping, availability in their region).",
        "Navigation includes exploring time-sensitive content (e.g., new releases, limited-time sales).",
        "Tasks target both informational (e.g., price checks, reviews) and transactional (e.g., purchasing) intents."
      ],
      "nnetnav_live_site=amazon_num_tasks=63_portion=3": [
        "Both datasets require product searches with specific filters (price range, features, ratings)",
        "Both involve navigation through hierarchical category structures (e.g., Electronics > Computers > Laptops)",
        "Both require price comparison between multiple similar items",
        "Both include tasks needing interaction with product detail pages (specs, descriptions, images)",
        "Both necessitate use of sorting mechanisms (Best Sellers, Newest Arrivals, Price Low-High)",
        "Both involve cart management actions (adding/removing items, quantity selection)",
        "Both require understanding of deal/promotion formats (percentage off, limited-time offers)",
        "Both include tasks that demand size/color variant selection before purchase",
        "Both utilize search refinement tools (department filters, customer ratings thresholds)",
        "Both contain tasks requiring account interactions (sign-in, address verification, order history)"
      ],
      "nnetnav_live_site=amazon_num_tasks=63_portion=1": [
        "Tasks require product searches with multiple attribute filters (e.g., price range, size, material, ratings)",
        "Price constraints are consistently specified as primary filters in search criteria",
        "Tasks involve verification of customer reviews/ratings thresholds (e.g., 4+ stars, 20k+ reviews)",
        "Requires navigation through hierarchical category structures (e.g., Electronics > Computers > Accessories)",
        "Tasks demand comparison across similar products using sort/filter functions",
        "Contains actions requiring cart management (adding items, checking cart contents)",
        "Requires interpretation of promotional offers/sales terminology (e.g., 'Deal of the Day', percentage discounts)",
        "Tasks involve parsing product specifications from detail pages (e.g., dimensions, compatibility, power requirements)",
        "Time-sensitive constraints appear in both datasets (new releases, upcoming deals, limited stock indicators)",
        "Requires handling variant selection (color, size, style) before adding to cart"
      ],
      "nnetnav_live_site=amazon_num_tasks=63_portion=0": [
        "Tasks require searching with specific filters (price, size, brand, features).",
        "Users need to interact with product categories (electronics, home, fashion, etc.).",
        "Tasks involve comparing prices across multiple items or sellers.",
        "Navigation includes sorting results (e.g., by price, ratings, release date).",
        "Users must verify product availability (in-stock, color/size options).",
        "Tasks require parsing customer reviews and ratings (e.g., 4+ stars).",
        "Users need to identify deals/discounts (e.g., percentage off, under $50).",
        "Tasks involve adding items to cart after meeting criteria.",
        "Navigation includes filtering by material or functionality (e.g., stainless steel, waterproof).",
        "Users must locate time-sensitive content (new releases, seasonal sales)."
      ],
      "nnetnav_live_site=amazon_num_tasks=63_portion=4": [
        "Tasks require precise product attribute filtering (e.g., price range, size, material).",
        "Navigation includes adding items to the shopping cart as a primary action.",
        "Users must locate items within specific categories (e.g., electronics, home goods, apparel).",
        "Tasks involve validating customer review metrics (e.g., 4+ stars, 100+ reviews).",
        "Price comparison across search results is a recurring objective.",
        "Users are prompted to identify deals, discounts, or sale-specific items.",
        "Search requires sorting mechanisms (e.g., newest arrivals, best sellers, price low-to-high).",
        "Tasks demand verifying product availability (e.g., in-stock status, delivery options).",
        "Users must navigate hierarchical menus (e.g., department filters, subcategories).",
        "Tasks involve keyword-based searches with contextual modifiers (e.g., \"waterproof,\" \"stainless steel\")."
      ]
    },
    "wolframalpha": {
      "nnetnav_live_site=wolframalpha_num_tasks=66_portion=4": [
        "Tasks require mathematical computation (e.g., algebra, calculus, statistics).",
        "Tasks involve scientific data retrieval (e.g., material properties, chemical equations).",
        "Tasks demand step-by-step problem-solving or explanations.",
        "Tasks include unit conversions or dimensional analysis.",
        "Tasks involve real-world data queries (e.g., population statistics, energy metrics).",
        "Tasks require plotting or visualizing mathematical functions.",
        "Tasks focus on academic or educational topics (e.g., physics, engineering, finance).",
        "Tasks utilize natural language input for computational queries.",
        "Tasks involve domain-specific knowledge (e.g., mathematics, chemistry, economics).",
        "Tasks target problem-solving across STEM disciplines (e.g., differential equations, thermodynamics)."
      ],
      "nnetnav_live_site=wolframalpha_num_tasks=66_portion=0": [
        "Tasks require mathematical computations (e.g., equations, derivatives, integrals).",
        "Navigation involves retrieving structured data (e.g., scientific measurements, economic statistics).",
        "Queries demand integration of domain-specific knowledge (e.g., chemistry, physics, finance).",
        "Tasks involve unit conversions (e.g., mass to moles, energy units).",
        "Users must interact with dynamic input fields for calculations or visualizations.",
        "Tasks require parsing and interpreting scientific notations or formulas.",
        "Navigation includes accessing step-by-step solutions for problem-solving.",
        "Queries involve natural language processing for computational answers.",
        "Tasks focus on real-world applications (e.g., mortgages, climate models, health metrics).",
        "Users explore educational or reference content (e.g., definitions, historical data, theorem explanations)."
      ],
      "nnetnav_live_site=wolframalpha_num_tasks=66_portion=1": [
        "Tasks require inputting mathematical expressions or equations for computation.",
        "Navigation involves accessing scientific data (e.g., physics, chemistry, astronomy).",
        "Queries demand unit conversions (e.g., mass to moles, temperature units).",
        "Tasks involve plotting or visualizing mathematical functions/curves.",
        "Users must retrieve statistical properties (e.g., mean, standard deviation, distributions).",
        "Navigation includes solving algebraic equations or calculus problems step-by-step.",
        "Tasks require accessing domain-specific examples (e.g., differential equations, finance).",
        "Queries involve natural language processing for computational answers (e.g., 'explain chemical thermodynamics').",
        "Tasks focus on real-world applications (e.g., climate data, nutrition facts, salary statistics).",
        "Navigation targets structured data outputs (e.g., molecular structures, astronomical event timings)."
      ],
      "nnetnav_live_site=wolframalpha_num_tasks=66_portion=3": [
        "Tasks involve computational queries requiring mathematical operations (e.g., integrals, derivatives, equation solving).",
        "Queries require retrieval of scientific data (e.g., material properties, planetary metrics).",
        "Use of both natural language and structured mathematical input formats.",
        "Tasks necessitate unit conversions and measurement calculations (e.g., currency, weight, temperature).",
        "Generation of visual outputs like plots and graphs for mathematical functions.",
        "Access to real-world statistical data (e.g., economic indicators, population trends).",
        "Requirement for step-by-step solutions in problem resolution.",
        "Utilization of domain-specific knowledge bases (e.g., physics, chemistry, finance).",
        "Queries involve both simple arithmetic and complex algorithmic computations.",
        "Engagement with educational or homework-related problem-solving scenarios."
      ],
      "nnetnav_live_site=wolframalpha_num_tasks=66_portion=2": [
        "Tasks involve mathematical computations requiring equation solving or numerical analysis.",
        "Queries often require scientific or technical data retrieval from domains like physics, chemistry, or engineering.",
        "Tasks include unit conversions and measurement-based calculations (e.g., temperature, mass, energy).",
        "Users interact with computational tools for plotting graphs or visualizing mathematical functions.",
        "Tasks span real-world applications such as finance, health, climate studies, and material properties.",
        "Queries necessitate structured outputs like tables, property lists, or formatted mathematical expressions.",
        "Tasks require natural language interpretation for complex inputs (e.g., derivatives, integrals, statistical metrics).",
        "Users explore predefined example categories (e.g., algebra, geometry) to refine queries or find templates.",
        "Tasks involve multi-step problem-solving (e.g., solving equations with intermediate explanations).",
        "Queries frequently compare or analyze multiple entities (e.g., elements, materials, geographic data)."
      ]
    },
    "allrecipes": {
      "nnetnav_live_site=allrecipes_num_tasks=79_portion=0": [
        "Both datasets require users to search for recipes using specific criteria such as ingredients, dietary restrictions, or meal types.",
        "Tasks in both datasets involve filtering recipes based on user ratings (e.g., 4 stars or higher) and review counts (e.g., >50 reviews).",
        "Users navigate through structured categories (e.g., Dinners, Cuisines, Occasions) to locate recipes in both datasets.",
        "Recipe pages in both datasets include key metadata like preparation time, cooking time, and ingredient lists to fulfill task requirements.",
        "Both datasets support saving or bookmarking recipes, as tasks explicitly mention saving recipes for later use.",
        "Users interact with recipe reviews/ratings in both datasets to assess quality (e.g., 'most saved Easter recipe').",
        "Tasks require handling dietary-specific queries (e.g., vegetarian, keto, gluten-free) in both datasets.",
        "Both datasets include seasonal/event-based recipe filtering (e.g., Christmas desserts, Easter brunch).",
        "Users leverage ingredient-driven search in both datasets (e.g., 'zucchini lasagna' or 'cranberry sauce recipes').",
        "Both datasets enable community engagement through user-generated content like recipe reviews and ratings."
      ],
      "nnetnav_live_site=allrecipes_num_tasks=79_portion=4": [
        "Tasks require filtering recipes by user ratings (e.g., 4 stars or higher) and review counts (e.g., >500 reviews).",
        "Tasks involve searching for recipes with specific dietary constraints (e.g., vegetarian, vegan, keto, low-calorie).",
        "Tasks include time-related constraints such as prep/cook time limits (e.g., <1 hour) or quick weeknight meals.",
        "Tasks demand ingredient-based filtering (e.g., zucchini, shrimp, almond flour) or exclusion criteria.",
        "Tasks target occasion-specific recipes (e.g., holidays, Easter, Christmas, Ramadan).",
        "Tasks require interaction with user-generated content like saving recipes, leaving reviews, or comparing community feedback.",
        "Tasks involve nutritional criteria (e.g., calorie counts, high-protein, healthy substitutions).",
        "Tasks include comparative analysis (e.g., 'most-saved,' 'top-rated,' or side-by-side recipe comparisons).",
        "Tasks focus on meal-type categorization (e.g., dinners, desserts, snacks, brunches).",
        "Tasks reference specific cooking methods/appliances (e.g., slow cooker, air fryer, skillet, Instant Pot)."
      ],
      "nnetnav_live_site=allrecipes_num_tasks=79_portion=1": [
        "Both datasets require navigation tasks focused on recipe discovery with specific filters such as ratings, reviews, and preparation time.",
        "Tasks in both datasets emphasize searching for recipes with user-generated ratings (e.g., 4+ stars, 100+ reviews).",
        "Both involve filtering recipes by dietary preferences (e.g., vegetarian, vegan, keto, gluten-free).",
        "Tasks in both datasets require extracting ingredient lists and cooking instructions from recipe pages.",
        "Navigation tasks in both datasets include browsing recipe categories (e.g., dinners, holidays, cuisines).",
        "Both datasets involve saving or bookmarking recipes for later use (e.g., 'Save Recipe' functionality).",
        "Tasks in both datasets prioritize seasonal or event-specific recipes (e.g., Easter, Christmas, New Year\u2019s).",
        "Both include interactions with user reviews, such as reading or comparing feedback on recipes.",
        "Tasks in both datasets require parsing nutritional information or calorie counts for recipes.",
        "Both datasets involve navigating search results with keywords (e.g., 'chicken curry,' 'stuffed mushrooms')."
      ],
      "nnetnav_live_site=allrecipes_num_tasks=79_portion=3": [
        "Tasks require filtering recipes by user ratings (e.g., 4 stars or higher) and minimum review counts.",
        "Users must locate specific dietary-focused recipes (e.g., vegetarian, keto, gluten-free).",
        "Navigation through categorized sections (e.g., Dinners, Meals, Ingredients, Cuisines) is essential.",
        "Tasks involve extracting detailed recipe metadata like preparation time, cook time, and ingredient lists.",
        "Interaction with user-generated content (e.g., reviews, ratings) is required to evaluate recipe quality.",
        "Search functionality with precise keywords (e.g., 'vegetarian lasagna', 'grilled salmon') is critical for task completion.",
        "Seasonal or occasion-based recipe navigation (e.g., Easter, Christmas, Ramadan) is a common requirement.",
        "Users must identify recipes with explicit nutritional constraints (e.g., under 600 calories per serving).",
        "Tasks require parsing recipe collections or galleries (e.g., 'Top-Rated Recipes', 'Quick Skillet Dinners').",
        "Engagement with community-driven features (e.g., user profiles, Allstars) is integral to some tasks."
      ],
      "nnetnav_live_site=allrecipes_num_tasks=79_portion=2": [
        "Search functionality supports filtering by ratings, reviews, and preparation time",
        "User-generated reviews and ratings are prominently displayed for recipe evaluation",
        "Structured recipe categories (e.g., Dinners, Meals, Ingredients) enable systematic navigation",
        "Detailed recipe metadata includes ingredient lists and cooking durations",
        "Seasonal/holiday-specific recipe collections available (Easter, Christmas, Thanksgiving)",
        "Dietary preference filters present (vegetarian, vegan, gluten-free options)",
        "Recipe saving/bookmarking capability integrated",
        "Community interaction features for recipe reviews and ratings",
        "Editorial content about culinary experts and recipe testing processes",
        "Popular recipe highlights based on user engagement metrics"
      ]
    },
    "dictionary.cambridge": {
      "nnetnav_live_site=dictionary.cambridge_num_tasks=54_portion=2": [
        "Both datasets require users to search for word definitions using a search functionality.",
        "Tasks in both datasets involve accessing UK and US English pronunciations of words.",
        "Both datasets include tasks that require navigating to the Thesaurus section for synonyms.",
        "Grammar sections are present in both datasets for researching grammatical concepts like modal verbs and adjectives.",
        "Translation features between multiple languages (e.g., English\u2013French, English\u2013Spanish) are utilized in tasks from both datasets.",
        "Both datasets include tasks that involve exploring the 'Plus' section for quizzes and word games (e.g., Word Scramble).",
        "Pronunciation guides with International Phonetic Alphabet (IPA) notation are referenced in tasks from both datasets.",
        "Example sentences for word usage are consistently required in tasks across both datasets.",
        "Both datasets include alphabetical browsing of dictionary entries (e.g., A-Z links).",
        "Tasks in both datasets involve accessing blog posts or new word updates (e.g., 'Word of the Day')."
      ],
      "nnetnav_live_site=dictionary.cambridge_num_tasks=54_portion=3": [
        "Tasks require accessing word definitions, pronunciations, and example sentences from dictionary entries.",
        "Users must navigate to specialized sections like Grammar, Thesaurus, and Pronunciation guides.",
        "Tasks involve translating words between multiple languages via the dictionary interface.",
        "Interaction with the Plus section (e.g., quizzes, games) is required for some tasks.",
        "Users need to compare UK and US English variations for pronunciation and usage.",
        "Tasks require identifying synonyms, antonyms, or collocations using the Thesaurus.",
        "Navigation to blog posts or articles (e.g., Word of the Day, new words) is necessary for context.",
        "Users must search for grammatical rules (e.g., modal verbs, comparative adjectives) in the Grammar section.",
        "Tasks involve browsing alphabetical or categorized word lists (e.g., A-Z index).",
        "Users need to handle multi-step navigation across headers, footers, and pop-ups (e.g., cookie banners, login prompts)."
      ],
      "nnetnav_live_site=dictionary.cambridge_num_tasks=54_portion=1": [
        "Tasks require searching for word definitions, including both UK and US English variations.",
        "Tasks involve retrieving example sentences provided within dictionary entries.",
        "Tasks necessitate accessing grammar explanations and usage rules in dedicated sections.",
        "Tasks require using the thesaurus to find synonyms and related phrases.",
        "Tasks involve translating words into multiple languages (e.g., Chinese, French, Spanish).",
        "Tasks require comparing phonetic notations (IPA) for UK vs. US pronunciations.",
        "Tasks involve navigating to specialized sections (e.g., Pronunciation, Grammar, Thesaurus) for specific information.",
        "Tasks include exploring new vocabulary through blog posts or new word features.",
        "Tasks require multi-step navigation across different website areas (e.g., Dictionary, Translate, Plus).",
        "Tasks involve understanding and applying part-of-speech distinctions (e.g., adjectives, adverbs, nouns) in definitions."
      ],
      "nnetnav_live_site=dictionary.cambridge_num_tasks=54_portion=4": [
        "Tasks require searching for word definitions, pronunciations, and example sentences across both datasets.",
        "Users must navigate between UK and US English variants for pronunciation and usage details.",
        "Thesaurus functionality is used to find synonyms and related terms for queried words.",
        "Grammar sections are accessed to explore rules, parts of speech, and usage examples.",
        "Translation features are leveraged for converting words between multiple languages (e.g., English\u2013French, English\u2013Spanish).",
        "Interactive elements like quizzes (e.g., animal quiz, Grammar quiz) and word games (e.g., Word Scramble) are included.",
        "Navigation to specialized sections (e.g., Plus, Grammar, Thesaurus) is required to complete tasks.",
        "Example sentences within dictionary entries are referenced to understand word usage in context.",
        "Social media sharing options (e.g., Twitter, Facebook) are utilized in specific tasks.",
        "Alphabetical browsing (A-Z links) is used to explore dictionary entries systematically."
      ],
      "nnetnav_live_site=dictionary.cambridge_num_tasks=54_portion=0": [
        "Tasks require using a search bar to look up words or phrases in the dictionary",
        "Navigation involves accessing pronunciation guides for both UK and US English variants",
        "Tasks utilize translation features for multilingual dictionary entries (e.g., English\u2013Spanish, Chinese)",
        "Users interact with grammar sections to explore rules like tenses, adjectives, or passive voice",
        "Thesaurus functionality is leveraged to find synonyms/antonyms for target words",
        "Tasks involve accessing example sentences accompanying word definitions",
        "Navigation paths include the '+Plus' section for quizzes/games like Word Scramble",
        "Word-of-the-Day or blog content is referenced for extended learning",
        "Tasks require switching between dictionary modes (e.g., Learner\u2019s Dictionary vs. Essential English)",
        "Footer/header menus are used to navigate between core sections (Dictionary, Grammar, Translate)"
      ]
    },
    "apple": {
      "nnetnav_live_site=apple_num_tasks=70_portion=1": [
        "Tasks require comparing product specifications and features across different models (e.g., iPhone variants, AirPods models).",
        "Tasks involve locating technical details like battery life, chip types, camera specs, or device dimensions.",
        "Tasks necessitate navigating through product purchase flows (availability checks, pricing, in-store pickup options).",
        "Tasks require interacting with trade-in/value estimation tools for Apple devices.",
        "Tasks involve identifying color/material options for products (e.g., Watch bands, iPhone cases).",
        "Tasks require accessing support/resolution paths for account issues (Apple ID recovery, repair methods).",
        "Tasks involve exploring Apple ecosystem integrations (compatibility between devices/services like Health Records).",
        "Tasks require comparing pricing tiers across product lines or subscription services (Apple One, Apple TV+).",
        "Tasks involve locating corporate initiative details (environmental efforts, healthcare partnerships, educational programs).",
        "Tasks require using hierarchical navigation through category menus (Store > iPhone > Accessories) or footer directories."
      ],
      "nnetnav_live_site=apple_num_tasks=70_portion=4": [
        "Tasks involve comparing product models (e.g., iPhone, MacBook) to identify differences in features, prices, or specifications.",
        "Navigation requires accessing product-specific pages (e.g., iPhone 16 Pro, AirPods Pro 2) to retrieve technical details or purchase options.",
        "Users must locate trade-in values, eligibility criteria, or promotional offers for Apple devices.",
        "Tasks require verifying product availability (e.g., in-store pickup, color variants) or regional purchasing constraints.",
        "Navigation paths include support sections for troubleshooting (e.g., battery life, warranty status, repair options).",
        "Price verification is critical across devices, accessories, and subscription services (e.g., Apple Fitness+).",
        "Tasks demand exploration of product customization workflows (e.g., storage, color, AppleCare+).",
        "Users navigate educational or business-specific portals to access discounts or bulk purchasing details.",
        "Tasks involve retrieving environmental or compliance information (e.g., material sustainability, regulatory policies).",
        "Navigation includes account management features (e.g., Family Sharing, Apple ID recovery, payment methods)."
      ],
      "nnetnav_live_site=apple_num_tasks=70_portion=0": [
        "Tasks require locating specific product details such as price, specifications, or features.",
        "Navigation involves accessing hierarchical product categories (e.g., iPhone, Mac, iPad, Watch).",
        "Users must interact with product-specific pages containing 'Learn more' and 'Buy' links.",
        "Tasks frequently involve comparing models or variants within a product line.",
        "Information retrieval includes technical specifications (e.g., battery life, chip types, display features).",
        "Tasks require accessing support sections for troubleshooting (e.g., Apple ID recovery, repair options).",
        "Trade-in value checks and promotional offers are common objectives.",
        "Users navigate to footer sections for policies, contact details, or regional settings.",
        "Tasks involve verifying product availability (in-store pickup, online stock) or release dates.",
        "Exploration of accessories or compatibility with specific devices is required."
      ],
      "nnetnav_live_site=apple_num_tasks=70_portion=2": [
        "Tasks involve comparing product specifications and prices across different models (e.g., iPhone, MacBook, Apple Watch).",
        "Navigation includes checking trade-in values and eligibility for Apple devices.",
        "Users frequently verify product availability (e.g., in-store pickup, color options, regional restrictions).",
        "Tasks require accessing technical specifications (e.g., battery life, chip details, device dimensions).",
        "Support-related queries focus on troubleshooting (e.g., forgotten passwords, device repairs, feature setup).",
        "Users explore product customization options (e.g., storage capacities, color variants, accessory compatibility).",
        "Navigation targets enterprise/business services (e.g., bulk purchases, device management, corporate support).",
        "Tasks involve price comparisons across storage tiers, models, or financing/trade-in combinations.",
        "Users seek accessory information (e.g., compatibility, availability, pricing for cases, AirPods variants).",
        "Navigation includes locating policy documents (e.g., privacy, environmental reports, warranty terms)."
      ],
      "nnetnav_live_site=apple_num_tasks=70_portion=3": [
        "Tasks involve comparing product models (e.g., iPhone, iPad) for specifications and pricing.",
        "Tasks require checking trade-in values and promotional offers for devices.",
        "Tasks involve verifying technical specifications like battery life, chip types, or camera features.",
        "Tasks require locating product availability (in-store/online) and scheduling purchases/pickups.",
        "Tasks involve navigating support pages for troubleshooting, warranty status, or repair options.",
        "Tasks focus on identifying color options, storage capacities, or accessory compatibility.",
        "Tasks require finding pricing details across different product configurations or regions.",
        "Tasks involve exploring product-specific features (e.g., camera zoom, health tracking, Siri integration).",
        "Tasks require accessing purchasing workflows like customization, financing, or carrier deals.",
        "Tasks involve validating compatibility of accessories or software versions with specific devices."
      ]
    },
    "google_search": {
      "nnetnav_live_site=google_search_num_tasks=72_portion=3": [
        "Tasks require retrieving specific factual information (e.g., dates, statistics, names, prices).",
        "Tasks involve querying real-time or frequently updated data (e.g., news, stock prices, sports scores).",
        "Navigation includes comparing multiple entities (e.g., products, teams, stocks, recipes).",
        "Tasks demand extracting structured data from unstructured sources (e.g., tables, search snippets, articles).",
        "Queries target specialized domains (e.g., sports, tech specs, health, finance, academic research).",
        "Tasks require filtering results by recency or relevance (e.g., latest news, recent research papers).",
        "Navigation involves multi-step exploration (e.g., finding specs \u2192 comparing prices \u2192 checking availability).",
        "Tasks focus on location-based information (e.g., event venues, travel destinations, local services).",
        "Queries include explicit platform or source directives (e.g., YouTube tutorials, Allrecipes.com, GitHub).",
        "Tasks require interpreting context-dependent terminology (e.g., 'top gainers,' 'high-risk groups,' 'entry-level jobs')."
      ],
      "nnetnav_live_site=google_search_num_tasks=72_portion=2": [
        "Tasks require retrieving specific factual information from search results (e.g., statistics, dates, names, definitions).",
        "Queries often involve temporal constraints (e.g., 'latest', 'upcoming', 'recent', '2025', 'current').",
        "Tasks demand parsing structured data formats (e.g., tables, lists, rankings, commit hashes, stock prices, citations).",
        "User goals require multi-step interactions (e.g., search \u2192 filter \u2192 extract \u2192 validate \u2192 return).",
        "Many tasks involve comparative analysis (e.g., ratings, stock prices, recipe ingredients, SEO trends).",
        "Navigation requires identifying and prioritizing authoritative sources (e.g., IMDb, GitHub, academic articles, official websites).",
        "Tasks frequently target domain-specific platforms (e.g., GitHub for code, Duolingo for language learning, Google Trends for data).",
        "User intents focus on actionable outcomes (e.g., purchase tickets, apply for jobs, start courses, implement strategies).",
        "Queries often require resolving ambiguities (e.g., distinguishing similar entities, clarifying temporal/logical scope).",
        "Tasks emphasize precision in data extraction (e.g., exact numerical values, specific syntax like SHA hashes, formatted citations)."
      ],
      "nnetnav_live_site=google_search_num_tasks=72_portion=4": [
        "Tasks require initiating a search query through a search bar input",
        "Queries involve retrieving specific factual information (e.g., dates, statistics, definitions)",
        "Tasks demand parsing search results for time-sensitive data (e.g., latest news, current records, real-time metrics)",
        "Navigation requires filtering results by granular criteria (e.g., geographical location, temporal recency, domain authority)",
        "Objectives often involve multi-step exploration (e.g., compare sources, cross-verify data points, navigate paginated results)",
        "Tasks target structured data extraction from unstructured web content (e.g., tables, lists, featured snippets)",
        "Queries frequently include compound intents (e.g., retrieve X AND Y, compare A vs B, filter by condition C)",
        "Tasks emphasize domain-specific precision (e.g., technical terminology in AI research, medical vaccine details, financial metrics)",
        "Navigation patterns show reliance on SERP features (e.g., knowledge panels, 'People also ask' sections, top stories carousels)",
        "Tasks require distinguishing between primary content and auxiliary information (e.g., ads, related links, comment sections)"
      ],
      "nnetnav_live_site=google_search_num_tasks=72_portion=0": [
        "Tasks require entering precise search queries to retrieve specific information.",
        "Both involve retrieving time-sensitive or current data (e.g., news, stock prices, sports scores).",
        "Users must parse and extract multiple data points from search results (e.g., dates, locations, names).",
        "Navigation often includes accessing external websites or tools (e.g., IMDb, Google Translate, booking platforms).",
        "Tasks demand interaction with search filters or advanced search features (e.g., date ranges, product specifications).",
        "Queries frequently involve technical or domain-specific terminology (e.g., medical terms, software engineering jargon).",
        "Users need to validate information across multiple sources or results (e.g., cross-checking ratings, translations).",
        "Tasks require distinguishing between primary content and auxiliary links (e.g., ads, related articles).",
        "Actions involve iterative refinement of search terms to optimize results (e.g., clarifying keywords).",
        "Users must identify and interact with actionable elements (e.g., booking buttons, product pages, forms)."
      ],
      "nnetnav_live_site=google_search_num_tasks=72_portion=1": [
        "Tasks require retrieving specific factual information such as dates, names, numerical data, or event outcomes.",
        "Navigation involves interacting with search bars, buttons, and hyperlinks to locate information.",
        "Users must identify and access specialized sections or services within websites (e.g., news, product pages, academic programs).",
        "Tasks often involve comparing information from multiple sources or categories (e.g., ratings, prices, stock performance).",
        "Queries demand time-sensitive or dynamically updated data (e.g., latest news, real-time scores, current prices).",
        "Tasks require parsing structured content like tables, lists, or summaries for precise answers.",
        "Users interact with forms, modals, or input fields (e.g., search queries, filters, subscriptions).",
        "Navigation includes multi-step processes, such as locating a resource and performing subsequent actions (e.g., copying data, submitting forms).",
        "Tasks involve domain-specific terminology (e.g., technical terms, brand names, scientific concepts) to refine searches.",
        "Users must verify information from authoritative sources or official platforms (e.g., academic sites, verified databases, corporate pages)."
      ]
    }
  },
  "diffs_synth_from_real": {
    "google_maps": {
      "nnetnav_live_site=google_maps_num_tasks=75_portion=2": [
        "Dataset B tasks require direct booking/reservation actions (e.g., hotel rooms, restaurant tables), while Dataset A focuses on locating/planning without explicit booking steps",
        "Dataset B tasks involve retrieving specific product/service details (e.g., menu items, pricing plans), whereas Dataset A focuses on general attribute extraction",
        "Dataset B includes tasks requiring interaction with user reviews/ratings (e.g., reading reviews), which are not mentioned in Dataset A tasks",
        "Dataset B tasks frequently specify price tiers (e.g., 'moderately priced', 'high-end'), while Dataset A focuses on price as a general attribute",
        "Dataset B contains tasks requiring date-specific availability checks (e.g., 'open on Sunday and Monday'), whereas Dataset A focuses on real-time availability ('open now')",
        "Dataset B includes multi-criteria combination filters (e.g., '3 stars + free Wi-Fi + specific location'), while Dataset A uses simpler single-criterion filters",
        "Dataset B tasks involve itinerary planning with waypoints (e.g., 'stopping at coffee shop'), whereas Dataset A focuses on point-to-point route planning",
        "Dataset B requires comparison of multiple business options based on reviews/ratings, while Dataset A focuses on sorting/filtering results",
        "Dataset B tasks specify dietary/accessibility needs (e.g., 'gluten-free', 'wheelchair-accessible'), while Dataset A mentions accessibility as a general feature",
        "Dataset B includes tasks requiring interaction with user-generated content (e.g., 'what people are saying'), absent in Dataset A tasks"
      ],
      "nnetnav_live_site=google_maps_num_tasks=75_portion=3": [
        "Dataset A tasks emphasize zip codes and city names for location parameters, while B uses neighborhoods or landmarks (e.g., 'Le Marais neighborhood,' 'Eiffel Tower').",
        "Dataset A includes explicit printing/sharing of route details (e.g., 'print the route details'), while B focuses on reservation/bookings (e.g., 'Book a hotel').",
        "Dataset A requires proximity-based actions (e.g., 'nearest to...') with transportation planning, while B emphasizes service-specific accessibility (e.g., 'wheelchair-accessible medical transportation').",
        "Dataset A tasks frequently involve multi-modal navigation (driving, walking, transit), while B focuses on localized exploration (e.g., 'restaurants near the Empire State Building').",
        "Dataset A includes quantitative verification (e.g., 'how many results are shown'), while B prioritizes qualitative filtering (e.g., 'highly rated parks').",
        "Dataset B tasks involve explicit budget constraints (e.g., '$400 budget') and date ranges (e.g., 'starting from January 10'), which are absent in A.",
        "Dataset A emphasizes real-time status checks (e.g., 'open now but not 24 hours'), while B focuses on amenity verification (e.g., 'hotels with a pool').",
        "Dataset B includes international locations (e.g., Paris, Buenos Aires), whereas A's tasks are U.S.-centric.",
        "Dataset A tasks frequently combine location search with subsequent actions (e.g., 'find a parking area... then find a route'), while B focuses on singular goal-oriented queries (e.g., 'Find hotels in Paris').",
        "Dataset B explicitly references user-generated content (e.g., 'view photos of...', 'reviews of Japanese restaurants'), while A prioritizes system-generated data (e.g., 'shortest walking route')."
      ],
      "nnetnav_live_site=google_maps_num_tasks=75_portion=1": [
        "Dataset B tasks require date-specific availability checks (e.g., future reservations, event dates) while A focuses on real-time/open-now statuses",
        "Dataset B includes explicit trip planning elements (e.g., multi-day stays, admission tickets) absent in A's immediate navigation focus",
        "Dataset B requires price comparison/numeric budget constraints ($350/night) where A uses relative price ranges ($-$$$)",
        "Dataset B contains tasks requiring hotel/bookable service reservations while A focuses purely on location identification",
        "Dataset B emphasizes accessibility requirements as primary filters (wheelchair-accessible routes) rather than secondary attributes",
        "Dataset B tasks involve complex lodging criteria (guest capacity, cancellation policies) unlike A's simple location-based hotel searches",
        "Dataset B includes international landmark-specific tasks (Eiffel Tower) while A focuses on domestic US locations",
        "Dataset B requires combining 3+ filters simultaneously (price+rating+accessibility) where A typically uses 1-2 filters",
        "Dataset B contains admission ticket purchases/price checks (attractions, transportation) absent in A",
        "Dataset B includes itinerary planning with sequential multi-stop requirements (A\u2192B\u2192C routes) rather than A's point-to-point navigation"
      ],
      "nnetnav_live_site=google_maps_num_tasks=75_portion=0": [
        "Dataset B tasks require international location coordination (e.g., Paris to California), while Dataset A focuses on domestic locations",
        "Dataset B includes explicit price comparison workflows (e.g., hotel rate comparisons), while Dataset A focuses on single-location pricing checks",
        "Dataset B tasks frequently combine travel planning with specific event timing (e.g., New Year's Eve bookings), while Dataset A uses generic date constraints",
        "Dataset B contains tasks requiring multi-destination itinerary planning (e.g., SF to Palo Alto with intermediate stops), while Dataset A focuses on point-to-point routes",
        "Dataset B emphasizes accessibility feature verification through user reviews, while Dataset A focuses on basic accessibility filtering",
        "Dataset B includes tasks requiring historical/geographical research about locations, while Dataset A focuses on current operational information",
        "Dataset B tasks frequently involve cross-referencing multiple service types (e.g., hotels near specific restaurants), while Dataset A maintains service category separation",
        "Dataset B requires interpretation of qualitative review analysis (e.g., patient experiences), while Dataset A focuses on quantitative review metrics",
        "Dataset B includes tasks demanding multimedia content interaction (e.g., 360\u00b0 views), while Dataset A focuses on textual/graphical data",
        "Dataset B features tasks requiring dynamic price tracking across dates, while Dataset A uses fixed budget parameters"
      ],
      "nnetnav_live_site=google_maps_num_tasks=75_portion=4": [
        "Tasks in dataset B require direct user engagement features (e.g., viewing photos, making reservations, checking menus) more frequently than dataset A.",
        "Dataset B tasks often include temporal constraints tied to specific dates/times (e.g., 'for New Year's Eve,' 'January 1, 2025'), while dataset A focuses on immediate availability (e.g., 'open now').",
        "Dataset A tasks explicitly demand output generation (e.g., 'print route details,' 'list three of them'), whereas dataset B focuses on information discovery without explicit output formatting requirements.",
        "Tasks in dataset B emphasize itinerary planning (e.g., 'day trip to wheelchair-accessible restaurants') more prominently than dataset A's single-journey route planning.",
        "Dataset B contains tasks requiring price comparison with cancellation policies (e.g., 'hotel with free cancellation'), while dataset A focuses on static price range filtering.",
        "Tasks in dataset B more frequently involve tourism-oriented actions (e.g., 'things to do near tourist attractions,' 'view photos of landmarks') compared to dataset A's practical navigation focus.",
        "Dataset A includes explicit numerical quantification requirements (e.g., 'how many results,' 'find 5 places'), while dataset B uses qualitative thresholds (e.g., 'highly-rated,' 'best').",
        "Dataset B tasks require evaluation of accessibility features along specific routes (e.g., 'check accessibility along route'), whereas dataset A only filters by accessibility status at locations.",
        "Dataset B contains tasks involving purchase intent (e.g., 'purchase a ticket to Eiffel Tower') absent in dataset A's information-retrieval focus.",
        "Tasks in dataset B utilize relative qualitative descriptors (e.g., 'most bike-friendly,' 'nicest restaurant') more frequently than dataset A's objective metrics (e.g., 'shortest route,' 'nearest')."
      ]
    },
    "github": {
      "nnetnav_live_site=github_num_tasks=71_portion=3": [
        "Dataset B tasks require direct price lookup for individual products (Copilot Pro, Codespaces) while Dataset A focuses on comparing plan features",
        "Dataset B includes specific security vulnerability reporting tasks (e.g. GHSA advisories) not present in Dataset A",
        "Dataset A tasks specify exact numerical criteria (e.g. '500 stars', 'last 2 days') while Dataset B uses more general quantitative terms",
        "Dataset B contains account management tasks (create account, privacy policy) absent from Dataset A",
        "Dataset A requires identification of trending/ranked content (monthly top developers) not seen in Dataset B",
        "Dataset B includes mobile app functionality inquiries not present in Dataset A tasks",
        "Dataset A tasks focus on repository contributor analysis while Dataset B emphasizes vulnerability research",
        "Dataset B contains explicit compliance/certification questions (security certifications) not in Dataset A",
        "Dataset A requires comparison of AI vs traditional features while Dataset B focuses on AI tool implementation",
        "Dataset B includes educational course enrollment tasks whereas Dataset A focuses on resource discovery"
      ],
      "nnetnav_live_site=github_num_tasks=71_portion=2": [
        "Dataset B tasks require explicit identification of security advisory severity levels (e.g., 'high-severity'), while Dataset A focuses on general security navigation",
        "Dataset B contains tasks specifically about compliance documentation retrieval (e.g., CSA STAR Certificate) not present in Dataset A",
        "Dataset B includes explicit requirements to compare REST API vs GraphQL technical implementations, while Dataset A focuses on general API documentation lookup",
        "Dataset B tasks demand direct interaction with legal/policy documents (data usage policies, terms of service) absent in Dataset A",
        "Dataset B requires precise cost extraction from enterprise pricing quotes, while Dataset A focuses on plan feature comparisons",
        "Dataset B contains tasks about employment information (job openings) not found in Dataset A",
        "Dataset B includes specific technical implementation queries (YAML code for GitHub Actions) where Dataset A focuses on general automation concepts",
        "Dataset B tasks require identification of AI training data sources (Copilot model training data) not addressed in Dataset A",
        "Dataset B contains explicit multilingual requirements (page translation tasks) absent from Dataset A",
        "Dataset B includes package management tasks (installation instructions, package searches) while Dataset A focuses on repository-level analysis"
      ],
      "nnetnav_live_site=github_num_tasks=71_portion=0": [
        "Dataset B tasks focus on GitHub Copilot-specific security features and data usage policies, while A focuses on general repository security vulnerabilities",
        "Dataset B requires understanding GitHub Copilot plan differences and upgrade paths, whereas A focuses on Free vs Pro plan comparisons",
        "Dataset B tasks involve GitHub Actions configuration/troubleshooting (e.g. Metrics embed), while A focuses on workflow analysis",
        "Dataset B includes specific queries about GitHub CLI setup and plugin integration, absent in A",
        "Dataset B tasks require navigating GitHub's educational resources for syntax/formats (Markdown tables), while A focuses on technical documentation",
        "Dataset B emphasizes GitHub Copilot extension exploration and integration, unlike A's general AI capability understanding",
        "Dataset B contains tasks about GitHub Enterprise trials and enterprise-specific features not present in A",
        "Dataset B requires investigation of GitHub's data collection policies for business users, while A focuses on general account management",
        "Dataset B includes explicit queries about GitHub Mobile features and configuration, absent in A",
        "Dataset B tasks involve GitHub Classroom/autograding setup, whereas A focuses on general educational resource discovery"
      ],
      "nnetnav_live_site=github_num_tasks=71_portion=4": [
        "Tasks in B require detailed analysis of GitHub Copilot's pricing tiers and upgrade paths, while A focuses on general plan comparisons.",
        "B includes tasks about GitHub Enterprise solutions and trials, which are absent in A.",
        "B tasks involve navigating GitHub's security certifications (e.g., CSA STAR) not mentioned in A.",
        "B requires understanding GitHub Copilot's data retention policies and security limitations, while A focuses on basic feature identification.",
        "B contains tasks about GitHub's GraphQL API integration development, whereas A focuses on general API documentation navigation.",
        "B tasks necessitate comparing GitHub Copilot plans across multiple dimensions (Free/Pro/Enterprise), while A compares basic Free vs Pro features.",
        "B requires investigation of GitHub's status page and system reliability reports, which A doesn't address.",
        "B includes tasks about GitHub Mobile app store ratings analysis, absent in A's desktop-focused tasks.",
        "B tasks demand understanding of GitHub's intellectual property policies and user-generated content terms, while A focuses on basic account management.",
        "B requires navigation through GitHub's educational program offerings and academic trials, which A doesn't include"
      ],
      "nnetnav_live_site=github_num_tasks=71_portion=1": [
        "Dataset B tasks focus on GitHub's own product features and setup processes (e.g., creating projects, using Copilot) rather than repository discovery",
        "Dataset B requires navigation through GitHub's documentation pages (e.g., API documentation, security guides) rather than repository metadata pages",
        "Dataset B emphasizes understanding GitHub's internal tools (Copilot, Projects, Issues) rather than analyzing external open-source projects",
        "Dataset B tasks frequently involve account-specific configuration inquiries (e.g., plan upgrades, organizational settings)",
        "Dataset B includes explicit troubleshooting scenarios (e.g., rate limits, project creation errors) not present in Dataset A",
        "Dataset B requires comparisons between GitHub's technical components (REST vs GraphQL API) rather than service tiers",
        "Dataset B tasks focus on security vulnerability research and advisory navigation rather than repository security features",
        "Dataset B emphasizes understanding data handling policies (Copilot's training data, privacy terms) rather than repository content",
        "Dataset B includes specific documentation element retrieval (e.g., API scalar types, project layouts) rather than general technical docs",
        "Dataset B tasks involve navigation through GitHub's blog and research content (e.g., AI adoption surveys) rather than marketing comparisons"
      ]
    },
    "espn": {
      "nnetnav_live_site=espn_num_tasks=62_portion=0": [
        "Dataset B includes explicit betting odds integration (e.g., spreads, moneylines) within game listings for navigation tasks",
        "Dataset B tasks emphasize upcoming game schedules with precise start times (e.g., '4:00 PM PT') rather than in-progress game clock statuses",
        "Dataset B requires navigation through conference-specific network branding (e.g., SEC Network, BTN) for college sports content access",
        "Dataset B contains tennis match navigation with player nationality flags and WTA/ATP tournament structures",
        "Dataset B tasks involve navigating multi-platform broadcast details (e.g., 'TNT/truTV/Max') as combined streaming options",
        "Dataset B requires interaction with collegiate ranking numbers (e.g., '9 CONN') preceding team abbreviations in NCAA matchups",
        "Dataset B includes postponed game status handling (e.g., 'Postponed CGY 0 LA 0') as distinct navigation scenario",
        "Dataset B tasks involve navigating tournament-specific formats like Spanish Supercopa's two-legged aggregate scoring",
        "Dataset B requires parsing of college basketball conference abbreviations (e.g., B1G+) in schedule navigation",
        "Dataset B contains expanded college softball navigation with inning-specific scoring (e.g., '7:08 - 2nd')"
      ],
      "nnetnav_live_site=espn_num_tasks=62_portion=4": [
        "Dataset B tasks focus on postseason/playoff content (bowl games, NFL playoff picture) while A emphasizes regular season games",
        "B includes explicit requests for historical data comparisons (current vs previous season performance) not present in A",
        "B requires navigation through sports betting odds and probability analysis more frequently than A",
        "B tasks involve international soccer leagues (Primeira Liga) beyond major leagues covered in A",
        "B contains more complex scenario simulations (NBA trade simulations between unrelated sports)",
        "B emphasizes collegiate sports (NCAAF, NCAAM) more prominently than A's professional league focus",
        "B tasks require cross-sport comparisons (NFL vs NCAAF odds) not seen in A",
        "B includes specific requests for broadcast/streaming information (ESPN Radio NBA podcasts)",
        "B tasks require tracking player movement (transfer rumors, free agents) beyond A's static roster checks",
        "B involves deeper fantasy sports analysis (rookie evaluations, specific positional stats) compared to A's basic fantasy interactions"
      ],
      "nnetnav_live_site=espn_num_tasks=62_portion=1": [
        "Dataset B tasks require navigating to postseason/championship content (e.g., College Football Playoff brackets, NFL Playoff Machine) while A focuses on regular season games",
        "Tasks in B emphasize retrieving full season schedules (e.g., '2024 NFL schedule') rather than recent date-filtered results as in A",
        "B includes explicit odds/betting information requests (e.g., 'AFC Champion odds') not present in A's tasks",
        "B requires comparison of team/player statistics across seasons (e.g., 'Cavaliers vs Thunder comparison') while A focuses on single-game performance",
        "Dataset B tasks specify historical player data retrieval (e.g., '2002-03 Michael Jordan stats') where A only requests current season data",
        "B contains fantasy sports ranking analysis tasks (e.g., '2024-25 fantasy basketball rankings') absent in A's requirements",
        "Tasks in B frequently reference specific competition weeks (e.g., 'Week 18 NFL') rather than relative time frames like 'last 2 days' in A",
        "B requires navigation through ESPN+ streaming content details (e.g., 'how to watch on ESPN+') not mentioned in A's tasks",
        "Dataset B emphasizes conference/division standings analysis (e.g., 'AFC West standings') while A focuses on individual game outcomes",
        "B includes bowl game/playoff bracket navigation tasks (e.g., '2024 Bowl schedule') whereas A focuses on regular league matchups"
      ],
      "nnetnav_live_site=espn_num_tasks=62_portion=3": [
        "Dataset B tasks focus more on college football (NCAAF) bowl games and playoff scenarios compared to Dataset A's broader league coverage",
        "Tasks in B require accessing specific player career stats (e.g. Scottie Barnes, Sidney Crosby) rather than just game-specific performances",
        "B includes requests for sports radio/podcast content navigation not emphasized in A",
        "B contains tasks related to coaching changes and team management updates (e.g. Saints coaching jobs)",
        "B requires navigation of college football transfer portal information and recruitment news",
        "Tasks in B emphasize current season standings (2024-25) rather than historical comparisons",
        "B includes specific requests for broadcast/streaming information (ESPN+ game streams)",
        "B tasks focus on NCAA football program comparisons (e.g. Vanderbilt vs Georgia Tech)",
        "B requires navigation of fantasy sports rankings (baseball) not present in A's tasks",
        "B includes requests for team vs team roster comparisons (e.g. Chiefs vs Cardinals)"
      ],
      "nnetnav_live_site=espn_num_tasks=62_portion=2": [
        "Tasks in B focus on retrieving specific season-based data (e.g., '2024 NFL season') rather than general recency filters",
        "B requires locating draft/transfer news updates (NBA/NFL drafts) not emphasized in A's tasks",
        "B includes explicit requests for injury reports (team/player injuries) not present in A's samples",
        "B contains tasks related to playoff/bowl game qualification scenarios and bracket probabilities",
        "B demands direct team-vs-team matchup comparisons (e.g., 'Raiders vs Saints') rather than single-team analysis",
        "B requires fantasy baseball-specific data retrieval absent from A's fantasy sports tasks",
        "B includes esports-related navigation queries not found in A's traditional sports focus",
        "B contains hypothetical trade scenario construction tasks between multiple teams",
        "B requires identification of preseason game schedules not mentioned in A's tasks",
        "B demands comprehensive play-by-play data retrieval for historical games rather than live gamecasts"
      ]
    },
    "huggingface": {
      "nnetnav_live_site=huggingface_num_tasks=76_portion=1": [
        "Tasks in B require accessing research papers or academic content linked to models/datasets",
        "B includes tasks involving image generation through text-to-image models (e.g. cityscapes, futuristic scenes)",
        "Navigation tasks in B require document format conversion (e.g. paper to HTML)",
        "B contains tasks focused on creative applications like generating artistic content",
        "B tasks involve finding model installation/implementation instructions for client libraries",
        "B includes explicit pricing/GPU instance cost investigation requirements",
        "B tasks target beginner-friendly resources and model discovery for newcomers",
        "B requires identifying code generation capabilities (e.g. creating algorithms)",
        "B tasks involve dataset structure analysis (e.g. row counts in splits)",
        "B includes commercial product development use case exploration for models"
      ],
      "nnetnav_live_site=huggingface_num_tasks=76_portion=0": [
        "Dataset B tasks involve finding licensing restrictions for commercial use cases (non-commercial clauses, attribution requirements)",
        "Dataset B tasks require integration with external platforms (GitHub Actions, LM Studio, Ollama) not native to Hugging Face ecosystem",
        "Dataset B includes tasks requiring price comparisons for computational resources (GPU pricing, enterprise plans)",
        "Dataset B tasks focus more on multimodal applications (text-to-image generation, video processing, 3D models)",
        "Dataset B requires analysis of model performance on domain-specific benchmarks (MMMU, medical applications)",
        "Dataset B tasks involve community interaction features (forum discussions, model comments, pull request management)",
        "Dataset B emphasizes research paper analysis for implementation details (arXiv references, methodology replication)",
        "Dataset B includes tasks requiring identification of low-resource language support in models/datasets",
        "Dataset B tasks require ethical considerations analysis (AI ethics, responsible usage guidelines)",
        "Dataset B contains tasks needing workflow integration documentation (model deployment pipelines, CI/CD implementations)"
      ],
      "nnetnav_live_site=huggingface_num_tasks=76_portion=4": [
        "Dataset B tasks require locating specific model versions (e.g. 'metaversions Llama-3.3-70B-Instruct') while Dataset A focuses on general model characteristics",
        "Dataset B includes tasks about framework-specific implementations (Keras, TensorRT-LLM) not emphasized in Dataset A",
        "Dataset B contains academic research paper retrieval tasks (e.g. 'find paper abstract') absent from Dataset A",
        "Dataset B tasks involve commercial integration considerations ('AI tools for commercial products') more prominently",
        "Dataset B requires handling non-English language content (e.g. German installation instructions)",
        "Dataset B includes GitHub platform integration tasks (GitHub Actions CI) not present in Dataset A",
        "Dataset B tasks focus on model version history tracking ('version history of codegen-350M...')",
        "Dataset B requires accessing model source code repositories ('find source code for ultravox model')",
        "Dataset B contains tasks about low-resource language support ('GlotLID for low-resource languages')",
        "Dataset B includes specific paper abstract retrieval tasks while Dataset A focuses on documentation"
      ],
      "nnetnav_live_site=huggingface_num_tasks=76_portion=2": [
        "Dataset B tasks emphasize model discovery for specific applications (e.g. dog breed classification) rather than technical metric comparisons",
        "Dataset B requires handling SDK/package installation guidance (e.g. Nexa SDK) not present in Dataset A",
        "Dataset B contains tasks about resolving API errors (e.g. 'Task not found' errors) while Dataset A focuses on successful API usage",
        "Dataset B includes commercial deployment considerations (pricing queries, commercial use approvals) beyond Dataset A's license checks",
        "Dataset B tasks target niche model types (anime-style generation, OCR models) compared to Dataset A's focus on mainstream NLP models",
        "Dataset B requires navigating beginner educational resources (NLP tutorials) not emphasized in Dataset A's expert-oriented tasks",
        "Dataset B tasks involve paper-source tracing (find original research papers) while Dataset A focuses on model-card cross-referencing",
        "Dataset B includes deployment cost analysis (e.g. Gemma deployment pricing) absent from Dataset A's technical parameter queries",
        "Dataset B tasks require integration with external tools (GitHub Copilot, LM Studio) beyond Dataset A's platform-contained workflows",
        "Dataset B contains multilingual dataset extraction tasks (e.g. Bulgarian jokes) while Dataset A focuses on English-centric models"
      ],
      "nnetnav_live_site=huggingface_num_tasks=76_portion=3": [
        "Dataset B tasks require locating exact model identifiers (e.g. meta-llama/Llama-3.3-70B-Instruct) rather than discovering new models",
        "Dataset B includes tasks involving commercial use case evaluation and ethics policy compliance checks",
        "Dataset B requires interaction with dataset format conversion documentation (e.g. Parquet format conversion)",
        "Dataset B tasks focus more on commercial product development applications compared to Dataset A's research focus",
        "Dataset B contains tasks requiring local model execution documentation rather than just cloud API usage",
        "Dataset B tasks demand navigation through research paper references and academic citations",
        "Dataset B includes multilingual translation model requirements beyond Dataset A's language-specific needs",
        "Dataset B tasks require understanding model versioning patterns (e.g. -Preview suffixes) for technical specifications",
        "Dataset B contains CPU optimization requirements for model inference not present in Dataset A",
        "Dataset B tasks involve GitHub repository navigation for tutorial implementations"
      ]
    },
    "coursera": {
      "nnetnav_live_site=coursera_num_tasks=72_portion=3": [
        "Dataset B tasks focus on career development through language-specific courses (e.g., English for career growth), while A emphasizes role-specific learning paths (e.g., Data Analyst).",
        "Dataset B requires identifying guided projects (e.g., Python programming for beginners), whereas A focuses on full courses/Specializations with project components.",
        "Dataset B tasks involve direct enrollment actions (e.g., 'Enroll in IBM Data Science Certificate'), while A focuses on comparison/identification of programs.",
        "Dataset B emphasizes technical skill combinations (e.g., Python with Flask for AI applications), whereas A focuses on discrete skill acquisition from domain-specific courses.",
        "Dataset B includes queries for course curriculum details (e.g., 'Find the curriculum of finance courses'), while A focuses on quantitative metadata like duration/quiz counts.",
        "Dataset B tasks target intermediate-level content (e.g., 'business analytics for intermediate data analysts'), whereas A primarily addresses beginner-level filtering.",
        "Dataset B requires locating financial analysis components (e.g., Apple stock analysis projects), while A focuses on salary/job opening statistics.",
        "Dataset B includes discipline-blended queries (e.g., 'Python for data science + computer science'), whereas A maintains domain-specific course searches.",
        "Dataset B emphasizes certification program requirements (e.g., Master's in Data Science prerequisites), while A focuses on certificate availability as a completion feature.",
        "Dataset B tasks validate course quality through detailed review analysis (e.g., AI application course reviews), whereas A uses star ratings/percentage metrics."
      ],
      "nnetnav_live_site=coursera_num_tasks=72_portion=2": [
        "Dataset B includes tasks related to social impact, human rights, and non-profit sectors (e.g., 'finance for social good'), absent in Dataset A.",
        "Dataset B requires navigation for language-specific courses (e.g., German-language photography courses), while Dataset A focuses solely on English content.",
        "Dataset B tasks involve locating courses tied to university admissions processes (e.g., Duke University deadlines), unlike Dataset A's focus on course attributes.",
        "Dataset B emphasizes free course availability with explicit financial aid criteria, whereas Dataset A prioritizes subscription models like Coursera Plus.",
        "Dataset B includes tasks about arts/culture courses (e.g., 'Arts and Culture Strategy'), while Dataset A focuses on technical domains like cybersecurity or data science.",
        "Dataset B tasks demand comparisons of leadership/management courses, unlike Dataset A's structured filtering by skill level or certification.",
        "Dataset B requires validating course content granularity (e.g., 'modules in Introduction to Artificial Intelligence (AI)'), while Dataset A focuses on completion time or credential types.",
        "Dataset B tasks involve psychology/well-being courses (e.g., 'The Science of Well-Being'), absent in Dataset A's technical scope.",
        "Dataset B includes career-specific queries about recommended experience (e.g., 'Data Analyst role prerequisites'), whereas Dataset A emphasizes salary/job openings cross-referencing.",
        "Dataset B tasks prioritize newer AI-integrated professional certificates (e.g., Google Cybersecurity), while Dataset A references foundational tools like Excel or Tableau."
      ],
      "nnetnav_live_site=coursera_num_tasks=72_portion=4": [
        "Dataset B tasks require identifying Generative AI and Prompt Engineering courses, not present in Dataset A",
        "Dataset B tasks involve career path exploration for roles like Full Stack Developer/Product Manager, absent in Dataset A",
        "Dataset B requires navigation through industry-specific analytics (Healthcare/HR analytics), while Dataset A focuses on general career outcomes",
        "Dataset B tasks include comparison of Python courses for business professionals, not seen in Dataset A",
        "Dataset B requires identification of technical tool-focused courses (TensorFlow/Docker/AWS), while Dataset A emphasizes programming languages",
        "Dataset B tasks involve Agile Project Management methodology verification, absent in Dataset A requirements",
        "Dataset B includes queries about Master's degree programs in Data Science/Machine Learning, not present in Dataset A",
        "Dataset B tasks require identification of newer course types (Google Prompting Essentials), not found in Dataset A samples",
        "Dataset B contains tasks about sustainability-focused courses in business contexts, while Dataset A's sustainability tasks are engineering-focused",
        "Dataset B emphasizes foundational AI/ML courses with Python/R implementation details, whereas Dataset A focuses on general AI ethics/application"
      ],
      "nnetnav_live_site=coursera_num_tasks=72_portion=0": [
        "Dataset B tasks require configuring course language preferences, which are not mentioned in Dataset A tasks.",
        "Dataset B tasks involve navigating to and evaluating degree programs, while Dataset A focuses on individual courses/specializations.",
        "Dataset B emphasizes Professional Certificates with integrated AI skills (e.g., Google Certificates), whereas Dataset A includes general certificates.",
        "Tasks in Dataset B require interacting with enrollment processes (e.g., signing up, accessing enrollment pages), absent in Dataset A.",
        "Dataset B tasks require detailed exploration of course modules/syllabi (e.g., module lists), while Dataset A focuses on overviews.",
        "Dataset B includes interdisciplinary course searches (e.g., combining graphic design with culture), unlike Dataset A\u2019s single-topic focus.",
        "Dataset B tasks use advanced filters like course level (e.g., 'Advanced'), beyond Dataset A\u2019s basic duration or rating filters.",
        "Dataset B tasks prominently feature university-specific courses (e.g., Stanford, Yale), whereas Dataset A emphasizes partner institutions more broadly.",
        "Free course filtering is explicitly highlighted in Dataset B tasks, unlike Dataset A\u2019s general price comparisons.",
        "Dataset B tasks require identifying hands-on projects or labs within course requirements, while Dataset A focuses on completion certificates."
      ],
      "nnetnav_live_site=coursera_num_tasks=72_portion=1": [
        "Dataset B tasks require gathering detailed program requirements and admission processes, unlike Dataset A",
        "Dataset B includes tasks focused on exploring career paths related to course subjects, absent in Dataset A",
        "Dataset B tasks involve comparing specialization tracks/degree concentrations, while Dataset A focuses on individual course attributes",
        "Dataset B requires analyzing refund policies and granular course content details (e.g., modules, tools) not present in Dataset A",
        "Dataset B contains tasks related to standardized test preparation (e.g., GRE) not found in Dataset A",
        "Dataset B includes technical tool-specific queries (e.g., EDA tools) absent from Dataset A tasks",
        "Dataset B tasks require investigation of enrollment processes for degree programs, unlike Dataset A",
        "Dataset B emphasizes curriculum structure analysis (course sequences, specializations) rather than single-course metrics",
        "Dataset B contains explicit job-to-course matching requirements not present in Dataset A",
        "Dataset B includes language teaching pedagogy tasks (e.g., English instruction) absent from Dataset A"
      ]
    },
    "arxiv": {
      "nnetnav_live_site=arxiv_num_tasks=80_portion=1": [
        "Dataset B tasks frequently require locating papers by explicit arXiv identifier codes (e.g., 2412.18601) while Dataset A uses title/keyword searches",
        "Dataset B contains tasks requiring analysis of HTML formatting features in papers, absent in Dataset A's format-agnostic requirements",
        "Dataset B includes troubleshooting tasks related to content accessibility (e.g., HTML conversion errors) not present in Dataset A",
        "Dataset B tasks more frequently require navigation through paper structure (e.g., finding specific sections like 'related work') compared to Dataset A's metadata-focused tasks",
        "Dataset B contains requests for copyright/license information retrieval not emphasized in Dataset A tasks",
        "Dataset B tasks include explicit requirements to compare content across academic platforms (e.g., arXiv vs dblp) beyond Dataset A's internal cross-referencing",
        "Dataset B requires handling of mixed-content papers with embedded multimedia (e.g., optical microscopy images) unlike Dataset A's text/formula focus",
        "Dataset B tasks involve temporal specificity challenges with phrases like 'recently published' without date ranges, unlike Dataset A's explicit date filters",
        "Dataset B includes definition-seeking tasks (e.g., 'stablecoin trilemma') requiring contextual understanding beyond paper content extraction",
        "Dataset B tasks require identification of paper versions through submission history patterns rather than explicit version numbers used in Dataset A"
      ],
      "nnetnav_live_site=arxiv_num_tasks=80_portion=4": [
        "Dataset B tasks require accessing technical documentation like submission guidelines and format specifications",
        "Dataset B includes tasks involving error message interpretation during HTML paper viewing",
        "Dataset B contains requests for author reference tracking and citation network exploration",
        "Dataset B tasks demand interaction with supplemental source code repositories",
        "Dataset B requires navigation through quantum-HPC middleware architecture descriptions",
        "Dataset B involves analysis of urban social segregation models in papers",
        "Dataset B tasks focus on specific section extraction (e.g. experimental setups, introductions)",
        "Dataset B includes interdisciplinary research paper discovery across multiple fields",
        "Dataset B contains requests for quantum computing resource management frameworks",
        "Dataset B tasks require handling of specific arXiv identifier-based paper retrieval"
      ],
      "nnetnav_live_site=arxiv_num_tasks=80_portion=0": [
        "Tasks in B require direct access to full-text sections (e.g., Results, Methodology) of papers, while A focuses on metadata extraction from abstracts or summaries",
        "B includes tasks involving arXiv paper IDs (e.g., 2412.18601) for direct retrieval, not present in A's requirements",
        "B contains tasks requiring identification and interpretation of licensing/copyright information for papers, absent in A's requirements",
        "B demands navigation to accessibility-friendly formats (e.g., HTML versions) rather than format-agnostic content analysis seen in A",
        "B requires troubleshooting interface issues (e.g., layout problems, HTML errors) not mentioned in A's task specifications",
        "B includes tasks involving citation analysis (e.g., reference lists, citation counts) rather than pure metadata extraction as in A",
        "B contains explicit requirements for downloading PDF versions of papers, while A focuses on content analysis without format-specific retrieval",
        "B tasks involve author-specific searches (e.g., papers by Ariel Shlosberg) rather than A's institution-based affiliation searches",
        "B requires identification of paper versions (e.g., v3 submissions) through direct version history inspection, unlike A's date-focused version tracking",
        "B includes tasks requiring navigation through legal/usage policies (e.g., license types) not addressed in A's academic-focused tasks"
      ],
      "nnetnav_live_site=arxiv_num_tasks=80_portion=2": [
        "Dataset B tasks focus on retrieving entire papers or specific sections rather than extracting metadata components like version history or formulas",
        "Dataset B includes tasks requiring navigation to methodology, conclusions, or background sections not emphasized in Dataset A",
        "Dataset B contains tasks with vague search targets (e.g. 'groundbreaking discoveries') unlike Dataset A's specific paper titles/categories",
        "Dataset B tasks frequently require PDF downloading/reading while Dataset A emphasizes HTML view content extraction",
        "Dataset B includes broken/incoherent task descriptions not present in Dataset A samples",
        "Dataset B tasks require verification of paper existence through direct search rather than category classification",
        "Dataset B contains requests for source code availability checks absent in Dataset A tasks",
        "Dataset B tasks focus on reference tracking within papers (e.g. finding 10th reference) not seen in Dataset A",
        "Dataset B includes cross-repository navigation tasks (e.g. finding code repositories) beyond arXiv's scope",
        "Dataset B tasks require understanding paper content structure rather than quantitative analysis of components"
      ],
      "nnetnav_live_site=arxiv_num_tasks=80_portion=3": [
        "Dataset B tasks frequently require locating papers by exact title match, while Dataset A emphasizes keyword-based search within specific fields",
        "Dataset B includes tasks involving arXiv submission process troubleshooting not present in Dataset A",
        "Dataset B contains requests for technical document elements like LaTeX tables/source code that Dataset A lacks",
        "Dataset B tasks more commonly involve author name searches and citation tracking compared to Dataset A",
        "Dataset A requires more granular date filtering (e.g. 'within last week') while Dataset B uses broader time ranges",
        "Dataset B includes format conversion requests (e.g. TeX\u2192PDF) not found in Dataset A's HTML/PDF distinction tasks",
        "Dataset A tasks focus on arXiv-specific metadata (version history, category abbreviations) while B includes external references (NASA ADS)",
        "Dataset B contains more open-ended exploration tasks ('find information about...') compared to Dataset A's structured queries",
        "Dataset A requires interpretation of quantitative research elements (formulas, figures count) absent in B's tasks",
        "Dataset B includes multi-step investigative tasks (author contributions analysis) while A focuses on discrete information retrieval"
      ]
    },
    "bbc": {
      "nnetnav_live_site=bbc_num_tasks=69_portion=2": [
        "Tasks in dataset B require locating future-dated content (e.g., 'January 2025 fixtures') unlike time-bound recency checks in A",
        "Dataset B tasks involve identifying cultural/artistic trends (e.g., book releases, dance styles) beyond news recency",
        "B includes explicit requirements to verify geopolitical accusations (e.g., 'Russia's involvement in crash') not present in A",
        "Tasks in B demand navigation through specialized sports coverage sections (e.g., cricket highlights, league tables)",
        "Dataset B requires interaction with user-generated content features (e.g., bookmarking South African Braai info)",
        "B tasks involve cross-referencing multimedia formats simultaneously (e.g., watching videos while reading articles)",
        "Dataset B contains tasks requiring identification of legislative changes (e.g., Iran's dress code laws)",
        "B includes explicit requirements to track ongoing conflict developments (e.g., Hamas hostage updates)",
        "Tasks in B demand understanding of academic/career development content (e.g., adult education closures)",
        "Dataset B requires navigation through localized weather forecasts (e.g., Rio de Janeiro) rather than general reports"
      ],
      "nnetnav_live_site=bbc_num_tasks=69_portion=3": [
        "Tasks in B require locating content without explicit time constraints (e.g., 'courses', 'hotels') rather than strictly recent articles",
        "B includes queries about static informational resources (e.g., education courses, hotel details) rather than time-sensitive news content",
        "Navigation in B requires finding permanent website sections (e.g., weather forecasts, sports schedules) rather than dynamic news categories",
        "Tasks in B involve locating specific entity information (e.g., Robbie Williams, Boeing crisis) rather than thematic news summaries",
        "B requires exploration of website structure (e.g., 'familiarize with layout') rather than content hierarchy navigation",
        "Queries in B target practical information (e.g., weather forecasts, stock prices) rather than news analysis",
        "Tasks in B include multimedia consumption (e.g., 'watch some videos') without requiring content summarization",
        "B contains requests for educational/professional resources (e.g., university courses) not present in news-focused A",
        "Navigation in B involves persistent content sections (e.g., sports fixtures, climate change info) rather than transient news blocks",
        "Tasks in B require open-ended exploration (e.g., 'discover something interesting') rather than targeted news retrieval"
      ],
      "nnetnav_live_site=bbc_num_tasks=69_portion=1": [
        "Dataset A tasks require locating content within explicitly named categories (e.g., Sport, Business), while Dataset B tasks often involve implied category navigation through broader topic searches",
        "Dataset A emphasizes time-sensitive filtering with strict temporal parameters ('today', 'last two days'), whereas Dataset B includes more flexible temporal ranges including historical events",
        "Dataset B contains tasks requiring navigation beyond news content (e.g., hotel bookings, book recommendations), while Dataset A focuses exclusively on news-related content retrieval",
        "Dataset A tasks frequently demand summarization of article bodies, while Dataset B includes more direct information extraction without summarization requirements",
        "Dataset B features tasks requiring cross-platform content integration (e.g., social media posts, external bookings), whereas Dataset A remains contained within BBC News domains",
        "Dataset A emphasizes geographic news categorization, while Dataset B includes more thematic categorization (e.g., 'AI', 'environmental crises')",
        "Dataset B contains tasks requiring understanding of specialized content formats (e.g., podcasts, video reports) as primary objectives, while Dataset A treats multimedia as secondary elements",
        "Dataset A tasks specify section-based navigation (e.g., 'Weather section'), while Dataset B tasks often require inferring content locations through keyword searches",
        "Dataset B includes functional tasks beyond information retrieval (e.g., checking weather forecasts, finding booking options), which are absent in Dataset A",
        "Dataset A focuses on current event tracking, while Dataset B incorporates both real-time updates and evergreen content exploration"
      ],
      "nnetnav_live_site=bbc_num_tasks=69_portion=0": [
        "Tasks in Dataset A require locating content within specific, predefined categories (e.g., 'Weather', 'Technology'), while Dataset B tasks involve broader exploratory navigation without explicit category constraints.",
        "Dataset A tasks emphasize precise time-bound filtering (e.g., 'within the last two days'), whereas Dataset B tasks lack strict recency requirements and focus more on open-ended research.",
        "Dataset A includes explicit references to multimedia format requirements (e.g., 'podcast episode', 'picture in Travel section'), while Dataset B tasks reference multimedia more generally (e.g., 'videos on AI').",
        "Tasks in Dataset A frequently specify exact article titles or sections (e.g., 'The SpeciaList section'), while Dataset B tasks use vague descriptors like 'recent news' or 'latest information'.",
        "Dataset B includes tasks requiring academic/technical research (e.g., university course details, graphene technology), which are absent in Dataset A's news-focused tasks.",
        "Dataset A tasks require identification of specific entities (e.g., named companies in AI headlines), while Dataset B tasks focus on conceptual understanding (e.g., 'environmental impact of flying').",
        "Dataset B contains tasks requiring cross-platform navigation (e.g., 'NASA's website', 'university pages'), unlike Dataset A's BBC-centric requirements.",
        "Dataset A emphasizes regional subsection navigation (e.g., 'Asia', 'Middle East'), while Dataset B tasks reference geopolitical contexts without explicit regional filtering.",
        "Dataset B includes practical action-oriented tasks (e.g., charity donations, course enrollment research) absent in Dataset A's informational focus.",
        "Dataset A tasks require verification through timestamp analysis, while Dataset B tasks prioritize content synthesis across multiple sources (e.g., 'business decisions impacting Formula 1')."
      ],
      "nnetnav_live_site=bbc_num_tasks=69_portion=4": [
        "Dataset B tasks include explicit requests for localized/regional event coverage (e.g., New Orleans attacks, London celebrations) while A focuses on global sections",
        "B contains tasks requiring navigation of educational/occupational content (e.g., university courses) absent in A's scope",
        "B tasks involve future-dated event lookup (e.g., Jan 1 2025 cricket matches) vs A's focus on recent/past content",
        "B includes direct requests for multimedia exploration (\"watch nature walks\") while A emphasizes podcast/live format retrieval",
        "B tasks require navigating policy impact analyses (e.g., UK tax changes) through complex governance sections",
        "B contains exploratory browsing patterns (\"explore innovation page\") vs A's targeted hierarchical navigation",
        "B tasks demand cross-referencing cultural trends (e.g., 1970s dance styles) across multiple content types",
        "B includes requests for biographical/entertainment content (e.g., actor roles) requiring celebrity metadata parsing",
        "B tasks require identification of scientific concepts (e.g., Jupiter's magnetism) through specialized science sections",
        "B contains tourism/hospitality research tasks (e.g., 9h hotel features) needing service-oriented content navigation"
      ]
    },
    "amazon": {
      "nnetnav_live_site=amazon_num_tasks=63_portion=2": [
        "Dataset B tasks focus on general product categories without specific technical attributes (e.g., 'gift for a female' vs A's '1TB disk size')",
        "Dataset B includes tasks targeting pre-owned/refurbished items (e.g., 'pre-loved Louis Vuitton') not present in A",
        "Dataset B contains more vague budget constraints ('cheapest shampoo' vs A's 'priced between $50-$100')",
        "Dataset B tasks emphasize gift-finding without specific requirements vs A's deal-focused item searches",
        "Dataset B includes product protection plan purchases not mentioned in A",
        "Dataset B tasks reference luxury brand items (Louis Vuitton) not emphasized in A",
        "Dataset B contains more general exploration tasks ('Explore gourmet food') vs A's structured filtering",
        "Dataset B includes multi-product purchases ('multiple Harry Potter books') without specific version details",
        "Dataset B tasks mention brand names as primary criteria (SanDisk, Homtiem) more frequently than A",
        "Dataset B contains seasonal category browsing ('Winter Sale toys') without price/feature filters present in A"
      ],
      "nnetnav_live_site=amazon_num_tasks=63_portion=3": [
        "Dataset B tasks focus on purchasing actions without requiring variant selection (e.g. 'Add to cart' without size/color specification)",
        "Dataset B contains tasks involving Amazon services like Prime Video rentals and Kindle Unlimited subscriptions",
        "Dataset B includes tasks related to Amazon Fresh grocery items and perishable goods",
        "Dataset B features tasks requiring interaction with Amazon's luxury stores and high-end fashion brands",
        "Dataset B contains seasonal/event-based tasks (e.g. Halloween decorations, Christmas gifts)",
        "Dataset B includes tasks focused on account creation and Prime membership benefits",
        "Dataset B tasks frequently involve gift-related purchases without specific product requirements",
        "Dataset B contains tasks related to subscription services (Subscribe & Save) and recurring purchases",
        "Dataset B features tasks requiring interaction with Amazon's pre-owned/renewed products marketplace",
        "Dataset B includes tasks focused on comparing prices within product categories rather than technical specifications"
      ],
      "nnetnav_live_site=amazon_num_tasks=63_portion=1": [
        "Tasks in B emphasize lifestyle/event-specific products (e.g. party shoes, holiday gifts) rather than technical specifications",
        "B requires navigation through specialized verticals (Luxury Stores, Shopbop) not present in A",
        "B contains tasks involving Amazon Fresh grocery navigation with CAPTCHA challenges",
        "B includes explicit requirements for eco-friendly/sustainable product attributes",
        "B features time-sensitive seasonal constraints (Winter Sale, NFL Wild Card event)",
        "B requires handling pre-owned/pre-loved product conditions and resale options",
        "B emphasizes gift-oriented tasks with specific occasion requirements",
        "B contains queries for subscription-based digital content (Kindle Unlimited, Prime Video)",
        "B includes health/wellness budget constraints (under $10) as key filters",
        "B requires navigation through curated collections (Customer-Loved, Top 100) rather than raw search results"
      ],
      "nnetnav_live_site=amazon_num_tasks=63_portion=0": [
        "Dataset A tasks require multi-step filtering with exact numerical criteria (e.g., 30-inch length, 10x zoom)",
        "Dataset B tasks emphasize general product discovery (e.g., 'eco-friendly kitchen products', 'gift ideas') without granular specifications",
        "Dataset A tasks explicitly require sorting mechanisms (e.g., 'sort by Best Sellers', 'newest arrivals')",
        "Dataset B tasks focus on identifying discounts/sales events (e.g., 'Winter Sale deals', 'pre-owned devices')",
        "Dataset A requires verification of real-time stock status (e.g., color/size availability checks)",
        "Dataset B includes open-ended category exploration (e.g., 'office supplies', 'women's fashion items')",
        "Dataset A tasks demand direct price comparisons across sellers/items",
        "Dataset B contains product customization tasks (e.g., 'customize AUTOMET Shacket')",
        "Dataset A specifies strict customer review thresholds (e.g., '20,000+ reviews', '4+ stars')",
        "Dataset B features thematic shopping objectives (e.g., 'eco-friendly', 'gift') rather than technical specifications"
      ],
      "nnetnav_live_site=amazon_num_tasks=63_portion=4": [
        "Tasks in dataset A require exact numerical values for product attributes (e.g., 1TB disk, 30-inch length), while B uses qualitative descriptors (e.g., 'eco-friendly', 'luxury')",
        "Dataset A tasks demand validation of quantitative review thresholds (e.g., '20,000+ reviews'), whereas B only references general ratings ('highly-rated')",
        "Dataset A consistently requires price range filtering with dollar boundaries ($50-$100), while B uses vague cost terms ('low-cost', 'cheapest')",
        "Dataset A tasks specify technical compatibility requirements (e.g., MacBook Pro compatibility), absent in B's tasks",
        "Dataset B includes gift-oriented tasks (e.g., graduation/Christmas gifts) not present in A",
        "Dataset A requires temporal filters (e.g., 'published in 2024'), while B lacks date-sensitive constraints",
        "Dataset B contains meta-navigation tasks (e.g., testing CAPTCHA) not found in A's product-focused tasks",
        "Dataset A tasks mandate specific sorting methods (e.g., 'sort by Best Sellers'), while B uses undefined sorting ('most popular')",
        "Dataset A requires multi-attribute conjunction (e.g., 'stainless steel AND double bowls'), whereas B uses single-criterion searches",
        "Dataset B includes gift card purchase tasks absent in A's physical product requirements"
      ]
    },
    "wolframalpha": {
      "nnetnav_live_site=wolframalpha_num_tasks=66_portion=4": [
        "Tasks in dataset B include requests for definitions or explanations of general concepts (e.g., paradoxes, logic, triangulate).",
        "Dataset B contains tasks involving health-related calculations (e.g., BMI, metabolic properties) not present in dataset A.",
        "Tasks in dataset B require historical or biographical data retrieval (e.g., Industrial Revolution figures).",
        "Dataset B includes comparative queries (e.g., foreign debt between countries) absent in dataset A.",
        "Tasks in dataset B involve word definitions or linguistic analysis (e.g., \"termination\", \"hello\").",
        "Dataset B features real-world event queries (e.g., COVID-19, solar eclipses) with dynamic temporal relevance.",
        "Tasks in dataset B explicitly explore Wolfram Alpha's internal capabilities or terminology (e.g., AceFEM, LanguageCategory).",
        "Dataset B includes exploratory research tasks (e.g., \"learn about paradoxes\") rather than direct computational outputs.",
        "Tasks in dataset B request step-by-step solutions for elementary problems (e.g., 3\u00d75) unlike dataset A's complex computations.",
        "Dataset B incorporates cultural or mythological inquiries (e.g., native names of Ganesha) absent in dataset A."
      ],
      "nnetnav_live_site=wolframalpha_num_tasks=66_portion=0": [
        "Dataset B tasks frequently require integration of humanities and social sciences knowledge (e.g., etymology, cultural history), absent in Dataset A.",
        "Tasks in Dataset B often demand conceptual explanations (e.g., definitions, hypotheses) alongside computational results, while Dataset A focuses purely on numerical or formulaic outputs.",
        "Dataset B includes exploratory tasks involving cultural or historical figure research (e.g., King Charles III, Ganesha), which are not present in Dataset A.",
        "Tasks in Dataset B involve linguistic analysis (e.g., word etymology) and philosophical/logical concepts (e.g., Liar Paradox), unlike Dataset A's STEM-centric focus.",
        "Dataset B tasks require retrieving multi-domain educational content (e.g., Riemann Hypothesis explanations), while Dataset A emphasizes direct problem-solving without extended context.",
        "Dataset B includes tasks to investigate system functionalities (e.g., Wolfram Language capabilities), whereas Dataset A focuses on predefined computational workflows.",
        "Tasks in Dataset B often involve comprehensive data trend analysis (e.g., COVID-19 case trends) rather than isolated data points as in Dataset A.",
        "Dataset B tasks integrate broader real-world applications (e.g., cultural, historical, linguistic contexts) beyond Dataset A's technical domains like physics or chemistry.",
        "Dataset B tasks may require generating summaries or contextual reports (e.g., climate models with anomalies), while Dataset A prioritizes discrete computational outputs.",
        "Dataset B includes tasks requiring retrieval of biographical, geographical, or cultural data (e.g., mountain elevations, political figures) absent in Dataset A."
      ],
      "nnetnav_live_site=wolframalpha_num_tasks=66_portion=1": [
        "Dataset B tasks include requests for data over extended time periods (e.g., 'monthly temperature anomaly data for the last 10 years').",
        "Dataset B tasks involve linguistic or etymological exploration (e.g., 'etymology of the word \"love\"').",
        "Dataset B tasks focus on exploratory learning of platform features (e.g., 'get familiar with the content and examples provided by Wolfram Alpha').",
        "Dataset B tasks include queries for future event timings (e.g., 'sunrise and sunset time on December 22, 2024').",
        "Dataset B tasks require cross-disciplinary conceptual research (e.g., 'relationship between Fibonacci sequence and 3x+1 problem').",
        "Dataset B tasks involve explicit data export formatting (e.g., 'save molecular structure in XLS format').",
        "Dataset B tasks include health or medical information retrieval (e.g., 'colon cancer' or 'daily caloric intake').",
        "Dataset B tasks emphasize step-by-step arithmetic process explanations (e.g., 'multiplying two single-digit numbers using elementary arithmetic').",
        "Dataset B tasks explore philosophical/mathematical paradoxes (e.g., 'Liar Paradox', 'Russell's paradox').",
        "Dataset B tasks involve text/string manipulation queries (e.g., 'work with text and strings in the Wolfram Language')."
      ],
      "nnetnav_live_site=wolframalpha_num_tasks=66_portion=3": [
        "Tasks in dataset B require accessing Wolfram Alpha's internal documentation or function-specific help resources (e.g., ProductLog function, generating functions).",
        "Dataset B includes explicit queries about Wolfram Alpha's platform features, pricing, or subscription plans (e.g., Pro features, educational tools).",
        "Tasks in dataset B involve retrieving conceptual definitions or theoretical explanations (e.g., paradoxes, theorem definitions, etymologies).",
        "Dataset B contains tasks focused on multi-step financial planning or real-world scenario modeling (e.g., annuities, weight loss plans, investment growth).",
        "Dataset B requires downloading or saving computational results (e.g., chemical properties, TeX-formatted solutions).",
        "Tasks in dataset B include humanities-focused queries (e.g., historical events, linguistic etymology, philosophical concepts).",
        "Dataset B emphasizes platform exploration (e.g., 'Explore Wolfram Alpha's resources,' 'Learn about features').",
        "Dataset B includes explicit references to Wolfram-specific terminology or datasets (e.g., sequence A000108, ProductLog function).",
        "Tasks in dataset B involve verifying platform permissions or usage guidelines (e.g., dissemination rules for solutions).",
        "Dataset B integrates cross-disciplinary research (e.g., combining physics concepts with biographical data on Einstein)."
      ],
      "nnetnav_live_site=wolframalpha_num_tasks=66_portion=2": [
        "Dataset B tasks focus on basic arithmetic and algebraic equations more frequently than Dataset A",
        "Dataset B includes explicit requests for step-by-step explanations of solutions, while Dataset A assumes procedural knowledge",
        "Dataset B contains more exploratory/open-ended queries about platform capabilities (e.g., 'Explore features') compared to Dataset A's focused computational tasks",
        "Dataset B shows higher frequency of chemical element property queries compared to Dataset A's material science focus",
        "Dataset B includes direct requests for data format conversions (TeX, image downloads) not present in Dataset A",
        "Dataset B contains more user-specific personal data calculations (BMI, mortgage costs) compared to Dataset A's general scientific calculations",
        "Dataset B tasks frequently request fundamental mathematical constants/values (pi, e) to high precision, unlike Dataset A",
        "Dataset B includes basic nutritional/food science queries absent from Dataset A's technical domains",
        "Dataset B shows more requests for astronomical/calendar events (eclipses, moon phases) compared to Dataset A's physics/engineering focus",
        "Dataset B contains explicit learning-oriented tasks (e.g., 'Learn about Liar's Paradox') not present in Dataset A's problem-solving focus"
      ]
    },
    "allrecipes": {
      "nnetnav_live_site=allrecipes_num_tasks=79_portion=0": [
        "Dataset B tasks focus on utilizing leftover ingredients in recipes, unlike Dataset A which emphasizes specific nutritional criteria.",
        "Dataset B includes tasks requiring recipes for specific named dishes (e.g., 'Juicy Roasted Chicken'), while Dataset A prioritizes attribute-based searches.",
        "Dataset B emphasizes holiday/event-specific recipes (e.g., Christmas desserts) more granularly than Dataset A's general seasonal filtering.",
        "Dataset B tasks involve active community engagement like leaving recipe reviews, whereas Dataset A focuses on passive review consumption.",
        "Dataset B requires recipe adaptation for meal prep/storage (e.g., 'tips for storing leftovers'), unlike Dataset A's immediate-use focus.",
        "Dataset B includes subjective search parameters like 'kid-friendly' or 'crowd-pleasing' without quantifiable metrics used in Dataset A.",
        "Dataset B tasks involve printing recipes as an explicit output requirement, absent in Dataset A's scope.",
        "Dataset B emphasizes ingredient repurposing (e.g., leftover cranberry sauce) rather than Dataset A's ingredient-driven searches for specific dishes.",
        "Dataset B contains multi-dietary combination queries (e.g., 'gluten-free cheesecake suitable for keto'), while Dataset A handles single dietary filters.",
        "Dataset B includes recipe similarity searches (e.g., 'recipes similar to Italian Ricotta Cookies') requiring relational understanding beyond Dataset A's categorical navigation."
      ],
      "nnetnav_live_site=allrecipes_num_tasks=79_portion=4": [
        "Dataset B tasks include meal planning for entire events or holidays (e.g., vegan holiday party menu), while Dataset A focuses on individual recipe searches.",
        "Dataset B requires finding ingredient substitutions (e.g., evaporated milk alternatives), which is absent in Dataset A tasks.",
        "Dataset B explicitly involves post-interaction actions like printing recipes or leaving user reviews, whereas Dataset A emphasizes information retrieval without explicit output actions.",
        "Dataset B contains broader exploratory tasks (e.g., 'research vegetarian recipes'), while Dataset A uses specific quantitative filters for all searches.",
        "Dataset B includes holiday-themed meal planning beyond specific recipes (e.g., hot chocolate bar ideas), while Dataset A focuses on occasion-specific recipe attributes.",
        "Dataset B tasks require nutritional value comparisons between multiple recipes, whereas Dataset A only filters by nutritional criteria within single recipes.",
        "Dataset B emphasizes kid-friendly recipe requirements across multiple tasks, while Dataset A doesn't specifically target age-related dietary needs.",
        "Dataset B includes utilization of leftovers as a core task constraint (e.g., leftover ham recipes), which is absent in Dataset A.",
        "Dataset B tasks involve physical recipe organization (e.g., saving/printing for meal prep), while Dataset A focuses on digital information gathering.",
        "Dataset B contains requests for non-recipe content (e.g., kitchen appliances for gatherings), while Dataset A strictly focuses on recipe attributes."
      ],
      "nnetnav_live_site=allrecipes_num_tasks=79_portion=1": [
        "Dataset B includes tasks focused on holiday-specific recipes (e.g., Christmas, Hanukkah, New Year\u2019s appetizers) not explicitly emphasized in A.",
        "Tasks in B require comparing recipes or nutritional information (e.g., clam chowder comparisons), while A focuses on individual recipe criteria.",
        "Dataset B emphasizes user-generated actions like saving, reviewing, or modifying recipes (e.g., leaving reviews, substituting ingredients).",
        "B includes tasks for kid-friendly recipes and family meals (e.g., granola bars, family dinners), which are less prominent in A.",
        "Dataset B tasks involve broader meal types (appetizers, cocktails, desserts) beyond primary dishes, unlike A\u2019s focus on main courses.",
        "B requires finding recipes based on leftovers or specific ingredient utilization (e.g., leftover potatoes), whereas A emphasizes dietary filters.",
        "Tasks in B prioritize festive and decorative dishes (e.g., Yule log cakes, gingerbread houses) not highlighted in A.",
        "Dataset B includes queries for seasonal beverages (e.g., mulled wine, holiday cocktails) absent in A\u2019s task samples.",
        "B tasks focus on recipe variations for holidays (e.g., Christmas fudge, eggnog desserts), while A emphasizes nutritional/caloric constraints.",
        "Dataset B involves exploring themed recipe collections (e.g., \"Ultimate Holiday Cookie Guide\"), whereas A targets specific ingredient or dietary searches."
      ],
      "nnetnav_live_site=allrecipes_num_tasks=79_portion=3": [
        "Tasks in dataset B require users to leave reviews or interact with community-driven content (e.g., rating recipes, asking for substitutions), while dataset A focuses on extracting existing reviews and ratings.",
        "Dataset B includes tasks related to finding recipes based on leftover ingredients (e.g., turkey, chicken), which are not present in dataset A.",
        "Tasks in dataset B involve searching for recipes tied to specific meal-planning durations (e.g., 7-day meal plans), whereas dataset A emphasizes immediate recipe metadata extraction.",
        "Dataset B tasks frequently require identifying recipes for niche events like Halloween snacks or Thanksgiving appetizers, while dataset A focuses on broader seasonal occasions (e.g., Easter, Ramadan).",
        "Dataset B contains tasks requesting price checks for kitchen appliances (e.g., KitchenAid mixers), absent in dataset A\u2019s navigation goals.",
        "Tasks in dataset B often lack explicit numerical constraints (e.g., \"best-rated\" without specifying star thresholds), unlike dataset A\u2019s strict requirements (e.g., 4.5 stars, 200+ reviews).",
        "Dataset B includes tasks for saving/printing recipes as a primary action, whereas dataset A emphasizes parsing and listing recipe details.",
        "Dataset B tasks involve broader dietary categories (e.g., \"low-carb\") without always specifying nutritional constraints, unlike dataset A\u2019s explicit calorie limits or ingredient counts.",
        "Tasks in dataset B require exploring alternative recipe versions (e.g., vegetarian spinach dip) or substitutions, which are not emphasized in dataset A.",
        "Dataset B tasks focus on holiday party planning and multi-recipe collection (e.g., appetizers, desserts), while dataset A prioritizes single-recipe precision."
      ],
      "nnetnav_live_site=allrecipes_num_tasks=79_portion=2": [
        "Dataset B tasks frequently involve user interaction features like leaving reviews or community questions, whereas Dataset A focuses on information retrieval without user contributions.",
        "Dataset B includes tasks related to repurposing leftovers and ingredient reuse, which are not present in Dataset A tasks.",
        "Dataset B tasks often specify budget-friendly or cost-effective meal solutions, unlike Dataset A's emphasis on ratings and prep time constraints.",
        "Dataset B contains more diverse seasonal/holiday tasks (e.g., Halloween, Valentine's Day) compared to Dataset A's focus on major holidays like Easter/Christmas.",
        "Dataset B tasks explicitly request nutrition information and healthy eating options, while Dataset A focuses on dietary preferences without nutritional details.",
        "Dataset B includes appetizer/snack-focused tasks (e.g., kid-friendly snacks) whereas Dataset A tasks are primarily main dish-oriented.",
        "Dataset B tasks require exploration of presentation/garnish ideas, unlike Dataset A's strict recipe parameter filtering.",
        "Dataset B emphasizes cooking methods/appliances (slow cooker, air fryer) as primary criteria, while Dataset A prioritizes review metrics and preparation time.",
        "Dataset B tasks involve multi-step actions like saving recipes + exploring related content, whereas Dataset A focuses on single-recipe retrieval.",
        "Dataset B includes broader cultural cuisine exploration tasks (e.g., 'international cuisines'), while Dataset A specifies narrower regional filters like Mediterranean-style."
      ]
    },
    "dictionary.cambridge": {
      "nnetnav_live_site=dictionary.cambridge_num_tasks=54_portion=2": [
        "Tasks in dataset B require providing syllable count information for words, which is not present in dataset A.",
        "Dataset B includes tasks that involve exploring word etymology, absent in dataset A.",
        "Tasks in dataset B focus on business-specific vocabulary and terminology, not emphasized in dataset A.",
        "Dataset B tasks require identifying antonyms of words, while dataset A focuses solely on synonyms.",
        "Translation tasks in dataset B specifically target English\u2013Spanish, whereas dataset A includes languages like French and Chinese.",
        "Dataset B tasks involve providing feedback on example sentences, a feature not required in dataset A.",
        "Tasks in dataset B require comparing related concepts (e.g., health terms), whereas dataset A focuses on individual word exploration.",
        "Dataset B includes tasks about linguistic terms (e.g., phonetics, syllabification) not explicitly required in dataset A.",
        "Tasks in dataset B explore grammatical forms of adjectives (e.g., comparative/superlative), while dataset A focuses on modal verbs and passive voice.",
        "Dataset B tasks require investigating relationships between words (e.g., 'hello' and 'meeting'), absent in dataset A."
      ],
      "nnetnav_live_site=dictionary.cambridge_num_tasks=54_portion=3": [
        "Tasks in dataset A require explicit interaction with pronunciation guides (e.g., IPA notation) for both UK/US English variants, while dataset B focuses on basic pronunciation lookups without IPA specificity.",
        "Dataset A tasks involve structured game-based activities (e.g., Word Scramble quizzes) in the Plus section, whereas dataset B lacks explicit gamified task requirements.",
        "Tasks in dataset A specify multi-language translation comparisons (e.g., Chinese/French), while dataset B focuses on single-language translations (e.g., English-Spanish).",
        "Dataset A tasks require navigation to time-bound content (e.g., March 2025 Word of the Day), while dataset B uses generic/static examples (e.g., December 2024 blog posts).",
        "Tasks in dataset A demand granular grammatical rule retrieval (e.g., modal verbs, comparative adjectives), while dataset B asks for broad grammar exploration without specific structural requirements.",
        "Dataset A includes tasks requiring synonym/antonym identification through dedicated Thesaurus use, while dataset B uses simpler \"find synonyms\" instructions without tool specificity.",
        "Tasks in dataset A require explicit handling of regional language variations (e.g., UK vs US comparisons), while dataset B focuses on universal English definitions.",
        "Dataset A tasks involve score-based outcomes from Plus section activities, while dataset B lacks performance metric requirements in its tasks.",
        "Tasks in dataset B emphasize basic word/phrase definitions (e.g., \"define environment\") without contextual usage requirements present in dataset A's example sentence retrieval.",
        "Dataset B includes conceptual exploration tasks (e.g., \"relationship between market research and advertising\") absent from dataset A's structured information retrieval focus."
      ],
      "nnetnav_live_site=dictionary.cambridge_num_tasks=54_portion=1": [
        "Tasks in dataset B focus on basic word definitions without requiring exploration of specialized sections like quizzes or games.",
        "Dataset B tasks do not involve interactive elements such as games (e.g., Word Scramble) present in dataset A.",
        "Tasks in dataset B emphasize direct translation lookups without multi-language comparison requirements.",
        "Dataset B lacks tasks requiring navigation to blog posts or new vocabulary features for learning.",
        "Dataset B tasks do not mention retrieving example sentences for specific words as a standalone requirement.",
        "Tasks in dataset B prioritize simple grammar explanations without multi-step rule application or usage comparisons.",
        "Dataset B does not include tasks requiring synonym retrieval via the Thesaurus as a distinct action.",
        "Tasks in dataset B focus on single-step navigation (e.g., direct searches) rather than cross-section exploration.",
        "Dataset B tasks omit explicit requirements to compare UK/US phonetic notations (IPA) for pronunciations.",
        "Dataset B tasks lack references to user engagement features like quizzes, scores, or Plus section activities."
      ],
      "nnetnav_live_site=dictionary.cambridge_num_tasks=54_portion=4": [
        "Dataset A tasks require specifying International Phonetic Alphabet (IPA) notation for pronunciations, while Dataset B does not",
        "Dataset A includes explicit navigation to 'Plus section' for quizzes/games, while Dataset B tasks omit section references for activities",
        "Dataset A requires translation tasks focusing on Chinese and French languages, while Dataset B emphasizes Spanish translations",
        "Dataset A tasks involve counting word meanings/definitions (e.g. 'how many meanings of unblemished'), absent in Dataset B",
        "Dataset B contains financial terminology lookup tasks (e.g. 'reinvest', 'investment') not present in Dataset A",
        "Dataset A includes Word Scramble game with definition-based challenges, while Dataset B features an animal-themed quiz",
        "Dataset B requires sharing specific word definitions via Twitter, while Dataset A's social tasks share example sentences/entries",
        "Dataset A grammar tasks target concrete usage rules (e.g. passive voice, comparatives), while Dataset B explores broader grammatical categories",
        "Dataset A quizzes specify 'easy' difficulty levels and final score reporting, while Dataset B omits difficulty/score parameters",
        "Dataset B includes multi-word translation tasks (e.g. 'apple and Friday'), while Dataset A focuses on single-word translations"
      ],
      "nnetnav_live_site=dictionary.cambridge_num_tasks=54_portion=0": [
        "Tasks in B require navigating to collocation sections or topic-specific vocabulary links (e.g., business terms), while A focuses on general vocabulary lookup.",
        "B tasks involve detailed grammatical subcategories (e.g., adverb phrases, prefixes) rather than A's broader grammar topics like tenses or passive voice.",
        "B includes translation tasks for multi-word phrases (e.g., 'break a leg') instead of A's focus on single-word translations.",
        "B tasks require interaction with the SMART Vocabulary cloud for thematic word relationships, absent in A's structured synonym/antonym searches.",
        "B tasks demand analysis of phonetic symbols/pronunciation guides beyond A's basic UK/US variant comparisons.",
        "B requires distinguishing between parts of speech categories (e.g., adjective vs. adverb phrases) not explicitly covered in A's grammar tasks.",
        "B tasks involve exploration of annual Word of the Year features, while A only references general blog/Word-of-the-Day content.",
        "B includes multilingual definition lookups (e.g., Chinese definitions of English words) rather than A's direct translation pairs.",
        "B tasks require navigation through help/documentation sections to understand features, unlike A's direct interface interactions.",
        "B tasks involve collocation identification (e.g., 'foreign business vocabulary') as a core requirement, while A focuses strictly on synonyms/antonyms."
      ]
    },
    "apple": {
      "nnetnav_live_site=apple_num_tasks=70_portion=1": [
        "Tasks in B require product customization (e.g., Apple Watch bands/faces) while A focuses on selecting pre-configured color/material options",
        "B necessitates navigating enterprise/business-specific portals (Business Manager, enterprise solutions) absent in A's consumer-focused tasks",
        "B involves detailed environmental impact comparisons across product lines while A only requires locating general initiative descriptions",
        "B requires configuring family sharing settings and group management where A only addresses individual account recovery",
        "B tasks involve app-specific feature exploration (NBA 2K25 versions, Kino camera app) while A focuses on service-tier comparisons",
        "B demands compatibility verification for third-party accessories (Smart Home devices) where A only lists Apple-first ecosystem integrations",
        "B includes case studies/success stories (retailer implementations) while A focuses on technical specification sheets",
        "B requires accessing detailed repair manuals/guides where A only surfaces general repair method descriptions",
        "B tasks involve privacy configuration granularity (data usage controls) while A addresses privacy only at policy level",
        "B contains version-specific app update investigations (Arcade Edition changes) where A focuses on hardware generation comparisons"
      ],
      "nnetnav_live_site=apple_num_tasks=70_portion=4": [
        "Dataset B tasks require deeper navigation into environmental impact reports and sustainability metrics for specific products, while Dataset A focuses on general environmental/compliance information retrieval",
        "Dataset B includes tasks involving multi-step device customization workflows (e.g., storage/color selection with AppleCare+ integration), whereas Dataset A focuses on basic customization option identification",
        "Dataset B contains tasks requiring comparison of trade-in values across multiple device generations simultaneously, while Dataset A focuses on single-device trade-in value checks",
        "Dataset B tasks demand navigation through enterprise/business-specific procurement processes absent in Dataset A's educational/business portal tasks",
        "Dataset B requires accessing detailed technical specifications for camera systems and zoom capabilities, while Dataset A focuses on general feature comparisons",
        "Dataset B includes warranty status verification and repair procedure navigation tasks not present in Dataset A's support section usage",
        "Dataset B tasks involve financial program integration (e.g., upgrade programs with payment plans) unlike Dataset A's basic price verification",
        "Dataset B requires navigation through parental control settings and family sharing management interfaces absent in Dataset A's account tasks",
        "Dataset B contains tasks needing cross-referencing between product specifications and accessory compatibility, while Dataset A focuses on accessory availability checks",
        "Dataset B includes detailed battery optimization guidance retrieval across device categories, whereas Dataset A focuses on single-device battery spec verification"
      ],
      "nnetnav_live_site=apple_num_tasks=70_portion=0": [
        "Tasks in dataset B involve configuring or customizing products for purchase (e.g., selecting color, storage, accessories) during navigation, unlike dataset A.",
        "Dataset B tasks require accessing educational or business-specific sections (e.g., K-12 education discounts, business applications of Apple products), absent in dataset A.",
        "Tasks in dataset B focus on environmental sustainability reports (e.g., product environmental impact, recycled materials) as part of product research.",
        "Dataset B includes navigation to financial or corporate sections (e.g., quarterly earnings reports, investor relations) not present in dataset A.",
        "Tasks in dataset B explicitly require comparing product lines across use-case categories (e.g., business vs. education, healthcare vs. consumer) rather than purely technical specifications.",
        "Dataset B tasks involve researching parental controls or family sharing features, which are not referenced in dataset A.",
        "Tasks in dataset B emphasize purchasing workflows (e.g., adding to cart, exploring checkout steps) alongside informational retrieval, unlike dataset A's focus on pre-purchase research.",
        "Dataset B includes navigation to niche product categories (e.g., Apple Watch Herm\u00e8s, enterprise solutions) not emphasized in dataset A.",
        "Tasks in dataset B require verifying education or bulk purchase discounts (e.g., university pricing) as a core objective, unlike dataset A's general trade-in offers.",
        "Dataset B tasks involve exploring Apple\u2019s corporate initiatives (e.g., environmental commitments, healthcare achievements) beyond product-specific support sections."
      ],
      "nnetnav_live_site=apple_num_tasks=70_portion=2": [
        "Dataset B includes tasks focused on enterprise-level services like AppleCare Help Desk Support and enterprise device management solutions, which are not present in Dataset A.",
        "Dataset B requires navigation to configure specific product features (e.g., display size, storage) during purchasing, while Dataset A focuses on general comparisons.",
        "Dataset B tasks involve accessing financial results or environmental impact reports, absent in Dataset A's scope.",
        "Dataset B emphasizes Family Sharing setup and parental controls (e.g., \"Ask to Buy\"), which are not highlighted in Dataset A.",
        "Dataset B includes troubleshooting for niche hardware issues (e.g., water-damaged keyboards), whereas Dataset A focuses on general password/feature support.",
        "Dataset B tasks require product personalization (e.g., custom Apple Watch designs) beyond basic color/storage choices in Dataset A.",
        "Dataset B explicitly references child purchase restrictions and account management workflows, unlike Dataset A.",
        "Dataset B tasks involve direct business-to-business inquiries (e.g., enterprise plans, bulk purchasing), which Dataset A generalizes as \"enterprise services\".",
        "Dataset B includes granular comparisons of fitness/sleep tracking features between Apple Watch models, whereas Dataset A compares broad specifications.",
        "Dataset B requires locating policy documents (e.g., Business Conduct Policy) through dedicated navigation, while Dataset A focuses on general warranty/privacy policies."
      ],
      "nnetnav_live_site=apple_num_tasks=70_portion=3": [
        "Tasks in dataset B require configuring product options for business-specific purchases (e.g., iPads for business)",
        "Dataset B tasks involve accessing and interpreting app version histories and user reviews",
        "Tasks in B focus on warranty status checks and repair processes for specific device issues (e.g., cracked screens)",
        "Dataset B includes tasks related to healthcare data integration (e.g., Health Records patient enrollment)",
        "B contains tasks about Apple's business solutions and enterprise success stories",
        "Tasks in B require comparing device integration capabilities (e.g., iPhone-Watch connectivity)",
        "Dataset B includes explicit requests for parental control features and Family Sharing limits",
        "B tasks focus on privacy-specific features like Core Spotlight and data handling disclosures",
        "Tasks in B require identifying device-specific charging specifications and battery optimization tips",
        "Dataset B contains requests for accessory compatibility verification with specific new models (e.g., iPhone 16 cases)"
      ]
    },
    "google_search": {
      "nnetnav_live_site=google_search_num_tasks=72_portion=3": [
        "Dataset B tasks more frequently involve instructional steps (e.g., scheduling meetings, downloading software)",
        "Dataset B includes tasks requiring interaction with platform-specific tools (e.g., Google Ads specialist, Chrome download)",
        "Dataset B contains more health/medical advisory requests (e.g., flu complications, diabetes symptoms)",
        "Dataset B emphasizes job market navigation (e.g., remote engineering roles, entry-level positions)",
        "Dataset B features tutorial/content creation tasks (e.g., woodworking projects, YouTube tutorials)",
        "Dataset B includes explicit user preference configuration tasks (e.g., language settings, search settings)",
        "Dataset B tasks more frequently require consultation of authoritative institutions (e.g., Mayo Clinic, PBS Kids)",
        "Dataset B contains more actionable commercial tasks (e.g., product purchases, shipping details)",
        "Dataset B includes educational/parenting guidance tasks (e.g., toddler activities, memory improvement)",
        "Dataset B features more social/event planning tasks (e.g., venue research, party ideas)"
      ],
      "nnetnav_live_site=google_search_num_tasks=72_portion=2": [
        "Dataset B tasks frequently involve recipe exploration and dietary management (e.g., smoothie recipes for health conditions)",
        "Dataset B emphasizes event planning tasks (venue research, amenities checks, ticket purchases) unlike Dataset A",
        "Dataset B contains explicit requests for tutorial/educational content (woodworking, language learning, tech guides)",
        "Dataset B includes health/medical information retrieval (symptom research, treatment options, risk factors)",
        "Dataset B tasks require interacting with user-generated content platforms (recipe rating, course enrollment, social sharing)",
        "Dataset B features product/service comparison tasks (stock prices, tech gadgets, hotel options)",
        "Dataset B includes explicit job search/application tasks (software engineer positions, career platforms)",
        "Dataset B tasks focus on Google product-specific features (Translate history, autocomplete mechanics, security updates)",
        "Dataset B contains procedural how-to queries (Python setup instructions, woodworking projects)",
        "Dataset B emphasizes personal productivity tasks (parenting advice, memory improvement, time management strategies)"
      ],
      "nnetnav_live_site=google_search_num_tasks=72_portion=4": [
        "Dataset A tasks focus on retrieving singular factual answers (e.g. dates/records) while Dataset B emphasizes exploratory understanding of concepts/trends",
        "Dataset B contains tasks requiring engagement with evolving narratives (e.g. climate change impacts, AI ethics) rather than static facts",
        "Dataset A queries target discrete data points while Dataset B often requires synthesizing information from multiple sources/perspectives",
        "Dataset B shows increased focus on forward-looking information (e.g. 2025 movie trailers, 2024 trends) compared to A's historical/recent data focus",
        "Dataset B includes tasks requiring interaction with transactional systems (job applications, ticket purchases) absent in A",
        "Dataset B demonstrates stronger emphasis on academic/professional research processes (paper analysis, program comparisons)",
        "Dataset A tasks prioritize temporal precision (current/real-time data) while B includes timeless conceptual queries (machine learning definitions)",
        "Dataset B contains more open-ended \"how to\" objectives (recipes, tutorials) vs A's specific answer retrieval",
        "Dataset B shows increased need for interpreting technical documentation (SEC filings, research papers) compared to A's media/content parsing",
        "Dataset B tasks frequently involve user-generated content interaction (Wikipedia editing, recipe contributions) absent in A"
      ],
      "nnetnav_live_site=google_search_num_tasks=72_portion=0": [
        "Dataset B tasks involve booking or reserving services (e.g., hotels, venues) directly through platforms, while A focuses purely on information retrieval without transactional actions.",
        "Dataset B includes explicit requests for local/near-me searches (e.g., stores, venues), while A tasks lack geographic proximity requirements.",
        "Dataset B tasks frequently require interacting with commercial platforms (e.g., Google Store, hotel booking sites) for product details/purchases, whereas A relies on informational platforms (e.g., IMDb, sports databases).",
        "Dataset B contains queries about eligibility criteria/health guidelines (e.g., flu vaccines, disease symptoms), which are absent in A\u2019s fact-oriented tasks.",
        "Dataset B emphasizes procedural/how-to exploration (e.g., woodworking tutorials, parenting tips), while A focuses on discrete factual answers (e.g., records, dates).",
        "Dataset B tasks often demand price comparisons or return policy checks (e.g., Pixel phones, benches), whereas A prioritizes non-commercial metrics like ratings or scientific data.",
        "Dataset B includes open-ended research goals (e.g., 'find inspiration,' 'explore trends') requiring broader content synthesis, while A tasks target narrow, verifiable answers.",
        "Dataset B features requests for job listings/academic papers (e.g., AI roles, research), while A focuses on pop culture, sports, and technical trivia.",
        "Dataset B tasks involve multi-step planning (e.g., trip itineraries, event coordination), whereas A tasks are single-step retrievals (e.g., 'find X's bio').",
        "Dataset B includes language translation tasks requiring cross-lingual validation, while A tasks validate information across domains (e.g., movie ratings across platforms)."
      ],
      "nnetnav_live_site=google_search_num_tasks=72_portion=1": [
        "Tasks in dataset B require interaction with dynamic website elements (e.g., modals, subscription prompts) beyond basic search functionality.",
        "Dataset B includes tasks involving purchasing processes (e.g., buying tickets, checking hotel prices) rather than pure information retrieval.",
        "Tasks in B frequently require understanding and applying organizational structures (e.g., program requirements, corporate principles) rather than isolated facts.",
        "Dataset B contains tasks requiring interaction with specialized tools/features (e.g., language translation settings, search history management).",
        "Tasks in B often involve multi-source verification from authoritative platforms (e.g., CDC guidelines, corporate policies) rather than general fact-checking.",
        "Dataset B includes tasks requiring procedural knowledge (e.g., software installation, settings configuration) rather than simple data extraction.",
        "Tasks in B demand comparison of options for decision-making (e.g., hotel prices, stock comparisons) rather than factual comparisons.",
        "Dataset B contains tasks involving temporal event coordination (e.g., venue availability, event planning) rather than static temporal data retrieval.",
        "Tasks in B require interpretation of conceptual information (e.g., AI principles, technical definitions) rather than concrete numerical/statistical data.",
        "Dataset B includes tasks involving troubleshooting/problem-solving (e.g., device issues, feature errors) rather than straightforward information queries."
      ]
    }
  },
  "diffs_real_from_synth": {
    "google_maps": {
      "nnetnav_live_site=google_maps_num_tasks=75_portion=2": [
        "Tasks in dataset B focus on locating specific chain stores or franchises (e.g., Apple Stores, Target) rather than general business categories",
        "Dataset B tasks emphasize counting/quantifying results (e.g., 'how many results', 'list three') more frequently than dataset A",
        "Tasks in dataset B require printing/sharing physical route details more often than dataset A",
        "Dataset B includes explicit requests for parking-related information (availability, garages, lots) not emphasized in dataset A",
        "Tasks in dataset B more frequently use zip codes as primary location filters compared to dataset A's city/landmark focus",
        "Dataset B tasks contain more explicit instructions for post-action map manipulation (e.g., 'view the map to understand')",
        "Tasks in dataset B specify exclusion criteria (e.g., 'not open 24 hours') more systematically than dataset A",
        "Dataset B includes multi-phase logistical planning (e.g., arrival sequences, combined hotel/supermarket queries) not seen in A",
        "Tasks in dataset B request infrastructure-specific details (airport levels, parking lots) rather than service quality attributes",
        "Dataset B emphasizes public transit stop identification (bus stops) as standalone tasks more than dataset A"
      ],
      "nnetnav_live_site=google_maps_num_tasks=75_portion=3": [
        "Dataset B tasks require explicit generation/printing of route details (e.g., 'print route details') while A focuses on directional retrieval without output formatting",
        "Dataset B contains tasks requiring numerical quantification of results (e.g., 'how many results') where A focuses on qualitative existence of results",
        "Dataset B includes specific commercial chain queries (e.g., Apple Stores, Target) while A uses generic category searches (e.g., restaurants, hotels)",
        "Dataset B tasks involve parking infrastructure planning (e.g., 'locate parking area') where A focuses on service accessibility (e.g., wheelchair transport)",
        "Dataset B requires map interface manipulation (e.g., 'view map to understand route') while A focuses on information extraction without UI interaction",
        "Dataset B contains hierarchical data extraction (e.g., 'which level has least proportion') where A maintains flat data retrieval",
        "Dataset B specifies commercial infrastructure requirements (e.g., 'has parking lot') while A emphasizes service parameters (e.g., 'free cancellation')",
        "Dataset B uses exact numerical constraints (e.g., '5 places') where A employs relative quantifiers (e.g., 'highly rated')",
        "Dataset B features inter-city navigation (e.g., Chicago to Los Angeles) while A focuses on intra-city/local navigation",
        "Dataset B includes post-search map actions (e.g., 'generated sharing link') where A focuses on pre-search filtering (e.g., 'open now')"
      ],
      "nnetnav_live_site=google_maps_num_tasks=75_portion=1": [
        "Dataset A tasks frequently involve date-specific availability checks (e.g., hotel availability on January 11th), while Dataset B does not reference date constraints",
        "Dataset A requires reservation-making actions (e.g., booking hotels, restaurant reservations), whereas Dataset B tasks lack transactional requirements",
        "Dataset B tasks explicitly demand numerical outputs (e.g., 'how many results', 'list three', 'walking time'), while Dataset A focuses on qualitative filtering",
        "Dataset B includes map-sharing functionality requests (e.g., generating sharing links), absent in Dataset A tasks",
        "Dataset A emphasizes multi-criteria combinations (e.g., wheelchair accessibility + price + ratings), while Dataset B uses simpler single-criterion filters",
        "Dataset B contains hierarchical information parsing (e.g., airport level analysis), whereas Dataset A focuses on flat attribute verification",
        "Dataset B tasks require UI interaction confirmation (e.g., viewing maps, sharing workflows), unlike Dataset A's pure information retrieval focus",
        "Dataset B includes exclusion filters (e.g., 'not open 24 hours'), while Dataset A uses inclusive filters only",
        "Dataset A features multi-stop accessibility-focused itineraries, while Dataset B prioritizes single-route planning with basic parameters",
        "Dataset B tasks require counting/quantification of results, absent from Dataset A's qualitative prioritization tasks"
      ],
      "nnetnav_live_site=google_maps_num_tasks=75_portion=0": [
        "Dataset B tasks focus on chain/brand-specific business locations (e.g., Apple Stores, Target) while A focuses on generic categories (e.g., hotels, restaurants)",
        "Tasks in B require explicit output actions like printing route details or sharing map links, absent in A",
        "Dataset B uses zip codes as primary location references, whereas A uses city names/landmarks",
        "B includes tasks requiring numerical quantification of results (e.g., 'how many results'), unlike A",
        "B contains intersection-based navigation tasks (e.g., 'main street and Amherst street'), not present in A",
        "Tasks in B involve multi-step logistical coordination (e.g., parking then ride booking), while A focuses on single-service reservations",
        "B requires explicit sorting commands (e.g., 'sort by highest rating'), whereas A uses implicit filtering",
        "Dataset B includes analytical data extraction (e.g., 'level has least proportion in reviews'), absent in A",
        "B features time-sensitive urgency phrases (e.g., 'I will arrive soon'), while A uses predefined future dates",
        "Tasks in B emphasize proximity prioritization (e.g., 'nearest to...'), whereas A uses qualitative filters like ratings/price"
      ],
      "nnetnav_live_site=google_maps_num_tasks=75_portion=4": [
        "Dataset B tasks require explicit printing/sharing of route details (e.g., 'print the route details', 'generated sharing link') while A focuses on consumption of route information",
        "Tasks in B specify exact quantity requirements (e.g., 'list three of them', '5 places', 'how many results') unlike A's open-ended quantity requests",
        "B contains tasks requiring binary verification of facility presence (e.g., 'check if it has parking lot') rather than A's qualitative assessments of amenities",
        "B includes explicit numerical output requirements (e.g., 'walking time', 'which level has least proportion') where A focuses on comparative analysis without numerical precision",
        "Tasks in B reference specific organizational structures (e.g., 'level has least proportion in reviews') absent in A's more general amenity queries",
        "B requires identification of commercial chain locations (e.g., 'Target stores', 'Apple Stores') while A focuses on category-based business searches",
        "Tasks in B specify exclusion criteria (e.g., 'not open 24 hours') through negative filters more explicitly than A's inclusion-focused filtering",
        "B contains explicit geographic hierarchy references (e.g., 'zip code 90028', 'Ypsilanti, MI') where A uses relative location markers ('near me')",
        "Tasks in B demand atomic location verification (e.g., 'nearest to intersection') rather than A's route-based accessibility analysis",
        "B includes explicit transportation node analysis (e.g., 'bus stops', 'airport's information levels') while A focuses on transportation route planning"
      ]
    },
    "github": {
      "nnetnav_live_site=github_num_tasks=71_portion=3": [
        "Dataset B tasks focus on repository discovery through star counts and update dates, while Dataset A emphasizes plan comparisons and security information",
        "Dataset B requires filtering repositories by specific programming languages and recent activity, whereas Dataset A focuses on accessing policy pages and compliance documentation",
        "Dataset B tasks involve identifying trending/open-source projects through multiple search criteria, while Dataset A requires navigation to educational resource sections",
        "Dataset B emphasizes real-time repository metrics analysis (stars, contributors, commit history), whereas Dataset A focuses on static product feature comparisons",
        "Dataset B tasks require temporal filtering (last week/month updates) that aren't present in Dataset A's requirements",
        "Dataset B involves direct repository content analysis (READMEs, wiki pages, release notes) while Dataset A focuses on marketing/feature pages",
        "Dataset B requires identifying specific technical domains (e.g., machine learning, game development) in repositories, unlike Dataset A's product-focused comparisons",
        "Dataset B tasks demand cross-referencing multiple repository attributes simultaneously (language+stars+date), while Dataset A tasks follow linear product navigation paths",
        "Dataset B emphasizes community-driven metrics (top contributors, trending rankings), whereas Dataset A focuses on official documentation and enterprise features",
        "Dataset B requires interpreting repository purpose/main features from descriptions, while Dataset A tasks involve parsing structured product specifications"
      ],
      "nnetnav_live_site=github_num_tasks=71_portion=2": [
        "Dataset B tasks emphasize locating trending or popular repositories based on stars and recency, while Dataset A focuses on general repository filtering by language, stars, or activity.",
        "Dataset B requires identifying specific numerical thresholds (e.g., '50+ stars', 'last 2 days') in repository searches, whereas Dataset A tasks lack explicit quantitative constraints.",
        "Dataset B tasks demand analysis of contributor activity (e.g., 'top three contributors'), while Dataset A focuses on general open-source project popularity metrics.",
        "Dataset B includes explicit comparisons of plan limitations (e.g., 'maximum number of private repositories'), while Dataset A compares plan tiers more generically.",
        "Dataset B tasks require retrieval of repository structural details (e.g., 'files changed in last commit'), which Dataset A tasks do not mention.",
        "Dataset B emphasizes identification of trending developer rankings, while Dataset A focuses on trending repositories without developer leaderboards.",
        "Dataset B tasks involve specific category-based searches (e.g., 'machine learning', 'game development'), whereas Dataset A uses broader technical filters.",
        "Dataset B requires verification of project descriptions/purposes during repository searches, a requirement absent in Dataset A tasks.",
        "Dataset B includes action-oriented tasks like theme configuration in wiki pages, while Dataset A focuses on informational retrieval without implementation steps.",
        "Dataset B tasks demand real-time validation of temporal constraints (e.g., 'initiated in January 2023'), whereas Dataset A uses relative timeframes like 'recent commits'."
      ],
      "nnetnav_live_site=github_num_tasks=71_portion=0": [
        "Dataset B tasks require precise real-time data retrieval (e.g. 'created in last week', 'updated in past 30 days') while A focuses on general recency",
        "B emphasizes numerical thresholds (e.g. '50+ stars', '1000 stars') in repository criteria where A uses qualitative popularity measures",
        "B requires direct interaction with repository content (commit details, changed files, closed issues) while A focuses on metadata analysis",
        "B tasks involve ranking/trend identification ('top contributors', 'ranked first this month') absent in A's tasks",
        "B specifies temporal constraints with exact timeframes ('last 2 days', 'January 2023') where A uses relative recency",
        "B requires comparison of quantitative plan limits ('max private repositories') while A compares feature sets",
        "B tasks demand categorical filtering ('tagged with web scraping', 'climate change projects') where A uses basic language filters",
        "B includes repository content analysis (wiki pages, release notes) while A focuses on surface-level feature discovery",
        "B requires identification of trending/ranked content ('top-trending', 'currently ranked first') not present in A",
        "B tasks involve multi-step verification (purpose description, feature listing) where A focuses on single information retrieval"
      ],
      "nnetnav_live_site=github_num_tasks=71_portion=4": [
        "Dataset B tasks require precise filtering by repository attributes (stars, update dates) while Dataset A focuses on product feature exploration without such granular criteria.",
        "Dataset B tasks involve retrieving specific numerical data (e.g., star counts, contributor counts) whereas Dataset A emphasizes qualitative information (e.g., security policies, pricing descriptions).",
        "Dataset B tasks compare GitHub plan features using quantitative metrics (e.g., maximum private repositories) while Dataset A compares plans qualitatively (e.g., feature availability tiers).",
        "Dataset B tasks demand extraction of exact repository metadata (e.g., commit histories, release versions) while Dataset A requires navigation to broad sections (e.g., customer stories).",
        "Dataset B tasks utilize GitHub's search functionality with compound filters (language + stars + date) whereas Dataset A tasks rely on linear navigation through product pages.",
        "Dataset B focuses on identifying trending/popular repositories through GitHub's discovery features while Dataset A emphasizes understanding GitHub's internal tools (e.g., Copilot, Issues).",
        "Dataset B tasks require parsing specific technical details from repository content (e.g., READMEs, wikis) while Dataset A involves form interactions (e.g., account sign-up, trial requests).",
        "Dataset B tasks prioritize temporal constraints (e.g., repositories updated within X days) whereas Dataset A tasks focus on static product information without time sensitivity.",
        "Dataset B tasks involve direct comparison of plan limitations across multiple parameters while Dataset A focuses on single-plan feature comprehension.",
        "Dataset B requires identification of repository ownership/maintenance patterns (e.g., top contributors) while Dataset A focuses on organizational account management features."
      ],
      "nnetnav_live_site=github_num_tasks=71_portion=1": [
        "Dataset A tasks focus on understanding GitHub's own service features and plans, while Dataset B emphasizes locating third-party repository data",
        "Dataset B requires filtering repositories by exact numerical thresholds (e.g. 1000 stars) while Dataset A uses qualitative filters",
        "Dataset A contains tasks about account management processes (sign-up/upgrade flows) absent in Dataset B",
        "Dataset B tasks frequently require identifying temporal constraints within repository metadata (last 2 days/week/month)",
        "Dataset A focuses on comparing service tiers' abstract capabilities, while Dataset B compares concrete numerical limits (e.g. private repo counts)",
        "Dataset B requires extracting specific technical details from repository contents (commit changes, release notes) unlike Dataset A",
        "Dataset A tasks involve hypothetical scenario evaluation (organizational Copilot use) while Dataset B uses concrete existing data",
        "Dataset B contains tasks requiring identification of trending/ranking information absent in Dataset A",
        "Dataset A focuses on security vulnerability research while Dataset B focuses on repository contribution patterns",
        "Dataset B requires cross-referencing multiple metadata dimensions (language + topic + stars + date) simultaneously more than Dataset A"
      ]
    },
    "espn": {
      "nnetnav_live_site=espn_num_tasks=62_portion=0": [
        "Dataset B tasks require identifying league standings with team rankings (e.g., '9 CONN 12-3') not present in Dataset A",
        "Dataset B includes postponed game status indicators (e.g., 'Postponed CGY 0 LA 0') absent from Dataset A's live/final score focus",
        "Dataset B tasks involve cross-referencing betting odds (spread, moneyline) with game listings unlike Dataset A",
        "Dataset B requires navigation through tournament qualifiers/specialized brackets (e.g., Australian Open Qualifying)",
        "Dataset B tasks demand analysis of team performance trends (e.g., 'how Texas-Ohio State play out') beyond basic score retrieval",
        "Dataset B includes tennis matchups with player nationalities/rankings (e.g., '1 J. Pegula') not found in Dataset A",
        "Dataset B tasks require synthesizing headlines/main points from articles rather than direct stat extraction",
        "Dataset B contains explicit conference/division standings breakdowns (Eastern/Western) for team positioning analysis",
        "Dataset B tasks involve multi-platform broadcast details (e.g., 'ESPN2/ESPN DEPORTES/ESPN+') for single events",
        "Dataset B requires navigation to ticket purchasing interfaces/price comparisons absent from Dataset A tasks"
      ],
      "nnetnav_live_site=espn_num_tasks=62_portion=4": [
        "Dataset B tasks require summarizing article highlights and recent event summaries, while Dataset A focuses on retrieving specific statistical data and historical performance metrics.",
        "Dataset B includes navigation tasks related to college football playoffs (CFP) and bowl games, absent in Dataset A which emphasizes NCAA basketball tournaments.",
        "Dataset B tasks involve counting teams with specific name patterns (e.g., 'Golden' or location-based), whereas Dataset A tasks do not require such pattern-based queries.",
        "Dataset B requires interaction with external ticket purchasing platforms, while Dataset A tasks remain confined to ESPN's internal content.",
        "Dataset B emphasizes real-time or recent game highlights (e.g., 'latest NBA game broadcast'), while Dataset A includes dynamic in-game updates (e.g., 'End of 3rd' quarter scores).",
        "Dataset B tasks focus on identifying MVP candidates and top headlines, whereas Dataset A prioritizes injured player tracking and depth chart analysis.",
        "Dataset B's accessibility tree prominently features NFL Schedule and Playoff Machine links, while Dataset A emphasizes fantasy sports bracket challenges.",
        "Dataset B tasks involve retrieving player-specific physical attributes (e.g., 'heaviest weight among infielders'), which are absent in Dataset A's requirements.",
        "Dataset B includes navigation to ESPN+ Tools for content summaries, while Dataset A focuses on traditional fantasy sports interactions.",
        "Dataset B tasks require cross-referencing game results with external factors (e.g., ticket prices), whereas Dataset A focuses on internal league standings cross-referencing."
      ],
      "nnetnav_live_site=espn_num_tasks=62_portion=1": [
        "Dataset B tasks require summarizing article content (e.g., 'summarize MVP candidate discussions') while Dataset A focuses strictly on data retrieval",
        "Dataset B includes ticket purchasing navigation tasks (e.g., finding cheapest Lakers tickets) not present in Dataset A",
        "Dataset B tasks require comparative analysis between team/player statistics (e.g., 'loser high vs winner high') unlike Dataset A's direct stat lookups",
        "Dataset B contains ESPN+ platform interaction tasks (e.g., 'ESPN+ Tools summary') absent from Dataset A requirements",
        "Dataset B requires identification of positional player attributes (e.g., 'heaviest Yankees infielder') while Dataset A focuses on general stats",
        "Dataset B tasks involve multi-step navigation chains (e.g., find game \u2192 ticket site) unlike Dataset A's single-destination queries",
        "Dataset B includes semantic team name analysis (e.g., count teams with 'Golden' in name) not required in Dataset A",
        "Dataset B requires injury status identification within organizational charts (e.g., Jets depth chart injuries) absent from Dataset A",
        "Dataset B tasks demand temporal pattern recognition (e.g., 'games within last 2 days') while Dataset A uses fixed date ranges",
        "Dataset B includes content categorization tasks (e.g., 'map headlines to leagues') requiring higher-level analysis than Dataset A's direct lookups"
      ],
      "nnetnav_live_site=espn_num_tasks=62_portion=3": [
        "Tasks in dataset B require summarizing article content or video highlights instead of just accessing them",
        "Dataset B tasks involve counting teams with specific name patterns across leagues (e.g. 'Golden' teams)",
        "B requires retrieving player physical attributes (e.g. weight) from roster information",
        "Tasks in B involve commercial aspects like ticket purchasing links and price comparisons",
        "B includes identification of sports leagues associated with homepage headlines",
        "Dataset B tasks require analysis of players' recent performance history (last 5 games)",
        "B contains position-specific queries (e.g. infielders, 2nd string injured players)",
        "Tasks in B require locating MVP discussion articles rather than just statistics",
        "B includes price tracking elements (cheapest available tickets) for live events",
        "Dataset B tasks involve comparative analysis of game metrics between winners/losers"
      ],
      "nnetnav_live_site=espn_num_tasks=62_portion=2": [
        "Tasks in dataset B require identifying player attributes beyond performance stats (e.g., heaviest weight in a roster).",
        "Dataset B includes navigation to external platforms (e.g., ticket purchasing sites) from ESPN links.",
        "Tasks in B demand summarizing article headlines rather than just retrieving scores or stats.",
        "Dataset B requires cross-referencing standings data between divisions/conferences explicitly.",
        "B tasks involve counting teams by name patterns (e.g., 'Golden' or 'Los Angeles') across leagues.",
        "Dataset B requires temporal filtering for 'last 5 games' rather than single-game queries common in A.",
        "Tasks in B focus on cheapest ticket prices for games, which is absent in dataset A.",
        "Dataset B includes positional injury tracking (e.g., '2ND position injuries') as a subtask.",
        "B tasks require MVP candidate analysis from articles, not just aggregated stats or reports.",
        "Dataset B emphasizes post-game analytical elements (e.g., 'loser high vs. winner high') in schedules."
      ]
    },
    "huggingface": {
      "nnetnav_live_site=huggingface_num_tasks=76_portion=1": [
        "Tasks in Dataset B emphasize discovering recently released models (past month) more frequently than Dataset A",
        "Dataset B tasks require summarizing model/dataset descriptions while Dataset A focuses on direct information retrieval",
        "Dataset B includes explicit requirements to use Inference API for live model testing/execution",
        "Tasks in Dataset B specifically target parameter configuration details (e.g. temperature value) more than Dataset A",
        "Dataset B contains tasks requiring comparison/ranking based on GitHub stars of documentation resources",
        "Dataset B tasks focus more narrowly on NLP subdomains (translation, summarization) compared to Dataset A's broader scope",
        "Dataset B requires identification of domain-specific models (medical, fake news) more explicitly than Dataset A",
        "Tasks in Dataset B emphasize open-source license filtering (Apache-2.0) more prominently than Dataset A",
        "Dataset B includes tasks requiring manipulation of tokenizers/embedding layers through documentation",
        "Dataset B contains specific requirements to evaluate educational resources (classroom benefits) not present in Dataset A"
      ],
      "nnetnav_live_site=huggingface_num_tasks=76_portion=0": [
        "Dataset B tasks require identifying models/datasets released within specific recent timeframes (e.g. 'past month'), while Dataset A tasks don't emphasize temporal recency constraints",
        "Dataset B contains tasks requiring direct API interaction demonstrations (e.g. 'use Inference API to generate story'), whereas Dataset A focuses on theoretical API usage understanding",
        "Dataset B emphasizes community engagement metrics (GitHub stars) for documentation resources, while Dataset A focuses on model/dataset popularity metrics (downloads/likes)",
        "Dataset B includes tasks requiring summarization of model features/benefits, while Dataset A focuses on direct information extraction",
        "Dataset B tasks require identification of specific technical parameter values (e.g. 'default temperature value'), while Dataset A focuses on general parameter identification",
        "Dataset B emphasizes open-source model identification, while Dataset A includes commercial/enterprise use case exploration",
        "Dataset B contains content generation tasks through API integration, while Dataset A focuses on trained model discovery",
        "Dataset B tasks reference newer library components (trl, timm), while Dataset A focuses on core Transformers library",
        "Dataset B includes educational resource exploration (Hugging Face Classroom), while Dataset A emphasizes enterprise solutions",
        "Dataset B requires filtering by strict license-popularity combinations, while Dataset A focuses on general license identification"
      ],
      "nnetnav_live_site=huggingface_num_tasks=76_portion=4": [
        "Dataset B tasks emphasize discovering recently released models (past month) while Dataset A focuses on established models",
        "Dataset B requires identifying multiple models/datasets per task (e.g. 'three models') whereas Dataset A typically seeks single instances",
        "Dataset B tasks specifically track popularity metrics (downloads/likes) as primary criteria more frequently than Dataset A",
        "Dataset B contains explicit requirements to report numerical parameter defaults (e.g. temperature values) unlike Dataset A",
        "Dataset B includes tasks about model/documentation benefits analysis while Dataset A focuses on technical implementation details",
        "Dataset B emphasizes model creation/release dates more prominently than Dataset A's focus on general update dates",
        "Dataset B tasks require direct comparison of models based on recency and popularity metrics simultaneously",
        "Dataset B contains explicit summarization requirements for model features that Dataset A lacks",
        "Dataset B includes specific creative generation tasks through Inference API (e.g. stories) that Dataset A's image-focused tasks don't require",
        "Dataset B tasks combine license filtering with popularity metrics more frequently than Dataset A's separate treatment"
      ],
      "nnetnav_live_site=huggingface_num_tasks=76_portion=2": [
        "Tasks in dataset B emphasize identifying models/datasets released within the past month (e.g., 'past month', 'latest')",
        "Dataset B includes tasks requiring real-time interaction with the Inference API (e.g., generating text/stories, calculating similarity scores)",
        "Tasks in B specifically request parameter default values from model settings (e.g., temperature parameter in GPT-J-6B)",
        "Dataset B contains tasks requiring analysis of GitHub star counts for documentation libraries",
        "B includes tasks that require summarizing model/project descriptions rather than just retrieving details",
        "Tasks in B focus on specialized NLP applications like medical summarization and fake news detection",
        "Dataset B requires identifying models based on specific license types combined with popularity metrics (Apache-2.0 + likes)",
        "B contains tasks involving direct manipulation of training configurations through Trainer API parameters",
        "Tasks in B emphasize creative generation through API usage (e.g., 'generate a short story about a dragon')",
        "Dataset B includes explicit requirements to examine tokenizer implementation details (e.g., parameter types/defaults)"
      ],
      "nnetnav_live_site=huggingface_num_tasks=76_portion=3": [
        "Dataset B tasks emphasize discovering recently released models/datasets (past month) more frequently than Dataset A",
        "Dataset B requires identifying 'most downloaded' or 'highest liked' items as core task objectives, unlike Dataset A",
        "Dataset B tasks involve explicit comparison/ranking of models based on popularity metrics (downloads, likes) more prominently",
        "Dataset B contains tasks requiring summarization of model/documentation descriptions as primary objectives",
        "Dataset B tasks focus more on temporal aspects (latest updates, recent releases) in model/dataset selection criteria",
        "Dataset B requires interaction with specific Inference API parameters (temperature settings) rather than general API use",
        "Dataset B tasks demand identification of model technical specifications (tensor types, model sizes) more explicitly",
        "Dataset B includes tasks requiring analysis of documentation GitHub stars/metrics not present in Dataset A",
        "Dataset B tasks involve specific NLP subdomains (fake news detection, recipe generation) as distinct filtering criteria",
        "Dataset B requires explicit investigation of tokenizer configuration details (adding tokens, special parameters) in documentation"
      ]
    },
    "coursera": {
      "nnetnav_live_site=coursera_num_tasks=72_portion=3": [
        "Dataset B tasks explicitly require identifying courses with ethics-related content (e.g., AI ethics, sustainability ethics) not emphasized in Dataset A",
        "Dataset B tasks include searching for courses with module-level specifications (e.g., 'Measuring Sustainability' modules) absent in Dataset A queries",
        "Dataset B requires filtering by precise duration ranges (e.g., <20 hours) rather than general duration tiers seen in Dataset A",
        "Dataset B tasks demand identification of credit-eligible programs, a criterion not present in Dataset A requirements",
        "Dataset B includes queries for testimonials/reviews tied to specific Specializations, unlike Dataset A's general review validation",
        "Dataset B tasks require summarizing business/team plan advantages, indicating organizational product focus absent in Dataset A",
        "Dataset B explicitly asks for skill development outcomes from Specializations, while Dataset A focuses on role-specific paths",
        "Dataset B tasks involve language-specific Specialization searches (e.g., Spanish), not observed in Dataset A samples",
        "Dataset B queries include partner institution geographical filters (e.g., Australian partners) not seen in Dataset A",
        "Dataset B requires identification of instructor bios/career details beyond basic credentials emphasized in Dataset A"
      ],
      "nnetnav_live_site=coursera_num_tasks=72_portion=2": [
        "Dataset B tasks require identifying structured learning paths (e.g., Specializations with projects) rather than individual courses alone",
        "Dataset B tasks emphasize certificate eligibility as a mandatory completion requirement (e.g., 'includes certificate upon completion')",
        "Dataset B requires filtering by exact duration thresholds in weeks (e.g., <5 weeks) rather than month ranges",
        "Dataset B tasks demand verification of partner institution locations (e.g., Australian university partners)",
        "Dataset B requires price comparison between subscription models and promotional discounts (e.g., annual vs. sale pricing)",
        "Dataset B tasks involve locating testimonial evidence for course quality (e.g., student reviews/testimonials)",
        "Dataset B requires identification of credit-eligible programs through specific filters",
        "Dataset B tasks specify outcome-based skill lists as search criteria (e.g., 'skills developed' requirements)",
        "Dataset B requires cross-referencing multiple numeric constraints simultaneously (e.g., 4.5+ stars AND <20h completion time)",
        "Dataset B tasks involve program type distinction between standalone courses vs. multi-course Specializations as core requirements"
      ],
      "nnetnav_live_site=coursera_num_tasks=72_portion=4": [
        "Dataset B includes tasks requiring identification of courses with specific credential types (e.g., Specializations) and explicit certificate offerings, while A focuses on general certification paths.",
        "Dataset B tasks require comparing subscription discounts for annual plans (e.g., $199/year) and team pricing models, whereas A emphasizes monthly subscription costs ($25/month).",
        "Dataset B tasks involve filtering courses by credit eligibility and multi-year duration ranges (1-4 years), which are absent in A's duration filtering requirements.",
        "Dataset B explicitly requires identifying partner companies/institutions from specific regions (e.g., Australia), while A focuses on general university/company partnership verification.",
        "Dataset B tasks demand extraction of quantitative review metrics (e.g., percentage of 5-star ratings), whereas A only requires verifying minimum star thresholds.",
        "Dataset B includes navigation tasks for industry-specific sustainability metrics (e.g., 'Measuring Sustainability' modules), which are not present in A's technical focus areas.",
        "Dataset B requires identifying instructor bios and cross-referencing their other course offerings, while A only verifies instructor bios as a course component.",
        "Dataset B tasks involve summarizing business product advantages (Coursera for Teams/Business), which are not required in A's career outcome analysis.",
        "Dataset B specifies filtering by exact duration thresholds (e.g., <5 weeks) rather than A's broader time ranges (1-3 months).",
        "Dataset B requires identifying learning outcomes for programming languages (e.g., C++ learning outcomes), while A focuses on general skill acquisition verification."
      ],
      "nnetnav_live_site=coursera_num_tasks=72_portion=0": [
        "Tasks in B require extracting numerical data points (e.g., quiz counts, percentage ratings) not present in A",
        "B tasks involve summarizing business product comparisons (Coursera for Business vs Teams) absent in A",
        "B requires identifying regional partner institutions (e.g., Australian partners) while A only mentions general partnerships",
        "B tasks demand identification of specific course modules/content components (e.g., 'Measuring Sustainability') rather than general topics",
        "B includes credit eligibility filtering requirements not found in A's tasks",
        "B tasks require locating and analyzing user testimonials/reviews while A focuses on general ratings",
        "B explicitly requires verification of project inclusion in Specializations whereas A doesn't specify project requirements",
        "B tasks involve detailed instructor bios and cross-course offerings analysis beyond A's basic affiliation checks",
        "B uses multi-year duration filters (1-4 years) compared to A's hourly/monthly ranges",
        "B mandates explicit identification of structured learning outcomes/skills lists for Specializations rather than general skill mentions"
      ],
      "nnetnav_live_site=coursera_num_tasks=72_portion=1": [
        "Tasks in B require filtering courses by exact star rating thresholds (e.g., 4.5+ vs 4+ in A)",
        "Tasks in B involve identifying courses with specific duration ranges in weeks/hours (e.g., <20 hours) rather than month ranges",
        "Tasks in B require verification of partner institutions from specific geographic regions (e.g., Australian universities)",
        "Tasks in B demand retrieval of instructor biographical information and testimonials for courses",
        "Tasks in B specify requirements for project-based Specializations with tangible outcomes",
        "Tasks in B involve comparing business subscription models (Coursera for Business/Teams) rather than individual pricing",
        "Tasks in B require calculation of rounded percentage values for review score distributions",
        "Tasks in B focus on identification of specific course modules/content components (e.g., 'Measuring Sustainability' module)",
        "Tasks in B require multi-criteria matching including skills developed, project requirements, and credential types simultaneously",
        "Tasks in B involve filtering by credit eligibility status and multi-year duration ranges (1-4 years)"
      ]
    },
    "arxiv": {
      "nnetnav_live_site=arxiv_num_tasks=80_portion=1": [
        "Dataset B tasks require quantitative analysis of paper content (e.g., formula counts, figure counts) while Dataset A focuses on metadata extraction",
        "Dataset B includes tasks requiring comparison of author counts across search results, unlike Dataset A",
        "Dataset B contains explicit requirements for HTML format analysis (e.g., reading specific sections) while Dataset A mentions format comparison without in-format analysis",
        "Dataset B tasks demand version history tracking (e.g., v3 submission dates) while Dataset A only requires version comparison",
        "Dataset B requires temporal filtering with higher precision (e.g., 'last two days') compared to Dataset A's broader date ranges",
        "Dataset B includes category structure analysis through subfield abbreviations (e.g., 'nlin.CD') while Dataset A uses primary categories",
        "Dataset B tasks involve external platform data extraction (e.g., university statistics) beyond academic cross-referencing in Dataset A",
        "Dataset B requires content summarization (e.g., abstract summaries) while Dataset A focuses on direct information retrieval",
        "Dataset B contains operational queries (e.g., status notifications, merchandise) absent from Dataset A's academic focus",
        "Dataset B features conditional search parameter modification (e.g., category vs. all-archive comparison) not present in Dataset A"
      ],
      "nnetnav_live_site=arxiv_num_tasks=80_portion=4": [
        "Dataset B tasks require precise quantitative extraction (e.g. formula counts, author counts) while A focuses on qualitative information retrieval",
        "B tasks demand temporal precision with exact date range filtering beyond basic 'last week' parameters used in A",
        "B requires numeric analysis of search results (paper counts per keyword/category) not present in A tasks",
        "B tasks involve cross-referencing paper metadata with external institutional data (author affiliations \u2192 university stats)",
        "B requires version history analysis (specific submission versions) while A only needs basic version awareness",
        "B tasks demand visual content analysis (figure/table counts) beyond A's text-focused requirements",
        "B requires comparative category searches (same query across multiple archives) while A uses single-category navigation",
        "B tasks involve blog/content summarization not present in A's purely research paper-focused requirements",
        "B requires merchandise discovery (arXiv logo shirts) indicating e-commerce elements absent in A",
        "B tasks need first-author identification and affiliation tracking while A only requires general author finding"
      ],
      "nnetnav_live_site=arxiv_num_tasks=80_portion=0": [
        "Dataset B tasks require counting specific elements in papers (e.g., formula/table counts) while Dataset A focuses on locating these elements",
        "Dataset B includes temporal constraints beyond basic date ranges (e.g., 'last two days') whereas Dataset A uses standard temporal filters",
        "Dataset B tasks demand comparison of results across search scopes (category vs. all archives) not seen in Dataset A",
        "Dataset B requires affiliation tracing through author positions (e.g., 'first author') while Dataset A only needs general affiliation identification",
        "Dataset B contains tasks involving arXiv category structure analysis (e.g., subfield abbreviations) absent in Dataset A",
        "Dataset B includes content summarization requirements (e.g., abstract summaries) not present in Dataset A tasks",
        "Dataset B tasks involve arXiv blog content interaction while Dataset A focuses solely on research content",
        "Dataset B requires cross-referencing between paper versions (e.g., v3 submission dates) unlike Dataset A",
        "Dataset B contains external website navigation tasks (e.g., university websites) not found in Dataset A",
        "Dataset B tasks demand quantitative analysis of search result sets (e.g., paper counts) while Dataset A focuses on qualitative retrieval"
      ],
      "nnetnav_live_site=arxiv_num_tasks=80_portion=2": [
        "Requires counting quantitative elements within papers (e.g., formulas, figures, authors)",
        "Involves precise date filtering constraints (e.g., submissions within last 24-48 hours)",
        "Tasks necessitate cross-category search result comparisons (e.g., specific archive vs. all archives)",
        "Requires accessing external institutional pages linked from author affiliations",
        "Demands summarization of specific paper sections (e.g., abstract, hypothesis)",
        "Involves verification of version history timestamps (e.g., v3 submission date)",
        "Requires identification of first author institutional affiliations",
        "Needs aggregation of publication statistics across date ranges (e.g., papers/day counts)",
        "Contains tasks requiring HTML format parsing for content extraction",
        "Includes comparative analysis of search results across multiple metadata fields"
      ],
      "nnetnav_live_site=arxiv_num_tasks=80_portion=3": [
        "Dataset B tasks require quantitative analysis of paper elements (e.g., formula counts, figure counts) while Dataset A focuses on qualitative retrieval",
        "Tasks in Dataset B specifically request temporal comparisons (e.g., 'submitted within last week') more frequently than Dataset A",
        "Dataset B includes tasks requiring comparison of results between specific category searches vs all-archive searches, while Dataset A focuses on single-category searches",
        "Tasks in Dataset B explicitly require version history analysis (e.g., 'when was v3 submitted?') more than Dataset A",
        "Dataset B contains tasks requiring affiliation tracking of specific authors, which is less prominent in Dataset A",
        "Tasks in Dataset B demand explicit category abbreviation identification (e.g., 'what are their abbreviations?') more than Dataset A",
        "Dataset B includes tasks requiring paper element localization (e.g., 'which formula is the loss function') not seen in Dataset A",
        "Tasks in Dataset B require explicit result counting operations (e.g., 'how many papers') more frequently than Dataset A",
        "Dataset B contains tasks requiring cross-category comparisons (e.g., Quantum Physics vs all archives) not present in Dataset A",
        "Tasks in Dataset B demand temporal range filtering precision (specific date ranges) while Dataset A uses more general time references"
      ]
    },
    "bbc": {
      "nnetnav_live_site=bbc_num_tasks=69_portion=2": [
        "Dataset B tasks require locating articles with specific publication timeframes (e.g., 'within last two days') while Dataset A focuses on relative timestamps (e.g., 'hrs ago')",
        "Dataset B contains tasks requiring identification of featured/editorially highlighted content (e.g., 'New Releases' podcasts) not present in Dataset A",
        "Dataset B includes tasks demanding summarization of key findings/recommendations from articles unlike Dataset A's information extraction requirements",
        "Dataset B tasks require navigation to specialized content verticals (Weather, Audio, PodCasts) not explicitly mentioned in Dataset A samples",
        "Dataset B contains queries about quantitative tournament details (team counts, match times) absent from Dataset A sports tasks",
        "Dataset B includes tasks requiring identification of commercial entities/companies involved in stories unlike Dataset A's general topic filtering",
        "Dataset B tasks demand location-specific weather event tracking (where and when) not seen in Dataset A's general weather requests",
        "Dataset B requires navigation through structured educational guides (e.g., climate change explainer) absent from Dataset A tasks",
        "Dataset B contains image-based content analysis requirements (identifying food in travel photos) not present in Dataset A",
        "Dataset B tasks involve fact-checking against specific reference articles while Dataset A focuses on general information gathering"
      ],
      "nnetnav_live_site=bbc_num_tasks=69_portion=3": [
        "Tasks in Dataset B require referencing specific titled guides/articles (e.g., 'What is climate change?') while Dataset A focuses on finding recent articles through timeframes/thematic tags",
        "Dataset B includes tasks demanding exact numerical data extraction (e.g., team counts, section quantities) absent in Dataset A",
        "Dataset B tasks involve identifying featured/highlighted content (e.g., 'New Releases' podcasts) not present in Dataset A\u2019s requirements",
        "Dataset B requires parsing granular details from articles (e.g., storm locations/dates) while Dataset A focuses on timestamp validation for recency",
        "Dataset B tasks reference uniquely named subsections (e.g., 'The SpeciaList') whereas Dataset A uses broad hierarchical categories",
        "Dataset B includes tasks requiring identification of regional origins for elements (e.g., food regions) not seen in Dataset A",
        "Dataset B tasks demand use of multimedia sections to identify featured episodes, whereas Dataset A focuses on general multimedia discovery",
        "Dataset B requires answering questions via reference materials (e.g., guides), while Dataset A emphasizes direct information retrieval",
        "Dataset B tasks specify identifying top headlines in niche sections (e.g., technology), whereas Dataset A seeks recency across general categories",
        "Dataset B tasks use exact article titles for retrieval, while Dataset A relies on metadata/topic tags for search"
      ],
      "nnetnav_live_site=bbc_num_tasks=69_portion=1": [
        "Dataset B tasks require direct summarization of article content, while Dataset A focuses on locating relevant sections or categories",
        "Dataset B tasks demand extraction of specific data points (e.g., names, dates, statistics) from content, whereas Dataset A focuses on general information retrieval",
        "Dataset B includes explicit requirements to interact with multimedia metadata (e.g., podcast episode titles, image captions) rather than just identifying multimedia presence",
        "Dataset B tasks frequently reference specific article titles or branded content sections (e.g., \"The SpeciaList\") that must be precisely located",
        "Dataset B requires answering comprehension questions using article content rather than simply finding information",
        "Dataset B contains tasks requiring quantitative analysis (e.g., counting sections, matching dates) absent in Dataset A",
        "Dataset B tasks specify tighter temporal constraints (e.g., 'within last two days') for information verification",
        "Dataset B includes visual content interpretation tasks (e.g., identifying food in images) not present in Dataset A",
        "Dataset B requires listing/enumeration of specific items (e.g., podcast recommendations) rather than general category exploration",
        "Dataset B tasks involve cross-referencing information within articles to answer complex questions about causality or processes"
      ],
      "nnetnav_live_site=bbc_num_tasks=69_portion=0": [
        "Tasks in dataset B require locating specific articles or guides by exact titles (e.g., \"What is climate change? A really simple guide\") rather than general topics.",
        "Dataset B includes tasks that demand enumeration (e.g., listing podcast names, counting teams or sections) not emphasized in dataset A.",
        "Tasks in B focus on identifying current features (e.g., \"New Releases\" podcasts, top headlines in real-time) rather than general recency checks.",
        "Dataset B tasks involve extracting structured data (e.g., event dates from calendars, match start times) requiring precise parsing of schedules or tables.",
        "B requires identifying multimedia content within specific contexts (e.g., food images in the Travel section with regional attribution).",
        "Tasks in B emphasize summarization of niche topics (e.g., archaeological discoveries, health recommendations) with explicit detail requirements.",
        "Dataset B includes queries about section organization (e.g., counting war-related categories) to assess navigational hierarchy awareness.",
        "B tasks require cross-referencing specific sections (e.g., Athletics calendar) for temporal or quantitative data retrieval.",
        "Dataset B tasks involve answering fact-based questions using a single authoritative source (e.g., climate change causes from a specific guide).",
        "B tasks demand identifying localized content (e.g., regional food names, Scottish Premiership details) with granular geographical specificity."
      ],
      "nnetnav_live_site=bbc_num_tasks=69_portion=4": [
        "Dataset B tasks require summarization of content (e.g., 'summarize key points') while Dataset A focuses on direct information retrieval",
        "Dataset B includes explicit podcast/audio content exploration (e.g., 'find featured New Releases podcasts') absent in Dataset A tasks",
        "Dataset B tasks demand identification of quantitative data (e.g., 'how many teams in Scottish Premiership') unlike Dataset A's qualitative focus",
        "Dataset B requires interaction with curated lists (e.g., 'best PodCasts for 2023') while Dataset A focuses on time-sensitive content discovery",
        "Dataset B tasks involve explanatory analysis (e.g., 'what human activities cause climate change') vs Dataset A's verification-focused metadata parsing",
        "Dataset B includes specific content-type navigation (e.g., 'Weather section storm reports') as discrete tasks rather than general format retrieval",
        "Dataset B tasks require identification of featured/spotlight content (e.g., 'currently featured podcast') through UI patterns not emphasized in Dataset A",
        "Dataset B contains direct requests for categorical counts (e.g., 'how many War sections') unlike Dataset A's categorical cross-referencing",
        "Dataset B tasks involve content interpretation (e.g., 'significance of archaeological findings') beyond Dataset A's fact verification requirements",
        "Dataset B includes specific guide-based navigation (e.g., 'find really simple guide') as distinct task type absent in Dataset A"
      ]
    },
    "amazon": {
      "nnetnav_live_site=amazon_num_tasks=63_portion=2": [
        "Dataset B tasks require multi-attribute filtering (e.g., size + material + price + technical specs) while Dataset A uses single-attribute filters",
        "Dataset B tasks frequently specify exact measurement thresholds (e.g., '30 inches', '300 sq ft') not seen in Dataset A",
        "Dataset B includes time-bound requirements for product releases/availability (e.g., 'released within a month') absent in Dataset A",
        "Dataset B tasks demand explicit comparison operations between multiple results (e.g., 'compare top three prices') more than Dataset A",
        "Dataset B requires sorting mechanisms (e.g., 'sort by Best Sellers', 'newest arrivals') as explicit task requirements unlike Dataset A",
        "Dataset B tasks specify exact technical compatibility requirements (e.g., 'MacBook Pro compatible') not present in Dataset A",
        "Dataset B includes quantitative review analysis requirements (e.g., '100+ reviews') rather than qualitative thresholds in Dataset A",
        "Dataset B tasks require spatial/measurement validation (e.g., 'room size suitability', 'capacity quarts') absent in Dataset A",
        "Dataset B contains tasks requiring temporal deal verification (e.g., 'list current offer percentages') not seen in Dataset A",
        "Dataset B tasks mandate specific interface interactions (e.g., 'save lowest priced result') beyond basic add-to-cart in Dataset A"
      ],
      "nnetnav_live_site=amazon_num_tasks=63_portion=3": [
        "Dataset B tasks require multi-attribute filtering beyond basic price/rating (e.g. disk size + OS version + product type)",
        "Dataset B contains tasks demanding time-based constraints (e.g. publication year 2024, upcoming releases)",
        "Dataset B includes requirements for specific technical specifications (e.g. 10x zoom, HDMI port inclusion)",
        "Dataset B tasks necessitate explicit sorting criteria application (e.g. 'sort by Best Sellers then select')",
        "Dataset B requires quantitative review analysis (e.g. 'over 20,000 reviews' vs general rating thresholds)",
        "Dataset B contains tasks requiring result retention/recall (e.g. 'save the lowest priced among results')",
        "Dataset B includes compatibility verification tasks (e.g. 'compatible with MacBook Pro')",
        "Dataset B tasks demand energy efficiency certification checks (e.g. 'energy efficiency rating' requirements)",
        "Dataset B contains explicit delivery method constraints (e.g. 'FREE delivery' verification)",
        "Dataset B requires detailed review content analysis (e.g. 'give me the top review about...')"
      ],
      "nnetnav_live_site=amazon_num_tasks=63_portion=1": [
        "Tasks in B require handling multi-attribute technical specifications (e.g., Windows 11 OS, 10x zoom, MacBook compatibility)",
        "B includes time-bound product criteria (e.g., 'published in 2024', 'released within a month') not just stock/time constraints",
        "B tasks demand extraction of dynamic promotional details (e.g., '% off values', live deal names) from homepage banners",
        "B requires sorting by specialized metrics like 'Best Sellers' rank rather than basic price/rating sorting",
        "Tasks in B specify exact physical dimensions (e.g., '30-inch length', '2-3 quart capacity') as core filters",
        "B contains requirements to parse/compare energy efficiency certifications or technical ratings",
        "Tasks in B mandate saving/recalling specific search results positions (e.g., 'lowest priced among results')",
        "B requires compatibility verification with specific devices/models (e.g., 'MacBook Pro compatible')",
        "Tasks in B involve analyzing review content (e.g., 'top review about...') rather than just threshold checks",
        "B includes compound sale filters (e.g., 'on sale AND under $10') requiring promotional section navigation"
      ],
      "nnetnav_live_site=amazon_num_tasks=63_portion=0": [
        "Dataset B tasks require multi-step filtering (e.g., price + size + compatibility) while Dataset A uses single-filter criteria",
        "Dataset B tasks explicitly demand technical specifications (e.g., '10x zoom', 'HDMI port') unlike Dataset A's general feature requests",
        "Dataset B requires prioritization of 'Best Sellers' sorted results whereas Dataset A focuses on price/rating sorting",
        "Dataset B contains time-bound publication requirements (e.g., 'released within a month') not seen in Dataset A",
        "Dataset B tasks require verification of exact review thresholds (100+ reviews) vs Dataset A's general star ratings",
        "Dataset B specifies precise price brackets ($50-$100) while Dataset A uses open-ended ranges (under $50)",
        "Dataset B includes device compatibility checks (MacBook Pro) absent from Dataset A tasks",
        "Dataset B requires direct comparison of top search results where Dataset A compares across sellers",
        "Dataset B tasks demand exact physical measurements (30\" length) unlike Dataset A's categorical sizing",
        "Dataset B contains post-filtering actions (save/select lowest) while Dataset A focuses on initial add-to-cart"
      ],
      "nnetnav_live_site=amazon_num_tasks=63_portion=4": [
        "Dataset B tasks require explicit specification of multiple precise attributes (e.g., OS version + disk size) simultaneously, while A focuses on single attributes",
        "Dataset B tasks demand structured price range adherence (e.g., $50-$100) rather than open-ended price checks like in A",
        "Dataset B includes explicit sorting method requirements (e.g., 'sort by Best Sellers') as mandatory steps, unlike A's implied sorting needs",
        "Dataset B tasks require compatibility verification (e.g., MacBook Pro compatibility) not present in A's requirements",
        "Dataset B contains quantitative thresholds for pack sizes (e.g., 'minimum 10 lights') absent in A's tasks",
        "Dataset B tasks specify exact review count requirements (e.g., '20+ reviews') rather than A's general review quality checks",
        "Dataset B requires saving/selections based on sorted results (e.g., 'save lowest priced') unlike A's direct add-to-cart actions",
        "Dataset B includes time-based constraints (e.g., 'released within a month') not found in A's product searches",
        "Dataset B tasks demand explicit comparative analysis between multiple results (e.g., 'compare top three prices') beyond A's price checks",
        "Dataset B requires specific aesthetic pattern matching (e.g., 'floral pattern') rather than A's general category browsing"
      ]
    },
    "wolframalpha": {
      "nnetnav_live_site=wolframalpha_num_tasks=66_portion=4": [
        "Tasks in B require numerical results with specified precision (e.g., significant figures).",
        "Tasks in B involve parametric equations or specialized curve plotting (e.g., cat curve, Albert Einstein curve).",
        "Tasks in B demand real-time or dynamic data retrieval (e.g., current temperature, annual energy production).",
        "Tasks in B require advanced calculus applications (e.g., series convergence analysis, differential equations with special functions).",
        "Tasks in B involve multi-variable physics/engineering problems (e.g., projectile motion with velocity/position calculations).",
        "Tasks in B include compound unit conversions with compositional analysis (e.g., mass-to-mole conversions with element breakdown).",
        "Tasks in B require matrix operations on large systems (e.g., 6x6 Hilbert matrix determinants).",
        "Tasks in B involve polynomial manipulation for simplification (e.g., reducing complex expressions).",
        "Tasks in B specify exact measurement conditions (e.g., thermal conductivity at 25\u00b0C).",
        "Tasks in B include generation/listing of mathematical entities (e.g., prime number sequences within ranges)."
      ],
      "nnetnav_live_site=wolframalpha_num_tasks=66_portion=0": [
        "Dataset B tasks require precise numerical outputs with specified significant figures or scientific notation formatting.",
        "Dataset B includes tasks involving parametric equations or specialized geometric curve generation.",
        "Tasks in B demand higher-order mathematical computations such as determinants of large matrices (e.g., 6x6 Hilbert matrix).",
        "Dataset B requires generating comparative analyses of material properties (e.g., thermal conductivity across elements).",
        "B tasks involve solving complex differential equations or evaluating series convergence/divergence.",
        "Dataset B tasks focus on multi-variable physics calculations (e.g., projectile motion with velocity and position).",
        "B includes explicit requests to generate enumerated lists (e.g., prime numbers within a range).",
        "Tasks in B require polynomial simplification or factorization of high-degree expressions.",
        "Dataset B involves unit conversions with compositional breakdowns (e.g., element percentages by weight).",
        "B tasks include real-time data retrieval (e.g., current weather metrics) as part of computations."
      ],
      "nnetnav_live_site=wolframalpha_num_tasks=66_portion=1": [
        "Tasks in dataset B emphasize numerical precision and formatting requirements (e.g., significant figures, scientific notation).",
        "Dataset B tasks frequently involve generating parametric or non-standard geometric plots (e.g., Albert Einstein curve, cat curve).",
        "B requires direct computation of physical system parameters (e.g., projectile motion outcomes, energy production metrics).",
        "Tasks in B focus on algebraic expression manipulation and simplification rather than equation solving procedures.",
        "Dataset B contains specific requests for material property comparisons across multiple elements/compounds.",
        "B includes tasks requiring series analysis and convergence/divergence determinations for mathematical sequences.",
        "Dataset B tasks involve matrix operations with specific matrix types (e.g., Hilbert matrix determinants).",
        "B requires composite calculations combining multiple scientific parameters (e.g., mass ratios with temporal measurements).",
        "Tasks in B demand generation of numerical sets/ranges (e.g., prime number lists within specified bounds).",
        "Dataset B includes real-time data queries (e.g., current weather conditions) rather than historical data analysis."
      ],
      "nnetnav_live_site=wolframalpha_num_tasks=66_portion=3": [
        "Dataset A tasks focus on information retrieval and exploration of existing knowledge bases (e.g., properties, definitions, historical events)",
        "Dataset B tasks emphasize direct numerical computation with strict formatting requirements (e.g., significant figures, scientific notation)",
        "Dataset A contains queries requiring navigation through categorized educational/research resources",
        "Dataset B prioritizes execution of parametric/algorithmic operations with immediate visual output generation",
        "Dataset A includes tasks involving temporal financial calculations (e.g., present value, currency conversion across years)",
        "Dataset B focuses on physical/engineering calculations with dimensional analysis (e.g., memory allocation, projectile motion)",
        "Dataset A features tasks requiring contextual interpretation of real-world statistics (e.g., unemployment factors, climate models)",
        "Dataset B emphasizes pure mathematical transformations (e.g., series convergence analysis, determinant calculations)",
        "Dataset A contains explicit requests for downloadable data formats and licensing verification",
        "Dataset B requires compositional output formatting (e.g., percentage breakdowns, multi-variable comparisons)"
      ],
      "nnetnav_live_site=wolframalpha_num_tasks=66_portion=2": [
        "Dataset B tasks emphasize advanced mathematical operations (e.g., series convergence analysis, parametric equations) compared to Dataset A's focus on foundational equation-solving and data retrieval",
        "Dataset B requires explicit numerical precision specifications (e.g., significant figures, scientific notation) while Dataset A focuses on direct numerical solutions without formatting constraints",
        "Dataset B contains more complex visualization requirements (e.g., specialized curves like 'cat curve', parametric plots) compared to Dataset A's standard function plotting",
        "Dataset B includes dynamic real-world data queries (e.g., current weather conditions) whereas Dataset A focuses on static historical/archival data retrieval",
        "Dataset B features more advanced physics/engineering calculations (e.g., projectile motion analysis, energy production metrics) compared to Dataset A's basic unit conversions",
        "Dataset B emphasizes pure mathematical constructs (e.g., Hilbert matrices, prime number generation) while Dataset A prioritizes applied mathematics in practical contexts",
        "Dataset B contains more complex chemical computations (e.g., molar conversions with percentage composition) compared to Dataset A's elemental property lookups",
        "Dataset B requires expression simplification/manipulation tasks absent in Dataset A's problem set",
        "Dataset B includes higher-level calculus operations (e.g., differential equation solutions with special functions) beyond Dataset A's basic derivative/integral calculations",
        "Dataset B features constraint-based mathematical problems (e.g., inequality regions) not present in Dataset A's task structure"
      ]
    },
    "allrecipes": {
      "nnetnav_live_site=allrecipes_num_tasks=79_portion=0": [
        "Dataset B tasks require precise numerical constraints (e.g., 'under 600 calories', 'over 300 reviews'), while Dataset A tasks use qualitative criteria (e.g., 'kid-friendly', 'healthy').",
        "Dataset B tasks explicitly demand recipe metadata synthesis (e.g., 'list key ingredients and total preparation time'), whereas Dataset A focuses on discovery without structured output requirements.",
        "Dataset B tasks frequently specify exact rating thresholds (e.g., '4.5 stars or higher'), while Dataset A uses general quality indicators like 'high-rated'.",
        "Dataset B includes multi-step objectives (e.g., 'find recipe + create shopping list'), unlike Dataset A's single-action tasks like 'find and save'.",
        "Dataset B requires cross-referencing quantitative metrics (e.g., '>50 reviews + 4-star rating'), while Dataset A prioritizes categorical filters (e.g., 'vegetarian', 'Easter').",
        "Dataset B tasks involve nutritional parameter adherence (e.g., 'high-protein', 'low-calorie'), absent in Dataset A's broader dietary queries.",
        "Dataset B emphasizes popularity validation through review volume thresholds (e.g., 'over 500 reviews'), whereas Dataset A mentions ratings without quantified minimums.",
        "Dataset B tasks demand ingredient/technique specificity (e.g., 'zucchini lasagna', 'slow cooker'), while Dataset A uses generic descriptors like 'Italian cookies'.",
        "Dataset B requires extraction of exact cooking parameters (e.g., 'maximum oven temperature'), unlike Dataset A's general time-related filters.",
        "Dataset B includes content synthesis from non-recipe pages (e.g., 'About Us section'), while Dataset A focuses solely on recipe discovery and saving."
      ],
      "nnetnav_live_site=allrecipes_num_tasks=79_portion=4": [
        "Tasks in B require precise numerical thresholds (e.g., 4.5 stars, 600 calories) for filtering criteria, while A uses general thresholds (e.g., '4 stars or higher').",
        "B tasks demand structured outputs like ingredient shopping lists or preparation step summaries, which are absent in A.",
        "B includes explicit requirements to extract specific recipe metadata (e.g., 'maximum temperature mentioned in Directions'), whereas A focuses on general recipe discovery.",
        "B tasks specify exact review count thresholds (e.g., 'over 500 reviews'), while A uses qualitative terms like 'highly-rated'.",
        "B requires combining multiple granular filters (e.g., 'vegetarian + under 600 calories + <1 hour prep'), whereas A uses broader constraint combinations.",
        "B tasks frequently reference cuisine styles (e.g., 'Mediterranean-style') as explicit search parameters, while A focuses on general dietary categories.",
        "B emphasizes ingredient specificity beyond dietary constraints (e.g., 'must include zucchini'), whereas A focuses on ingredient categories.",
        "B tasks require identification of primary seasoning/herbs used in recipes, a detail not requested in A.",
        "B includes explicit requirements to summarize cooking methods (e.g., 'slow cooker required'), while A references appliances as general filters.",
        "B tasks mandate outputting temporal breakdowns (e.g., 'total cooking and preparation time'), whereas A focuses on time constraints without output formatting requirements"
      ],
      "nnetnav_live_site=allrecipes_num_tasks=79_portion=1": [
        "Tasks in dataset B require explicit calorie count constraints per serving (e.g., 'under 600 calories'), while dataset A tasks do not specify exact calorie limits",
        "Dataset B tasks mandate precise rating thresholds (e.g., '4.5 stars or higher'), whereas dataset A uses relative terms like 'highly-rated' without numeric thresholds",
        "Tasks in dataset B frequently require output formatting (e.g., 'list the key ingredients', 'create shopping list'), while dataset A focuses on discovery/saving without structured outputs",
        "Dataset B tasks specify exact review quantity requirements (e.g., 'over 500 reviews'), while dataset A uses qualitative descriptors like '100+ reviews'",
        "Tasks in dataset B emphasize Mediterranean/regional cuisine specifications (e.g., 'Mediterranean-style', 'Italian-style'), while dataset A uses broader cuisine categories",
        "Dataset B requires identification of specific protein sources (e.g., 'shrimp and mussels'), while dataset A uses general protein categories like 'chicken'",
        "Tasks in dataset B demand time-bound constraints for preparation (e.g., 'prep time under 45 minutes'), while dataset A uses vague terms like 'quick'",
        "Dataset B tasks require multi-step analysis (e.g., 'find recipe + list ingredients + include prep time'), while dataset A focuses on single actions",
        "Tasks in dataset B specify alternative ingredients (e.g., 'uses zucchini'), while dataset A focuses on dietary categories without ingredient specifics",
        "Dataset B includes technical baking specifications (e.g., 'maximum temperature mentioned'), while dataset A focuses on general recipe characteristics"
      ],
      "nnetnav_live_site=allrecipes_num_tasks=79_portion=3": [
        "Tasks in dataset B require recipes to meet exact numerical thresholds (e.g., 'over 300 reviews', 'under 600 calories') while dataset A uses general thresholds (e.g., '4 stars or higher').",
        "Dataset B tasks explicitly demand inclusion of preparation/cook time constraints (e.g., 'prep time under 45 minutes') whereas dataset A only references time extraction without constraints.",
        "Dataset B requires creation of shopping lists from recipe ingredients, a feature absent in dataset A tasks.",
        "Tasks in dataset B specify exact review count requirements (e.g., 'more than 500 reviews') while dataset A only requires minimum review counts without specific quantities.",
        "Dataset B tasks require identification of specific recipe components (e.g., 'primary seasoning used', 'maximum temperature') not explicitly required in dataset A.",
        "Tasks in dataset B frequently combine multiple precision filters (rating + review count + ingredient constraints) while dataset A uses simpler filter combinations.",
        "Dataset B includes requests for cooking method overviews/step summaries whereas dataset A focuses only on metadata extraction.",
        "Tasks in dataset B require validation of specific ingredient inclusions (e.g., 'must include zucchini', 'shrimp and mussels') beyond general dietary categories used in dataset A.",
        "Dataset B tasks specify style/cuisine hybrid requirements (e.g., 'Mediterranean-style grilled fish') while dataset A uses broader cuisine categories.",
        "Dataset B includes explicit output formatting requirements (e.g., 'list the first five ingredients') not present in dataset A tasks."
      ],
      "nnetnav_live_site=allrecipes_num_tasks=79_portion=2": [
        "Tasks in dataset B require specific numerical constraints (e.g., calorie counts, prep time limits) not seen in dataset A",
        "Dataset B tasks explicitly demand minimum review counts (e.g., 'over 500 reviews') while dataset A only references general prominence of reviews",
        "Dataset B requires precise rating thresholds (e.g., '4.5 stars or higher') whereas dataset A tasks reference ratings more generally",
        "Tasks in dataset B frequently require compilation of multiple data points (ingredients + time + steps) unlike single-focus tasks in dataset A",
        "Dataset B includes meal-specific nutritional requirements (e.g., 'high-protein', 'under 600 calories') not present in dataset A tasks",
        "Dataset B tasks require creation of derived content (shopping lists, step summaries) beyond simple recipe location in dataset A",
        "Tasks in dataset B specify exact ingredient requirements (e.g., 'zucchini', 'shrimp and mussels') rather than general categories in dataset A",
        "Dataset B requires identification of specific preparation methods (e.g., 'slow cooker', 'baked') as mandatory filters unlike dataset A",
        "Tasks in dataset B demand quantitative analysis of community engagement (review counts + ratings) rather than general popularity in dataset A",
        "Dataset B includes multi-criteria filtering combinations (e.g., dietary + rating + cook time) not required in dataset A's single-filter tasks"
      ]
    },
    "dictionary.cambridge": {
      "nnetnav_live_site=dictionary.cambridge_num_tasks=54_portion=2": [
        "Dataset B includes tasks requiring users to complete specific quiz types (e.g., Image quizzes) in the Plus section, while A only references general word games",
        "Tasks in B explicitly require identification of translation service providers/company attribution for translations",
        "Dataset B tasks require counting/quantifying the number of definitions/meanings listed for a word",
        "B includes tasks demanding differentiation between grammatical concepts through comparative analysis (e.g., fewer vs less)",
        "Dataset B tasks require identification of specific content types (word/phrase/idiom) related to a concept within single tasks",
        "B contains tasks that specify difficulty levels for quizzes (e.g., 'easy quiz about Animals')",
        "Tasks in B require explicit identification of grammatical rules (e.g., 'rules for forming comparative adjectives') rather than general concept research",
        "Dataset B includes tasks requiring confirmation of authentication status (e.g., 'without login') for Plus features",
        "B tasks specify particular grammatical structures to analyze (e.g., passive voice) rather than general grammar sections",
        "Dataset B requires identification of content hierarchy through numbered lists (e.g., 'one word, one phase, one idiom') in results"
      ],
      "nnetnav_live_site=dictionary.cambridge_num_tasks=54_portion=3": [
        "Tasks in B require explicit reporting of quiz/game scores from the Plus section",
        "B tasks demand identification of IPA notation specifically for pronunciations",
        "B requires comparison of word meanings across multiple dictionary editions (Learner's/Essential)",
        "B tasks involve interacting with 'Popular searches' lists for common word lookups",
        "B requires recognition of advertisement attribution for translation services",
        "B tasks necessitate handling cookie consent banners with 'Do Not Sell' options",
        "B includes tasks requiring identification of word class/category in definitions",
        "B tasks demand interaction with ranked search result lists (e.g., top 10 searches)",
        "B requires differentiation between blog categories (New Words vs General Blog)",
        "B tasks involve identifying publication years for Word of the Year features"
      ],
      "nnetnav_live_site=dictionary.cambridge_num_tasks=54_portion=1": [
        "Dataset B tasks require interactive engagement with quizzes/games (e.g., Word Scramble, Grammar quiz), while Dataset A focuses on passive information retrieval.",
        "Dataset B tasks explicitly demand reporting activity outcomes (e.g., quiz scores), absent in Dataset A tasks.",
        "Dataset B includes tasks requiring identification of third-party service providers (e.g., translation company), not present in Dataset A.",
        "Dataset B tasks necessitate simultaneous retrieval of multiple data points (e.g., pronunciation + definition + example sentence) per query, whereas Dataset A tasks often target single elements.",
        "Dataset B tasks require explicit use of the '+Plus' section for feature exploration (e.g., quizzes), while Dataset A only references '+Plus' as a navigation endpoint.",
        "Dataset B tasks involve direct interaction with grammar rules (e.g., forming comparatives/superlatives), whereas Dataset A focuses on retrieving pre-existing grammar explanations.",
        "Dataset B tasks specify IPA notation requirements for pronunciations, while Dataset A only implies phonetic comparison.",
        "Dataset B tasks require quantitative analysis (e.g., counting word meanings), absent in Dataset A's qualitative focus.",
        "Dataset B tasks mandate synthesis of cross-sectional information (e.g., grammar rules + usage examples), while Dataset A tasks follow linear navigation paths.",
        "Dataset B contains tasks requiring completion of actions without authentication (e.g., 'without login'), introducing account state considerations not present in Dataset A"
      ],
      "nnetnav_live_site=dictionary.cambridge_num_tasks=54_portion=4": [
        "Dataset B tasks require users to complete interactive quizzes and report final scores, while Dataset A only requires accessing quiz sections.",
        "Dataset B includes specific requests for International Phonetic Alphabet (IPA) notation in pronunciation tasks, unlike Dataset A.",
        "Dataset B tasks explicitly require comparing UK/US English variants within single queries, while Dataset A handles them separately.",
        "Dataset B contains tasks requiring identification of translation service providers/APIs, absent in Dataset A.",
        "Dataset B features explicit requests for quantitative analysis (e.g., counting definitions/synonyms), not present in Dataset A.",
        "Dataset B requires categorization of related terms (word/phrase/idiom) in thesaurus use, while Dataset A asks for general synonyms.",
        "Dataset B tasks specify particular grammar sub-topics (e.g., passive voice, comparatives) rather than general grammar exploration in Dataset A.",
        "Dataset B includes tasks that can be completed without authentication (\"without login\"), unlike Dataset A's implicit logged-in requirements.",
        "Dataset B emphasizes multi-format content combination (pronunciation+definition+example in single task) more systematically than Dataset A.",
        "Dataset B contains explicit references to image-based quizzes and visual learning elements absent in Dataset A tasks"
      ],
      "nnetnav_live_site=dictionary.cambridge_num_tasks=54_portion=0": [
        "Tasks in B require reporting quiz/game scores (e.g., 'tell me your final score') while A does not",
        "B includes tasks that explicitly reference International Phonetic Alphabet (IPA) notation for pronunciation details",
        "Tasks in B involve quantitative verification (e.g., 'how many meanings') not present in A",
        "B requires identifying translation service providers ('tell me which company provided the translation') unlike A",
        "Grammar tasks in B specify completion without authentication ('without login') whereas A doesn't mention access constraints",
        "B includes explicit navigation to categorized quizzes (e.g., 'Image quizzes about Animals') while A references general quizzes",
        "Tasks in B combine multiple content types in single actions (e.g., 'pronunciation + definition + example sentence') more systematically than A",
        "B requires distinguishing between content formats (e.g., 'one word, one phase, one idiom') within entries unlike A",
        "Tasks in B demand identification of linguistic metadata (e.g., 'rules for forming comparative adjectives') beyond A's general grammar exploration",
        "B includes direct interaction with game mechanics (e.g., 'unscrambling letters') while A focuses on content access without gameplay participation"
      ]
    },
    "apple": {
      "nnetnav_live_site=apple_num_tasks=70_portion=1": [
        "Dataset B tasks focus on current/recently released products (e.g. iPhone 14/15 Pro, iOS 17) while A includes hypothetical/future products (e.g. iPhone 16, WWDC25)",
        "B requires checking time-bound availability (e.g. 'schedule in-store pickup for January 10, 2024') not present in A",
        "B includes retrieving specific marketing slogans (e.g. Apple Watch slogan) absent in A's technical focus",
        "B tasks involve verifying software-hardware compatibility (e.g. iOS 17 with iPhone 12) while A focuses on hardware specs",
        "B requires precise location-based queries (e.g. zip code 90038 availability checks) not seen in A",
        "B's trade-in tasks target older devices (iPhone 11 Pro Max) vs A's newer device trade-ins (iPhone 12+)",
        "B emphasizes immediate color/configuration availability checks while A focuses on material/design comparisons",
        "B tasks include checking exact release dates/prices rather than A's environmental/sustainability comparisons",
        "B requires identifying specific accessory features (Siri Remote) vs A's ecosystem compatibility checks",
        "B tasks demand measurement retrieval (size/weight of Apple TV) while A focuses on performance specs"
      ],
      "nnetnav_live_site=apple_num_tasks=70_portion=4": [
        "Tasks in B emphasize immediate purchase logistics (e.g., in-store pickup scheduling, zip code-based availability checks)",
        "B requires direct verification of technical specifications (e.g., battery life metrics during specific usage scenarios)",
        "Tasks in B focus on identifying marketing content (e.g., product slogans, feature names)",
        "B includes time-bound availability checks with specific dates rather than general availability inquiries",
        "Tasks in B prioritize identification of physical product attributes (e.g., dimensions, weight, color counts)",
        "B requires comparison of consecutive product generations (e.g., iPhone 14 Pro vs. 15 Pro) rather than concurrent models",
        "Tasks in B demand listing quantitative product details (e.g., exact color count, accessory quantities)",
        "B focuses on surface-level feature identification rather than customization workflows",
        "Tasks in B require retrieval of precise release dates rather than general product timelines",
        "B emphasizes verification of current retail inventory status rather than hypothetical purchasing scenarios"
      ],
      "nnetnav_live_site=apple_num_tasks=70_portion=0": [
        "Dataset B tasks require verifying immediate in-store pickup availability with specific dates, while Dataset A focuses on general product availability checks or release dates.",
        "Dataset B tasks involve checking exact technical specifications (e.g., battery life during web browsing), whereas Dataset A emphasizes broader technical comparisons (e.g., chip types).",
        "Dataset B includes explicit requests for color/model variations (e.g., HomePod mini colors), while Dataset A focuses on general product line comparisons without color specificity.",
        "Dataset B tasks prioritize current trade-in offers for older devices (e.g., iPhone 11 Pro Max), while Dataset A emphasizes trade-in value ranges for newer models.",
        "Dataset B requires identifying precise price differences between consecutive models (e.g., iPhone 14 Pro vs. 15 Pro), whereas Dataset A compares prices across product categories.",
        "Dataset B tasks demand listing specific accessory names (e.g., Smart Folio for iPad), while Dataset A explores accessory compatibility without naming exact products.",
        "Dataset B includes troubleshooting steps for account recovery (e.g., forgotten Apple ID), while Dataset A focuses on support section navigation for device repairs.",
        "Dataset B tasks verify marketing slogans (e.g., MacBook Pro taglines), which are absent in Dataset A's objectives.",
        "Dataset B requires checking regional availability via zip codes (e.g., 90038), while Dataset A navigates regional settings without location-specific verification.",
        "Dataset B tasks emphasize immediate purchase logistics (e.g., stock availability dates), whereas Dataset A involves product configuration customization (e.g., color/storage selection)."
      ],
      "nnetnav_live_site=apple_num_tasks=70_portion=2": [
        "Dataset A tasks involve detailed product customization (specific storage, color, size combinations) while Dataset B focuses on pre-configured model comparisons",
        "Dataset A includes enterprise-specific navigation (business plans, bulk purchases) not present in Dataset B tasks",
        "Dataset B tasks emphasize checking release dates/new model availability while Dataset A focuses on existing product configurations",
        "Dataset A contains tasks requiring navigation through multiple policy documents (environmental reports, business conduct) absent in Dataset B",
        "Dataset B tasks frequently reference specific dates/times (e.g., store pickup scheduling) unlike Dataset A",
        "Dataset A includes troubleshooting scenarios for physical device damage while Dataset B focuses on account/password recovery",
        "Dataset B tasks explicitly request slogan/feature tagline identification not present in Dataset A",
        "Dataset A shows navigation through educational/K-12 specific content not found in Dataset B tasks",
        "Dataset B emphasizes accessory availability checks while Dataset A focuses on accessory compatibility",
        "Dataset A contains financial report navigation tasks absent in Dataset B's product-focused queries"
      ],
      "nnetnav_live_site=apple_num_tasks=70_portion=3": [
        "Dataset A tasks require exploring business/enterprise solutions and healthcare integrations not present in Dataset B",
        "Dataset B focuses on immediate in-store pickup scheduling with specific dates while Dataset A emphasizes general availability checks",
        "Dataset A includes tasks about future OS compatibility (e.g. iOS 18) and beta features like Apple Intelligence",
        "Dataset B requires direct comparison of consecutive model generations (e.g. iPhone 14 Pro vs 15 Pro) rather than configuration variants",
        "Dataset A contains tasks about warranty status verification and extended service purchases (AppleCare) absent in B",
        "Dataset B emphasizes physical product attributes (size, weight, colors) more prominently than Dataset A",
        "Dataset A includes privacy/security-related tasks (Core Spotlight, data handling) not found in Dataset B",
        "Dataset B tasks require identifying marketing slogans and promotional language absent in Dataset A",
        "Dataset A contains healthcare-specific functionality tasks (Health Records enrollment) not present in B",
        "Dataset B focuses on accessory compatibility checks for specific new products (Vision Pro) rather than general accessory searches"
      ]
    },
    "google_search": {
      "nnetnav_live_site=google_search_num_tasks=72_portion=3": [
        "Dataset B tasks emphasize retrieving exact numerical values or precise measurements (e.g., air quality index, astronomical distances)",
        "Dataset B includes queries requiring identification of ordinal positions/sequence data (e.g., 'first 7 bits', 'next visible solar eclipse after specified one')",
        "Dataset B tasks frequently request direct copy-paste operations of technical identifiers (e.g., GitHub commit SHAs)",
        "Dataset B contains more temporal specificity requirements (e.g., 'as of today's date', 'most recent... before...')",
        "Dataset B shows higher prevalence of astronomical/geographic factual queries (e.g., solar system distances, eclipse paths)",
        "Dataset B tasks often require verification against authoritative lists/rankings (e.g., world records, championship winners)",
        "Dataset B includes explicit requests for metadata about information sources (e.g., ratings from specific review platforms)",
        "Dataset B tasks more frequently involve temporal comparisons within singular entities (e.g., athlete's best season performance)",
        "Dataset B shows stronger focus on current atmospheric/environmental conditions (e.g., air quality, celestial events)",
        "Dataset B contains more explicit requests for complete enumeration of items (e.g., 'all discovered planets', 'names of all kids')"
      ],
      "nnetnav_live_site=google_search_num_tasks=72_portion=2": [
        "Dataset A tasks focus on user engagement with platforms (e.g., starting courses, applying for jobs), while B tasks focus on passive data retrieval without platform interaction.",
        "Dataset A includes tasks requiring subjective validation (e.g., author credibility, recipe ratings), whereas B tasks prioritize objective, verifiable facts (e.g., records, dates).",
        "Dataset A tasks involve transactional goals (e.g., purchasing tickets, buying books), while B tasks lack transactional intent.",
        "Dataset A tasks frequently target health/medical information (e.g., symptoms, treatments), which is absent in B.",
        "Dataset B emphasizes real-time or dynamically updated data (e.g., current air quality, today's Earth-Mars distance), while A focuses on stable information (e.g., recipes, strategies).",
        "Dataset B contains more sports/athletics-related queries (e.g., player stats, game scores), which are rare in A.",
        "Dataset A tasks often require personalization (e.g., \"near me\" venues, user-specific recipe databases), while B tasks are location-agnostic.",
        "Dataset B includes astronomical/geographic queries (e.g., solar eclipses, star systems), which are absent in A.",
        "Dataset A tasks involve content creation/curation (e.g., rating recipes, building ingredient lists), while B focuses solely on extraction.",
        "Dataset B requires precise technical operations (e.g., copying/pasting commit SHAs), whereas A emphasizes conceptual understanding (e.g., learning AI applications)."
      ],
      "nnetnav_live_site=google_search_num_tasks=72_portion=4": [
        "Dataset A tasks focus on multi-domain research requiring synthesis of information across sources (e.g., academic programs + climate impacts + job applications)",
        "Dataset B tasks prioritize singular factual retrievals with unambiguous answers (e.g., championship winners, record times, exact dates)",
        "Dataset A contains transactional objectives requiring user actions beyond information retrieval (e.g., job applications, ticket purchases, Wikipedia edits)",
        "Dataset B emphasizes numerical precision in scientific/athletic metrics (e.g., planetary distances, sprint records, air quality indexes)",
        "Dataset A requires contextual interpretation of evolving concepts (e.g., 'relevance of AI innovations this year', 'effects of climate change')",
        "Dataset B focuses on discrete historical/current events verification (e.g., tournament results, celebrity bios, championship locations)",
        "Dataset A tasks frequently involve comparative analysis between entities (e.g., stock performance comparisons, recipe ingredient alternatives)",
        "Dataset B tasks demand exact string matching/output replication (e.g., commit SHAs, first 7 bits of hashes, precise measurement values)",
        "Dataset A includes open-ended exploration of creative domains (e.g., event planning ideas, fashion trends, recipe customization)",
        "Dataset B concentrates on time-bound verification of system-generated data (e.g., real-time sports scores, astronomical distance updates, trending search analytics)"
      ],
      "nnetnav_live_site=google_search_num_tasks=72_portion=0": [
        "Dataset B tasks predominantly require retrieving exact numerical or statistical data (e.g., records, dates, distances).",
        "Dataset B tasks frequently involve verifying information from authoritative databases or platforms (e.g., IMDb, Rotten Tomatoes, GitHub).",
        "Dataset B tasks focus on factual, non-transactional outcomes (e.g., championship winners, biographical details, astronomical data).",
        "Dataset B tasks emphasize retrieving singular, unambiguous answers (e.g., specific years, record holders, SHA hashes).",
        "Dataset B tasks often target public figures' career milestones or achievements (e.g., athlete stats, actor projects).",
        "Dataset B tasks include requests for real-time scientific or environmental metrics (e.g., air quality index, planetary distances).",
        "Dataset B tasks prioritize cross-platform comparisons (e.g., movie ratings across IMDb and Rotten Tomatoes).",
        "Dataset B tasks require extracting technical or cryptographic identifiers (e.g., GitHub commit SHAs).",
        "Dataset B tasks focus on sequential or future events (e.g., next solar eclipse dates).",
        "Dataset B tasks center on globally recognized events or entities (e.g., World Cup, UEFA Champions League)."
      ],
      "nnetnav_live_site=google_search_num_tasks=72_portion=1": [
        "Tasks in dataset B require retrieving exact numerical values or codes (e.g., SHA hashes, record times, air quality indices) more frequently than dataset A.",
        "Dataset B tasks focus more on retrieving real-time or near-real-time data (e.g., current air quality, today's planetary distance) compared to dataset A's time-sensitive but less granular temporal requirements.",
        "Tasks in dataset B emphasize direct comparisons between authoritative platforms (e.g., IMDb vs. Rotten Tomatoes ratings) rather than general multi-source comparisons seen in dataset A.",
        "Dataset B includes explicit requests to copy/paste technical identifiers (e.g., GitHub commit SHAs) as part of task completion, unlike dataset A.",
        "Tasks in dataset B more frequently involve astronomical/scientific measurements (e.g., solar eclipse dates, star system distances) compared to dataset A's focus on commercial/health domains.",
        "Dataset B requires identification of ordinal rankings (e.g., 'top-10 destinations', 'most goals in a season') more systematically than dataset A.",
        "Tasks in dataset B focus on biographical/achievement data for public figures (e.g., athlete records, celebrity projects) as a distinct category compared to dataset A's broader person-related queries.",
        "Dataset B includes explicit temporal chaining (e.g., 'next eclipse and the one after that') more frequently than dataset A's single-instance time queries.",
        "Tasks in dataset B require parsing technical repositories/version control systems (e.g., GitHub commits) unlike dataset A's focus on mainstream platforms.",
        "Dataset B emphasizes global-scale events/records (e.g., World Cup, UEFA Champions League) more prominently than dataset A's locally anchored tasks (e.g., hotel prices in NYC)."
      ]
    }
  }
}