AI Respondents for Policy Monitoring: From Data Extraction to AI-Driven Survey Responses in the OECD STIP Compass

AI Respondents for Policy Monitoring: From Data Extraction to AI-Driven Survey Responses in the OECD STIP Compass

ICLR 2026 Conference Submission20485 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models (LLMs), Long-Context In-Context learning (ICL), Policy Data Extraction, STIP Compass (OECD), AI Respondents, Human-AI Agreement

Abstract: Science, Technology, and Innovation (STI) policies are central to national and international competitiveness, yet their complexity makes systematic mapping and continuous monitoring a persistent challenge. This study draws on one of the largest initiatives in the field, the OECD STIP Compass survey, which collects and organizes data on STI policy from OECD countries and has historically relied on extensive manual survey efforts to ensure global consistency. Large Language Models (LLMs) are redefining representation learning in NLP, enabling them to process and internalize knowledge from long unstructured documents. This paper presents a novel application of LLMs for structured information extraction and generation from STI policy documents, focusing on OECD data across six sample countries. We develop a data extraction pipeline based on long-context in-context learning to encode task-specific schemas that allow learning of survey taxonomy labels from public URLs referencing policy initiatives. The pipeline integrates validation steps using a secondary LLM for relevance and evidence scoring, and comparison with survey responses completed by human respondents. For evaluation, we apply multiple overlap measures, including overlap ratios, agreement scores between human-generated and LLM-generated policy indicators, and K-fold cross-validation for AI-generated labels. Our findings indicate that LLMs can achieve high overlap with human respondents for policy indicators (84-95%). Qualitative analysis reveals that the model tends to provide more detailed descriptions, complementing human-written content. Our approach points to the potential of an AI-assisted framework for STI policy monitoring, enhancing both efficiency and quality in international policy intelligence.

Primary Area: generative models

Submission Number: 20485

Loading