Using Large Language Models to measure and classify occupations in surveys

Published: 26 Jul 2025, Last Modified: 06 Oct 2025NLPOR 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: occupation coding, LLM, survey
TL;DR: using an LLM to code occupation on the fly and probe for more detail if needed
Submission Type: Non-Archival
Abstract: We present the results of a new approach to measuring the occupations of respondents in surveys using Large Language Models (LLM). Occupation is a notoriously difficult variable to measure accurately due to the very large number of occupations and the technical ways they are described in standard classifications. These features of occupational classification systems mean that respondents cannot feasibly pick their occupation from a list, even with dynamic text prediction. The measurement and classification stages are therefore usually not conducted simultaneously, with coding of open responses about job title and tasks implemented in a subsequent stage of 'office coding'. In our new approach, an LLM integrated in the questionnaire scripting is used to code the job title response to the occupational classification within the interview. Where the job title does not contain sufficient information to be coded with confidence, the LLM probes for further relevant detail on job tasks, industry, qualifications, and so on. The approach has the potential to reduce respondent burden, lower costs, and yield more timely and accurate data. We evaluate the methodology by comparing the LLM-coded data to codes applied by human coders in a field experiment using the Verian Public Voice online probability panel.
Submission Number: 8
Loading