Challenges in Technical Regulatory Text Variation Detection

Shriya Vaagdevi Chikati, Samuel Larkin, David Minicola, Chi-kiu Lo

Published: 2025, Last Modified: 06 Jan 2026COLING Workshops 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: We present a preliminary study on the feasibility of using current natural language processing techniques to detect variations between the construction codes of different jurisdictions. We formulate the task as a sentence alignment problem and evaluate various sentence representation models for their performance in this task. Our results show that task-specific trained embeddings perform marginally better than other models, but the overall accuracy remains a challenge. We also show that domain-specific fine-tuning hurts the task performance. The results highlight the challenges of developing NLP applications for technical regulatory texts.

External IDs:dblp:conf/coling/ChikatiLML25