GDCF: A Generalizable Digital Cognition Framework Incorporating Teacher Model Generated Pseudo-label Verification into Instruction Tuning

ACL ARR 2025 February Submission278 Authors

05 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: To address the semantic gap between specialized terminology in cultural heritage and everyday public language, this paper innovatively proposes the Generalizable Digital Cognition Framework (GDCF), focusing on overcoming cross-domain semantic alignment challenges in low-resource scenarios. By leveraging a teacher-student model architecture and instruction tuning techniques, GDCF achieves accurate mapping from everyday language to domain-specific vocabulary in a few-shot setting with only 100 annotated samples. The teacher model generates initial pseudo-labels, while a dynamic label masking strategy guides the smaller student model through instruction tuning, enabling it to achieve performance comparable to the teacher model. Remarkably, when both teacher and student models use the same parameter size, the student model can even outperform the teacher model. Experiments show that this method achieves a keyword extraction accuracy of 0.39 on a cultural heritage review dataset, marking a 73\% improvement over the baseline LLM. More significantly, this framework pioneers a 3D visualization space that integrates semantic vectors with cognitive dynamics, uncovering deep semantic relationships between public discourse and professional terminology. Its modular design has been successfully validated for transferability in architectural heritage conservation assessments, providing a scalable benchmark paradigm for interdisciplinary digital humanities research.
Paper Type: Long
Research Area: Efficient/Low-Resource Methods for NLP
Research Area Keywords: data-efficient training,NLP in resource-constrained settings,few-shot learning,lexical relationships
Contribution Types: NLP engineering experiment, Approaches to low-resource settings, Data analysis
Languages Studied: English,Chinese
Submission Number: 278
Loading