CLUE-NAS: A CLIP-Inspired Contrastive Learnable Unifying Encoder for Neural Architecture Search

ICLR 2026 Conference Submission13655 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: predictor-based NAS, encoder-based NAS, semantic-based NAS, language model-based NAS, CLIP-based NAS
Abstract: Conventional encoder-based neural architecture search (NAS) methods typically encode candidate architectures as graphs based on their information flow and operations. Such graph-based embeddings primarily capture topological features, such as nodes and edges, while lacking high-level semantic representations, which limits the robustness and generalization of encoder-based NAS. This issue is evident in several phenomena, such as the inability of typical NAS methods to interpret previously unseen operations or their limited capacity to benefit from joint training across multiple search spaces. To mitigate these limitations, we propose Contrastive Learnable Unifying Encoder for NAS (CLUE-NAS), a novel framework that leverages the text encoder of Contrastive Language Image Pre-training (CLIP) to generate context embeddings enriched with high-level semantics and integrates them with graph-based embeddings through contrastive learning. CLUE-NAS further emulates human expert behaviors by employing a coarse-to-fine strategy to enhance performance. Experiments on NASBench-101, NASBench-201, and NASBench-301 show that CLUE-NAS not only demonstrates strong generalization to unseen operations but also benefits substantially from joint training, achieving competitive results against state-of-the-art NAS baselines.
Primary Area: infrastructure, software libraries, hardware, systems, etc.
Submission Number: 13655
Loading