Architect Thyself: Neural Darwinism and Self-Evolving Multimodal Networks

Ayan Sar; Pranav Singh Puri; Anurag Kaushish; Prashant Trivedi; Sumit Aich; Tanupriya Choudhury

Architect Thyself: Neural Darwinism and Self-Evolving Multimodal Networks

Ayan Sar, Pranav Singh Puri, Anurag Kaushish, Prashant Trivedi, Sumit Aich, Tanupriya Choudhury

18 Sept 2025 (modified: 04 Dec 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Architecture search, Meta-learning, Multimodal learning, Neural architecture evolution, Vision-language model

Abstract: Modern deep learning architectures, particularly Vision-Language Models (VLMs), have achieved remarkable success across a wide range of multimodal tasks. However, these models are often constrained by manually engineered, static topologies with predefined architectural blueprints that limit their adaptability, diversity, and evolutionary potential. Such rigidity hampers their ability to generalize across domains, scale efficiently, and innovate beyond human design. To address these limitations, we present AI Architect Thyself, a meta-learned evolutionary framework that enables neural networks to design, diversify, and evolve their own architectures. Unlike conventional neural architecture search or fixed multimodal blueprints, our approach treats topology as a dynamic, learnable variable optimized jointly with network parameters. Our Thyself Architect introduces three key innovations: (i) Parametric Purality (PP) where multiple instantiations of diverse archetypes (e.g., Transformers, LSTMs, ResNets, Squeeze-and-Excite modules) coexist with distinct hyperparameters; (ii) a Graph Attention Router (GAR) that performs per-sample expert routing across a dynamically evolving module zoo; and (iii) a co-evolutionary hybridization engine that recombines architectural traits of high-performing ancestors to generate novel configurations beyond human design. Across 12 multimodal and vision-language benchmarks, including Hateful Memes, VQA v2.0, COCO Captions, Food-101, and OpenImages, our framework consistently surpasses state-of-the-art baselines with improvements of +0.9\% to +4.1\% in accuracy, AUC, and F1-Score. These results demonstrate a paradigm shift: models can evolve from engineered artifacts into self-directed, evolving organisms, advancing the frontier of autonomous machine intelligence.

Supplementary Material: pdf

Primary Area: transfer learning, meta learning, and lifelong learning

Submission Number: 13308

Loading