Autonomous Discovery of Robot Structure and Motion Control Through Large Vision Models

Xiaohui Li, Lian Liu, Zhao Zhang, Xiaoyu Guo, Jinqiang Cui

Published: 2024, Last Modified: 20 Jan 2026CIS-RAM 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: This paper introduces an algorithm for autonomous self-modeling of robots through the integration of Large Vision Model (LVM) and Large Language Model (LLM). Our approach differs from traditional robotic approaches in that it enables robots to independently discover and refine their own body structure and control strategies using only partial information. Through a symbiotic process that includes LLM’s ability to generate predictive control code based on finite prompts, and LVM’s visual reasoning to validate and improve those predictions, our algorithm facilitates a self-learning loop. This cycle is characterized by an inner loop of assumptions, observations, and adjustments, supplemented by an outer loop that gradually increases the information provided until convergence is reached. The effectiveness of the process was quantified by measuring the difference between the expected and actual joint motion as a cost function to determine the minimum feasible prompt (MVP). Simulation results indicate that the algorithm is capable of self-modeling of with minimal initial information.

External IDs:dblp:conf/ram/LiLZGC24