Keywords: face reconstruction, vision language model, unsupervised learning
Abstract: We introduce FaceGPT, a self-supervised learning framework for large vision-language models (VLMs) to reason about 3D human faces from images and text. Typical 3D face analysis algorithms are specialized and lack semantic reasoning capabilities. FaceGPT overcomes this limitation by embedding the parameters of a 3D morphable face model (3DMM) into the token space of a VLM, enabling the generation of 3D faces from both textual and visual inputs. FaceGPT is trained as a model-based autoencoder in a self-supervised manner from in-the-wild images. In particular, a dedicated face token is projected to 3DMM parameters and then rendered as a 2D face image to guide the self-supervised learning process through image-based reconstruction. Without relying on expensive 3D annotations, FaceGPT learns to generate 3D faces based on visual or textual inputs, achieving a competitive performance compared to methods that are specialized to each of these tasks. Most importantly, FaceGPT is able to leverage the world knowledge in VLMs to achieve semantic reasoning capabilities, allowing the model to perform speculative generation of 3D faces purely from subtle textual prompts that do not explicitly describe facial features. This opens a new way of generating 3D faces from subtle descriptions of emotions or general everyday situations.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 4499
Loading