FNFORMER: A Transformer-Based Face Normal Estimator

Published: 01 Jan 2024, Last Modified: 13 Nov 2024ICME 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Face normal estimation is a crucial step in the development of 3D facial applications, particularly for face modeling and relighting. U-shaped networks are widely used for the task and have witnessed remarkable success. However, CNN-based methods often suffer from unsatisfied generalization ability to out-of-distribution/unseen data, because they do not adequately model long-range dependencies. To address this limitation, Transformer-based approaches have been developed, which benefit from the global self-attention mechanism. Nevertheless, merely using them to learn face normal may lead to limited localization abilities due to insufficient low-level details. In this work, we customize a hybrid model called FNFormer that combines Transformer and CNN to achieve accurate face normal estimation. The proposed model encodes tokenized image patches from CNN feature maps as input to extract global context features using Transformer blocks. Additionally, it extracts detailed local spatial information from a U-shaped CNN. Both the CNN and Transformer features are then integrated for further learning, enabling the network to take both the local and global information into account effectively. Extensive experimental results demonstrate that our proposed FNFormer achieves state-of-the-art performance on various datasets. Our code is available at https://github.com/AutoHDR/FNFormer.
Loading