Efficient Realistic Avatar Generation via Model Compression and Enhanced Rendering

Shengjia Zhang

Efficient Realistic Avatar Generation via Model Compression and Enhanced Rendering

Shengjia Zhang

22 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX

Keywords: Head Avatars, Knowledge distillation, Neural network compression, Generative adversarial renderer

Abstract: In order to integrate digital avatars into people's lives, efficiently generating complete, realistic, and animatable avatars is a very important requirement. However, with the increasing parameter counts and model sizes, efficiency such as training speed and model sizes are challenged when the models are deployed on devices, while the graphical rule-based micro-renderers, which simplify real-world photorealistic mechanisms such as illumination and reflections, are unable to generate photorealistic images. Based on these issues, we propose a two-stage model compression optimization architecture, where the first stage uses our proposed distillation architecture to compress the model, and the second stage uses our proposed generative adversarial renderer to customize its inverse version to the student network to further improve the realism of digital avatars. Specifically, in the knowledge distillation process, multi-scale feature fusion is achieved by concatenating the output features of RandLA-Net and GCN to combine global and local information to better capture the details and contextual information of the point cloud. We construct assisted supervision, which enables point-level supervision by building the graph topology of the entire point cloud. We also propose to feed the extracted point cloud features as latent codes into our well-designed neural renderer to render more realistic facial images. Experiments show that the method not only improves the network performance but also reduces the parameters and computation of the whole network compared to existing SOTA methods, and our method reduces the number of parameters of the teacher model by about 95\% and 90\% of the computation in knowledge distillation.

Primary Area: representation learning for computer vision, audio, language, and other modalities

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 4643

Loading