Abstract: The current blockchain systems are suffering the low scalability. In order to improve the scalability and enable the storage of more critical facial data on the blockchain, we propose a novel cross-modal face reconstruction model in this paper. To reduce the huge communication pressure of model synchronization on the blockchain, we use model-independent sketches and natural language texts as intermediate representations of the face. To obtain accurate natural language representations of faces, we designed a precise progressive questioning process to achieve targeted knowledge mining of large multimodal models. The mixed modality intermediate representation of sketches and text greatly reduces the storage volume of faces and increases the visual fidelity. After uploading this mixed modality representation to the blockchain, it effectively reduces the communication and storage pressure of the blockchain. Finally, we used a diffusion generative model that supports mixed modality to reconstruct the face. Experiments have demonstrated the huge potential and application prospects of our method.
Loading