Abstract: Point cloud compression plays a critical role in efficiently handling large 3D datasets, enabling their practical usage. An accurate entropy model is essential for learned codecs to achieve good compression performance. Although autoregressive entropy models can explore dependencies in large contexts, they are inefficient in terms of complexity and may miss information from the inverse direction. To improve coding efficiency while guaranteeing fast decoding, we propose a bidirectional mask Transformer entropy model (BMTEM) for point cloud geometry compression, by leveraging bidirectional self-attention in the mask Transformer. The pre-defined mask schedules facilitate group-wise autoregression, which improves parallel computation for faster inference compared to a fully autoregressive approach. Comparative evaluations against point cloud coding standards G-PCC and V-PCC reveal that BMTEM achieves significant bitrate savings of 53.22% and 47.18% in terms of D1-PSNR, respectively. When compared to other deep learning-based methods like PCGCv2 and ANFPCGC, our approach demonstrates average bitrate reductions of 20.01% and 17.67% in terms of D1-PSNR, respectively.
Loading