Exploiting Latent Properties to Optimize Neural Codecs

14 Mar 2023 (modified: 07 Nov 2024)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: End-to-end image/video codecs are getting competitive compared to traditional compression techniques that have been developed through decades of manual engineering efforts. These trainable codecs have many advantages over traditional techniques such as an easy adaptation on perceptual distortion metrics and high performance on specific domains thanks to their learning ability. However, current state-of-the-art neural codecs do not fully exploit the benefits of vector quantization and the existence of the gradient of entropy in the decoding device. In this research, we propose leveraging these two properties to improve the performance of off-the-shelf codecs. Firstly, we demonstrate that using non-uniform scalar quantization cannot improve performance over uniform quantization. Thus, we suggest using a predefined optimal uniform vector quantization to improve performance. Secondly, we show that the gradient of entropy available at the decoder side is correlated with the gradient of the reconstruction error, which is not available at the decoder side. Thus, we utilize the former as a proxy to enhance compression performance. Our experimental results show that these approaches can save between 2-4% of the rate for the same quality across various pre-trained methods.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: ### First Revision According to feedback from Reviewers, we re-phased most of the sentences and fixed grammatical and mathematical notations errors. We did not highlight those changes because we think it will make the revision hard to follow. We did some structural changes and added new texts. Those changes are highlighted in the text with red color as follows; 1. For Reviewer jS5i demand, we added a block diagram of the state-of-the-art neural codecs in Figure1 in page 3 in order to make the problem statement easy to follow. 2. For Reviewer jS5i demand, we improved the caption of Figure 2a in page 4. 3. For Reviewers BrCz and jS5i demands, we clearly states the relevance and applicability of our theorems to our proposal VQ method in page 4 just before the Theorem1 and First sentences of Section 3.1 in page 5. 4. For Reviewer jS5i demand we added detailed explanation on how to simulations were done for Figure3a,b and c. This explanations are started last part of the page 5 and ends the first paragraph in the page 6. 5. For Reviewer jS5i demand, we highlighted the insight of using Vector Quantization in end to end compression in last two sentences before section 3.2 in page 6. 6. For Reviewers BrCz demand, we highlighted how to calculate bd-rate in the first paragraph of section 5.1 in page 8. 7. For Reviewer MXQX, BrCz and jS5i demands, we shifted the complexity analysis from appendix to the main text as Section 5.2 in page 10. 8. For Reviewer MXQX demand on some idea on how to decrease complexity of the LatentShift, we added a sentence to the end of section 5.2 in page 10 and used this idea in Section 5.4 9. For Reviewer jS5i demand, we shifted Latent Shift ablation study from appendix to the main text as Section 5.3 in page 10. 10. For Reviewer MXQX indirect demand, we added a new traditional codec (ECM8.0) as baseline and tested LatentShift idea on it in Section 5.4 in page 11. To that that we also added a two sentences in Section 5.3 in page 11. 11. For Reviewers BrCz demand, we added new tests reported in Section 5.5 in page 12. 12. For Reviewer jS5i demand, we fixed the issue in Eq 10 in page 20. Thanks for all the feedbacks/criticisms. ### Second Revision In this revision all changes are made for Reviewer jS5i demand on relevance of the theorems and our claims. We used blue fonts in order to highlighted our changes. These are; 1. One sentence is adapted to our updated claims on relevance of Theorem1 in abstract. 2. In last two paragraph of the introduction, we changed two sentences in order to adapt relevance of theorems to the paper in page2. 3. We improved the caption and image of Figure1 and mentioned necessary information in page 3. 4. At the beginning of Section3 in page 4, we added a new paragraph that describes why non-uniform SQ is better than uniform SQ in general, but it is not the case in neural codec as stated in Theorem1. 5. Just before Section 3.1 in page 5, we showed relevance of theorem1 to the paper and our motivation to use VQ. 6. Just after Theorem2 in page 7, we changed two sentences on the relevance of theorem2 to the paper. and also mentioned in which case, KKT conditions justifies correlation. ### Third Revision This revision is for fixing the error on proof of theorem2. We used brown fonts for changed parts. Simply, we did following changes. 1. In page 7, we slightly changed the Theorem2 and add a new corollary just after the Theorem. 2. Since new corollary says instead of gradient of main information's entropy to use the gradient of total information's entropy, we updated this in the equations accordingly and added footnote in page7 and page 8. 3. We rewrite the proof of Theorem2 in Appendix C in page 17-18 and a new subsection on proof of Corollary in page 18. All changes are highlighted with brown font.
Assigned Action Editor: ~Jakub_Mikolaj_Tomczak1
Submission Number: 946
Loading