Abstract: Joint optimization, which jointly optimizes compression and machine vision algorithms, is widely regarded as an effective strategy for enhancing compression performance in the field of coding for machines. However, existing joint optimization methods usually incorporate a semantics parsing module at the end of the pipeline, raising a critical question: Does the performance improvement stem from the joint optimization itself, or is it primarily driven by the tailed semantics parsing module? To address this, we disentangle the tailed semantics parsing module from the joint optimization pipeline by leveraging the simplicity of the person re-identification task, where semantics parsing involves deterministic feature matching rather than a learned neural network. First, we propose a separate optimization pipeline and two joint optimization pipelines to systematically investigate the effectiveness of joint optimization. Our findings reveal that joint optimization alone does not necessarily guarantee performance improvement. Second, we evaluate the influence of the tailed semantics parsing module by equipping it with varying capabilities, demonstrating that higher parsing capability directly correlates with better machine vision performance. These findings underscore the pivotal role of tailed semantics parsing in enhancing machine vision performance and challenge the assumption that joint optimization alone drives improvement. This work offers new insights for designing effective coding methods, emphasizing the interplay between optimization strategies and tailed semantics parsing.
External IDs:dblp:conf/icmcs/GaoLLLWL25
Loading