Abstract: Recent 3D object detectors leverage multi-frame data, including past and future data, to enhance performance. However, the method of temporal data fusion they employ has not fully tapped into its potential for improving performance. Existing works make use of multi-frame data which only fuse specific features according to ego-motion and cannot be directly applied to long sequences due to the huge computation and memory cost. We find that the present methods do not efficiently exploit history information including history predictions and object-motion. Building on our investigations, we present a novel hybrid query formulation comprised of the history queries and original queries. The history queries consist of inferred position and content queries obtained from the historical predictions and features, which take into account the motion of all objects in the current scene. What’s more, our method can be simply applied into other DETR-like models to boost performance without introducing huge computation and memory cost. As a result, our History-DETR results in a remarkable improvement(+1.1% NDS) under negligible inference time increase.
0 Replies
Loading