Abstract: We formulate the task of 3D object retrieval as a visual search problem where a database containing videos of objects captured manually from different viewpoints is queried using a single image. We propose to aggregate visual information of similar views and use the Fisher vector (FV) framework to compactly represent a database of objects. Large-scale experiments on an existing video dataset that we complemented with image queries, shows that our aggregation schemes significantly outperform standard retrieval techniques. When representing our database with only 4 FVs per object, our approach performs with a mean average precision (mAP) of 73.0% on our dataset while the baseline (no aggregation) only reaches a mAP of 43.8%. It can also reach a 72.0% mAP level with a 10× smaller database than the baseline.
0 Replies
Loading