The VIP model only use dinov3 to address training-free open-vocabulary semantic segmentation task.
The complete codebase will be publicly released upon the acceptance of the paper.