DVI:A Derivative-based Vision Network for INR

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: we propose DVI, a novel Derivative-based Vision network for INR, capable of handling a variety of vision tasks across various data modalities
Abstract: Recent advancements in computer vision have seen Implicit Neural Representations (INR) becoming a dominant representation form for data due to their compactness and expressive power. To solve various vision tasks with INR data, vision networks can either be purely INR-based, but are thereby limited by simplistic operations and performance constraints, or include raster-based methods, which then tend to lose crucial structural information of the INR during the conversion process. To address these issues, we propose DVI, a novel Derivative-based Vision network for INR, capable of handling a variety of vision tasks across various data modalities, while achieving the best performance among the existing methods by incorporating state of the art raster-based methods into a INR based architecture. DVI excels by extracting semantic information from the high order derivative map of the INR, then seamlessly fusing it into a pre-existing raster-based vision network, enhancing its performance with deeper, task-relevant semantic insights. Extensive experiments on five vision tasks across three data modalities demonstrate DVI's superiority over existing methods. Additionally, our study encompasses comprehensive ablation studies to affirm the efficacy of each element of DVI, the influence of different derivative computation techniques and the impact of derivative orders. Reproducible codes are provided in the supplementary materials.
Lay Summary: Computer vision systems struggle to effectively process Implicit Neural Representations (INR) data, either using limited INR-based methods or losing crucial structural information when converting to traditional formats. We developed DVI, a Derivative-based Vision network that extracts structural information from high-order derivatives of INR data and seamlessly integrates it with existing vision networks. DVI handles multiple vision tasks across various data types with superior performance. This advancement helps computers better "understand" complex visual data, potentially improving applications from medical imaging to autonomous driving.
Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.
Primary Area: Applications->Computer Vision
Keywords: Data Compression, Implicit Neural Representation
Submission Number: 1966
Loading