Radon Implicit Field Transform (RIFT): Learning Scenes from Radar Signals

Daqian Bao; Alex Saad-Falcon; Justin Romberg

Radon Implicit Field Transform (RIFT): Learning Scenes from Radar Signals

Daqian Bao, Alex Saad-Falcon, Justin Romberg

28 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: AI for Science, Representation Learning, Scene Rendering, Implicit Neural Representation, 3D Reconstruction, Inverse Problems

TL;DR: We combine implicit neural representation and signal processing algorithms to create a model which can reconstruct scenes by learning radar signals.

Abstract: Data acquisition in array signal processing (ASP) is costly because achieving high angular and range resolutions necessitates large antenna apertures and wide frequency bandwidths, respectively. The data requirements for ASP problems grow multiplicatively with the number of viewpoints and frequencies, significantly increasing the burden of data collection, even for simulation. Implicit Neural Representations (INRs) — neural network-based models of 3D objects and scenes — offer compact and continuous representations with minimal radar data. They can interpolate to unseen viewpoints and potentially address the sampling cost in ASP problems. In this work, we select Synthetic Aperture Radar (SAR) as a case from ASP and propose the \textit{\textbf{R}adon \textbf{I}mplicit \textbf{F}ield \textbf{T}ransform} (RIFT). RIFT consists of two components: a classical forward model for radar (Generalized Radon Transform, GRT), and an INR based scene representation learned from radar signals. This method can be extended to other ASP problems by replacing the GRT with appropriate algorithms corresponding to different data modalities. In our experiments, we first synthesize radar data using the GRT. We then train the INR model on this synthetic data by minimizing the reconstruction error of the radar signal. After training, we render the scene using the trained INR and evaluate our scene representation against the ground truth scene. Due to the lack of existing benchmarks, we introduce two main new error metrics: \textit{\textbf{p}hase-\textbf{R}oot \textbf{M}ean \textbf{S}quare \textbf{E}rror} (p-RMSE) for radar signal interpolation, and \textit{\textbf{m}agnitude-\textbf{S}tructural \textbf{S}imilarity \textbf{I}ndex \textbf{M}easure} (m-SSIM) for scene reconstruction. These metrics adapt traditional error measures to account for the complex nature of radar signals. Compared to traditional scene models in radar signal processing, with only 10\% data footprint, our RIFT model achieves up to 188\% improvement in scene reconstruction. Using the same amount of data, RIFT is up to $3\times$ better at reconstruction and shows a 10\% improvement generalizing to unseen viewpoints.

Primary Area: applications to computer vision, audio, language, and other modalities

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 12994

Loading