# ATOM of Understanding: Information-Theoretic Decomposition for Interpretable 3D Visual Reasoning

<p align="center"><img width="100%" src="./docs/ATOM.png"></p>

This is the official repository of **ATOM of Understanding: Information-Theoretic Decomposition for Interpretable 3D Visual Reasoning**.
## 🏠 Abstract
3D Question Answering requires integrating heterogeneous data representations including point clouds, multi-view images, 
and natural language. Existing approaches function as black boxes, providing limited interpretability into how different 
modalities contribute to predictions. We present ATOM (\textbf{A}daptive \textbf{T}ask-aware m\textbf{O}dular \textbf{M}odel), 
an information-theoretic framework that operationalizes Partial Information Decomposition (PID) for interpretable 3D 
question answering. ATOM explicitly models the multimodal interactions as four information atoms: redundancy, 
modality-specific uniqueness, and synergy. Our framework employs a Query-driven View Aggregator (QVA) to extract geometrically 
consistent and question relevant visual features, a Contextual Grounding Module (CGM) for description-guided visual grounding, 
a Question-aware PID (Q-PID) module with theoretically-grounded regularization losses, and a Dynamic Atom Modulation (DAM) 
mechanism for adaptive atom reweighting. Extensive experiments demonstrate that ATOM achieves state-of-the-art performance 
(23.52 EM@1 on ScanQA, 49.71 EM@1 on SQA3D) while providing transparent reasoning through explicit modeling of information 
atom dynamics.

## 📚 Installation

Please refer to [installation guide](docs/installation.md).

## 📋 Dataset

Please refer to [data preparation](docs/dataset.md) and [Scannet preparation](data/scannet/README.md) for preparing the ScanNet v2, ScanQA and SQA3D datasets.
## 🤖 Usage

### Framework implementation
The main framework can be found in [`mv_vlm_atom_qa.py`](embodiedqa/models/framework/mv_vlm_atom_qa.py).

The Q-PID module can be found in [`decomposition.py`](embodiedqa/models/framework/decomposition.py).

The CGM module can be found in [`context_grounding.py`](embodiedqa/models/framework/context_grounding.py).

We provided very detailed comments to help readers to understand both our ATOM and DSPNet.

### Training
- (Optional) Start training the poinnet++ in the object detection task of scannet dataset, or you can directly download 
the checkpoint. We found that the lack of pretraining for PointNet++ has little impact on the ScanQA and SQA task.
  ```shell
    CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 tools/train.py \
    configs/scannet-det/scannet-votenet-12xb12.py --work-dir=work_dirs/scannet-det/scannet-votenet --launcher pytorch
  ```

- Training ATOM on ScanQA task:
  ```shell
    CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 tools/train.py \
    configs/scanqa/atom.py --work-dir=work_dirs/mv-scanqa/atom --launcher pytorch
  ```

- Training DSPNet on SQA task:
  ```shell
    CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 tools/train.py \
    configs/sqa/atom.py --work-dir=work_dirs/mv-sqa/atom --launcher pytorch
  ```
- For ease of analysis, we incorporate the evaluation into the end of each `interval` epochs of training.

- (Optional) Training DSPNet on ScanQA task:
  ```shell
    CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 tools/train.py \
    configs/scanqa/dspnet.py --work-dir=work_dirs/mv-scanqa/dspnet --launcher pytorch
  ```

- (Optional) Training DSPNet on SQA task:
  ```shell
    CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 tools/train.py \
    configs/sqa/dspnet.py --work-dir=work_dirs/mv-sqa/dspnet --launcher pytorch
  ```

### Inference
- Evaluation of trained models with the ScanQA test dataset:

  ```shell
    CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 tools/test.py \
    configs/scanqa/atom.py work_dirs/mv-scanqa/atom/best_EM@1_epoch_<epoch_id>.pth \ 
    --work-dir=work_dirs/scanqa_atom_test_w_object  --launcher pytorch
  ```

- The [ScanQA benchmark](https://eval.ai/web/challenges/challenge-page/1715/overview) is hosted on [EvalAI](https://eval.ai/). 
Please submit the `work_dirs/scanqa_test_{split}/test_result.json` to this site for the evaluation of the test with and 
without objects. You can set up the test split by modifying the `ann_file` and `qa_file` of `test_dataloader` 
in [`atom.py`](configs/scanqa/atom.py).


- Evaluation of trained models with the SQA3D test dataset:

  ```shell
    CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 tools/test.py \
    configs/sqa/atom.py work_dirs/mv-sqa/atom/best_EM@1_epoch_<epoch_id>.pth \ 
    --work-dir=work_dirs/sqa_atom_test  --launcher pytorch
  ```

- Due to the inherent randomness of multi-gpu distribution, the re-evaluation results may be slightly different from the 
evaluation results during training (We use the evaluation results during training in our report).

## 📦 Checkpoints

| Checkpoint           | Link                                                         | Note                                              |
| :------------------- | ------------------------------------------------------------ | ------------------------------------------------- |
| VoteNet  | [link](https://drive.google.com/file/d/1OTj-q4aPmsAg0rSq8T3uqCEUHXuCRQtb/view?usp=drive_link) | VoteNet Pre-trained on ScanNet.                  |

please download it and save the checkpoint under ``work_dirs/scannet-det/scannet-votenet-12xb12``

## 👏 Acknowledgements
We would like to thank [EmbodiedScan](https://github.com/OpenRobotLab/EmbodiedScan) for the codebase of 3D training pipeline and [mmdetection3d](https://github.com/open-mmlab/mmdetection3d) for the codebase of pointnet++ and votenet.