Title: GIC: Gaussian-Informed Continuum for Physical Property Identification and Simulation

Abstract: Accurately estimating the physical properties of deformable objects (system identification) from visual observations is a critical yet challenging task for various applications. We introduce GIC, a novel geometry-informed continuum framework that synergistically leverages 3D Gaussian representations to capture explicit object shapes and empower simulated continuums to render 2D shape surrogates (object masks) for robust physical property estimation. Our approach first employs a motion-factorized dynamic 3D Gaussian framework to precisely reconstruct objects as 3D Gaussian point sets across various time states. Subsequently, a coarse-to-fine filling strategy generates dense object continuum fields from these reconstructions, enabling the extraction of both object surfaces and Gaussian-informed continuum particles. These particles are then utilized to render object masks during simulations, providing crucial 2D-shape guidance for physical property estimation. Extensive experimental evaluations demonstrate that GIC achieves state-of-the-art performance across multiple benchmarks and metrics, and we further validate its practical utility through real-world demonstrations. Our project page is at https://jukgei.github.io/project/gic.

Section: Introduction
Identifying the physical properties of objects (i.e., system identification) is essential for numerous applications such as games, digital twins, and robotic manipulation [1][2][3]. Despite its importance, estimating these properties solely from visual observations remains a long-standing challenge for computational perceptual algorithms, even though humans can intuitively deduce them from object deformation at a single glance.
Many established methods [4][5][6] address this by assuming elastic materials [7] and employing physics-based modeling with mass-spring systems (MSS) or finite element method (FEM). However, this assumption inherently limits their applicability to more general material types, such as fluids or granular media. Furthermore, a significant drawback of prior work [8][9][10] is the requirement for ground-truth object geometry, which severely restricts their practical deployment. While some methods [5,4] attempt to decouple geometry and property recovery using stereo observations or dynamic neural reconstruction [11], the resulting noisy geometric reconstructions often lead to degraded system identification performance.
Recently, PAC-NeRF [12] integrated neural radiance fields (NeRF) [13] with a continuum dynamic model to address some of these issues by capturing object geometries and physical properties within a unified framework. Despite its effectiveness, this method suffers from two primary limitations. First, the implicit shapes represented by NeRF often yield inferior geometries, which can result in inaccurate trajectories during simulation. Second, PAC-NeRF renders novel views of deformed objects using an appearance radiance field reconstructed from a static scene, which can introduce significant texture distortion and discrepancies between rendered and observed images, especially under large deformations [14].
To overcome these limitations, this paper proposes GIC, a novel hybrid solution based on 3D Gaussians [15,16] and the Material Point Method (MPM) [17,18]. The core strength of GIC lies in its ability to leverage both explicit 3D shapes from dynamic 3D Gaussian reconstruction and 2D shape surrogates (object masks) rendered by a Gaussian-informed continuum for robust physical property estimation.
Specifically, to generate more precise shapes for physical property reasoning, we first introduce a motion-factorized dynamic 3D Gaussian network for high-fidelity dynamic scene reconstruction. We then extract the continuum from the reconstructed 3D Gaussians at each frame by employing a coarse-to-fine filling strategy to progressively generate the object's density fields. These density fields are used to sample continuum particles for simulation and to extract object surfaces, which serve as 3D-shape supervision during physical property estimation. To mitigate the appearance distortion observed in PAC-NeRF under large deformations, we further assign Gaussian attributes to the continuum particles, where opacity and scale are derived from the density field. This Gaussian-informed continuum enables the rendering of object masks during simulation, providing robust 2D-shape guidance for estimation and effectively circumventing the use of potentially inaccurate RGB renderings.
To demonstrate the superiority of GIC over existing baselines, we conduct comprehensive experiments, including evaluations of physical properties, dynamic reconstruction, and future state simulation. We also showcase a real-world application in digital twins and robotic manipulation, highlighting the practical utility of our proposed method.
Our contributions are summarized as follows:
• We propose GIC, a novel hybrid pipeline that leverages 3D Gaussian representations to simultaneously acquire explicit 3D shapes and empower simulated continuums to render 2D shapes (object masks) for enhanced physical property estimation.
• We introduce a novel motion-factorized dynamic 3D Gaussian framework for more precise dynamic scene reconstruction. Additionally, we develop a coarse-to-fine filling strategy to generate object density fields, enabling the extraction of object surfaces and the creation of Gaussian-informed continuum particles.
• Extensive experiments demonstrate that our pipeline achieves state-of-the-art performance across multiple benchmarks and metrics. We also present compelling real-world demonstrations showcasing the practical efficiency and effectiveness of the proposed method.

Section: Related Work
Dynamic reconstruction. Reconstructing dynamic scenes from monocular or multi-view video(s) is a long-standing problem in the computer vision community [19,20]. Previous works exploit neural implicit representation [21,22] for non-rigid reconstruction. These methods either reconstruct the scene in a frame-wise manner [23,24] or maintain a canonical shape and model the deformation with a neural network [25,26,11,27]. While effective for novel view synthesis, these methods often require extensive training time and can result in noisy deformations owing to the implicit representation, which may compromise the utility of the recovered geometries for physical property estimation [12]. Recent progress in 3D Gaussian Splatting (3DGS) technique [15] stands out to be a prevalent method for 3D reconstruction and novel view synthesis because of the abilities of explicit shape modeling and extremely fast view rendering. Similar to non-rigid NeRF, many follow-up works extend the 3DGS into 4D by treating each frame separately [28] or decomposing a scene into a canonical 3D Gaussian point cloud and a deformation model that warps the canonical shape into a specific scene [16,29,30]. In this paper, we draw upon these prior studies [16,29] and propose a novel motion-factorized dynamic 3D Gaussian network to achieve better performance on reconstruction and novel view synthesis.
System identification. Understanding the physics laws of the 3D world is beneficial for simulation [31][32][33][34][35]6] and manipulation [2,3,[36][37][38]. However, unveiling these properties from visual information is an extremely difficult task due to the ambiguity introduced by incomplete observation and the high degrees of freedom of the scene. Early works [39,40] study the problem by learning physical properties via interactions. With recent improvements in differentiable physics simulation [17,18,[41][42][43][44][45], many methods turn to evaluate the physical properties by comparing the rendering results with 2D ground truth given the prior knowledge about the object geometry. VEO [5] presents a differentiable simulator to learn patterns from 4D reconstruction and force-displacement measurements. Another approach [4] eliminates the dependence of captured forces by proposing an iteration framework between deformation tracking and parameter optimization. While these methods demonstrate promising results, the inferior reconstruction might lead to degraded performance, and the assumption of elastic material restricts the applicability. PAC-NeRF [12] instead proposes a single framework to recover both the unknown geometry and physical properties of deformable objects from multi-view video sequences. However, the inferior geometries and blurry rendered images might have detrimental effects on physical property reasoning. In this work, we adopt MPM as our simulation framework following the approach used in PAC-NeRF due to its ability to simulate a variety of materials [6,[46][47][48]. Unlike previous approaches, we utilize dynamic 3D Gaussians to reconstruct explicit 3D geometries and generate simulatable continuum particles. Furthermore, we enhance the particles with Gaussian attributes, facilitating the rendering of 2D shapes, and thereby improving physical parameter estimation.

Section: Preliminary
In this section, we briefly review the core idea of 3D Gaussian Splatting (3DGS) [15] and introduce its point-based alpha blending to render depth maps and foreground masks. Typically, 3DGS utilizes 3D Gaussians, each defined by a central point µ 0 , a covariance matrix Σ 0 , a density value σ, and a color attribute c, to efficiently render images from specific viewpoints. Each point is denoted as
G(x) = exp(- 1 2 (x -µ 0 ) T Σ -1 0 (x -µ 0 )),(1)
where Σ 0 can be factorized as Σ 0 = R 0 S 0 S T 0 R T 0 , in which R 0 is a rotation matrix represented by a quaternion vector r 0 ∈ R 4 , and S 0 is a a diagonal scaling matrix characterized by a 3D vector s 0 ∈ R 3 . If we consider isotropic Gaussian representation, the scaling matrix can be written as s 0 I, where s 0 is a scalar and I is the identity matrix. When performing splatting, the 3D Gaussians are projected into 2D with the covariance matrix defined as Σ ′ 0 = JW Σ 0 W T J T , where J is the Jacobian of affine approximation of the projective transformation [49], and W is the viewing transformation matrix. The rendered color I(u) with its foreground mask A(u) at pixel u are then evaluated by integrating N ordered slatted Gaussians via the point-based alpha blending. Since the depth of each Gaussian point at a specific view can be obtained according to its transformation matrix, we can further render the depth map D using the same blending method [16,50], as
I(u) = i∈N T i α i c i , A(u) = i∈N T i α i , D(u) = i∈N T i α i d i ,(2)
where
T i = i-1 j=1 (1 -α j )
is the accumulated transmittance, α i is the probability of termination at point i, and d i is the depth of the Gaussian point at the specific view.

Section: Method


Section: Problem Definition and Overview
In this work, we aim to reconstruct the geometries and the physical properties of various object types from multi-view videos. Formally, given a set of video sequences {V i |i = 1...n} with moving object and the corresponding camera extrinsic and intrinsic parameters {(T i , K i )|i = 1...n}, the goal of this task is to recover the explicit geometries of the object represented by continuum particles P (t) and its corresponding physical parameters Θ (e.g., Young's modulus E and Poisson's ratio ν for elastic objects). We follow the assumption in PAC-NeRF and PhysGaussian [12,51] that the object types (e.g., elastic, granular, Newtonian/non-Newtonian, plastic) are known and the physical phenomenon follows continuum mechanics [17,52]. The overview of the proposed pipeline is illustrated in Fig. 1, which consists of three modules: a motion-factorized dynamic 3D Gaussian network (Sec. 4.2) for 4D reconstruction of the object, a coarse-to-fine density field generation strategy (Sec. 4.3) for continuum generation, surface extraction, and Gaussian attribute assignment, and a procedure (Sec. 4.4) showing how we leverage Gaussianinformed continuum and extracted surfaces to estimate physical properties.

Section: Motion-factorized Dynamic 3D Gaussian Network
Our dynamic 3D Gaussian network follows existing frameworks [16,29,30] that simultaneously maintain a canonical 3D Gaussian set and a deformation field modeled by a neural network to warp the canonical shape into object states at specific times. The core idea of this pipeline, presented in Fig. 2, is that the motion of every point in the object can be decomposed into a small range of motion bases.
Architecture. We first factorize the entire motion into N m bases that are modeled by a fully connected neural network, where every basis shares a common backbone except the final layer. The output of each basis consists of the deformations at position dµ i (t) ∈ R 3 and at scale ds i (t) ∈ R. To model the exact deformation for each position, we next propose a lightweight coefficient network that maps the positions at canonical space with specific time to their corresponding motion coefficients w(µ 0 , t) ∈ R Nm . Therefore, the deformed position and the scale for each Gaussian point are evaluated by the linear combination of the motion basis according to the motion coefficients:
µ(t) = µ 0 + Nm i=1 w i (µ 0 , t)dµ i (t), s(t) = s 0 + Nm i=1 w i (µ 0 , t)ds i (t).(3)
In this work, we regard all the Gaussians as isotropic kernels, which has been demonstrated as an effective way to simplify the model and better reconstruct the scene [6,53]. We should note that although previous works [29,54] also perform motion decomposition modeling, our pipeline shows two major differences: 1) instead of modeling each basis with an independent neural network, our module shares a common backbone. Our key observation is that for reconstructing a dynamic object, all points on the object should follow a similar moving tendency, and the final heads of the neural network are sufficient to model the details of different parts of the object; 2) to increase the ability to fit high rank of the dynamic scene [16], we model the motion coefficients as time-variant variables rather than constant Gaussian attributes [29].
Optimization. We employ the same setting in [16] to train our pipeline. Concretely, the canonical 3D Gaussians are initialized with points randomly sampled from the given bounding box of the scene. We start training the deformation network after 3,000 iterations of warm-up for the 3D Gaussians. Similar to previous works [16,29], we optimize the pipeline by computing the L1 norm and Structural Similarity Index Measure (SSIM) between the rendered image I and the ground truth image Ĩ. Moreover, since large scales may lead to inaccurate reconstructed shapes [55], we thus perform L1 norm on the scale attributes of all the points to recover more fine-grand shapes of the object. Therefore, the overall loss function is defined as:
L gs = L 1 (I, Ĩ) + λ 1 L ssim (I, Ĩ) + λ 2 L 1 (s(t)),(4)
where λ 1 and λ 2 are balancing hyperparameters. More in-depth analysis of the proposed pipeline, including implementation details and effects of scale regularization, are presented in Appendix A.1.

Section: Gaussian-informed Continnum Generation
Coarse-to-fine density field generation. Since the reconstructed Gaussian particles are served for rendering only, meaning that they are not evenly distributed on the objects, they cannot be directly used for simulation [51]. Therefore, we propose a novel coarse-to-fine filling strategy to iteratively generate density fields of the object based on the reconstructed Gaussian particles from Eqn. 3 and the internal particles filtered by the rendered depth maps. The proposed strategy is presented in Alg. 1.
The implementation details and visual results are illustrated in Appendix A.2.
Concretely, the internal particles, initialized by uniform sampling from the bounding box of Gaussian particles, are filtered by projecting the particles to various images to compare the projected depth with rendered depth values (lines 1-6 in Alg. 1). The resulting particles can roughly represent the shape of the object. However, as denoted in Eqn. 2, the rendered depth maps are evaluated in an accumulated manner, making them less precise in representing the object surface.
Therefore, We employ a coarse-to-fine filling strategy by iteratively upsampling the density field and reassigning the densities on the indices computed from both the Gaussian and internal particles (lines 8-16 in Alg. 1). Fig. 3 provides a sketch illustration of the proposed strategy. Specifically, due to the large grid size at the initial stage, the object is completely inside the voxels with high densities.
Next, we sequentially perform upsampling (line 10), mean filtering (line 13), and reassigning the The particles are again used to correct the voxels that contain particles with high densities. (d) and (e) repeat the previous operations to achieve a more detailed shape.
field (line 14) at each iteration. The first two operations produce more fine-grained shapes, and the reassigning operation ensures high densities at the surface to avoid over-erosion caused by the first two steps. Finally, the continuum particles with the corresponding object surfaces can be extracted by thresholding the density field (lines 16-17 in Alg. 1). (u in , v in ), d in ← P roj(P in , T i , K i ); ▷ obtain image indices and depths of P in at view i 5:
P in ← P in [ Di (u in , v in ) ≤ d in ];
▷ filter out particles that are outside the object 6: end for 7: Initialize the zero-value density field F (t) with ∆x and the bounding box of {µ(t)}; 8: for j ← 1, n u do 9:
if j ̸ = 1 then 10:
F (t) ← T rilinearInterpolation(F (t), 2)
▷ upsample F (t) with scale factor 2 11:
F (t)[p, q, r] = 1, where p, q, r ← Discretize(P in ∪ {µ(t)});
12:
end if 13:
F (t) ← M eanF iltering(F (t)); 14:
F (t)[p, q, r] = 1, where p, q, r ← Discretize(P in ∪ {µ(t)}); 15: end for 16: P (t) ← GetP osition(th min ≤ F (t)); 17: S(t) ← GetP osition(th min ≤ F (t) ≤ th max );
Gaussian-informed continuum. In PAC-NeRF, the particles are equipped with appearance features to enable image rendering for the continuum at different states. We can also achieve this function by treating the particles as Gaussian kernels and re-train the particles using the visual data. However, this process is cumbersome and will also face the same issue in PAC-NeRF where distorted RGB images will be rendered when large deformation occurs. Therefore, instead of injecting appearance attributes, we opt to assign density and scale attributes to the particles where the densities originate from the density field, and the scale attributes can be directly obtained by the field grid size. The Gaussian-informed continuum is defined as a set of triplets:
P P = {(p, s ∆x , σ F )},(5)
where p ∈ P , s ∆x = ∆x/2 nu , and σ F = F [Discretize(p)] (we neglect t in the notation for simplicity). Therefore, we only render object masks as 2D shape surrogates for supervision.

Section: Geometry-aware Physical Property Estimation
With the Gaussian-informed continuum at initial state P P (0) and the extracted surfaces S(t) in place, we can employ MPM to perform simulation on the continuum and evaluate the difference in terms of both the 3D and 2D shapes. Concretely, after a rollout by MPM given the current estimation of physical parameters, we obtain a trajectory P (t) with corresponding object surfaces S(t). We thus can render object masks over the trajectory. Then the loss of the current rollout can be computed as:
L ppe = 1 m m i=1 [L CD (S(t i ), S(t i )) + 1 n n j=1 L 1 (A j (t i ), Ãj (t i ))],(6)
where L CD and L 1 are chamfer distance and L1 norm respectively, S(t i ) denotes the simulated surface at time t i , A j (t i ) is the rendered mask at view j, and Ãj (t i ) represents the object mask of the image extracted from video V j at time t i . Due to the differential property of the simulator, the evaluated loss is used to optimize the target physical parameters Θ.

Section: Experiments
Datasets. To thoroughly assess our proposed method, we employ two sources of data introduced by PAC-NeRF [12] and Spring-Gaus [6]. Concretely, PAC-NeRF contributes two synthetic datasets generated by MLS-MPM framework [18]. Each object in both datasets includes RGB images from 11 distinct viewpoints, with approximately 14 frames per viewpoint. The datasets feature a range of materials, including elastic and plastic objects, granular media, and both Newtonian and non-Newtonian fluids. The first dataset contains 45 cross-shape objects with different initial conditions and ground-truth values of physical properties, while the second one consists of 9 objects with different shapes. The interpretation of the physical parameters is listed in Appendix A.9 and A.10. Spring-Gaus generates a synthetic dataset of elastic objects and collects a real-world dataset containing both static and dynamic scenes. The synthetic data contains 30 frames in each of 10 viewpoints. While the real-world data only contains 3 viewpoints for each object in the dynamic scene, it captures 50-70 images from various viewpoints for the static scene. Moreover, we follow previous works [12,6] and use the off-the-shelf matting [56] or segmentation [57] techniques to obtain object masks.
Baselines. For dynamic reconstruction, we compare with PAC-NeRF and the current state-of-the-art deformable 3D Gaussian method DefGS [16] on the PAC-NeRF synthetic dataset. More comparison of our dynamic 3D Gaussian pipeline on other widely-used datasets such as D-NeRF [25] is presented in Appendix A.1.3. For system identification, we employ PAC-NeRF as the baseline and evaluate the performance using the two datasets introduced in PAC-NeRF. To further demonstrate the precision of the proposed method in terms of geometry recovery and future prediction, we perform experiments on the Spring-Gaus synthetic dataset and compare the results with PAC-NeRF and Spring-Gaus.
Metrics. The evaluation metrics in the experiments include 1) Chamfer Distance (CD), with units expressed in 10 3 mm 2 ; 2) Earth Mover's Distance (EMD); 3) Peak Signal-to-Noise Ratio (PSNR); 4) Structural Similarity Index Metric (SSIM) [58]; and 5) Mean Absolute Error (MAE), with values scaled by a factor of 100. The first two metrics are used to evaluate discrepancies between the reconstructed and ground-truth point clouds. PSNR and SSIM are leveraged on the Spring-Gaus dataset to validate the precision of future state prediction. We compute the mean absolute error for the evaluation of physical property estimation.

Section: Evaluation on PAC-NeRF Synthetic Dataset
Comparison on dynamic reconstruction. In this experiment, we first perform dynamic Gaussian reconstruction on the cross-shaped object dataset using DefGS and our proposed method, respectively. We then employ the same filling strategy on the reconstructed Gaussians at each time state to generate the continuum, which is regarded as the final recovered geometry of the object and used to make comparisons with the oracle shape to compute CD and EMD. Since PAC-NeRF jointly recovers both geometries and physical parameters, we use the final estimated results to generate the trajectory for evaluation.
The results, reported in Tab. 1, show that our method outperforms the baselines on both metrics and achieves more precise reconstruction performance on most objects. Specifically, we find that   the NeRF representation used by PAC-NeRF usually leads to overly large shape generation. While DefGS performs well on elastic objects, its performance degenerates when modeling objects with large deformations, such as granular media and fluids. Our method can better handle these objects due to the flexibility of trajectory representation. Comparison on system identification.
We evaluate the performance of system identification of the two datasets proposed by PAC-NeRF. For the first dataset, we compute the MAE of the parameters for each type of object. To demonstrate the effectiveness of the 2D shape representation, we also conduct experiments on the second dataset by only using masks for supervision on our method, namely "Ours*". For the second dataset, we execute 10 times of our method with different random seeds for each object instance and report the mean value of the estimation results. The training details are illustrated in Appendix A.3.
The results, reported in Tab. 2 and Tab. 3, show that the proposed hybrid pipeline can achieve more accurate estimation over a wide range of entries and objects, which demonstrate the effectiveness of the geometry-aware guidance. Fig. 4 visualizes the RGB images rendered by PAC-NeRF and the masks rendered by our method. We can see that when large deformation occurs, the rendered RGB image becomes distorted, while the rendered mask can effectively reduce such effect and get better perfor-   mance. By leveraging both 3D and 2D shape guidance, our method obtains the best results on most entries. More qualitative results are available in the supplementary video.

Section: Evaluation on Spring-Gaus Synthetic Dataset
Comparison on future state simulation. To further demonstrate the performance of our proposed method, we follow the setting in Spring-Gaus [6] that uses the first 20 frames as training data and the subsequent 10 frames for evaluation. Concretely, we first perform system identification based on our method and then use the estimated physical parameters and the continuum to simulate a trajectory that includes the states of the 30 frames. Therefore, we can compute CD and EMD between the simulated continuum and the ground-truth point cloud. Since we know the exact position of the continuum at each time state after estimation, we can assign time-invariant Gaussian attributes by training Gaussians on the continuum using the first 20 frames of RGB images, which enable image rendering at novel views and states. Therefore, we can compute PSNR and SSIM at any time state.
The results of future state prediction are presented in Tab. 4, and the results of reconstruction on the training states are reported in Appendix A. 4. We observe that our method significantly outperforms the baselines on CD and EMD metrics over almost all object instances, which shows the superiority of our method for both geometry recovery and system identification. The results of PSNR and SSIM show that leveraging dynamic visual data to train the Gaussian attributes on the continuum improves rendering quality. This further reveals that the generated trajectories are precise such that the particles are consistent to contribute to the rendering for the same region of the object at different time states.

Section: Real-world Application: Digital Twins in Robotic Grasping Scenario
To demonstrate the efficacy of the proposed method in real-world scenarios, we perform system identification on the real-world dataset collected by Spring-Gaus [6], as shown in Fig. 5. Since the real-world dataset consists of static and dynamic scenes for each object, we follow the procedure introduced by Spring-Gaus to progressively 1) reconstruct a Gaussian set of the object from the  static scene, 2) transform the static Gaussian set to the initial state of the dynamic scene based on a registration network similar as iNeRF [6,59], and 3) perform system identification from the dynamic observation by our method "Ours*" due to the lack of sufficient images for dynamic reconstruction. Subsequently, we establish robotic platforms in both simulated and real-world environments, each equipped with UR10 robot arms configured identically. We then execute grasp attempts on both the reconstructed objects with the estimated properties in the simulation and the corresponding real-world objects under the same configuration. The results of more objects, and more details about the training and the experiment setting are presented in Appendix A.5. From the results shown in Fig. 5, we see that our method demonstrates its capability to effectively model the deformation experienced by the objects upon impact with a surface. Furthermore, by applying identical gripper forces to both the simulated and real-world versions of the objects, we observe similar deformation behaviors. This consistency in deformation under identical conditions supports that the estimated physical parameters closely mirror the real-world properties of the objects.

Section: Conclusion and Limitations
This paper proposes a novel solution that leverages the 3D Gaussian representation of objects to acquire explicit shapes while concurrently enabling the simulated continuum to render 2D shapes to facilitate the estimation of physical properties. A novel motion-factorized dynamic 3D Gaussian framework is proposed to reconstruct precise dynamic scenes. Object surfaces and Gaussian-informed continuum are obtained by utilizing the proposed coarse-to-fine density field generation strategy. Extensive experiments demonstrate the efficacy and applicability of our method.
Despite the performance we achieve, this method still suffers from limitations, such as the assumption of continuum mechanics, the requirements of multi-view images with known camera poses, and the need for prior knowledge of object constitutive models. Integrating the pose-free method [60] or generalized constitutive [61] model with our method will be an interesting direction for future work.
From the perspective of application, while this method can yield accurate estimations, it may pose risks for fragile objects, as the interaction required for property inference could potentially cause damage. Moreover, the computational demands of our framework are substantial which require at least 1.5 hours to simultaneously recover both the geometry and physical properties of each object. Future work could explore leveraging multi-model large language models [62] and large reconstruction models [63][64][65][66] to facilitate the recovery process.

Section: A Appendix
A.1 Motion-factorized Dynamic 3D Gaussian Network

Section: A.1.1 Implementation details
We employ temporal and positional encoding to the time t and position µ 0 , respectively, to introduce features with various frequencies. Specifically, the encoding module is denoted as γ(x) = sin(2 k πx), cos(2 k πx) L-1 k=0 , where L = 10 for both t and µ 0 . All the modules within the proposed network are composed of fully connected layers. The intermediate layers are uniformly designed, featuring both input and output channels configured to 256, and employ ReLU activation. For training, we adhere to the protocol established in [16], utilizing the Adam optimizer [67] with the same learning rate as specified in [16]. The total number of iterations is set at 40,000, with densification and pruning operations conducted every 500 steps until reaching 15,000 iterations. Additionally, the number of motions N m is set to 8 for all objects in our network. λ 1 and λ 2 in Eqn. 4 are all set to 1. All the experiments are conducted on a single A10 GPU. When addressing the deformation of objects such as fluids or granular media, the network may struggle to fit transformations accurately due to significant discrepancies between the canonical and target shapes. As a compensatory mechanism, the network may employ Gaussians with enlarged scales to mitigate shape distortions during image rendering. This effect is visualized in the top row of Fig. 6. To rectify this issue, we implement scale regularization during network training, which enforces Gaussian kernels to maintain smaller scales. The efficacy of this operation is demonstrated in the second row of Fig. 6, where it is evident that scale regularization enables the reconstruction of more precise shapes for rendering.

Section: A.1.2 Effects of scale regularization


Section: A.1.3 Evaluation on D-NeRF Dataset
To further evaluate the performance of our method in terms of novel view synthesis, we conduct the experiment on the D-NeRF [25] dataset, which is a widely used benchmark consisting of moving items with data captured by a monocular camera. We compute PSNR on the D-NeRF test set and compare our method with previous dynamic approaches, including Tensor4D [68], K-Planes [69], TiNeuVox [70], and DefGS [16]. The results, reported in Tab. 5, demonstrate the proposed dynamic 3D Gaussian pipeline can also achieve superior performance on rendering. In Alg. 1, the number of iterations, denoted as n u , is uniformly set to 4 for all objects. set the initial grid size ∆x according to the volume of the object. For most objects, ∆x = 0.1, while for small items such as toothpaste in PAC-NeRF dataset, ∆x = 0.01. The parameters th min and th max are set to 0.5 and 0.8, respectively. The resulting particle count ranges from approximately 50,000 to 100,000.

Section: A.2.2 Visualization of coarse-to-fine filling
Fig. 7 visualizes the filling results of our proposed coarse-to-fine strategy with different numbers of iterations, along with the results from PAC-NeRF and ground-truth shapes. The qualitative results show that our method can generate more accurate shapes compared with PAC-NeRF, which tends to recover over-large shapes. We should note that we cannot recover the cat-shaped object as in [12], though we use the code officially implemented by PAC-NeRF without any modification.

Section: A.3 Training details on PAC-NeRF Dataset
The training process is divided into two sub-processes, where we perform system identification after estimating the initial velocity of the object using the first three frames of data. Both processes use Adam [67] optimizer to tune the parameters.

Section: A.4 More Experiments on Spring-Gaus Synthetic Dataset
Besides performing evaluation on the simulated future states in Sec. 5.2, we also evaluate CD and EMD on states existing in the training data, and the results are reported in Tab. 6. It is obvious to see that our method outperforms the baselines by a large margin, which further demonstrates the performance of our method in terms of reconstruction and identification.   Each material contains 10 instances (5 for granular material) with various object orientations, initial velocities, and physical parameters. To evaluate the necessity of 2D mask supervision, we perform system identification on 45 crossshaped object instances in the PAC-NeRF dataset by our method but with only object surface supervision. The results are reported in Tab. 9. It is obvious to see that combining both 2D and 3D shapes as supervision can achieve more accurate performance compared to using 3D shapes only. Therefore, we believe that utilizing 2D mask supervision to some extent makes up for the errors introduced by the 3D object surfaces extracted from dynamic 3D Gaussians. 

Section: A.9 Physical Properties
In this work, we simulate five types of materials, including elasticity, plasticine, granular media, Newtonian fluids, and non-Newtonian fluids. Each material exhibits distinct physical properties. We provide a brief introduction to the properties of each material.
Elasticity: The Young's modulus (E) is a measure of the stiffness of a solid material, quantifying the relationship between stress and strain in a material under elastic deformation. The Poisson's ratio (ν) describes the tendency of a material to expand or contract along its width when it is stretched or compressed along its length.
Plasticine: The yield stress (τ Y ) is the minimum stress that a material requires to transition from elastic deformation to plastic deformation, marking the onset of permanent deformation. Both Young's modulus (E) and Poisson's ratio (ν) exhibit characteristics similar to those of elastic materials.
Granular Media: The friction angle (θ f ric ) is a measure of the inherent resistance of a granular material to sliding or shearing, directly related to the angle at which a material can be piled without slumping.
Newtonian fluids: The bulk modulus (κ) is a measure of a material's resistance to uniform compression, quantifying how much it compresses under a given amount of external pressure. Fluid viscosity (µ) describes a fluid's resistance to flow, quantifying how much it resists deformation at a given rate.
Non-Newtonian fluids: The plasticity viscosity (η) refers to the measure of a viscoplastic material's resistance to deformation, which defines how it behaves under stress beyond its yield point. The bulk modulus (κ) and fluid viscosity (µ) are comparable to those of Newtonian fluids, while the yield stress (τ Y ) is akin to that of plasticine.

Section: A.10 Constitutive Models
A constitutive model describes how a material responds to stress, strain, or other external forces. It defines the material's behavior by relating stress and strain through constitutive equations, which can capture complex behaviors such as elasticity, plasticity, and fracture. The MPM simulator is capable of modeling a diverse range of materials by employing various constitutive models. In this work, we have implemented simulations for five distinct types of materials: elasticity, plasticine, granular, Newtonian fluids, and non-Newtonian fluids.
Elasticity. We use the Neo-Hookean model, which is a common nonlinear hyperelastic model, to simulate the elasticity of materials and predict deformations. The Cauchy stress for this model is defined by
Jσ = µ (FF ⊺ ) + [λ log(J ) -µ] I,(7)
where the F is the deformation gradient, J = det(F) and µ, λ are the Lamé parameters, which are related to the material properties of Young's modulus (E) and Poisson's ratio (ν) as:
µ = E 2(1 + ν) , λ = Eν (1 + ν)(1 -2ν) .(8)
Plasticine. We use the Saint Venant-Kirchhoff Model (StVK) together with von Mises yield criterion to simulate the plasticine. For this model, the stess is defined as:
Jσ = F [2µG + λTr(G)I] F ⊺ ,(9)
where G = 1 2 (F ⊺ F -I) is the Green strain. The von Mises yield criterion serves as a tool to assess whether the deformation exceeds the recoverable limit. The deformation gradient will be mapped back onto the boundary of elastic region using the following projection:
Z(F) = F δγ ≤ 0 U exp(ϵ -δγ ε ∥|ε∥ )V ⊺ otherwise ,(10)
where the δγ = ∥ε∥ -τ Y 2µ , ϵ = log(Σ) is the normalized Hencky strain. The U, Σ and V can be obtained by performing Singular Value Decomposition (SVD) on deformation gradient F. Granular Media. Similar to plasticine, the StVK constitutive model is used to simulate granular media. Drucker-Prager yield criteria [48] is selected as the yielding condition. It is defined as follows:
Tr(ϵ) > 0, or δγ = ∥ε∥ F + α (dλ + 2µ)Tr(ϵ) 2µ > 0, (11
)
where d is the spatial dimension, α = 2 3 2 sin θ f ric 3-sin θ f ric and θ f ric is the friction angle. The deformation gradient return mapping is defined by
Z(F) =      UV ⊺ Tr(ϵ) > 0 F δγ ≤ 0, Tr(ϵ) ≤ 0 U exp (ϵ -δγ ε ∥|ε∥ )V ⊺ otherwise . (12
)
Newtonian Fluid. We adopt the approach used in PAC-NeRF [12], which employs a J-based fluid model combined with a viscosity term to simulate Newtonian fluids. The stress for this model is defined by
Jσ = 1 2 µ(∇v + ∇v ⊺ ) + κ(J - 1 J 6 ),(13)
where µ and κ represent the fluid viscosity and the bulk modulus, respectively.
Non-Newtonian Fluid. We employ the viscoplastic model [47] to simulate non-Newtonian fluids.
Although we continue to utilize the von Mises criteria to delineate the elastic region, the presence of viscoplasticity implies that deformation will not be immediately reverted onto the yield surface. It is defined as follows:
Z(F) = F δγ ≤ 0 U exp( ŝ 2µ ε + 1 d Tr(ϵ)I)V ⊺ otherwise ,(14)
μ = µ d Tr(Σ 2 ), s = 2µε, ŝ = ∥s∥ - δγ 1 + η 2μ∆t (15
)
where d is the spatial dimension. The U, Σ and V can be obtained by performing Singular Value Decomposition (SVD) on deformation gradient F.
• The answer NA means that the paper does not include experiments.
• If the paper includes experiments, a No answer to this question will not be perceived well by the reviewers: Making the paper reproducible is important, regardless of whether the code and data are provided or not. • If the contribution is a dataset and/or model, the authors should describe the steps taken to make their results reproducible or verifiable. • Depending on the contribution, reproducibility can be accomplished in various ways.
For example, if the contribution is a novel architecture, describing the architecture fully might suffice, or if the contribution is a specific model and empirical evaluation, it may be necessary to either make it possible for others to replicate the model with the same dataset, or provide access to the model. In general. releasing code and data is often one good way to accomplish this, but reproducibility can also be provided via detailed instructions for how to replicate the results, access to a hosted model (e.g., in the case of a large language model), releasing of a model checkpoint, or other means that are appropriate to the research performed. • While NeurIPS does not require releasing code, the conference does require all submissions to provide some reasonable avenue for reproducibility, which may depend on the nature of the contribution. , with an open-source dataset or instructions for how to construct the dataset). (d) We recognize that reproducibility may be tricky in some cases, in which case authors are welcome to describe the particular way they provide for reproducibility.
In the case of closed-source models, it may be that access to the model is limited in some way (e.g., to registered users), but it should be possible for other researchers to have some path to reproducing or verifying the results.

Section: Open access to data and code
Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material?
Answer: [No] Justification: The code is not included for now. But we will release the code to the public soon.
Guidelines:
• The answer NA means that paper does not include experiments requiring code. • At submission time, to preserve anonymity, the authors should release anonymized versions (if applicable). • Providing as much information as possible in supplemental material (appended to the paper) is recommended, but including URLs to data and code is permitted.

Section: Experimental Setting/Details
Question: Does the paper specify all the training and test details (e.g., data splits, hyperparameters, how they were chosen, type of optimizer, etc.) necessary to understand the results?
Answer: [Yes]
Justification: These details are described in the experiment section and appendix.
Guidelines:
• The answer NA means that the paper does not include experiments.
• The experimental setting should be presented in the core of the paper to a level of detail that is necessary to appreciate the results and make sense of them. • The full details can be provided either with the code, in appendix, or as supplemental material.

Section: Experiment Statistical Significance
Question: Does the paper report error bars suitably and correctly defined or other appropriate information about the statistical significance of the experiments?
Answer: [Yes]
Justification: The error bars are reported in the experiments section.
Guidelines:
• The answer NA means that the paper does not include experiments.
• The authors should answer "Yes" if the results are accompanied by error bars, confidence intervals, or statistical significance tests, at least for the experiments that support the main claims of the paper. • The factors of variability that the error bars are capturing should be clearly stated (for example, train/test split, initialization, random drawing of some parameter, or overall run with given experimental conditions). • The method for calculating the error bars should be explained (closed form formula, call to a library function, bootstrap, etc.) • The assumptions made should be given (e.g., Normally distributed errors). • It should be clear whether the error bar is the standard deviation or the standard error of the mean. • It is OK to report 1-sigma error bars, but one should state it. The authors should preferably report a 2-sigma error bar than state that they have a 96% CI, if the hypothesis of Normality of errors is not verified. • For asymmetric distributions, the authors should be careful not to show in tables or figures symmetric error bars that would yield results that are out of range (e.g. negative error rates). • If error bars are reported in tables or plots, The authors should explain in the text how they were calculated and reference the corresponding figures or tables in the text.

Section: Experiments Compute Resources
Question: For each experiment, does the paper provide sufficient information on the computer resources (type of compute workers, memory, time of execution) needed to reproduce the experiments?
Answer: [Yes] Justification: This is described in the appendix section.
Guidelines:
• The answer NA means that the paper does not include experiments.
• The paper should indicate the type of compute workers CPU or GPU, internal cluster, or cloud provider, including relevant memory and storage. • The paper should provide the amount of compute required for each of the individual experimental runs as well as estimate the total compute. • The paper should disclose whether the full research project required more compute than the experiments reported in the paper (e.g., preliminary or failed experiments that didn't make it into the paper). Justification: There is no societal impact of the work performed. Guidelines:
• The answer NA means that there is no societal impact of the work performed.
• If the authors answer NA or No, they should explain why their work has no societal impact or why the paper does not address societal impact. • Examples of negative societal impacts include potential malicious or unintended uses (e.g., disinformation, generating fake profiles, surveillance), fairness considerations (e.g., deployment of technologies that could make decisions that unfairly impact specific groups), privacy considerations, and security considerations. • The conference expects that many papers will be foundational research and not tied to particular applications, let alone deployments. However, if there is a direct path to any negative applications, the authors should point it out. For example, it is legitimate to point out that an improvement in the quality of generative models could be used to generate deepfakes for disinformation. On the other hand, it is not needed to point out that a generic algorithm for optimizing neural networks could enable people to train models that generate Deepfakes faster. • The authors should consider possible harms that could arise when the technology is being used as intended and functioning correctly, harms that could arise when the technology is being used as intended but gives incorrect results, and harms following from (intentional or unintentional) misuse of the technology. • If there are negative societal impacts, the authors could also discuss possible mitigation strategies (e.g., gated release of models, providing defenses in addition to attacks, mechanisms for monitoring misuse, mechanisms to monitor how a system learns from feedback over time, improving the efficiency and accessibility of ML).

Section: Safeguards
Question: Does the paper describe safeguards that have been put in place for responsible release of data or models that have a high risk for misuse (e.g., pretrained language models, image generators, or scraped datasets)? Answer: [NA] Justification: This paper poses no such risks.
Answer: [NA] Justification: This paper does not involve crowdsourcing nor research with human subjects. Guidelines:
• The answer NA means that the paper does not involve crowdsourcing nor research with human subjects. • Including this information in the supplemental material is fine, but if the main contribution of the paper involves human subjects, then as much detail as possible should be included in the main paper. • We recognize that the procedures for this may vary significantly between institutions and locations, and we expect authors to adhere to the NeurIPS Code of Ethics and the guidelines for their institution. • For initial submissions, do not include any information that would break anonymity (if applicable), such as the institution conducting the review.

Section: Acknowledgements
This research was supported by the Research Grant Council of the Hong Kong Special Administrative Region under grant number 16212623. We thank Licheng Zhong for providing us with details about real data collection and links for purchasing objects for real-world experiments.

Section: 
(3DGS) technique [15]. Subsequently, we transform the static Gaussian set to the initial configuration of the dynamic scene, guided by the relative pose between the two scenes. The pose is estimated iteratively based on the discrepancies observed between the rendered images and the actual images at the initial state of the dynamic scene. After pose estimation, we implement our methodology, which leverages only implicit shape guidance, to conduct system identification.
Experimental setting. We conducted grasping experiments using the UR10 robotic arm equipped with the Robotiq140 dexterous gripper in both simulated and real-world settings, ensuring consistency in the mass of the objects and their grasping poses across both environments. For the simulations, we employed the FEM-based Isaac Gym simulator [71] for its advanced capabilities in realistically simulating deformable objects [72]. To facilitate the simulation of deformable objects, we apply the Marching Cubes algorithm [73] to the generated density fields to derive the object meshes. Subsequently, we utilize fTetWild [74] for the tetrahedralization of these meshes.
More results. Qualitative results of grasp demonstrations on pig and dog objects are shown in Fig. 8.    

Section: A.6 System Identification Result on PAC-NeRF Dataset


Section: NeurIPS Paper Checklist
The checklist is designed to encourage best practices for responsible machine learning research, addressing issues of reproducibility, transparency, research ethics, and societal impact. Do not remove the checklist: The papers not including the checklist will be desk rejected. The checklist should follow the references and follow the (optional) supplemental material. The checklist does NOT count towards the page limit.
Please read the checklist guidelines carefully for information on how to answer these questions. For each question in the checklist:
• You should answer [Yes] , [No] , or [NA] .
• [NA] means either that the question is Not Applicable for that particular paper or the relevant information is Not Available.
• Please provide a short (1-2 sentence) justification right after your answer (even for NA).
The checklist answers are an integral part of your paper submission. They are visible to the reviewers, area chairs, senior area chairs, and ethics reviewers. You will be asked to also include it (after eventual revisions) with the final version of your paper, and its final version will be published with the paper.
The reviewers of your paper will be asked to use the checklist as one of the factors in their evaluation.
While "[Yes] " is generally preferable to "[No] ", it is perfectly acceptable to answer "[No] " provided a proper justification is given (e.g., "error bars are not reported because it would be too computationally expensive" or "we were unable to find the license for the dataset we used"). In general, answering "[No] " or "[NA] " is not grounds for rejection. While the questions are phrased in a binary way, we acknowledge that the true answer is often more nuanced, so please just use your best judgment and write a justification to elaborate. All supporting evidence can appear either in the main paper or the supplemental material, provided in appendix. If you answer [Yes] to a question, in the justification please point to the section(s) where related material for the question can be found.
IMPORTANT, please:
• Delete this instruction block, but keep the section heading "NeurIPS paper checklist",
• Keep the checklist subsection headings, questions/answers and guidelines below.
• Do not modify the questions and only use the provided macros for your answers.

Section: Claims
Question: Do the main claims made in the abstract and introduction accurately reflect the paper's contributions and scope?
Answer: [Yes]
Justification: The main claims made in the abstract and introduction sections already reflect the paper's contributions.
Guidelines:
• The answer NA means that the abstract and introduction do not include the claims made in the paper. • The abstract and/or introduction should clearly state the claims made, including the contributions made in the paper and important assumptions and limitations. A No or NA answer to this question will not be perceived well by the reviewers. • The claims made should match theoretical and experimental results, and reflect how much the results can be expected to generalize to other settings. • It is fine to include aspirational goals as motivation as long as it is clear that these goals are not attained by the paper.

Section: Limitations
Question: Does the paper discuss the limitations of the work performed by the authors?
Answer: [Yes] Justification: The limitations have been discussed in the conclusion section.
Guidelines:
• The answer NA means that the paper has no limitation while the answer No means that the paper has limitations, but those are not discussed in the paper. • The authors are encouraged to create a separate "Limitations" section in their paper.
• The paper should point out any strong assumptions and how robust the results are to violations of these assumptions (e.g., independence assumptions, noiseless settings, model well-specification, asymptotic approximations only holding locally). The authors should reflect on how these assumptions might be violated in practice and what the implications would be. • The authors should reflect on the scope of the claims made, e.g., if the approach was only tested on a few datasets or with a few runs. In general, empirical results often depend on implicit assumptions, which should be articulated. • The authors should reflect on the factors that influence the performance of the approach.
For example, a facial recognition algorithm may perform poorly when image resolution is low or images are taken in low lighting. Or a speech-to-text system might not be used reliably to provide closed captions for online lectures because it fails to handle technical jargon. • The authors should discuss the computational efficiency of the proposed algorithms and how they scale with dataset size. • If applicable, the authors should discuss possible limitations of their approach to address problems of privacy and fairness. • While the authors might fear that complete honesty about limitations might be used by reviewers as grounds for rejection, a worse outcome might be that reviewers discover limitations that aren't acknowledged in the paper. The authors should use their best judgment and recognize that individual actions in favor of transparency play an important role in developing norms that preserve the integrity of the community. Reviewers will be specifically instructed to not penalize honesty concerning limitations.

Section: Theory Assumptions and Proofs
Question: For each theoretical result, does the paper provide the full set of assumptions and a complete (and correct) proof?
Answer: [NA] Justification: This paper does not include theoretical results.
Guidelines:
• The answer NA means that the paper does not include theoretical results.
• All the theorems, formulas, and proofs in the paper should be numbered and crossreferenced. • All assumptions should be clearly stated or referenced in the statement of any theorems.
• The proofs can either appear in the main paper or the supplemental material, but if they appear in the supplemental material, the authors are encouraged to provide a short proof sketch to provide intuition. • Inversely, any informal proof provided in the core of the paper should be complemented by formal proofs provided in appendix or supplemental material. • Theorems and Lemmas that the proof relies upon should be properly referenced.

Section: Experimental Result Reproducibility
Question: Does the paper fully disclose all the information needed to reproduce the main experimental results of the paper to the extent that it affects the main claims and/or conclusions of the paper (regardless of whether the code and data are provided or not)?
Answer: [Yes] Justification: We have tried to include all the details and referenced work for reproduction. We will also release the code of our method.

Section: Guidelines:
Guidelines:
• The answer NA means that the paper poses no such risks.
• Released models that have a high risk for misuse or dual-use should be released with necessary safeguards to allow for controlled use of the model, for example by requiring that users adhere to usage guidelines or restrictions to access the model or implementing safety filters. • Datasets that have been scraped from the Internet could pose safety risks. The authors should describe how they avoided releasing unsafe images. • We recognize that providing effective safeguards is challenging, and many papers do not require this, but we encourage authors to take this into account and make a best faith effort.
12. Licenses for existing assets Question: Are the creators or original owners of assets (e.g., code, data, models), used in the paper, properly credited and are the license and terms of use explicitly mentioned and properly respected?
Answer: [Yes]
Justification: The used assets are properly cited in the experiments section.
Guidelines:
• The answer NA means that the paper does not use existing assets.
• The authors should cite the original paper that produced the code package or dataset.
• The authors should state which version of the asset is used and, if possible, include a URL. • The name of the license (e.g., CC-BY 4.0) should be included for each asset.
• For scraped data from a particular source (e.g., website), the copyright and terms of service of that source should be provided. • If assets are released, the license, copyright information, and terms of use in the package should be provided. For popular datasets, paperswithcode.com/datasets has curated licenses for some datasets. Their licensing guide can help determine the license of a dataset. • For existing datasets that are re-packaged, both the original license and the license of the derived asset (if it has changed) should be provided. • If this information is not available online, the authors are encouraged to reach out to the asset's creators.

Section: New Assets
Question: Are new assets introduced in the paper well documented and is the documentation provided alongside the assets?
Answer: [NA] Justification: This paper does not release new assets.
Guidelines:
• The answer NA means that the paper does not release new assets.
• Researchers should communicate the details of the dataset/code/model as part of their submissions via structured templates. This includes details about training, license, limitations, etc. • The paper should discuss whether and how consent was obtained from people whose asset is used. • At submission time, remember to anonymize your assets (if applicable). You can either create an anonymized URL or include an anonymized zip file.

Section: Crowdsourcing and Research with Human Subjects
Question: For crowdsourcing experiments and research with human subjects, does the paper include the full text of instructions given to participants and screenshots, if applicable, as well as details about compensation (if any)?


References:
[b0] Hang Yin; Anastasia Varava; Danica Kragic (2021). Modeling, learning, perception, and control methods for deformable object manipulation. Science Robotics
[b1] Haochen Shi; Huazhe Xu; Samuel Clarke; Yunzhu Li; Jiajun Wu (2023). Robocook: Long-horizon elasto-plastic object manipulation with diverse tools. PMLR
[b2] Haochen Shi; Huazhe Xu; Zhiao Huang; Yunzhu Li; Jiajun Wu (2024). Robocraft: Learning to see, simulate, and shape elasto-plastic objects in 3d with graph networks. The International Journal of Robotics Research (IJRR)
[b3] Bin Wang; Longhua Wu; Kangkang Yin; Uri Ascher; Libin Liu; Hui Huang (2015). Deformation capture and modeling of soft objects. ACM Transactions on Graphics (TOG)
[b4] Edith Hsiao-Yu Chen; Tuur Tretschk; Petr Stuyck; Ladislav Kadlecek; Etienne Kavan; Christoph Vouga;  Lassner (2022). Virtual elastic objects. 
[b5] Licheng Zhong; Hong-Xing Yu; Jiajun Wu; Yunzhu Li (2024). Reconstruction and simulation of elastic objects with spring-mass 3d gaussians. 
[b6] Matthias Müller; Markus H Gross (2004). Interactive virtual materials. 
[b7] Miguel Jaques; Michael Burke; Timothy Hospedales (2020). Physics-as-inverse-graphics: Unsupervised physical parameter estimation from video. 
[b8] Pingchuan Ma; Tao Du; Joshua B Tenenbaum; Wojciech Matusik; Chuang Gan (2021). Risp: Rendering-invariant state predictor with differentiable simulation and rendering for crossdomain parameter estimation. 
[b9] Pingchuan Ma; Bolei Peter Yichen Chen; Joshua B Deng; Tao Tenenbaum; Chuang Du; Wojciech Gan;  Matusik (2023). Learning neural constitutive laws from motion observations for generalizable pde dynamics. PMLR
[b10] Edgar Tretschk; Ayush Tewari; Vladislav Golyanik; Michael Zollhöfer; Christoph Lassner; Christian Theobalt (2021). Non-rigid neural radiance fields: Reconstruction and novel view synthesis of a dynamic scene from monocular video. 
[b11] Xuan Li; Yi-Ling Qiao; Krishna Murthy Peter Yichen Chen; Ming Jatavallabhula; Chenfanfu Lin; Chuang Jiang;  Gan (2022). Pac-nerf: Physics augmented continuum neural radiance fields for geometry-agnostic system identification. 
[b12] Cheng Sun; Min Sun; Hwann-Tzong Chen (2022). Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction. 
[b13] Yutao Feng; Yintong Shang; Xuan Li; Tianjia Shao; Chenfanfu Jiang; Yin Yang (2024). Pienerf: Physics-based interactive elastodynamics with nerf. 
[b14] Bernhard Kerbl; Georgios Kopanas; Thomas Leimkühler; George Drettakis (2023). 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics (TOG)
[b15] Ziyi Yang; Xinyu Gao; Wen Zhou; Shaohui Jiao; Yuqing Zhang; Xiaogang Jin (2024). Deformable 3d gaussians for high-fidelity monocular dynamic scene reconstruction. 
[b16] Chenfanfu Jiang; Craig Schroeder; Joseph Teran; Alexey Stomakhin; Andrew Selle (2016). The material point method for simulating continuum materials. ACM
[b17] Yuanming Hu; Yu Fang; Ziheng Ge; Ziyin Qu; Yixin Zhu; Andre Pradhana; Chenfanfu Jiang (2018). A moving least squares material point method with displacement discontinuity and two-way rigid body coupling. ACM Transactions on Graphics (TOG)
[b18] Li Zhang; Brian Curless; Steven M Seitz (2003). Spacetime stereo: Shape recovery for dynamic scenes. IEEE
[b19] Dieter Richard A Newcombe; Steven M Fox;  Seitz (2015). Dynamicfusion: Reconstruction and tracking of non-rigid scenes in real-time. 
[b20] Ben Mildenhall; P Pratul; Matthew Srinivasan; Jonathan T Tancik; Ravi Barron; Ren Ramamoorthi;  Ng (2021). Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM
[b21] Peng Wang; Lingjie Liu; Yuan Liu; Christian Theobalt; Taku Komura; Wenping Wang (2021). Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. Advances in Neural Information Processing Systems (NeurIPS)
[b22] Tianye Li; Mira Slavcheva; Michael Zollhoefer; Simon Green; Christoph Lassner; Changil Kim; Tanner Schmidt; Steven Lovegrove; Michael Goesele; Richard Newcombe (2022). Neural 3d video synthesis from multi-view video. 
[b23] Yiming Wang; Qin Han; Marc Habermann; Kostas Daniilidis; Christian Theobalt; Lingjie Liu (2023). Neus2: Fast learning of neural implicit surfaces for multi-view reconstruction. 
[b24] Albert Pumarola; Enric Corona; Gerard Pons-Moll; Francesc Moreno-Noguer (2021). D-nerf: Neural radiance fields for dynamic scenes. 
[b25] Keunhong Park; Utkarsh Sinha; Peter Hedman; Jonathan T Barron; Sofien Bouaziz; Dan B Goldman; Ricardo Martin-Brualla; Steven M Seitz (2021). Hypernerf: a higher-dimensional representation for topologically varying neural radiance fields. ACM Transactions on Graphics (TOG)
[b26] Ang Cao; Justin Johnson (2023). Hexplane: A fast representation for dynamic scenes. 
[b27] Jonathon Luiten; Georgios Kopanas; Bastian Leibe; Deva Ramanan (2024). Dynamic 3d gaussians: Tracking by persistent dynamic view synthesis. 
[b28] Agelos Kratimenos; Jiahui Lei; Kostas Daniilidis (2023). Dynmf: Neural motion factorization for real-time dynamic view synthesis with 3d gaussian splatting. 
[b29] Guanjun Wu; Taoran Yi; Jiemin Fang; Lingxi Xie; Xiaopeng Zhang; Wei Wei; Wenyu Liu; Qi Tian; Wang Xinggang (2024). 4d gaussian splatting for real-time dynamic scene rendering. 
[b30] Junbang Liang; Ming Lin; Vladlen Koltun (2019). Differentiable cloth simulation for inverse problems. Advances in Neural Information Processing Systems (NeurIPS)
[b31] Maziar Raissi; Paris Perdikaris; George E Karniadakis (2019). Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational physics
[b32] Priya Sundaresan; Rika Antonova; Jeannette Bohgl (2022). Diffcloud: Real-to-sim from point clouds with differentiable simulation and rendering of deformable objects. IEEE
[b33] Yifei Li; Tao Du; Kui Wu; Jie Xu; Wojciech Matusik (2022). Diffcloth: Differentiable cloth simulation with dry frictional contact. ACM Transactions on Graphics (TOG)
[b34] Jinxi Li; Ziyang Song; Bo Yang (2024). Nvfi: Neural velocity fields for 3d physics learning from dynamic videos. Advances in Neural Information Processing Systems (NeurIPS)
[b35] Xiao Liang; Fei Liu; Yutong Zhang; Yuelei Li; Shan Lin; Michael Yip (2024). Real-to-sim deformable object manipulation: Optimizing physics models with residual mappings for robotic surgery. IEEE
[b36] Dongzhe Zheng; Siqiong Yao; Wenqiang Xu; Cewu Lu (2024). Differentiable cloth parameter identification and state estimation in manipulation. IEEE Robotics and Automation Letters
[b37] Yi-Ling Qiao; Alexander Gao; Ming Lin (2022). Neuphysics: Editable neural geometry and physics from monocular videos. Advances in Neural Information Processing Systems (NeurIPS)
[b38] Barbara Frank; Rüdiger Schmedding; Cyrill Stachniss; Matthias Teschner; Wolfram Burgard (2010). Learning the elasticity parameters of deformable objects with a manipulation robot. IEEE
[b39] Zhenjia Xu; Jiajun Wu; Andy Zeng; Joshua B Tenenbaum; Shuran Song (2019). Densephysnet: Learning dense physical object representations via multi-step dynamic interactions. 
[b40] Krishna Murthy; Miles Macklin; Florian Golemo; Vikram Voleti; Linda Petrini; Martin Weiss; Breandan Considine; Jérôme Parent-Lévesque; Kevin Xie; Kenny Erleben (2020). gradsim: Differentiable simulation for system identification and visuomotor control. 
[b41] Moritz Geilinger; David Hahn; Jonas Zehnder; Moritz Bächer; Bernhard Thomaszewski; Stelian Coros (2020). Add: Analytically differentiable dynamics for multi-body systems with frictional contact. ACM Transactions on Graphics (TOG)
[b42] Eric Heiden; Miles Macklin; Yashraj Narang; Dieter Fox; Animesh Garg; Fabio Ramos (2021). Disect: A differentiable simulation engine for autonomous robotic cutting. 
[b43] Tao Du; Kui Wu; Pingchuan Ma; Sebastien Wah; Andrew Spielberg; Daniela Rus; Wojciech Matusik (2021). Diffpd: Differentiable projective dynamics. ACM Transactions on Graphics (TOG)
[b44] Yiling Qiao; Junbang Liang; Vladlen Koltun; Ming Lin (2021). Differentiable simulation of soft multi-body systems. Advances in Neural Information Processing Systems (NeurIPS)
[b45] Chenfanfu Jiang; Craig Schroeder; Andrew Selle; Joseph Teran; Alexey Stomakhin (2015). The affine particle-in-cell method. ACM Transactions on Graphics (TOG)
[b46] Yonghao Yue; Breannan Smith; Christopher Batty; Changxi Zheng; Eitan Grinspun (2015). Continuum foam: A material point method for shear-dependent flows. ACM Transactions on Graphics (TOG)
[b47] Gergely Klár; Theodore Gast; Andre Pradhana; Chuyuan Fu; Craig Schroeder; Chenfanfu Jiang; Joseph Teran (2016). Drucker-prager elastoplasticity for sand animation. ACM Transactions on Graphics (TOG)
[b48] Matthias Zwicker; Hanspeter Pfister; Jeroen Van Baar; Markus Gross (2001). Surface splatting. 
[b49] Youtian Lin; Zuozhuo Dai; Siyu Zhu; Yao Yao (2024). Gaussian-flow: 4d reconstruction with dynamic 3d gaussian particle. 
[b50] Tianyi Xie; Zeshun Zong; Yuxin Qiu; Xuan Li; Yutao Feng; Yin Yang; Chenfanfu Jiang (2024). Physgaussian: Physics-integrated 3d gaussians for generative dynamics. 
[b51] Eduardo Wv Chaves (2013). Notes on continuum mechanics. Springer Science & Business Media
[b52] Vladimir Yugay; Yue Li; Theo Gevers; Martin R Oswald (2023). Gaussian-slam: Photo-realistic dense slam with gaussian splatting. 
[b53] Kai Katsumata; Duc ; Minh Vo; Hideki Nakayama (2023). An efficient 3d gaussian representation for monocular/multi-view dynamic scenes. 
[b54] Hanlin Chen; Chen Li; Gim Hee; Lee  (2023). Neusg: Neural implicit surface reconstruction with 3d gaussian splatting guidance. 
[b55] Shanchuan Lin; Andrey Ryabtsev; Soumyadip Sengupta; Brian L Curless; Steven M Seitz; Ira Kemelmacher-Shlizerman (2021). Real-time high-resolution background matting. 
[b56] Alexander Kirillov; Eric Mintun; Nikhila Ravi; Hanzi Mao; Chloe Rolland; Laura Gustafson; Tete Xiao; Spencer Whitehead; Alexander C Berg; Wan-Yen Lo (2023). Segment anything. 
[b57] Zhou Wang; Alan C Bovik; Hamid R Sheikh; Eero P Simoncelli (2004). Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing
[b58] Lin Yen-Chen; Pete Florence; Jonathan T Barron; Alberto Rodriguez; Phillip Isola; Tsung-Yi Lin (2021). inerf: Inverting neural radiance fields for pose estimation. IEEE
[b59] Yang Fu; Sifei Liu; Amey Kulkarni; Jan Kautz; Alexei A Efros; Xiaolong Wang (2024). Colmapfree 3d gaussian splatting. 
[b60] Haozhe Su; Xuan Li; Tao Xue; Chenfanfu Jiang; Mridul Aanjaneya (2023). A generalized constitutive model for versatile mpm simulation and inverse learning with differentiable physics. 
[b61] Josh Achiam; Steven Adler; Sandhini Agarwal; Lama Ahmad; Ilge Akkaya; Florencia Leoni Aleman; Diogo Almeida; Janko Altenschmidt; Sam Altman; Shyamal Anadkat (2023). . 
[b62] Dmitry Tochilkin; David Pankratz; Zexiang Liu; Zixuan Huang; Adam Letts; Yangguang Li; Ding Liang; Christian Laforte; Varun Jampani; Yan-Pei Cao (2024). Triposr: Fast 3d object reconstruction from a single image. 
[b63] Jiaxiang Tang; Zhaoxi Chen; Xiaokang Chen; Tengfei Wang; Gang Zeng; Ziwei Liu (2024). Lgm: Large multi-view gaussian model for high-resolution 3d content creation. Springer
[b64] Zhengyi Wang; Yikai Wang; Yifei Chen; Chendong Xiang; Shuo Chen; Dajiang Yu; Chongxuan Li; Hang Su; Jun Zhu (2024). Crm: Single image to 3d textured mesh with convolutional reconstruction model. 
[b65] Jiale Xu; Weihao Cheng; Yiming Gao; Xintao Wang; Shenghua Gao; Ying Shan (2024). Instantmesh: Efficient 3d mesh generation from a single image with sparse-view large reconstruction models. 
[b66] P Diederik; Jimmy Kingma;  Ba (2015). Adam: A method for stochastic optimization. 
[b67] Ruizhi Shao; Zerong Zheng; Hanzhang Tu; Boning Liu; Hongwen Zhang; Yebin Liu (2023). Tensor4d: Efficient neural 4d decomposition for high-fidelity dynamic reconstruction and rendering. 
[b68] Sara Fridovich-Keil; Giacomo Meanti; Frederik Rahbaek Warburg; Benjamin Recht; Angjoo Kanazawa (2023). K-planes: Explicit radiance fields in space, time, and appearance. 
[b69] Jiemin Fang; Taoran Yi; Xinggang Wang; Lingxi Xie; Xiaopeng Zhang; Wenyu Liu; Matthias Nießner; Qi Tian (2022). Fast dynamic radiance fields with time-aware neural voxels. 
[b70] Viktor Makoviychuk; Lukasz Wawrzyniak; Yunrong Guo; Michelle Lu; Kier Storey; Miles Macklin; David Hoeller; Nikita Rudin; Arthur Allshire; Ankur Handa (2021). Isaac gym: High performance gpu-based physics simulation for robot learning. 
[b71] Isabella Huang; Yashraj Narang; Clemens Eppner; Balakumar Sundaralingam; Miles Macklin; Ruzena Bajcsy; Tucker Hermans; Dieter Fox (2022). Defgraspsim: Physics-based simulation of grasp outcomes for 3d deformable objects. IEEE Robotics and Automation Letters
[b72] William E Lorensen; Harvey E Cline (1998). Marching cubes: A high resolution 3d surface construction algorithm. ACM SIGGRAPH Computer Graphics
[b73] Yixin Hu; Teseo Schneider; Bolun Wang; Denis Zorin; Daniele Panozzo (2020). Fast tetrahedral meshing in the wild. ACM Transactions on Graphics (TOG)

Figures:
Figure fig_0: 1
Type: figure
Caption: Figure 1 :1Figure 1: Overview. (a) Continuum Generation: Given a series of multi-view images capturing a moving object, the motion-factorized dynamic 3D Gaussian network is trained to reconstruct the dynamic object as 3D Gaussian point sets across different time states. From the reconstructed results, we employ a coarse-to-fine strategy to generate density fields to recover the continuums and extract object surfaces. The continuum is endowed with Gaussian attributes to allow mask rendering. (b) Identification: The MPM simulates the trajectory with the initial continuum P(0) and the physical parameters Θ. The simulated object surfaces and the rendered masks are then compared against the previously extracted surfaces (colored in blue) and the corresponding masks from the dataset. The differences are quantified to guide the parameter estimation process. (c) Simulation: Digital twin demonstrations are displayed. Simulated objects (colored by stress increasing from blue to red), characterized by the properties estimated from observation, exhibit behavior consistent with real-world objects.
Data: 

Figure fig_1: 2
Type: figure
Caption: Figure 2 :2Figure 2: The pipeline of the proposed dynamic 3D Gaussian network. The motion network backbone consists of 8 fully connected (FC) layers. The output of the motion block is fed to Nm heads to generate motion residuals. The coefficient network contains 4 FC layers.
Data: 

Figure fig_2: 3
Type: figure
Caption: Figure 3 :3Figure 3: Sketch illustration of the coarse-to-fine filling strategy. Gaussian and internal particles are depicted in green and blue, respectively. (a) Voxels containing particles are assigned high densities. (b) Following the upsampling and smoothing of the field, densities near boundaries become blurred (indicated in light yellow). (c)The particles are again used to correct the voxels that contain particles with high densities. (d) and (e) repeat the previous operations to achieve a more detailed shape.
Data: 

Figure fig_4: 4
Type: figure
Caption: Figure 4 :4Figure 4: Comparison between rendered and ground-truth images. (a) Rendered RGB images by PAC-NeRF. (b) Rendered masks by our method. (c)-(d) Ground-truth RGB images and masks. The mask-based supervision can introduce fewer discrepancies compared with the RGB-based guidance when the estimated shapes are correct.
Data: 

Figure fig_5: 5121333
Type: figure
Caption: 5 Cream µ = 1 . 21 × 3 Playdoh E = 3 . 35121333, κ = 1.08 × 10 5 µ = 2.01 × 10 2 , κ = 0.18 × 10 5 µ = 200, κ = 10 5 Letter µ = 83.85, κ = 1.35 × 10 5 µ = 95.05, κ = 1.00 × 10 5 µ = 100, κ = 10 10 5 , κ = 1.57 × 10 6 , µ = 1.03 × 10 4 , κ = 1.48 × 10 6 , µ = 10 4 , κ = 10 6 , τY = 3.16 × 10 3 , η = 5.6 τY = 2.98 × 10 3 , η = 6.6 τY = 3 × 10 3 , η = 10 Toothpaste µ = 6.51 × 10 3 , κ = 2.22 × 10 5 , µ = 4.19 × 10 3 , κ = 9.24 × 10 4 , µ = 5 × 10 3 , κ = 10 5 , τY = 228, η = 9.77 τY = 226, η = 9.1 τY = 200, η = 10 Torus E = 1.04 × 10 6 , ν = 0.322 E = 0.99 × 10 6 , ν = 0.295 E = 10 6 , ν = 0.3 Bird E = 2.78 × 10 5 , ν = 0.273 E = 3.08 × 10 5 , ν = 0.284 E = 3 × 10 5 , ν = 0.84 × 10 6 , ν = 0.272, τY = 1.69 × 10 4 E = 1.58 × 10 6 , ν = 0.322, τY = 1.56 × 10 4 E = 2 × 10 6 , ν = 0.3, τY = 1.54 × 10 4 Cat E = 1.61 × 10 5 , ν = 0.293, τY = 3.57 × 10 3 E = 0.98 × 10 6 , ν = 0.296, τY = 3.76 × 10 3 E = 10 6 , ν = 0.3, τY = 3.85 × 10 Trophy θ 0 f ric = 36.1 • θ 0 f ric = 38.0 • θ 0 f ric = 40 •
Data: 

Figure fig_7: 5
Type: figure
Caption: Figure 5 :5Figure 5: Real-world application. Left: Identification and future state simulation. Right: Grasping simulation. The stress on the simulated object is indicated by blue (low) to red (high). The gripper widths from top to bottom are set to 6cm, 4.5cm, and 3.5cm, respectively.
Data: 

Figure fig_8: 6
Type: figure
Caption: Figure 6 :6Figure 6: Visualization of trophy sequences. Row 1: rendering results from the network trained without scale regularization. Row 1: rendering results from the network trained with scale regularization.
Data: 

Figure fig_9: 2
Type: figure
Caption: (a) nu = 2 (2b) nu = 3 (c) nu = 4 (d) nu = 5 (e) PAC-NeRF (f) Oracle
Data: 

Figure fig_10: 7
Type: figure
Caption: Figure 7 :7Figure 7: Visualization of Coarse-to-fine Filling. (a)-(d) are filling results by our method with different times of upsampling operations. (e) visualize the point clouds recovered by PAC-NeRF. (f) shows the ground-truth shapes.
Data: 

Figure fig_11: 
Type: figure
Caption: For example (a) If the contribution is primarily a new algorithm, the paper should make it clear how to reproduce that algorithm. (b) If the contribution is primarily a new model architecture, the paper should describe the architecture clearly and fully. (c) If the contribution is a new model (e.g., a large language model), then there should either be a way to access this model for reproducing the results or a way to reproduce the model (e.g.
Data: 

Figure tab_1: 1
Type: table
Caption: Dynamic Reconstruction on PAC-NeRF Dataset
Data: MetricsCD ↓EMD ↓Newtonian0.2770.2690.2430.0270.0270.025Non-Newtonian0.2360.2160.1950.0250.0240.022Elasticity0.2380.1910.1780.0250.0220.02Plasticine0.4290.2130.1960.0290.0240.022Sand0.2120.2810.250.0250.0280.025Mean0.2780.2340.2120.0260.0250.023

Figure tab_2: 2
Type: table
Caption: System identification performance on PAC-NeRF cross-shaped object Dataset
Data: TypeParameters PAC-NeRF Ours*Ourslog 10 (µ)11.6±6.601.53±1.451.53±1.31Newtonianlog 10 (κ)16.7±5.3716.0±22.414.8±19.2v0.86±1.450.20±0.080.20±0.07log 10 (µ)24.1±21.932.9±44.613.5±18.2Non-Newtonianlog 10 (κ) log 10 (τY ) log 10 (η)44.0±26.3 5.09±7.41 28.7±23.317.7±20.2 3.74±3.72 34.9±24.112.9±16.8 4.80±3.92 40.7±24.6v0.29±0.130.68±0.280.19±0.09log 10 (E)3.02±3.723.27±4.132.43±3.29Elasticityν4.35±5.083.10±2.002.52±2.03v0.50±0.230.78±0.260.82±0.32log 10 (E)83.8±68.428.1±24.425.6±29.4Plasticinelog 10 (τY ) ν11.2±14.5 18.9±15.71.24±0.90 10.2±5.341.67±1.21 9.59±5.00v0.56±0.170.13±0.040.22±0.10Sandθ f ric v4.89±1.10 0.21±0.084.21±0.08 0.24±0.084.18±0.52 0.17±0.05

Figure tab_3: 3
Type: table
Caption: System Identification Performance on PAC-NeRF Dataset
Data: PAC-NeRF [12]OursGround TruthDropletµ = 2.09 × 10 2

Figure tab_4: 4
Type: table
Caption: Future State Simulation on Spring-Gaus Synthetic Dataset Gaus [6] 16.83 16.93 15.42 21.55 14.71 16.08 17.89 17.06 PAC-NeRF [12] 17.46 14.15 15.37 19.94 12.32 15.08 16.04 15.77 Ours 20.24 30.51 19.15 26.89 16.31 18.44 29.29 22.98
Data: torus cross cream apple paste chess banana MeanCD↓Spring-Gaus [6] 2.38 PAC-NeRF [12] 2.47 Ours 0.751.57 3.87 1.092.22 2.21 0.941.87 4.69 0.227.03 37.7 2.792.59 8.2 0.7718.48 66.43 17.94 5.16 0.12 0.95EMD↓Spring-Gaus [6] 0.087 0.051 0.094 0.076 0.126 0.095 0.135 0.095 PAC-NeRF [12] 0.055 0.111 0.083 0.108 0.192 0.155 0.234 0.134 Ours 0.034 0.058 0.050 0.030 0.096 0.059 0.017 0.049PSNR↑ Spring-SSIM↑ Spring-Gaus [6] 0.919 0.940 0.862 0.902 0.872 0.881 0.904 0.897 PAC-NeRF [12] 0.913 0.906 0.858 0.878 0.819 0.848 0.886 0.870 Ours 0.942 0.939 0.909 0.948 0.894 0.912 0.964 0.930

Figure tab_5: 5
Type: table
Caption: Results of PSNR (↑) on D-NeRF [25] Dataset
Data: MethodHell WarriorMutant HookBouncing BallsT-Rex Stand UpJumping JacksMeanTensor4D [68]31.2629.11 28.6324.4723.8630.5624.227.44K-Planes [69]24.5832.528.1240.0530.4333.131.1131.41TiNeuVox [70]27.131.87 30.6140.2331.2534.6133.4932.74DefGS [16]41.5442.63 37.4241.0138.144.6237.7240.43Ours41.9742.93 38.0441.2637.5445.3238.8640.85A.2 Gaussian-informed Continnum GenerationA.2.1 Implementation details

Figure tab_6: 6
Type: table
Caption: Dynamic Reconstruction on Spring-Gaus Synthetic Dataset
Data: torus cross cream apple paste chess banana MeanCD↓Spring-Gaus [6] 0.17 PAC-NeRF [12] 4.92 Ours 0.130.48 1.10 0.130.36 0.77 0.140.38 1.11 0.150.19 3.14 0.171.80 0.96 0.412.60 2.77 0.030.85 2.11 0.17EMD↓Spring-Gaus [6] 0.040 0.037 0.031 0.033 0.022 0.063 0.052 0.040 PAC-NeRF [12] 0.056 0.052 0.041 0.045 0.054 0.052 0.062 0.052 Ours 0.020 0.020 0.019 0.020 0.025 0.036 0.011 0.022

Figure tab_7: 8
Type: table
Caption: Notation of Algorithm 1
Data: Operator or symbolExplanationP G (t)Gaussian particle set at time tP (t)Sampled continuum particles at time tS(t)Sampled surface particles at time t, S(t) ⊂ P (t)F (t)3D Density field at time tP rojOperation projecting 3D particles into 2D image indices according to the camera parametersDiscretizeOperation mapping particle positions to voxel indices on the density fieldGetP ositionOperation returning 3D positions of the binary fieldA.7 Notation of Algorithm 1A.8 Necessity of 2D mask supervision

Figure tab_8: 9
Type: table
Caption: System identification with/without mask supervision
Data: TypeParameters w/o masks w/ maskslog 10 (µ)2.19±2.901.53±1.31Newtonianlog 10 (κ)24.2±22.214.8±19.2v0.20±0.080.20±0.07log 10 (µ)19.4±27.713.5±18.2Non-Newtonianlog 10 (κ) log 10 (τY ) log 10 (η)24.0±24.8 4.58±9.11 49.1±40.512.9±16.8 4.80±3.92 40.7±24.6v1.33±0.540.19±0.09log 10 (E)2.85±1.942.43±3.29Elasticityν3.97±2.642.52±2.03v0.22±0.100.82±0.32log 10 (E)25.6±27.425.6±29.4Plasticinelog 10 (τY )9.04±2.371.67±1.21v1.16±0.000.22±0.10Sandθ f ric v2.55±2.03 0.31±0.184.18±0.52 0.17±0.05

Figure tab_9: 
Type: table
Caption: • Please see the NeurIPS code and data submission guidelines (https://nips.cc/ public/guides/CodeSubmissionPolicy) for more details. • While we encourage the release of code and data, we understand that this might not be possible, so "No" is an acceptable answer. Papers cannot be rejected simply for not including code, unless this is central to the contribution (e.g., for a new open-source benchmark). • The instructions should contain the exact command and environment needed to run to reproduce the results. See the NeurIPS code and data submission guidelines (https: //nips.cc/public/guides/CodeSubmissionPolicy) for more details. • The authors should provide instructions on data access and preparation, including how to access the raw data, preprocessed data, intermediate data, and generated data, etc. • The authors should provide scripts to reproduce all experimental results for the new proposed method and baselines. If only a subset of experiments are reproducible, they should state which ones are omitted from the script and why.
Data: 

Figure tab_10: 
Type: table
Caption: 9. Code Of Ethics Question: Does the research conducted in the paper conform, in every respect, with the NeurIPS Code of Ethics https://neurips.cc/public/EthicsGuidelines? Answer: [Yes] Justification: This work conform with the NeurIPS Code of Ethics. Guidelines: • The answer NA means that the authors have not reviewed the NeurIPS Code of Ethics. • If the authors answer No, they should explain the special circumstances that require a deviation from the Code of Ethics. • The authors should make sure to preserve anonymity (e.g., if there is a special consideration due to laws or regulations in their jurisdiction). 10. Broader Impacts Question: Does the paper discuss both potential positive societal impacts and negative societal impacts of the work performed? Answer: [NA]
Data: 

Figure tab_11: 
Type: table
Caption: • According to the NeurIPS Code of Ethics, workers involved in data collection, curation, or other labor should be paid at least the minimum wage in the country of the data collector. 15. Institutional Review Board (IRB) Approvals or Equivalent for Research with Human Subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or institution) were obtained? Answer: [NA] Justification: This paper does not involve crowdsourcing nor research with human subjects. Guidelines: • The answer NA means that the paper does not involve crowdsourcing nor research with human subjects. • Depending on the country in which research is conducted, IRB approval (or equivalent) may be required for any human subjects research. If you obtained IRB approval, you should clearly state this in the paper.
Data: 


Formulas:
Formula formula_0: G(x) = exp(- 1 2 (x -µ 0 ) T Σ -1 0 (x -µ 0 )),(1)

Formula formula_1: I(u) = i∈N T i α i c i , A(u) = i∈N T i α i , D(u) = i∈N T i α i d i ,(2)

Formula formula_2: T i = i-1 j=1 (1 -α j )

Formula formula_3: µ(t) = µ 0 + Nm i=1 w i (µ 0 , t)dµ i (t), s(t) = s 0 + Nm i=1 w i (µ 0 , t)ds i (t).(3)

Formula formula_4: L gs = L 1 (I, Ĩ) + λ 1 L ssim (I, Ĩ) + λ 2 L 1 (s(t)),(4)

Formula formula_5: P in ← P in [ Di (u in , v in ) ≤ d in ];

Formula formula_6: F (t) ← T rilinearInterpolation(F (t), 2)

Formula formula_7: F (t) ← M eanF iltering(F (t)); 14:

Formula formula_8: P P = {(p, s ∆x , σ F )},(5)

Formula formula_9: L ppe = 1 m m i=1 [L CD (S(t i ), S(t i )) + 1 n n j=1 L 1 (A j (t i ), Ãj (t i ))],(6)

Formula formula_10: Jσ = µ (FF ⊺ ) + [λ log(J ) -µ] I,(7)

Formula formula_11: µ = E 2(1 + ν) , λ = Eν (1 + ν)(1 -2ν) .(8)

Formula formula_12: Jσ = F [2µG + λTr(G)I] F ⊺ ,(9)

Formula formula_13: Z(F) = F δγ ≤ 0 U exp(ϵ -δγ ε ∥|ε∥ )V ⊺ otherwise ,(10)

Formula formula_14: Tr(ϵ) > 0, or δγ = ∥ε∥ F + α (dλ + 2µ)Tr(ϵ) 2µ > 0, (11

Formula formula_15: )

Formula formula_16: Z(F) =      UV ⊺ Tr(ϵ) > 0 F δγ ≤ 0, Tr(ϵ) ≤ 0 U exp (ϵ -δγ ε ∥|ε∥ )V ⊺ otherwise . (12

Formula formula_17: )

Formula formula_18: Jσ = 1 2 µ(∇v + ∇v ⊺ ) + κ(J - 1 J 6 ),(13)

Formula formula_19: Z(F) = F δγ ≤ 0 U exp( ŝ 2µ ε + 1 d Tr(ϵ)I)V ⊺ otherwise ,(14)

Formula formula_20: μ = µ d Tr(Σ 2 ), s = 2µε, ŝ = ∥s∥ - δγ 1 + η 2μ∆t (15

Formula formula_21: )
