V2C: JOG3R (ours) vs. ours w/o generation loss vs. DUSt3R
Our method generates more accurate correspondences and camera trajectories compared to DUSt3R.
We also compare with our method without generation loss.
For each pair of frames, we visualize only 10 correspondences to avoid clutter.
da80d87326bf63b7 (JOG3R vs. DUSt3R)
![]()
ours correspondences
|
![]()
DUSt3R's correspondences
(note the drifting of 2nd to the last line) |
---|---|
![]()
our camera trajectory
|
![]()
DUSt3R's camera trajectory
|
fb52f951d8a8ad11 (JOG3R vs. DUSt3R)
![]()
ours correspondences
|
![]()
DUSt3R's correspondences
|
---|---|
![]()
our camera trajectory
|
![]()
DUSt3R's camera trajectory
|
26fe74c70177d694 (JOG3R vs. DUSt3R)
![]()
ours correspondences
|
![]()
DUSt3R's correspondences
|
---|---|
![]()
our camera trajectory
(camera moves only rightwards) |
![]()
DUSt3R's camera trajectory
(camera jitters around) |
e0577a912fd116ea (JOG3R vs. DUSt3R)
![]()
ours correspondences
|
![]()
DUSt3R's correspondences
|
---|---|
![]()
our camera trajectory
|
![]()
DUSt3R's camera trajectory (doesn't move horizontally)
|
1de1b73fe4d6aa77 (JOG3R vs. JOG3R w/o generation loss)
![]()
ours
|
![]()
ours w/o gen loss
|
---|---|
![]()
our camera trajectory
|
![]()
ours w/o gen loss
(camera moves as a straight line, no curvry trajectory; the green camera has a sudden jump.) |
d48b66d36ec83707 (JOG3R vs. JOG3R w/o generation loss)
![]()
ours
|
![]()
ours w/o gen loss
|
---|---|
![]()
our camera trajectory
(camera moves only forward) |
![]()
ours w/o gen loss
(camera jitters back and forth). |
T2V: JOG3R (ours) vs. ours w/o reconstruction loss
We compare with a variant trained w/o reconstruction loss and show that reconstruction loss helps generation.
an empty basement with wood paneling on the walls
ours
|
ours w/o reconstruction loss
(quality degradation) |
---|
an outdoor swimming pool surrounded by rocks and lounge chairs
ours
|
ours w/o reconstruction loss
(noticeable artifacts, no camera motion) |
---|
a dining room table with chairs and a vase of flowers
ours
|
ours w/o reconstruction loss
(left chair has artifacts) |
---|
a living room with a couch, coffee table, and entertainment center
ours
|
ours w/o reconstruction loss
(deforming artifacts appearing on the left at the end) |
---|
a laundry room with a washer and dryer in it
ours
|
ours w/o reconstruction loss
(implausible wash machine configuration) |
---|
T2V+C
All videos in this section are generated from JOG3R.
Our T2V+C pipeline can reconstruct 3D cameras consistent with T2V->V2C.
For each pair of frames, we visualize only 10 correspondences to avoid clutter.
a living room with leather chairs and guitars
![]()
correspondences from T2V+C
|
![]()
correspondences from T2V->V2C
|
---|---|
![]()
camera poses from T2V+C
|
![]()
camera poses from T2V->V2C
|
a backyard with steps leading up to a blue house
![]()
correspondences from T2V+C
|
![]()
correspondences from T2V->V2C
|
---|---|
![]()
camera poses from T2V+C
|
![]()
camera poses from T2V->V2C
|
a hallway leading to a bathroom and bedroom
![]()
correspondences from T2V+C
|
![]()
correspondences from T2V->V2C
|
---|---|
![]()
camera poses from T2V+C
|
![]()
camera poses from T2V->V2C
|
an aerial view of a large house on the water
![]()
correspondences from T2V+C
|
![]()
correspondences from T2V->V2C
|
---|---|
![]()
camera poses from T2V+C
|
![]()
camera poses from T2V->V2C
|