V2C: JOG3R (ours) vs. ours w/o generation loss vs. DUSt3R
Our method generates more accurate correspondences and camera trajectories compared to DUSt3R.
We also compare with our method without generation loss.
For each pair of frames, we visualize only 10 correspondences to avoid clutter.
da80d87326bf63b7 (JOG3R vs. DUSt3R)
ours correspondences
|
DUSt3R's correspondences
(note the drifting of 2nd to the last line) |
|---|---|
our camera trajectory
|
DUSt3R's camera trajectory
|
fb52f951d8a8ad11 (JOG3R vs. DUSt3R)
ours correspondences
|
DUSt3R's correspondences
|
|---|---|
our camera trajectory
|
DUSt3R's camera trajectory
|
26fe74c70177d694 (JOG3R vs. DUSt3R)
ours correspondences
|
DUSt3R's correspondences
|
|---|---|
our camera trajectory
(camera moves only rightwards) |
DUSt3R's camera trajectory
(camera jitters around) |
e0577a912fd116ea (JOG3R vs. DUSt3R)
ours correspondences
|
DUSt3R's correspondences
|
|---|---|
our camera trajectory
|
DUSt3R's camera trajectory (doesn't move horizontally)
|
1de1b73fe4d6aa77 (JOG3R vs. JOG3R w/o generation loss)
ours
|
ours w/o gen loss
|
|---|---|
our camera trajectory
|
ours w/o gen loss
(camera moves as a straight line, no curvry trajectory; the green camera has a sudden jump.) |
d48b66d36ec83707 (JOG3R vs. JOG3R w/o generation loss)
ours
|
ours w/o gen loss
|
|---|---|
our camera trajectory
(camera moves only forward) |
ours w/o gen loss
(camera jitters back and forth). |
T2V: JOG3R (ours) vs. ours w/o reconstruction loss
We compare with a variant trained w/o reconstruction loss and show that reconstruction loss helps generation.
an empty basement with wood paneling on the walls
|
ours
|
ours w/o reconstruction loss
(quality degradation) |
|---|
an outdoor swimming pool surrounded by rocks and lounge chairs
|
ours
|
ours w/o reconstruction loss
(noticeable artifacts, no camera motion) |
|---|
a dining room table with chairs and a vase of flowers
|
ours
|
ours w/o reconstruction loss
(left chair has artifacts) |
|---|
a living room with a couch, coffee table, and entertainment center
|
ours
|
ours w/o reconstruction loss
(deforming artifacts appearing on the left at the end) |
|---|
a laundry room with a washer and dryer in it
|
ours
|
ours w/o reconstruction loss
(implausible wash machine configuration) |
|---|
T2V+C
All videos in this section are generated from JOG3R.
Our T2V+C pipeline can reconstruct 3D cameras consistent with T2V->V2C.
For each pair of frames, we visualize only 10 correspondences to avoid clutter.
a living room with leather chairs and guitars
correspondences from T2V+C
|
correspondences from T2V->V2C
|
|---|---|
camera poses from T2V+C
|
camera poses from T2V->V2C
|
a backyard with steps leading up to a blue house
correspondences from T2V+C
|
correspondences from T2V->V2C
|
|---|---|
camera poses from T2V+C
|
camera poses from T2V->V2C
|
a hallway leading to a bathroom and bedroom
correspondences from T2V+C
|
correspondences from T2V->V2C
|
|---|---|
camera poses from T2V+C
|
camera poses from T2V->V2C
|
an aerial view of a large house on the water
correspondences from T2V+C
|
correspondences from T2V->V2C
|
|---|---|
camera poses from T2V+C
|
camera poses from T2V->V2C
|