Abstract: This paper asks, whether in pose-aware variational autoencoders where the pose of each training view is already known, it is better to apply that pose through an exact geometric transformation or to give pose coordinates to a decoder and ask the decoder to learn the rendering rule from data. We study this question in a deliberately simple two-dimensional silhouette setting. The encoder extracts object content from two rotated views, the decoder predicts one canonical silhouette, and the proposed model renders each posed view by an analytic rotor-induced image warp. The matched baseline uses the same encoder, latent size, and decoder width, but concatenates the pose code to the decoder input. Across three random seeds in a compressed-capacity setting, the rotor pathway improves validation canonical binary cross-entropy from $0.0919 \pm 0.0022$ to $0.0889 \pm 0.0051$, improves thresholded canonical Dice from $0.8339 \pm 0.0042$ to $0.8407 \pm 0.0035$, improves thresholded view Dice from $0.7981 \pm 0.0119$ to $0.8352 \pm 0.0030$, and reduces relative-pose composition error from $0.0319 \pm 0.0022$ to $0.0092 \pm 0.0004$. The baseline obtains lower probabilistic view cross-entropy, which we interpret cautiously because its smoother predictions can reduce cross-entropy while giving worse thresholded shape agreement. These results support the value of explicit analytic warping for this known-pose canonicalisation problem. They do not establish that planar Geometric Algebra is superior to all analytic matrix or spatial-transformer warps. Finally, we outline how the same design principle could be carried to three-dimensional rotors and Conformal Geometric Algebra motors, but we leave that extension as future empirical work.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Francisco_J._R._Ruiz1
Submission Number: 8698
Loading