From Silence to Sound:
Towards Audio-Visual Subject Customization

Paper ID: 972
Anonymous Authors

Showcases

Reference Videos

Reference Videos

Reference Videos

Reference Videos

Reference Videos

Reference Videos

Reference Videos

Reference Videos

 


Comparisons with previous methods

The video depicts a scene set in what appears to be a historical or fantasy context, possibly during the Viking era. The character in focus has long hair and a beard, and is wearing a fur-lined garment, suggesting a cold environment. The background is dimly lit, with smoke or mist visible, adding to the atmosphere of an ancient or medieval setting. The character seems to be engaged in a serious conversation or interaction with another person who is partially visible on the right side of the frame. The overall mood of the scene is intense and dramatic.

Reference Videos

SadTalker

Aniportrait

Hallo3

Ours

The image depicts two characters in a dimly lit room, engaged in a conversation. The character on the right is wearing a dark, ornate tunic with intricate designs and a brooch at the collar. This character has short, light-colored hair and appears to be speaking or reacting to something. The character on the left, whose back is partially turned to the camera, has long, braided hair and is wearing a simple, light-colored garment. The setting suggests a medieval or fantasy context, possibly from a television show or movie. The lighting creates a dramatic atmosphere, highlighting the expressions and details of their attire.

Reference Videos

SadTalker

Aniportrait

Hallo3

Ours

In the video, a man in a suit and tie is walking down a street with a colorful mural on the wall behind him. He appears to be engaged in a conversation with another person who is partially visible on the left side of the frame. The man in the suit has a backpack over one shoulder and seems to be gesturing or explaining something as he walks. The setting suggests an urban environment with various signs and advertisements in the background. The interaction between the two individuals seems friendly and animated.

Reference Videos

SadTalker

Aniportrait

Hallo3

Ours

The video features two animated characters in a dimly lit, underwater-like environment with a greenish-blue hue. The character on the left is a young boy with black hair, wearing a red jacket over a white shirt and blue jeans. He appears to be speaking or reacting with an open mouth and expressive eyes. The character on the right is facing away from the camera, wearing a purple jacket and holding what looks like a piece of paper or a book. The background consists of rocky surfaces and water, suggesting they are in a cave or underground setting. The overall atmosphere is mysterious and slightly eerie.

Reference Videos

SadTalker

Aniportrait

Hallo3

Ours

The video features a man in a suit, standing indoors. He is wearing a patterned blazer over a white shirt and tie. The man has a cigarette in his mouth and appears to be speaking or reacting to something. The background includes framed pictures on the wall and a candelabra on a table to the left. The setting suggests a formal or professional environment, possibly an office or a study. The lighting is soft, creating a calm and composed atmosphere.

Reference Videos

SadTalker

Aniportrait

Hallo3

Ours

 


Ablations on Region-Selective Audio CFG

w/o CFG Global CFG
Local CFG (Ours) Attention Map

Ablations on Decoupled Audio-Visual Learning

w/o audio pt. w/o decoupling
Ours