wanx_1: A man in a blue suit is giving a speech.
wanx_2: A man dressed in a suit is adjusting his tie.
wanx_3: An East Asian man with short, dark hair pulled back from his face looks downward. 
wanx_4: An older man with short black hair wearing a white shirt, dark vest, and patterned tie stands looking at another person off-screen.

vc_1: An Asian man with short black hair is upset. He wipes tears with one hand.
vc_2: A person stands in front, with others behind, all dressed in white lab coats.
vc_3: A young Black woman with long, dark hair. She wears a green strapless top.

The videos showcased, from left to right, are baseline (denotes videoDPO), +IPR (in videos named win reward), +ARS (in videos named reweighted dpo) and our method (PG-DPO).
