# Improving Dynamic Object Interactions in Text-to-Video Generation with AI Feedback

This page is supplemental material, including example videos generated by text-to-video diffusion models.
The contents in README.md and README.html are the same.


## RL-Finetuning with AI Feedback
We investigate the recipe for improving dynamic interactions with objects in text-to-video models by leveraging external feedback. We first generate videos from the pre-trained models, and then put the AI feedback and reward labels on the generated videos. For the choice of feedback, we test metric-based feedback on semantics, human preference, and dynamics, and also propose leveraging the binary feedback obtained from large-scale VLMs capable of video understanding (such as Gemini, GPT). Those data are leveraged for offline and iterative RL-finetuning.

<video src="./asset/RLAIF_VDM_241130-2.mp4" controls="true" width="800"></video>

## Example Videos

1. Prompt: ***taking rose bud from bush***
    - **Pre-Trained**

    <video src="./asset/rose_pt.mp4" controls="true"></video>

    - **RL-Finetuned (AIF)**

    <video src="./asset/rose_aif.mp4" controls="true"></video>


2. Prompt: ***taking a pen out of the book***
    - **Pre-Trained**

    <video src="./asset/taking_a_pen_out_of_the_book_pt.mp4" controls="true"></video>

    - **RL-Finetuned (AIF)**

    <video src="./asset/taking_a_pen_out_of_the_book_aif.mp4" controls="true"></video>



3. Prompt: ***taking one body spray of many similar***
    - **Pre-Trained**

    <video src="./asset/spray_pt.mp4" controls="true"></video>

    - **RL-Finetuned (AIF)**

    <video src="./asset/spray_aif.mp4" controls="true"></video>



4. Prompt: ***tearing  receipt into two pieces***
    - **Pre-Trained**

    <video src="./asset/tearing_receipt_into_two_pieces_pt.mp4" controls="true"></video>

    - **RL-Finetuned (AIF)**

    <video src="./asset/tearing_receipt_into_two_pieces_aif.mp4" controls="true"></video>


5. Prompt: ***pushing a bottle so that it falls off the table***
    - **Pre-Trained**

    <video src="./asset/pushing_a_bottle_so_that_it_falls_off_the_table_pt.mp4" controls="true"></video>

    - **RL-Finetuned (AIF)**

    <video src="./asset/pushing_a_bottle_so_that_it_falls_off_the_table_aif.mp4" controls="true"></video>
