Infinity⭐️: Unified Spacetime AutoRegressive Modeling for Visual Generation

Paper ID 2539

More Video Visualizations

1. Text-to-Videos 720p 5s

The video begins with The Flash, a superhero in a red and gold suit, sprinting at the speed of light along the bustling city's streets.

In a realistic shot with a panoramic view, the camera moves slowly to capture a young Chinese woman with long, straight hair and a mature demeanor. She is wearing a modest white dress that adds to her elegant appearance. The woman stands on a bustling street, with her hands in her pockets, walking casually. Occasionally, she turns her head to glance at the camera. The background shows the lively street, bustling with people, shops, and cafés.

In a garden full of blooming flowers, a butterfly moved gracefully among the blossoms, eventually settling on a vibrant petal while gently fluttering its wings.

A handsome man in a dark suit and white turtleneck sweater sits indoors, possibly in an office or conference room, deeply absorbed in a task. He looks down with focused concentration. Beside him, a charming woman in a white shirt leans in, appearing equally engaged in the same activity. The background is softly lit and blurred, emphasizing the intimate, concentrated, and collaborative atmosphere between the two. The camera remains still, capturing their focused interaction within this serene setting.

In a warmly lit indoor setting with clean and tidy walls and floors, a fluffy white puppy is happily sitting on a chair. The scene is captured with a realistic style and in a medium shot, with the camera remaining stationary. The puppy appears cheerful with its tongue playfully sticking out, adding to the cozy atmosphere of the room. The ample lighting further emphasizes the inviting and comfortable environment.

A graceful sika deer with a sleek, dappled coat and elegant antlers kneels by the edge of a babbling stream, its head lowered as it drinks with gentle, rhythmic laps. The deer's large eyes and alert ears convey a serene yet vigilant presence in the tranquil woodland setting. The camera pans slowly from a distance, capturing the deer's graceful posture and the soft ripples formed in the clear water, before zooming in for a closer view of its delicate face and the intricate pattern of its coat.

A cinematic slow-motion shot captures a sleek, black motorbike speeding away from the camera down a coastal highway. The rider, a handsome man wearing a snug leather jacket and helmet, experiences the wind as it dramatically whips his jacket. The motorbike's tires grip the asphalt with remarkable precision, while the sunlit ocean glistens magnificently in the background, creating a captivating scene. The camera smoothly follows the motorbike's rapid motion, enhancing the sense of speed and adventure.

Realistic footage captures a panoramic view of a mountain pasture, where several sheep are leisurely grazing on the lush grass. The camera moves gently across the scene, showcasing the backdrop of mountains covered with green vegetation. The entire setting exudes a serene and natural atmosphere.

A detailed wooden toy ship with intricately carved masts and sails is seen gliding smoothly towards the camera over a plush, blue carpet that mimics the waves of the sea. The ship's hull is painted a rich brown and adorned with tiny windows. The soft, textured carpet provides a perfect backdrop, resembling an oceanic expanse. Surrounding the ship are various other toys and children's items, hinting at a playful environment. The scene captures the innocence and imagination of childhood, with the toy ship's journey symbolizing endless adventures in a whimsical, indoor setting. The camera gently follows the ship's movement, enhancing the feeling of a serene voyage.

A man on horseback gallops energetically through the vast and arid Gobi Desert. The magnificent sunset behind him creates a dramatic silhouette, casting rich, warm hues across the expansive sands. The camera sweeps dynamically, capturing the man's focused expression and the horse's powerful strides, rendering the entire scene with a breathtaking cinematic quality.

2. Zero-shot Image to Video 5s 360p

In a medium shot with a stationary camera, a young Chinese boy is seated at a table, deeply engrossed in his studies. He appears to be around eight years old, with short black hair and a focused expression. The boy is writing math problems, diligently working on his assignment. On the table, a tablet displays the questions he is working on. The setting is quiet and serene, with books neatly arranged on a bookshelf in the background. The overall scene is simple, emphasizing his concentration and thoughtful approach to learning.

An astronaut, clad in a sleek, silver spacesuit that gleams under the harsh lunar light, runs with graceful, smooth strides across the moon's surface. The low-angle shot captures the astronaut's steady progress, set against a backdrop of the vast, desolate lunar landscape. As the camera slowly pans to follow the astronaut's motion, the surface appears textured and uneven, creating a vivid sense of realism and the feeling of light, buoyant movement in this otherworldly setting.

A camera focuses on the delicate hands of a beautiful Chinese woman in her twenties as she gently caresses her face. The scene is set in a softly lit bedroom, creating a serene and comforting atmosphere.

A handsome man in a red and black plaid shirt and bright yellow helmet confidently rides a motorcycle along a forested road, raising his right arm while holding the handlebars with his left. Surrounded by tall trees and lush greenery, the motorcycle’s headlight illuminates the path ahead as sunlight filters through the foliage, captured from a dynamic, smooth-following camera angle.

A 40-year-old Chinese woman is depicted in a realistic style, facing the camera for a close-up shot. She is wearing light pink cotton-linen homewear. The camera remains still as she drops essence into the palm of her hand, then presses her palms together to warm the liquid. With gentle movements, she applies it to her face, starting from the sides of her nose and gradually spreading it across her cheeks and forehead. Her actions are relaxed, and she appears serene. The setting is a softly lit study room, enhancing the peaceful atmosphere.

The video captures a medium shot of a pot filled with boiling Fuzhou fish balls, filmed from an overhead perspective, with the camera stationary. The fish balls are seen rolling in the bubbling water, showcasing their white, tender, and smooth surface. The setting is a home kitchen with plenty of light, a clean environment, and a cozy atmosphere.

In a dense, lush forest where sunlight streams through the canopy, creating dappled patterns of light on the forest floor, a majestic bear with thick, glossy brown fur is climbing a tall, sturdy tree. The bear's powerful claws firmly grip the rough tree bark as it ascends gracefully. The camera pans upwards, closely following the bear's movements, capturing the intricate details of the bear's fur and the textured surface of the tree, while the background remains a blur of vibrant greenery.

The camera slowly zooms in, focusing on the legs of a Chinese woman around 30 years old. She is wearing yoga pants in a gym setting, emphasizing the strength and power in her legs. Her expression is one of concentration and determination, as she is engaged in a workout. The ambient gym environment enhances the scene, highlighting her movements and dedication to fitness.

The video features a realistic depiction with a medium shot and an overhead angle, capturing a stationary view. The scene showcases a bowl of bird's nest porridge placed on a wooden table. The environment is clean and well-lit, with a light blue wall in the background. The main subject in the shot is a hand holding a transparent plastic spoon, scooping porridge from the bowl, effectively highlighting the porridge's smooth texture and the distinctive purple-red grains within it. The overall atmosphere is tranquil and harmonious.

Two fencers, both with focused expressions and strong, agile bodies, duel on a narrow platform, their eyes locked in concentration. Their blades move in quick, precise strikes and parries, glinting under bright overhead lights. Each fencer wears a perfectly fitted white uniform complete with protective masks that obscure their faces, adding an air of mystery to the intense match. The slender, lightweight blades slice through the air with sharp precision, a testament to their years of training and dedication to mastering the art of fencing. Each lunge is countered with a swift retreat, their feet gliding across the floor in a series of rapid, rhythmic steps, resembling a perfectly coordinated dance. The metallic clash of swords punctuates the quick exchanges, creating a dramatic symphony of sound, while their white uniforms flutter with each agile motion, giving a sense of fluidity and grace to their fierce competition.

3. Zero-shot Video Extrapolation: 10s 360p video extrapolation using a 2s input prefix

The video features a realistic wide shot captured at eye level, with the camera steadily moving from a distant point to a closer perspective. It showcases a red train traveling through a snowy landscape, where the tracks are flanked by pine forests heavily blanketed in snow. Set against a winter wonderland backdrop, the natural lighting enhances the serene atmosphere. The train proceeds slowly along the curving tracks, presenting a tranquil and beautiful winter scene.

In a medium shot with a stationary camera, a young Chinese boy is seated at a table, deeply engrossed in his studies. He appears to be around eight years old, with short black hair and a focused expression. The boy is writing math problems, diligently working on his assignment. On the table, a tablet displays the questions he is working on. The setting is quiet and serene, with books neatly arranged on a bookshelf in the background. The overall scene is simple, emphasizing his concentration and thoughtful approach to learning.

The camera slowly zooms in on the lips of a 30-year-old Chinese woman who is applying moisturizing lipstick. She has a natural smile adorning her face, set against a bright indoor background.

In a realistic style and medium shot, a camera steadily moves to follow a middle-aged Chinese woman. She wears a light brown top paired with blue jeans and holds a smartphone in her hand. She walks casually inside a mall. In the background, there is a clothing store and a wall adorned with cartoon patterns. The ambient lighting is ample, creating a relaxed atmosphere.

In a serene minimalist room with white walls, light floors, and subtle decor, a man in a white tee and black shorts stands near a wooden desk with a black lamp and YouTube logo item, gesturing expressively. Soft natural light enhances the calm ambiance as the camera smoothly captures his fluid movements against a backdrop of a beige couch and black stool.

A man’s hand firmly holds a wine glass in a warmly lit, cozy family gathering.

A stationary medium close-up captures a Chinese man in his twenties sitting on a bed, leaning forward and gently pressing his abdomen, wearing a white cotton long-sleeve shirt and light gray pants. His pale, fatigued face with furrowed brows and downturned mouth gazes at the bed linens in a simply furnished, softly lit bedroom with a wooden headboard.

In a cozy living room, a fluffy orange cat with a black pirate hat perched jauntily on its head and a tiny red bandana around its neck rides a circular robot vacuum cleaner with an air of adventurous mischief. The cat's bright green eyes glint with excitement as it maintains balance, its tail swishing behind. The robot hums steadily as it glides across the wooden floor, gently bumping into a soft, patterned rug. As the camera pans slowly to follow the vacuum's journey, it captures the sunlight streaming through a nearby window, casting a warm glow over the scene. The overall motion is smooth and whimsical, as if the cat were on a mini voyage across an ocean of floorboards, ready to embark on its next adventure in a world where the ordinary transforms into a playful fantasy.

A person sits on a comfortable train seat, gazing out the window at the passing landscapes. The individual is dressed casually, wearing a soft woolen sweater and jeans. Her hair is styled loosely, adding to her relaxed demeanor. The train moves smoothly forward, providing a serene view of rolling hills, expansive fields, and dense forests that change as the train progresses. The camera follows the train's steady movement, capturing glimpses of the natural scenery as it blurs gently in the background, creating a peaceful and reflective atmosphere.

an elderly Chinese woman in a light blue silk outfit performing slow-motion Tai Chi on a frost-covered bamboo-lined path at sunrise, bathed in soft golden autumn light.