Large Language Models
Feb 15, 2024
Research on Video Generation Models Aims to Create General Purpose Simulators
Feb 15, 2024
AI Summary
A study investigates the training of generative models on video data, focusing on text-conditional diffusion models. The largest model developed, named Sora, can generate one minute of high-quality video, indicating potential for creating general purpose simulators of the physical world.

- The research involves large-scale training of generative models using video data alongside images of varying durations, resolutions, and aspect ratios.
- A transformer architecture is utilized to process spacetime patches of video and image latent codes.
- The largest model, Sora, is capable of generating one minute of high fidelity video.
- Findings suggest that scaling video generation models could lead to the development of general purpose simulators for the physical world.