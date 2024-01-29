Google's latest video generation AI model, Lumiere, employs a novel diffusion model named Space-Time-U-Net (STUNet). This model determines the spatial location of objects in a video and how they change over time. Lumiere initiates the process by creating a base frame from a given prompt. Utilising the STUNet framework, it predicts the movement of objects within the frame, generating additional frames that seamlessly flow into each other, creating the illusion of smooth motion. Lumiere produces 80 frames, a notable increase compared to Stable Video Diffusion's 25 frames.
Unlike other models that piece together videos from generated keyframes, where movement has already occurred, STUNet allows Lumiere to concentrate on the movement itself based on the anticipated location of generated content at specific times in the video.