This video is a little bit sloppy, but its about a development feature - it will be in a release soon.
I just did some videos about how to setup the AI Interpolator to generate extra media data for the Media module for image, document and audio types.
This got me thinking about a way to do video as well, and I spent some hours yesterday night coding this. It's available currently in the development version of the module, but hopefully it will also soon be available in the next release.
What this does is rasterizing up all the scenecuts/keyframes from the video and raster them on multiple images. Then it also extracts the audio and sends that to OpenAI Whisper to get a transcription.
Then your prompt is sent together with these two components as a context for it to be able to "see" and "hear" the video.
Note that this only works with FFMpeg installed on your server.