Genmo, the San Francisco-based open-source AI video lab, has just announced a new add-on for its Mochi-1 state-of-the-art video generation model.
The new fine tuner tool lets users customize their video output as they want, by seeding the target video with a modest number of additional training clips. The ability to fine-tune video output like this is not new, but this is the first time we have seen it released into the market as an open-source video product.
The tuning is done using standardized LoRA technology, which has long been used to fine-tune image models in order to create a desired output.
By using low-rank adapters like this, users can take a generalized model and customize it to taste. One example could be in creating product images with a specific logo and having them appear in videos.
What is Mochi-1?
Mochi 1 caused quite a stir when it launched because of its superb quality video output, so this latest development is a significant milestone in the race towards cinema quality, versatile video output. It is available from the Genmo website.
As with many recent AI announcements, the Mochi 1 fine-tuner is less of a mass-market product and more of a research experiment. This early iteration, while designed to work with one graphics card, will only work on systems with an expensive top-end graphics processor with at least 60 gigabytes of VRAM. Which will immediately put it out of the reach of ordinary mortals.
The launch demo suggests that you’ll need no more than a dozen video clips to fine-tune the model to your needs, which is a pretty impressive feat for video. But interested parties will also need to have a good deal of familiarity with coding and command line interfaces in order to get the system working. Not for the faint of heart, therefore.
The value of open-source
Open-source video seems to be the flavor du jure, judging by the amount of announcements coming down the pipe. The Allegro-T12V model dropped this week, another open-source video technology that holds promise.
It gives six seconds of 720p video from a text prompt, but the key feature is it all happens inside 9GB of VRAM, which sounds like an excellent use of space.
Again, there’s no fancy wrapper to make it easy for end-users at the moment, but hopefully, it will come soon.
In the meantime, I’m just gonna sit here with my jumbo box of hot buttered popcorn and keep looking towards the door for the arrival of Sora. Whatever happened to Sora? Anybody know? Any guesses? Sam? Any one?
More from Tom's Guide
- Apple reportedly '2 years behind' on AI with Apple Intelligence — here's why
- I write about AI for a living — here's how to become a true power user
- Watch out, OpenAI — Haiper 2.0 AI video generation model just launched and it looks stunning