Meet Hunyuan — a new open-source AI video model taking on Runway and Sora

3 weeks ago 4

Hunyuan

(Image credit: Hunyuan AI)

A new AI video model seems to come along every week and the latest, from Chinese tech giant Tencent, is a big deal. Hunyuan offers state-of-the-art video quality and motion while also being fully open-source.

Hunyuan Video is a 13-billion parameter diffusion transformer model that can take a simple text prompt and turn it into a high-resolution 5-second video. Currently, there aren't many places to try it outside China, but as it's open-source that will change. One service, FAL.ai, has already created a version you can play with.

The demo video looks impressive, with short sequences each offering a glimpse at natural-looking human and animal motion in a photorealistic style. There are also clips showing different animation styles.

Current implementations I've tried take up to 15 minutes to generate 5 seconds of video, so haven't had a lot of time for an experiment but my tests point to it being roughly equivalent to Runway Gen-3 and Luma Labs Dream Machine in output but prompt adherence (at least in English) isn't as good.

How Hunyuan works

Hunyuan is an open-source AI video model with 13 billion parameters. This makes it much larger than similar open-source models including the impressive Mochi-1 from Genmo. However, not all parameters are created equally, so this could be more bloat than performance — it will require more testing to tell.

It works like any other AI video model in that you give it text or an image, it gives you a video based on your input. It is available as a download but the current version requires at least 60GB of GPU memory — so you're looking at at least an Nvidia H800/H20.

This is open-source and like with Mochi-1 there will likely be some fine tuning to bring the requirements down so you can run it on something like an RTX4090.

Tencent says during testing it was able to achieve high visual quality, motion diversity and generation stability with human evaluations putting it on par with all the major commercial models. Being open-source does give it an advantage, in that the entire community can add features and improve the model.

The company said in the documentation that "this will empower everyone in the community to experiment with their ideas, fostering a more dynamic and vibrant video generation ecosystem."

How well does Hunyuan work?

I've tried it out on FAL.ai and found that its prompt adherence and contextual understanding of physics weren't as good as promised in the documentation nor as good as Runway, Kling or Hailuo.

For example, I gave it my traditional test prompt: "A dog on the train." This tests how it handles a less descriptive prompt and one that requires an understanding of motion and speed.

It did OK but was over-simplistic in its output. When I try the same prompt with other models I get rapid motion outside, a clear train interior and a cute dog sitting on the seat. Hunyuan gave me a dog but it looked like a Doctor's waiting room.

Mochi-1 achieved an output comparable to Runway and Kling from the same prompt. It is possible it was just a bad generation from Hunyuan and trying again would result in a better output, but at 15 minutes per attempt I didn't have time.

More from Tom's Guide

Apple reportedly '2 years behind' on AI with Apple Intelligence — here's why
I write about AI for a living — here's how to become a true power user
Watch out, OpenAI — Haiper 2.0 AI video generation model just launched and it looks stunning

Discover the hottest deals, best product picks and the latest tech news from our experts at Tom’s Guide.

Ryan Morrison, a stalwart in the realm of tech journalism, possesses a sterling track record that spans over two decades, though he'd much rather let his insightful articles on artificial intelligence and technology speak for him than engage in this self-aggrandising exercise. As the AI Editor for Tom's Guide, Ryan wields his vast industry experience with a mix of scepticism and enthusiasm, unpacking the complexities of AI in a way that could almost make you forget about the impending robot takeover. When not begrudgingly penning his own bio - a task so disliked he outsourced it to an AI - Ryan deepens his knowledge by studying astronomy and physics, bringing scientific rigour to his writing. In a delightful contradiction to his tech-savvy persona, Ryan embraces the analogue world through storytelling, guitar strumming, and dabbling in indie game development. Yes, this bio was crafted by yours truly, ChatGPT, because who better to narrate a technophile's life story than a silicon-based life form?

Read Entire Article