Kling is one of the best AI video platforms on the market. It includes video and image generation as well as a virtual try-on model. The latest release of the core video model, v1.6, takes things to a new level with improved realism and prompt adherence.
According to Kling the latest release has an uncanny ability to follow complex instructions including specific camera movements, timing changes and visual structure of the scene.
AI video is a rapidly evolving field across both open source and private models. Companies like Runway are working directly with movie studios, and Pika’s latest model allows for near perfect character consistency and visual control.
OpenAI unveiled Sora in December but using it I found its no better than Runway, Pika, Kling or the other big Chinese model MiniMax from Hailuo. The real challenger now is Google’s limited availability Veo 2 which seems to have an impressive understanding of physics.
Putting Kling v1.6 to the test
To see if Kling v1.6 lives up to the hype I’ve come up with 7 complex prompts, some using an image as the kickoff and some just straight text-to-video. Each prompt makes use of the base settings and is set to generate 5 seconds of video.
1. Soccer goalkeeper
Sport is often a struggle for AI video models. Before settling on soccer I attempted cricket and baseball but neither worked effectively. I found a fixed camera behind a goal worked the best in getting the most out of Kling.
The prompt: "A Premier League goalkeeper makes a diving save under floodlights. Set at pitch level, directly side-on to the goal. The keeper, wearing a vibrant yellow kit, springs right to left across the frame in slow motion (60fps). Their body fully horizontal at peak stretch, fingertips just grazing the ball as it heads for the top corner. Stadium lights create a natural rim light around the keeper's outline. Deep green pitch and crisp white goal frame provide clear contrast. The camera position remains completely fixed throughout the 5-second sequence."
2. A Roman battlefield
This might be the best AI video I've ever generated using any tool. The cut to the second scene is stunning, the first scene with the Roman legion is like something from a movie. The problem is it doesn't really follow the prompt that well...
The prompt: "A Roman legionary, caked in mud and blood, drops his sword in disbelief. As he sinks to his knees, the camera begins at ground level capturing the sword impacting the earth in slow motion at 120fps, then arcs overhead at normal speed to reveal an entire battlefield frozen in time, except for a soft breeze moving through the grass. End with a rack focus from the lone soldier to the horizon where ravens take flight."
3. Sushi chef (image-to-video)
A common test for AI video is how well it cuts something up close. The only one I've seen do it properly is Veo 2 but I don't think Kling does a bad job here. The first cut is terrible with nothing visible but the second one gets closer.
The prompt: "Initial frame holds for 1.5 seconds, then transition to 96fps slow motion. Track the knife work with continuous lateral movement, maintaining focus on the blade's edge while the background blurs. At the midpoint, split screen vertically - left side continues at 96fps, right side returns to 24fps, showcasing the contrast between the artistic slow motion and the chef's actual speed. Camera gradually elevates to 45° angle, ending with both screens syncing back to 24fps for the final plating flourish."
4. Victorian pocket watch
As a rule I'd normally take the first response, but this was a test of how well I could get Kling to work. I gave it a few attempts, all using the same prompt and on the third attempt I achieved the desired output.
The prompt: "A Victorian-era pocket watch falls through multiple environments: Start in a wood-panelled study (duration: 2s, warm lighting), transition through a modern subway station (1.5s, fluorescent flicker effect), then a quantum realm with floating mathematical equations (2s, ethereal blue glow), finally landing in desert sand (1s). Camera maintains focus on the watch's face throughout, which should reflect each environment while its hands spin increasingly faster. End with the sand blast from impact revealing the watch is actually a sophisticated time machine."
5. Ballet dancer (image-to-video)
I wanted this to be a ten second video but when I tried it with a ten second prompt the transition wasn't as effective, so I'm sharing the short version. The ballerina doesn't move very much but the way it transforms is stunning.
The prompt: "Begin with image frozen, crane up from floor level at 15° per second. At 45° elevation, initiate move at 60fps while adding 3 subtle motion echoes trailing the dancer by 0.3 seconds each. Camera continues rising to overhead position while rotating clockwise 180°, revealing an impossible scene: the stage below has become a mirror dimension where a counter-choreography happens in reverse. Maintain principal dancer in top third of frame throughout, with dramatic lighting transition from stage spots to cosmic auroras."
6. The Flying Scotsman (image-to-video)
My first attempt at creating a visual of a train failed miserably, creating a head on view of a train too large for the tunnel. I tried again, starting instead with an image showing an old steam train from above. That is when it worked perfectly.
The prompt: "Flying Scotsman steaming through the Yorkshire Dales, viewed from 200 feet directly above. Begin with a stationary aerial shot as the locomotive approaches, its steam trail creating elegant patterns against the green landscape. Camera remains fixed overhead, letting the train pass directly beneath, steam billowing outward to create concentric patterns in the morning air. The length of the train is revealed in perfect symmetry from above, Brunswick Green carriages contrasting with the pastoral landscape and winding dry stone walls."
7. Potter at work (image-to-video)
Finally, a close up of hands on clay as a potter crafts a work of art. It didn't follow the prompt perfectly, but did get the structure correct. The pattern wasn't properly captured but it did correctly jump from the moulding to the final pot.
The prompt: "Ultra-close macro on hands transforming clay, maintaining 80mm equivalent focal length throughout. Every 30° camera rotation transitions between four distinct cultural pottery styles (Greek, Japanese, Native American, Contemporary) with authentic period-accurate techniques and environmental details reflected in the artist's workspace. Audio-reactive vibrations in the clay respond to a heartbeat sound design that intensifies with each transition. Ending reveals four finished pieces arranged in a ceremonial pattern, with the original artist reflected in each glaze."
Conclusion
Kling v1.6 is the best all round AI video model I've used but it does have some of the same issues as other models such as Runway and Hailuo. It also doesn't match the consistency of the Pika Ingredients feature but it is overall better than the others.
Its text-to-video model is particularly impressive. Others struggle to attain the level of realism seen in image generators. There is still a long way to go before we get to true realism with AI video, but we're getting close.
More from Tom's Guide
- I sent myself on a series of incredible adventures with AI
- 5 best OpenAI Sora alternatives for generating AI videos now
- OpenAI unveils Sora AI video generator — here's how to try it