Grok gets eyes — X-based chatbot can now analyze images

3 weeks ago 3

(Image credit: Shutterstock)

Elon Musk's artificial intelligence company, xAI, has unveiled a major new update to its AI assistant called Grok. The latest iteration now incorporates vision capabilities, enabling Grok to analyze and comprehend images, alongside its existing text functionalities.

Grok can already generate images using the Flux model from Black Forest Labs and it was the last of the major AI chat products not to include image analysis, also known as AI vision.

With the introduction of this vision feature, Grok can analyze images linked to posts on the X platform, interpret visual content such as documents, diagrams, and photographs and understand spatial relationships within images to help better describe the contents.

You could use this to come up with recipe ideas based on a photo of ingredients, identify the location of a landmark inside a photo shared on X or even explain the results of a graph. The last part could be particularly useful on a news-heavy platform like Grok.

How vision works in Grok

Users will soon notice a new button on posts containing images on the X platform. When clicked it sends the image to Grok, allowing users to pose questions or request analyses of the visual content. It could also be used to help with describing images for people with sight issues.

We haven’t seen official benchmarks yet but according to xAI Grok's vision capabilities hold their own against established models from OpenAI, Google and Anthropic. To this end, the company has introduced a new benchmark, RealWorldQA, designed to evaluate the model’s proficiency in understanding and reasoning about the physical world through images.

The announcement led to varied reactions from the AI community and users with some enthusiastic about how fast Grok is advancing, while others remained cautious, questioning its performance against established AI models.

Here at Tom’s Guide our expert editors are committed to bringing you the best news, reviews and guides to help you stay informed and ahead of the curve!

What comes next for Grok

Elon Musk-owned xAI has a 200,000 GPU data center built for the sole purpose of training future versions of Grok. I think it's safe to say we’re going to see big things from the model in the future.

Specifically related to vision capabilities, these could find their way into robots. Musk owns Tesla, which also has its own robotics division. In the future, we may also see video and voice analysis from Grok as these are features already in place with Gemini and ChatGPT.

While this update marks a notable advancement for Grok, it's clear that the model is still in development compared to more mature AI models like Gemini or ChatGPT. As with all rapidly evolving AI technologies, we'll need to monitor both the upgraded capabilities and the ethical considerations of these developments in the months ahead.

More from Tom's Guide

OpenAI shares a new GPT-4o advanced voice demo — it can teach you a language
ChatGPT Advanced Voice is out — 9 examples showing why you should be excited
ChatGPT-4o Advanced Voice features — OpenAI just revealed when they’re coming

Ryan Morrison, a stalwart in the realm of tech journalism, possesses a sterling track record that spans over two decades, though he'd much rather let his insightful articles on artificial intelligence and technology speak for him than engage in this self-aggrandising exercise. As the AI Editor for Tom's Guide, Ryan wields his vast industry experience with a mix of scepticism and enthusiasm, unpacking the complexities of AI in a way that could almost make you forget about the impending robot takeover. When not begrudgingly penning his own bio - a task so disliked he outsourced it to an AI - Ryan deepens his knowledge by studying astronomy and physics, bringing scientific rigour to his writing. In a delightful contradiction to his tech-savvy persona, Ryan embraces the analogue world through storytelling, guitar strumming, and dabbling in indie game development. Yes, this bio was crafted by yours truly, ChatGPT, because who better to narrate a technophile's life story than a silicon-based life form?

Read Entire Article