Elon Musk's artificial intelligence company, xAI, has unveiled a major new update to its AI assistant called Grok. The latest iteration now incorporates vision capabilities, enabling Grok to analyze and comprehend images, alongside its existing text functionalities.
Grok can already generate images using the Flux model from Black Forest Labs and it was the last of the major AI chat products not to include image analysis, also known as AI vision.
With the introduction of this vision feature, Grok can analyze images linked to posts on the X platform, interpret visual content such as documents, diagrams, and photographs and understand spatial relationships within images to help better describe the contents.
You could use this to come up with recipe ideas based on a photo of ingredients, identify the location of a landmark inside a photo shared on X or even explain the results of a graph. The last part could be particularly useful on a news-heavy platform like Grok.
How vision works in Grok
Users will soon notice a new button on posts containing images on the X platform. When clicked it sends the image to Grok, allowing users to pose questions or request analyses of the visual content. It could also be used to help with describing images for people with sight issues.
We haven’t seen official benchmarks yet but according to xAI Grok's vision capabilities hold their own against established models from OpenAI, Google and Anthropic. To this end, the company has introduced a new benchmark, RealWorldQA, designed to evaluate the model’s proficiency in understanding and reasoning about the physical world through images.
The announcement led to varied reactions from the AI community and users with some enthusiastic about how fast Grok is advancing, while others remained cautious, questioning its performance against established AI models.
What comes next for Grok
Elon Musk-owned xAI has a 200,000 GPU data center built for the sole purpose of training future versions of Grok. I think it's safe to say we’re going to see big things from the model in the future.
Specifically related to vision capabilities, these could find their way into robots. Musk owns Tesla, which also has its own robotics division. In the future, we may also see video and voice analysis from Grok as these are features already in place with Gemini and ChatGPT.
While this update marks a notable advancement for Grok, it's clear that the model is still in development compared to more mature AI models like Gemini or ChatGPT. As with all rapidly evolving AI technologies, we'll need to monitor both the upgraded capabilities and the ethical considerations of these developments in the months ahead.
More from Tom's Guide
- OpenAI shares a new GPT-4o advanced voice demo — it can teach you a language
- ChatGPT Advanced Voice is out — 9 examples showing why you should be excited
- ChatGPT-4o Advanced Voice features — OpenAI just revealed when they’re coming