One of the latest trends in the generative AI space is AI agents, and Google may be prepping its own agent to be a feature of an upcoming Gemini large language model (LLM).
The development, called Project Jarvis, is an AI agent based within the Google Chrome browser that will be able to execute common tasks after being given a short query or command with more independence than before. The inclusion of AI agents in the next Chrome update has the potential to be the biggest overhaul since the browser launched in 2008, according to The Information.
Google has already demonstrated how Gemini can be a shopping companion or trip planner with its current Gemini 1.5 Pro model, which powers the Gemini Advanced chatbot. In a future model, Project Jarvis will be able to come full circle by tackling various tasks such as visiting websites, filling out forms, and making payments to complete the query given by a user.
For example, a Gemini 1.5 Pro model would be able to execute the query “plan me a vacation in December with a $2,000 budget”; Project Jarvis would likely be able to execute the query “plan me a vacation in December with a $2,000 budget. Book the flights and hotel and send the details to my email,” Tom’s Guide noted.
Jarvis, or “Just A Rather Very Intelligent System” is a nod to Tony Stark’s AI assistant in Iron Man. While the official name remains currently unknown, the codename is fitting as Google adds capabilities to its model.
Google has shared details about AI agents since May at its Google I/O developers conference, where it noted that the technology would assist in the processing of speech and video content and enable faster response times in conversations.
As said, Project Jarvis may be among the features on Google’s next LLM, which is rumored to be Gemini 2.0. The generative AI may be announced in early December, but there is no word on how widely it will be available, The Information noted.
As Google potentially prepares to release its own AI agent, many other brands have already showcased their own iterations of similar functions. The brand Anthropic recently revealed its Computer Use agent, which allows the Claude LLM to serve independently as a device user to complete tasks. The feature is currently in beta. Similarly, OpenAI’s Swarm Framework is a system of agents that work together to complete tasks. The company stated that this technology remains in the research and educational experiment phase.
Fionna Agomuoh is a Computing Writer at Digital Trends. She covers a range of topics in the computing space, including…
Google AI helped researchers win two Nobel Prizes this week
It's been another insane week in the world of AI. While Tesla CEO Elon Musk was debuting his long-awaited Cybercab this week (along with a windowless Robovan that nobody asked for), Google's AI was helping researchers win Nobel Prizes, Zoom revealed its latest digital assistant, and Meta sent its Facebook and Instagram chatbots to the U.K.
Check out these stories and more from this week's top AI headlines.
Google's AI helped researchers win two Nobel Prizes
Read more
Google expands its AI search function, incorporates ads into Overviews on mobile
Google announced on Thursday that it is "taking another big leap forward" with an expansive round of AI-empowered updates for Google Search and AI Overview.
Earlier in the year, Google incorporated generative AI technology into its existing Lens app, which allows users to identify objects within a photograph and search the web for more information on them, so that the app will return an AI Overview based on what it sees rather than a list of potentially relevant websites. At the I/O conference in May, Google promised to expand that capability to video clips.
With Thursday's update, "you can use Lens to search by taking a video, and asking questions about the moving objects that you see," Google's announcement reads. The company suggests that the app could be used to, for example, provide personalized information about specific fish at an aquarium simply by taking a video and asking your question.
Whether this works on more complex subjects like analyzing your favorite NFL team's previous play or fast-moving objects like identifying makes and models of cars in traffic, remains to be seen. If you want to try the feature for yourself, it's available globally (though only in English) through the iOS and Android Google App. Navigate to the Search Lab and enroll in the “AI Overviews and more” experiment to get access.
You won't necessarily have to type out your question either. Lens now supports voice questions, which allows you to simply speak your query as you take a picture (or capture a video clip) rather than fumbling across your touchscreen in a dimly lit room.
Your Lens-based shopping experience is also being updated. In addition to the links to visually similar products from retailers that Lens already provides, it will begin displaying "dramatically more helpful results," per the announcement. Those include reviews of the specific product you're looking at, price comparisons from across the web, and information on where to buy the item.
Read more
Google’s Gemini Live now speaks nearly four-dozen languages
Google announced Thursday that it is making Gemini Live available in more than 40 languages, allowing global users (no longer just English speakers) to access the conversational AI feature, as well as enabling the full Gemini AI to connect with additional Google apps in more languages.
Gemini Live is Google's answer to OpenAI's Advanced Voice Mode or Meta's Voice Interactions. The feature enables users to converse with the AI as if it were another person, eliminating the need for text-based prompts. Gemini Live made its debut in May during the company's I/O 2024 event and was initially released for Gemini Advanced subscribers in August before being made available to all users (on Android, at least) in September.
Read more