OpenAI to Release an AI Agent “Operator” Soon: What We Know So Far

6 hours ago 2

It’s not every day you hear about an AI tool that could literally take the wheel of your computer and do tasks for you. Yet, that’s exactly what the buzz is around OpenAI’s rumored “Operator.” While OpenAI hasn’t officially confirmed the release date, recent leaks suggest the launch could happen soon. Here’s what Operator is and what we know so far.

Let’s Start with the Big Leak

A software engineer named Tibor Blaho, who’s been pretty accurate about AI product leaks in the past, found some interesting clues in the ChatGPT macOS desktop app. Hidden menus in the macOS desktop app allow users to set shortcuts for Toggle Operator and Force Quit Operator.

Why is that such a big deal? Because it aligns with earlier rumors that OpenAI has been working on a secret, agentic system capable of doing complex tasks on your behalf. This has been tentatively dubbed Operator.

What Exactly Is Operator?

Think of Operator as the AI assistant that does not just reply but rather does the tasks for you on your device. Whether that’s booking flights online, launching apps, or writing and testing code, Operator will handle multi-step tasks without needing constant human input. It’s essentially an AI agent that can “see” and “click around” your computer with little or no human assistance.

In simple terms, Operator automate tasks like Google Assistant or Alexa—but far more intelligently. For example, if you ask the Operator to “Send an email to John summarizing my recent meeting”:

It searches your notes or even transcribes meeting recordings to extract key points.
It generates a summary.
It personalizes the tone based on previous emails to John.
Finally, it sends an email—possibly after a quick verification from you.

In contrast, Google Assistant or Alexa would require you to provide the subject and body of the email, leaving much of the task in your hands. Of course, that also means Operator might make mistakes, which can be terrifying to think about.

Not just OpenAI, other AI companies are also developing their own AI agents. For example, Google is working on Project Mariner, which is designed to perform tasks within the Chrome browser. Similarly, Anthropic has introduced Claude Computer Use, which can currently control a virtual PC. Even open-source developers are getting involved in the buzz surrounding AI agents.

Performance Leaks: The Good vs. The Not-So-Good

While the idea of having a personal digital assistant that never sleeps is exciting, the reality—at least for now—is that it’s far from perfect. According to leaked benchmarks (uncovered by Blaho):

OpenAI’s Computer Use Agent (CUA), which could be the core AI model running Operator, scores 38.1% on a test called OSWorld. Humans reportedly score 72.4% on the same test.
On WebVoyager, which measures how well an AI navigates and interacts with websites, Operator actually surpasses human scores. That’s promising!
However, on WebArena, another web-based benchmark, it doesn’t quite reach human-level performance.
In tasks like signing up for a cloud provider and launching a virtual machine, Operator only succeeded 60% of the time.
For creating a Bitcoin wallet, it was successful just 10% of the time.

OpenAI website already has references to Operator/OpenAI CUA (Computer Use Agent) – "Operator System Card Table", "Operator Research Eval Table" and "Operator Refusal Rate Table"

Including comparison to Claude 3.5 Sonnet Computer use, Google Mariner, etc.

(preview of tables… pic.twitter.com/OOBgC3ddkU

— Tibor Blaho (@btibor91) January 20, 2025

So, it’s definitely still a work in progress. Imagine telling Operator to book you a flight to New York and it ends up sending you to Toronto instead. That could happen—though hopefully these kinks get worked out before a public release.

So, When Can We Expect It?

Sources like TechCrunch and The Information have hinted that OpenAI has been targeting January for an Operator release (or at least a research/developer preview). While nothing official has been announced, seeing these hidden settings pop up in the macOS app suggests we might be close.

Could it be delayed? Absolutely. AI tools of this caliber aren’t trivial. Plus, there’s talk that OpenAI wants to ensure the tool is robust and safe before unleashing it to the world.

Also Read:

Which Gemini Model to Choose – 1.5 Flash, 1.5 Pro, Deep Research, 2.0 Flash or Advanced
Now You Can Try Google’s Project Astra: Multimodal AI for Everyday Tasks
ChatGPT Expands “Work with Apps”: More Apps, Search, and Voice Integration
DeepSeek R1: Open-Source AI Reasoning Model That Beats OpenAI’s o1

What About Safety and Privacy Concerns?

Any AI tool that can control parts of your computer and make purchases on your behalf is bound to raise eyebrows. The rumor mill says that Operator’s lengthy development cycle might be tied to safety testing—and for good reason.

One of the leaked charts allegedly shows Operator performing well on safety evaluations designed to see if it can be tricked into doing something malicious, like searching for sensitive personal data. But, as with any advanced AI system, there’s always a risk.

Some experts worry that if Operator (and competing AI agents) get too powerful, they could be manipulated into nefarious tasks. That’s probably why OpenAI co-founder Wojciech Zaremba recently took a swipe at Anthropic for releasing their AI agent Computer Use, claiming it lacked proper safety mitigations.

A Word on macOS vs Windows

So far, all the major clues about Operator are coming from the macOS desktop app for ChatGPT. So what about Windows? Plenty of folks on social media have voiced concerns that the Windows app is getting less recognition, especially considering Microsoft is a huge investor in OpenAI.

Ravi Teja KNTS

Tech writer with over 4 years of experience at TechWiser, where he has authored more than 700 articles on AI, Google apps, Chrome OS, Discord, and Android. His journey started with a passion for discussing technology and helping others in online forums, which naturally grew into a career in tech journalism. Ravi's writing focuses on simplifying technology, making it accessible and jargon-free for readers. When he's not breaking down the latest tech, he's often immersed in a classic film – a true cinephile at heart.

Read Entire Article