AI researchers at Stanford and the University of Washington were able to train an AI “reasoning” model for under $50 in cloud compute credits, according to a new research paper released last Friday.
The model known as s1 performs similarly to cutting-edge reasoning models, such as OpenAI’s o1 and DeepSeek’s R1, on tests measuring math and coding abilities. The s1 model is available on GitHub, along with the data and code used to train it.
The team behind s1 said they started with an off-the-shelf base model, then fine-tuned it through distillation, a process to extract the “reasoning” capabilities from another AI model by training on its answers.
The researchers said s1 is distilled from one of Google’s reasoning models, Gemini 2.0 Flash Thinking Experimental. Distillation is the same approach Berkeley researchers used to create an AI reasoning model for around $450 last month.
To some, the idea that a few researchers without millions of dollars behind them can still innovate in the AI space is exciting. But s1 raises real questions about the commoditization of AI models.
Where’s the moat if someone can closely replicate a multi-million dollar model with relative pocket change?
Unsurprisingly, big AI labs aren’t happy. OpenAI has accused DeepSeek of improperly harvesting data from its API for the purposes of model distillation.
The researchers behind s1 were looking to find the simplest approach to achieve strong reasoning performance and “test-time scaling,” or allowing an AI model to think more before it answers a question. These were a few of the breakthroughs in OpenAI’s o1, which DeepSeek and other AI labs have tried to replicate through various techniques.
The s1 paper suggests that reasoning models can be distilled with a relatively small dataset using a process called supervised fine-tuning (SFT), in which an AI model is explicitly instructed to mimic certain behaviors in a dataset.
SFT tends to be cheaper than the large-scale reinforcement learning method that DeepSeek employed to train its competitor to OpenAI’s o1 model, R1.
Google offers free access to Gemini 2.0 Flash Thinking Experimental, albeit with daily rate limits, via its Google AI Studio platform.
Google’s terms forbid reverse-engineering its models to develop services that compete with the company’s own AI offerings, however. We’ve reached out to Google for comment.
S1 is based on a small, off-the-shelf AI model from Alibaba-owned Chinese AI lab Qwen, which is available to download for free. To train s1, the researchers created a dataset of just 1,000 carefully curated questions, paired with answers to those questions as well as the “thinking” process behind each answer from Google’s Gemini 2.0 Flash Thinking Experimental.
After training s1, which took less than 30 minutes using 16 Nvidia H100 GPUs, s1 achieved strong performance on certain AI benchmarks, according to the researchers. Niklas Muennighoff, a Stanford researcher who worked on the project, told TechCrunch he could rent the necessary compute today for about $20.
The researchers used a nifty trick to get s1 to double-check its work and extend its “thinking” time: they told it to wait. Adding the word “wait” during s1’s reasoning helped the model arrive at slightly more accurate answers, per the paper.
In 2025, Meta, Google, and Microsoft plan to invest hundreds of billions of dollars in AI infrastructure, which will partially go toward training next-generation AI models.
That level of investment may still be necessary to push the envelope of AI innovation. Distillation has shown to be a good method for cheaply recreating an AI model’s capabilities, but it doesn’t create new AI models vastly better than what’s available today.
Maxwell Zeff is a senior reporter at TechCrunch specializing in AI and emerging technologies. Previously with Gizmodo, Bloomberg, and MSNBC, Zeff has covered the rise of AI and the Silicon Valley Bank crisis. He is based in San Francisco. When not reporting, he can be found hiking, biking, and exploring the Bay Area’s food scene.