I tested ChatGPT's free new o3-mini model with 7 prompts to rate its problem-solving and reasoning capabilities — here’s what happened

4 hours ago 5

(Image credit: Shutterstock)

OpenAI's o3-mini model is now part of the free tier of ChatGPT, which lets users take full advantage of a significant advancement in AI, particularly for tasks requiring complex reasoning and problem-solving.

Building upon the foundation laid by its predecessors, the o3-mini model introduces enhanced capabilities that set it apart.

The o3 model excels in tasks that demand step-by-step logical reasoning. Essentially, o3-mini has a "private chain of thought" approach, planning and reasoning through tasks, then performing intermediate steps to assist in problem-solving. This method results in more accurate and reliable outputs, especially in complex scenarios.

The o3-mini is a streamlined version of the o3 model, offering higher rate limits and lower latency, making it a compelling choice for coding, STEM and logical problem-solving tasks. It replaces the o1-mini model in the ChatGPT interface, providing users with improved performance for free.

This accessibility allows a broader audience to benefit from the model's enhanced performance.

o3 scores a 2727 ELO on Codeforces which places it 175th in the global ranking. That‘s better than ~99.9% of humans on the website (who already tend to be far above average). pic.twitter.com/VGXeQ525nLDecember 20, 2024

Upgraded performance in coding and mathematics

In coding tasks, o3 has demonstrated exceptional proficiency. It achieved an Elo score of 2,727 on the Codeforces competitive programming platform, placing it among the top 2,500 programmers globally. Additionally, o3 scored 71.7% on the SWE-bench Verified benchmark, which assesses the ability to solve real-world software issues, outperforming its predecessor, o1, which scored 48.9%.

Additionally, o3 excels in scientific and mathematical benchmarks, achieving a score of 87.7% on the GPQA Diamond benchmark, which contains expert-level science questions not publicly available online. Furthermore, on the Abstraction and Reasoning Corpus for Artificial General Intelligence (ARC-AGI) benchmark, o3 attained three times the accuracy of o1, showcasing its advanced reasoning capabilities.

Prompts to try with o3-Mini

For those looking for ways to see how theo3-mini model truly shines, consider experimenting with the following queries or similar ones that explore coding, math, and STEM tasks. Here’s a look at what happened when I put the o3-mini model to the test with seven varying prompts.

Get instant access to breaking news, the hottest reviews, great deals and helpful tips.

1. Coding challenge

Prompt: "Write a Python script that simulates a basic banking system with functionalities to deposit, withdraw and check balance."

This prompt is excellent for testing o3-mini because it combines multiple aspects of programming—from OOP and control structures to input validation and error handling—into one cohesive example. It challenges the model to produce a complete, functional, and well-structured piece of software, which is a solid measure of its code generation capabilities.

The prompt is not only a test of code generation but also serves as a learning tool. It provides a concrete example that can help users understand how to design and implement basic banking functionality in Python. This dual purpose of being both a test case and an educational example makes it useful and simple enough for even casual users to understand and implement.

2. Mathematical proof

Prompt: "Prove the Pythagorean theorem using a geometric approach."

This prompt requires a blend of logical sequencing, mathematical rigor, clear communication, and integration of different types of reasoning. It demonstrates the o3-mini model's ability to handle complex, multi-faceted tasks as it successfully generated a clear and correct geometric proof of the Pythagorean theorem.

3. Scientific explanation

Prompt: "Explain the process of photosynthesis in detail."

The o3-mini model’s ability to cover a broad range of scientific concepts and recall, organize, and articulate that multi-step process is made evident in this prompt.

The logically organized, detailed response was clearly presented and flowed coherently. This prompt showcases the model’s ability to relay deep scientific knowledge and the ability to integrate interdisciplinary concepts into a cohesive explanation.

4. Historical analysis

Prompt: "Analyze the causes and effects of the French Revolution."

This prompt requires the integration of interdisciplinary historical knowledge, structured and coherent writing, and critical analysis of complex cause-and-effect relationships, making it an ideal prompt to test the o3-mini model’s ability to successfully generate accurate, detailed, and educationally valuable content on a multifaceted historical topic.

This prompt showcases how the o3-mini model can be used for educational or teaching purposes.

5. Literary critique

Prompt: "Provide a critical analysis of Shakespeare's 'Hamlet' focusing on its themes of madness and revenge."

The prompt requires a deep and critical analysis of Hamlet, focusing on multifaceted themes like madness and revenge. This tests the model’s ability to engage in high-level literary criticism, synthesizing various elements of the text to produce an insightful analysis.

This model successfully addressed the complex academic task and expertly produced a nuanced, well-supported argument about intricate themes in literature.

6. Philosophical discussion

Prompt: "Discuss the concept of utilitarianism and its implications in modern ethics."

By asking for both a discussion of utilitarianism as a concept and its implications in modern ethics, the prompt challenges the model to bridge historical philosophical theories with contemporary ethical issues. This demonstrates the model’s capacity to synthesize information across different time periods and contexts.

This, and prompts like it, test the abstract reasoning ability of the o3-mini. This prompt also highlights the model’s ability to do critical analysis, understand historical content, and the practical application – all of which are essential for generating an informative and nuanced response on complex ethical topics.

7. Urban planning

Prompt: "Design an integrated strategy to optimize urban transportation in a rapidly growing megacity. Your plan should address the following aspects.”This prompt effectively showcases the model’s problem-solving and complex reasoning abilities. The query requries an integrated, multifaceted solution that mirrors the challenges encountered in real-world scenarios, in this case, planning within an urban environment.

The prompt also dives deep into the o3-mini's ability to understand many "moving parts" including environmental science, technology, and socio-economics. Although I did not show the script of the model "thinking," it did take the time to thoughtfully process a response before offering a detailed, step-by-step plan and the rationale behind the solution.

Final thoughts

OpenAI's o3-mini model represents a significant advancement in AI, offering enhanced reasoning and problem-solving capabilities across various domains. Its integration into ChatGPT's free tier democratizes access to advanced AI tools, empowering users to tackle complex tasks with greater efficiency. By experimenting with diverse prompts, users can fully appreciate the model's versatility and potential.