The opinions are only mine.
OpenAI releases a new model: o3, and o3 mini
OpenAI’s new model o3, which will be tested for public safety over the coming weeks, seems poised to break all benchmarks. In the Software Engineering (SWE-bench Verified) category, it scores 71.7, compared to o1’s score of 48.9. In the Competition Code (Codeforces) category, o3 achieves a score of 2727, while o1 scores 1891. The results in mathematics are even more impressive: in the Competition Math (AIME 2024) category, o3 scores 96.7, compared to o1’s score of 83.3. Furthermore, for PhD-level Science Questions (GPQA Diamond), o3 receives a score of 87.7, whereas o1 scores 78.0. Watch the full introduction here. With o1 being pre-released in September 2024 and o3 following in December 2024, reasoning models could address the plateauing challenges that LLMs have faced recently. You might be wondering why there has been no o2 model? It’s because of potential copyright and trademark conflicts with the telecom operator O2.
Is Colossus really up and running with 100’000 Nvidia H100 chips?
xAI is in the process of building the largest AI supercomputer, equipped with 100,000 Nvidia H100 chips. Elon Musk claims that it is already operational at full capacity. However, the available information suggests that he may be exaggerating: the data center currently lacks the necessary power capacity from the grid, and connecting all these GPUs to function as a “single unit” is not that easy with today’s networking technology. Some AI providers, such as OpenAI, have started raising concerns that xAI might have access to more GPU capacity than they do. Read xAI just raised another USD 6 billion, giving it a valuation of USD 35-40 billion. Read
Other news
Google has released its Gemini 2.0 Flash Thinking model, ranked #1 in reasoning capabilities (note: is that still true when I publish this?) and is free. Google’s AI Studio allows users to experiment with Gemini prompts using operators and code. DeepMind, which is part of Google, has also released a new benchmark for assessing the factuality of large language models. Meta is planning to integrate its video generator into Instagram early next year, enabling users to create personalized videos and has published Apollo, a new family of models that can understand and explain videos of up to one hour in length. OpenAI now offers a free 15-minute monthly phone call service to ChatGPT. Elon Musk said that “Grok 3.0 will be the most powerful A.I. in the world” and Peter Diamandis predicts that it will reach an IQ exceeding 140 in 2025.
Other readings
> OpenAI has an edge over Google in winning publishers’ business, read
> Data centers are consuming so much energy in the US that they may be distorting the normal flow of electricity for millions of Americans, read
> Nvidia Christmas’ presents, read
