The week on AI – December 29, 2024

The opinions are only mine.

OpenAI releases a new model: o3, and o3 mini

OpenAI’s new model o3, which will be tested for public safety over the coming weeks, seems poised to break all benchmarks. In the Software Engineering (SWE-bench Verified) category, it scores 71.7, compared to o1’s score of 48.9. In the Competition Code (Codeforces) category, o3 achieves a score of 2727, while o1 scores 1891. The results in mathematics are even more impressive: in the Competition Math (AIME 2024) category, o3 scores 96.7, compared to o1’s score of 83.3. Furthermore, for PhD-level Science Questions (GPQA Diamond), o3 receives a score of 87.7, whereas o1 scores 78.0. Watch the full introduction here. With o1 being pre-released in September 2024 and o3 following in December 2024, reasoning models could address the plateauing challenges that LLMs have faced recently. You might be wondering why there has been no o2 model? It’s because of potential copyright and trademark conflicts with the telecom operator O2.

Is Colossus really up and running with 100’000 Nvidia H100 chips?

xAI is in the process of building the largest AI supercomputer, equipped with 100,000 Nvidia H100 chips. Elon Musk claims that it is already operational at full capacity. However, the available information suggests that he may be exaggerating: the data center currently lacks the necessary power capacity from the grid, and connecting all these GPUs to function as a “single unit” is not that easy with today’s networking technology. Some AI providers, such as OpenAI, have started raising concerns that xAI might have access to more GPU capacity than they do. Read xAI just raised another USD 6 billion, giving it a valuation of USD 35-40 billion. Read

Other news

Google has released its Gemini 2.0 Flash Thinking model, ranked #1 in reasoning capabilities (note: is that still true when I publish this?) and is free. Google’s AI Studio allows users to experiment with Gemini prompts using operators and code. DeepMind, which is part of Google, has also released a new benchmark for assessing the factuality of large language models. Meta is planning to integrate its video generator into Instagram early next year, enabling users to create personalized videos and has published Apollo, a new family of models that can understand and explain videos of up to one hour in length. OpenAI now offers a free 15-minute monthly phone call service to ChatGPT. Elon Musk said that “Grok 3.0 will be the most powerful A.I. in the world” and Peter Diamandis predicts that it will reach an IQ exceeding 140 in 2025.

Other readings

> OpenAI has an edge over Google in winning publishers’ business, read
> Data centers are consuming so much energy in the US that they may be distorting the normal flow of electricity for millions of Americans, read
> Nvidia Christmas’ presents, read

The week on AI – December 22, 2024

The new Nvidia Blackwell chip appears to be encountering ongoing challenges. Following design flaws that delayed its release, the chip is now facing overheating issues, making the servers less reliable and reducing their performance. Nvidia has requested its suppliers to modify the design of the 72-chip racks multiple times, causing anxiety among customers about potential further delays. And delays may be worsened because large cloud providers need to customize the racks to fit into their vast cloud data centers. It seems Nvidia is facing the same challenges with the smaller 36-chip racks. In the meantime, customers have decided to buy more Hopper chips.

Nvidia becoming a cloud and AI software provider

Nvidia has been quietly building its own cloud and AI software business (Nvidia AI Enterprise) and is already close to generating USD 2 billion in revenues annually. This is not surprising when we know that all major cloud providers (e.g., Microsoft, AWS, Google) are developing their own AI chips to become less dependent on Nvidia. The AI Enterprise suite includes all the necessary tools and frameworks to accelerate AI developments and deployments, including but not limited to PyTorch and TensorFlow for deep learning, NVIDIA RAPIDS for data science, TAO for model optimization, industry-specific solutions, NVIDIA RIVA for speech AI and translation, and much more. But don’t be mistaken, Nvidia is still far behind the major cloud providers and will continue to operate Nvidia DGX, their AI supercomputer, on the infrastructure of its competitors. Does Nvidia have a hedge compared to other big tech firms due to its proximity to AI hardware? Some believe so. Nvidia still has a long way to go before becoming a cloud and AI software business provider, but it definitely has the means to succeed, and that could become another major revenue stream.

Apple moving in AI chips with Broadcom

Apple is working with Broadcom to develop its own AI chips for servers, aiming for mass production by 2026. These chips are expected to be used internally rather than entering the consumer market, highlighting Apple’s effort to reduce reliance on Nvidia and other competitors. This trend mirrors a broader industry shift, as many tech companies seek to create custom AI processors to cut their dependence on Nvidia. However, designing AI chips is a complex undertaking, and most firms continue to rely heavily on Nvidia, with Google being a notable exception. In most cases, tech companies collaborate with chip makers to leverage their intellectual property, design services, and manufacturing capabilities. The deal between Apple and Broadcom seems to be different from other deals; Apple is still managing chip production with TSMC (it seems). Read

Other readings

> A look at why the world’s powers are locked in a battle over computer chips. How will Europe continue to compete against China from an investment perspective? read
> Broadcom chief Hock Tan says AI spending frenzy to continue until end of decade, read
> Perplexity’s value triples to $9bn in latest funding round for AI search engine, read, read about Perplexity here

The week on AI – November 17, 2024

Are LLM reaching a plateau?

The reasoning capabilities of LLMs may be reaching a plateau, suggesting that the scaling laws might be hitting a limit. Scaling laws which are based on observations and are not proper laws (like the Moore Law), describe how machine learning models improve as a function of resource allocation, such as compute power, dataset size, or model parameters. Reports suggest that OpenAI’s upcoming model, Orion, is showing only modest improvements over GPT-4, falling short of the significant leaps seen in earlier model iterations. The industry is beginning to exhaust its data for training LLMs, and the legal disputes over copyright rights are escalating. Additionally, the use of synthetic data generated by AI presents its own set of challenges. In addition, computation power is not limitless, even in the cloud, and it brings limitations and hard decisions for LLM developers like OpenAI. The industry is working to overcome these challenges by developing new training approaches that align more closely with human thinking. This has already been used in the development of OpenAI’s o1 model.

Google DeepMind has a new way to look inside AI models

As previously discussed, we currently do not fully understand how AI operates. Google DeepMind has taken on this challenge by introducing Gemma Scope, which is a collection of open, sparse autoencoders (SAEs) aimed at providing insights into the internal workings of language models. This research falls under the category of mechanistic interpretability. To better control AI, we will need to further refine our approaches, balancing the need to reduce or eliminate undesirable behaviors—like promoting violence — without compromising the model’s overall knowledge. Additionally, removing undesirable knowledge is a complex task, particularly when it involves information that should not be widely disseminated (such as bomb-making instructions) as well as knowledge that may be incorrect [on the internet]. Mechanistic interpretability has the potential to enhance our understanding of AI, ensuring that it is both safe and beneficial. Read

Elevating AI-coding to the next level

In a crowded landscape filled with AI coding tools such as GitHub Copilot, Dodeium, Replit, and Tabnine, many of these options function primarily as coding assistants. Tessl aims to elevate AI-based coding to the next level. They envisions a future where software developers transition into roles more akin to architects or product managers, allowing artificial intelligence to handle the majority of the coding. Upon examining their proposal on their website, it seems that Tessl is not attempting to turn everyone into a developer (at least not yet). Their tool will still be targeted at developers but will empower them to define what they want to build and let the Tessl AI tool define the internal architecture of the solution and develop it. Let’s see how far they can push the concept. They have just raised another USD 100 million making them worth a reported USD 750 million. Read

Other readings

> Inside Elon Musk’s colossus supercomputer, watch (no content guarantee)
> Amazon to develop its own AI chips to take on Nvidia, read
> Nvidia’s message to global chipmakers, read
> A.I. Chatbots Defeated Doctors at Diagnosing Illness, Read

The week on AI – November 10, 2024

It’s too soon to call the hype on Artificial Intelligence

Predicting the future of technology has always been a challenge. It’s likely that optimists will face disappointment in the short term, while pessimists—some of whom are even predicting the end of humanity—may also end up being wrong. In other words, as per Amara’s law, we tend to overestimate the effect of a technology in the short run and underestimate the effect in the long run. New technologies often take decades to enhance productivity and often follow the J-curve pattern described by some economists. In this pattern, productivity initially declines before experiencing significant growth. As per Carlota Perez, this was true for the industrial revolution (1770), the steam and railway age (1830), and the electricity and engineering age (1870). Carlota Perez sees Artificial Intelligence as a revolutionary technology, not a technology revolution. AI depends on powerful microprocessors, computers, and the Internet. She argues that AI is better seen as a key development of the ICT (information-communication-technology) revolution that started in the 1970s. Read the whole essay here.

To fully exploit AI, new infrastructure will have to be built, new ways of working developed, and new products and services launched. But AI seems to be on a much faster trajectory than any technology in the past, so we might not have to wait for a decade. Read

ChatGPT is competing with Google for search

ChatGPT has introduced search capabilities that will compete with Google and startups like Perplexity. This search feature is directly accessible in the ChatGPT interface. The AI determines when to use the internet and when to rely on its internal knowledge, but users can prompt it to perform a web search. To further enhance its search capabilities, ChatGPT is also developing long-term memory functionalities. Currently, Google’s search results appear to be more accurate. Perplexity seems still better than ChatGPT in how it presents source references. The ChatGPT search feature is not yet available to users on the free plan but it should be available over the coming months. Competition for the search market is definitely on. Use

The battle for the AI stack

Programming GPUs has historically been complicated, but this changed with the release of Nvidia’s CUDA platform in 2006, which abstracts the GPU complexity from developers. CUDA is a general-purpose platform that allows C code to run on Nvidia GPUs, making it easier for programmers to utilize these powerful processors that are necessary for AI. Most AI engineers and researchers prefer using Python, often with libraries such as PyTorch or Google’s JAX. Under the hood, PyTorch operates on CUDA, which runs C code on the GPU. The industry is now exploring alternatives to CUDA: AMD has introduced its ROCm platform (Radeon Open Compute), and Google has released the XLA (Accelerated Linear Algebra) platform designed for its TPU (Tensor Processing Unit) chips. The key point here is that developing Artificial Intelligence relies not only on the chips created by Nvidia, AMD, and others but also on the software platforms that support these chips, which are equally crucial as the hardware. Nvidia is definitely ahead but the the CUDA platform is getting some serious competition. Read

Other readings

> TSMC to close door on producing advanced AI chips for China from Monday, read
> Salesforce to Hire 1,000 People for AI Product Sales Push, read
> Evident AI index for banks, read

The week on AI – October 27, 2024

Perplexity AI search start-up targets USD 8bn valuation

Perplexity AI is an AI-powered search engine that leverages large language to deliver fast, accurate answers to user queries. The company positions itself as a user-focused alternative to traditional search engines like Google, aiming to provide a more streamlined and informative search experience without relying on advertisements. Perplexity differentiates itself by offering concise summaries of search results with citations, enabling users to easily verify information and avoid the often overwhelming presence of sponsored content found on other platforms. Driven by the success of other AI ventures and the potential of AI-powered search, Perplexity is actively pursuing a new round of funding, aiming to raise between USD 500 million and USD 1 billion. This would increase its valuation to an impressive USD 8 billion, more than double its previous valuation of USD 3 billion in June. Perplexity’s current investors include prominent names like Nvidia, Jeff Bezos, Andrej Karpathy, Yann LeCun, and SoftBank’s Vision Fund 2, reflecting the strong belief in the company’s potential to disrupt the search engine landscape. While Perplexity’s annualized revenues have increased from USD 5 million in January to USD 35 million in August, the company is not yet turning a profit. This is largely due to the substantial operating costs associated with training and maintaining its advanced AI models. The expenses related to these models reportedly amount to “millions of dollars,” potentially creating a significant burn rate as the company strives to establish a sustainable business model. Perplexity’s reliance on venture capital funding underscores this financial challenge, as the company works to achieve profitability through subscriptions and other revenue streams. Read

Notebook LM from Google, to help with research and writing

Notebook LM (https://notebooklm.google.com) is a new tool from Google designed to help users quickly create content based on specified information. This means it utilizes only the information you provide and includes references to the sources, so that content can be quickly verified (eliminate hallucinations). It excels at writing and following instructions, which is not always the case with ChatGPT and other LLM models. Be sure to watch the video that explains Notebook LM. I have been experimenting for quite a few days now, and I must say that I love it.

The future of automation is [almost] here

Anthropic has released the “Computer Use” API, allowing developers to automate processes much like a human would use a computer. Although the AI is still in its early stages, slow, and not yet performing optimally, it will likely improve rapidly in the coming months. A demo is available on Anthropic’s website. Similar tools will be released in the coming weeks from companies like Microsoft, Asana, and Salesforce, among others. But it seems that Anthropic is ahead of the game on that one.

ARM CEO sees AI transforming the world much faster than the internet

Arm Holdings plc is a British semiconductor and software design company based in Cambridge, specializing in the architecture and licensing of central processing unit (CPU) technologies. Founded in 1990, Arm’s designs are integral to a wide range of devices, from smartphones to automotive systems. Rene Haas, the CEO of ARM, is optimistic about the future of AI and believes its evolution will be faster than that of the internet revolution. One of the main challenges ARM is facing is the need for more engineers. As AI continues to grow, we will require more energy, but it’s also crucial to develop more efficient chips. Read

Other readings

>Intel has tough choices to make to survive. Read
> OpenAI to release its latest model Orion before the end of 2024. Read But it does not seem for 2024 anymore! Read