Skip to content

Microsoft Strung Together Tens of Thousands of Chips in a Pricey Supercomputer for OpenAI

Now the software maker’s cloud technology supports AI products for the company and customers while it puts together a successor

To train a large language model, the computation workload is portioned across thousands of GPUs in a cluster linked together in a high throughput, low-latency network.
To train a large language model, the computation workload is portioned across thousands of GPUs in a cluster linked together in a high throughput, low-latency network.

Source: Microsoft

When Microsoft Corp. invested $1 billion in OpenAI in 2019, it agreed to build a massive, cutting-edge supercomputer for the artificial intelligence research startup. The only problem: Microsoft didn’t have anything like what OpenAI needed and wasn’t totally sure it could build something that big in its Azure cloud service without it breaking.

OpenAI was trying to train an increasingly large set of artificial intelligence programs called models, which were ingesting greater volumes of data and learning more and more parameters, the variables the AI system has sussed out through training and retraining. That meant OpenAI needed access to powerful cloud computing services for long periods of time.