Technical Editor: Mango Fruit! Editorial Department SegmentFault Report! Public Account: SegmentFault


It was him, the same man, the familiar leather jacket.

On the evening of May 14, Huang held the online launch of Nvidia GTC 2020 in his kitchen. Due to the Novel Coronavirus outbreak, Nvidia’s planned live events were canceled, and press releases scheduled for March 24 through the media were missing. Huang Renxun finally met everyone in front of the oven.

This year’s GTC has taken an unusual approach since the warm-up. Huang Renxun showed his new Ampere GPU Nvidia A100 out of the oven the day before the event.

The “world’s biggest” didn’t lie

Surprisingly, Nvidia didn’t bother to stream the event online, even though it couldn’t hold an offline event. Instead, the company made the announcement by streaming a video from Huang’s kitchen. As expected, there are “hard goods” in hand and they don’t care about the form.

Nvidia’s first Ampere Architecture GPU is “the strongest ever,” with 54 billion transistors and an 826mm² surface area based on a 7nm process. It’s a 20-fold improvement over the Volta Architecture and can do both training and reasoning.

The NVIDIA A100 has the TF32 third-generation Tensor Core, which improves AI performance at FP32 accuracy by 20 times to 19.5 Tbps without changing any code.

Multi-instance GPU-MG can split a single A100 GPU into seven independent GPUs, providing different computing power depending on the task, and achieving optimal utilization and maximum ROI.

The Nvidia A100’s new efficiency technology takes advantage of the inherent sparsity of AI mathematics and doubles performance when optimized.

Nvidia summarizes the Nvidia A100’s features as follows:

1. More than 54 billion transistors, making it the world’s largest 7-nanometer processor;

2. Third-generation Tensor Core with TF32, a new numerical format that accelerates single-precision AI training out of the box. Nvidia’s widely used Tensor Core is now more flexible, faster, and easier to use;

3. Structural sparsity acceleration, which is a new and efficient technology that takes advantage of the inherent sparsity of AI mathematics to achieve higher performance;

4. Multi-instance GPUs (or MIGs) that allow an A100 to be split into up to seven independent GPUs, each of which has its own resources;

5. The third generation of NVLink technology doubles the high-speed connection capacity between GPUs, so that multiple A100 servers can act as a giant GPU.

“The breakthrough design of the Ampere architecture provides Nvidia’s eighth-generation GPU with the biggest performance leap to date, integrating AI training and reasoning, and delivering up to 20 times more performance than its predecessor,” Huang said. For the first time ever, it is possible to accelerate both horizontal and vertical scaling loads on a single platform. “The A100 will reduce data center costs while increasing throughput.”

The Nvidia A100 is the first GPU based on Nvidia’s ampere architecture, offering the largest performance boost in Nvidia’s eight generations of GPUs. It is also available for data analytics, scientific computing, and cloud graphics, and is in full production and shipping to customers worldwide.

Eighteen of the world’s leading service providers and system builders are integrating the Nvidia A100 into their services and products, They include Alibaba Cloud, AWS, Baidu Cloud, Cisco, Dell Technologies, Google Cloud, HPE, Microsoft Azure and Oracle.

The AI computing power of a single node is 5 PFLOPS, and 140 DGXA100 units constitute DGXSuper Pod

Huang also introduced the DGX-A100 AI, a third-generation AI system based on the Nvidia A100. The DGX-A100 AI is the world’s first single-node AI server capable of 5 PFLOPS. Each DGX A100 can be split into up to 56 independently running instances. It also incorporates 8 NVIDIA A100 GPUs. Each GPU supports a 12-channel NVLINK interconnection bus.

Compared with other high-end CPU servers, DGXA100’s AI computing performance is 150 times higher, memory bandwidth is 40 times higher, and IO bandwidth is 40 times higher.

“AI has been applied to cloud computing, automotive, retail, healthcare and many other fields, and AI algorithms are becoming more and more complex and diverse,” Huang said. The power demand for the RESNET model has increased 3,000-fold since 2016 and we need a better solution.”

The powerful DGX-A100 AI doesn’t come cheap either, with a price tag of $199,000, or 1.41 million yuan.

Huang also pointed to Nvidia’s next-generation DGXSuper Pod cluster, which consists of 140 DGXA100 systems and produces 700 Petaflops of AI, equivalent to the performance of thousands of servers.

The first batch of DGXSuper Pods will be deployed at the U.S. Department of Energy’s Argonne National Laboratory for novel coronavirus related research.

Five major software and hardware are in place, and cooperation on autonomous driving platform has been finalized

In addition to those two blockbuster products, Huang also announced the launch of Nvidia Merlin, an end-to-end framework for building next-generation recommendation systems that are fast becoming the engine for a more personalized Internet. Merlin reduced the time required to create a recommendation system for a 100-terabyte dataset from four days to 20 minutes.

The Mellanox ConnectX-6 LX SmartNIC for Ethernet, the EGX Edge AI platform, and a number of software update extensions are among the products Nvidia is launching in the AI space.

Mellanox ConnectX-6 LX SmartNIC

The ConnectX-6 LX is the industry’s first secure smart network card optimized for 25Gb/s, available with two 25Gb/s ports or one 50Gb/s port.

2.EGX Edge AI Platform

The EGX Edge AI platform is the first Edge AI product based on Nvidia’s ampere architecture, capable of receiving up to 200Gbps of data and sending it directly to GPU memory for AI or 5G signal processing.

3. The Spark 3.0

Nvidia also announced support for Nvidia GPU acceleration on Spark 3.0, which is based on Rapids and breaks the performance benchmark for extracting, transforming and loading data. It has helped Adobe Intelligent Services reduce computing costs by 90%.

4.NVIDIA Jarvis

At the event, Huang detailed Nvidia Jarvis, a new end-to-end platform that leveries the power of Nvidia’s AI platform to create real-time, multimodal, conversational AI.

5. AI Misty interaction

In the live demonstration, an AI system called Misty demonstrated the interactive process of understanding and answering a series of complex questions about the weather in real time.

For autonomous driving, Nvidia has also built the Ampex architecture into its new Nvidia Drive platform. It is understood that Ma Zhixing, Faraday Future and other autonomous driving enterprises have announced the adoption of Nvidia Drive AGX computing platform.

Nvidia’s Nvidia Isaac software-defined robot platform will also be used in BMW Group plants. The Nvidia Robotics global ecosystem encompasses the distribution, retail, autonomous mobile robots, agriculture, services, logistics, manufacturing, and healthcare industries.

Nvidia’s AI ecological layout, China’s AI chip research and development gap with developed countries is narrowing

Nvidia’s three-year launch was full of good faith, with the first Ampere architecture being a surprise, and the 20-fold performance improvement of the Nvidia A100 GPU being a performance leap.

Although the event was not broadcast live, it was still explosive. The DGX-A100 AI, which is better than a thousand units, also confirms Huang’s famous saying that “the more you buy, the more you earn.” Nvidia’s AI solutions have covered all walks of life, and a powerful AI ecosystem is emerging.

Ni Guangnan, an academician at the Chinese Academy of Engineering, said: “The barrier to chip design is very high, and only a very few companies can afford to develop mid – to high-end chips, which also limits chip innovation.”

Nvidia’s Apex architecture and a range of AI platforms based on it at GTC show the strength of an AI chip giant and set the performance benchmark once again.

According to the forecast data of Gartner, in the next 5 years, the global market size of artificial intelligence chips will soar, rising from US $4.27 billion in 2018 to US $34.3 billion, an increase of more than 7 times, which shows that there is a large growth space for the market of AI chips.

While there is still a gap in AI chip research and development in China compared with developed countries in the West, Chinese AI chip start-ups have received hundreds of millions of dollars in funding in the past two years. Companies like Huawei have also developed impressive chip designs.

But the complexity of chip development, a shortage of talent in China and the absence of many of the top 15 Chinese semiconductor companies by global sales suggest that China still needs to make significant progress before it can match the US in semiconductors.