Let's not hide it any more. Training for a hundred billion parameter model. Tell you

Abstract: The training of Pangu is based on “Centerm AI processor”, and at the same time with the help of “CANN heterogeneous computing architecture”, so that the hardware computing power can be fully released, greatly shortening the training time!

This article is shared from Huawei cloud community “training billions of parameter model magic weapon, Ascension CANN heterogeneous computing architecture ~”, author: technology torch bearer.

In April 2021, “Huawei Cloud Pangu Grand Model” became popular in the field of ARTIFICIAL intelligence.

If you ask: clearly clearly clearly understand white like him, but he just don’t say, white like who?

Your companion may hesitate for another three seconds, but Pangu can easily answer: Ming Ming!

Such as “Chinese word with different meaning” fast semantic recognition is only its minor skills.

With advanced language understanding and model generation ability, this Internet star was instantly labeled as “the most close to human Chinese understanding ability” and “the world’s largest Chinese language (NLP) pre-training model”.

Label is not white post, in the field of AI, great wisdom means big model, he behind the “hundred billion parameters”, “TB memory model” is absolutely his magic weapon to success!

Big models mean big data. Have you figured out how to train such a big model?

The training of Pangu is based on centerm AI processor and with the help of CANN heterogeneous computing architecture, the hardware computing power is fully released and the training time is greatly shortened.

What is CANN?

To improve user development efficiency and release the ultimate computing power of Centerm AI processors, Centerm CANN (Compute Architecture for Neural Networks) is a heterogeneous computing Architecture introduced by Huawei for AI scenarios. It supports the mainstream front-end framework in the industry on the top and shields the hardware differences of serialized chips on the bottom for users, so as to meet the demands of artificial intelligence application of users in all scenarios with rich software stack functions.

Currently CANN has been released to version 3.0, unified programming architecture, while supporting reasoning + training side, side, cloud full scenarios, to achieve three functions.

Enabling all scenarios: Supports mainstream AI frameworks and 14+ mainstream operating systems, enabling flexible deployment of various hardware forms and operating environments in all scenarios.

Ascend ComputingLanguage (AscendCL), a unified programming interface, screens the differences of underlying processors for developers, enabling developers to fully apply the whole series of Centerm chips + reasoning and training scenarios with only one SET of APIS.

Enable the ultimate performance: Through soft and hard co-optimization, affinity graph compilation technology, and more than 1200 high-performance operators to release the surge of ascension chip computing power.

CANN’s open capabilities

CANN provides developers with the whole process development experience of operator development, model development and application development, which can cover all scenarios.

Operator development

DSL language development interface: provides a set of memory-based development interface, instruction mapping and scheduling on the processor is automatically implemented. Developers only need to pay attention to the mathematical logic calculation of the operator, and do not need to understand the details of the hardware, to develop high-performance operators. According to statistics, more than 60% of operator development needs can be met.
TIK language development interface: provides a relatively complete programming language based on the visible Buffer inside the processor, developers can decide the size of the amount of data in and out, so as to give full play to the capabilities of the chip and improve the performance of the development operator.

Model development

Support multiple model development frameworks MindSpore, TensorFlow, PyTorch, ONNX, etc
Support for direct mapping and model development through the standardized Ascend IR(Intermediate Representation) interface to isolate differences between the upper framework

Application development

AscendCL provides a standard programming interface to improve APP programming efficiency

CANN’s hard core technology

High performance operator library: support 1200+ operators including TensorFlow, Pytorch, Mindspore, Onnx framework, developers can directly develop models based on the built-in operators.
Automatic fusion technology: it supports automatic fusion based on operator, subgraph and SCOPE, and dynamic DSL fusion, which can effectively reduce computing nodes, shorten computing time and instantly accelerate centerm AI processor.

Heterogeneous deployment and scheduling framework: Taking full advantage of the heterogeneous execution unit of Centerm chip, different computing tasks are assigned to the most appropriate computing engine to efficiently coordinate asynchronous flow and improve the overall efficiency of computing tasks.
Efficient memory lifecycle management algorithm: taking full account of memory reuse and data exchange efficiency to achieve resource and efficiency balance.
Preset the industry mainstream Model library: huawei rise Model Zoo with 100 + mainstream example code and corresponding tuning parameters in the Model, the reference implementation for developers shelves, detailed information, refer to: www.hiascend.com/software/mo…
High-performance graph sinking execution framework: sinks all computations to the chip, reduces the interaction time between the Host CPU and the chip, and enables high-performance training and reasoning.
High-performance dynamic graph scheduling: Supports the single operator execution framework based on asynchronous flow, supports flexible H2D and D2H interaction, and solves the high-performance operation problem of dynamic graph mode under PyTorch and other frameworks.
Industry-leading intelligent tuning: support intelligent tuning algorithms based on reinforcement learning, genetic algorithm, CostModel, etc., provide operator-level or graph-level tuning options, providing users with automatic extreme performance tuning experience.

CANN version 5.0 will bring you more imagination, for more information, please visit centerm community.

Click to follow, the first time to learn about Huawei cloud fresh technology ~

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Let’s not hide it any more. Training for a hundred billion parameter model. Tell you

What is CANN?

CANN’s open capabilities

CANN’s hard core technology

Let’s not hide it any more. Training for a hundred billion parameter model. Tell you

What is CANN?

CANN’s open capabilities

CANN’s hard core technology

Related Posts

PyTorch distributed (4)—— Distributed application concepts

WSL2 starts Jupyter and automatically accesses it using the browser of the host

Under the Windows platform to realize the Unity form collection push | | camera screen