Welcome to personal blogAlex Chiu’s learning space


Apex is an open source NVIDIA module for implementing mixed precision training under the PyTorch framework, making it easy to do FP16 training.

This repository holds NVIDIA-maintained utilities to streamline mixed precision and distributed training in Pytorch. Some of the code here will be included in upstream Pytorch eventually. The intention of Apex is to make up-to-date utilities available to users as quickly as possible.

The API address is nvidia.github. IO /apex

A pit stepped in during installation

I encountered some problems in the process of compiling and installing APEX and resolved them by checking issues.

A segmentation fault was encountered during use

Conda install -c psi4 gcc-5 can be used to install gCC5, see github.com/NVIDIA/apex…

Glibcxx_3.4.20 ‘not found

Try finding path_to_anaconda3/lib/libstdc++.so.6 and connect to the apex referenced PATH, or add your own lib PATH.

If you encounter an error related to FusedLayerNorm

Probably with cudA-free extensions, yes

Try a full pip uninstall apex, then cd apex_repo_dir; rm-rf build; python setup.py install –cuda_ext –cpp_ext and see if the segfault persists.”

Refer to https://github.com/huggingface/pytorch-pretrained-BERT/issues/284

Pit when used

AttributeError: ‘NoneType’ object has no attribute ‘contiguous’

Models have layers(weights) that are useless (example: github.com/FDecaYed/py…) If gradients of weights are none, AttributeError is reported. ‘NoneType’ object has no attribute ‘contiguous “error, you can refer to https://github.com/NVIDIA/apex/issues/131

Solution: 1. Modify apex source code to determine whether gradient is None. 2. Change the model to get rid of the weights, the second method is better, or wait for the APEX update.

p.type().is_cuda() ASSERT FAILED at csrc/fused_adam_cuda.cpp:12

This error is my own, model.cuda() should be before FusedAdam’s declaration, otherwise this error will be reported.

cuda runtime error (77) : an illegal memory access

My problem is solved by using the issue method, github.com/NVIDIA/apex… We don’t know why yet.