PyTorch compilation for 64-bit Windows

Guys, here I go again! This time I’m going to do something different and get PyTorch compiled on Windows.

First, let me give you a brief introduction to PyTorch. PyTorch is a symbolic algorithm library developed and maintained by Facebook for building dynamic neural networks. Its code is concise, elegant, and has strong performance. For example, if we’re going to do vector operations in Theano or TensorFlow, we’ll define a tensor, then we’ll do calculations on the tensor, then we’ll define a function, and then we’ll call a function and we’ll pass in parameters and we’ll get the output. Sample code:

import theano
import theano.tensor as T
x = T.dmatrix('x')
s = 1 / (1 + T.exp(-x))
logistic = function([x], s)
logistic([[0.1], [- 1.2 -]])Copy the code

What if we use PyTorch, let’s write it like this

import torch
x = torch.FloatTensor([[0.1], [- 1.2 -]])
s = 1 / (1 + torch.exp(-x))Copy the code

You just have to define the variables, and you can do it. Is it more in line with our thinking?

Finally, I will quote one more sentence to promote the wave:

Matlab is so 2012.

Caffe is so 2013.

Theano is so 2014.

Torch is so 2015.

TensorFlow is so 2016. 😀

‏ –Andrej Karpathy

‏ It ‘s 2017 now.

Let’s get down to business and see how to install PyTorch under Windows.

As a friendly reminder, if you don’t want to mess around, Windows 10 users can play under WSL, but the disadvantage is that they can’t use GPU for computing acceleration. Or you can wait for the official release of the installation package. The following installation procedure is a test and is not guaranteed to be successful.

First we can find the related issue of the official REPO. One of the gods has done a great job for us, storing his code here. Of course you can just use my final code and build on it a little bit, but my code passed all CUDA unit tests and his didn’t.

First, we need to prepare the tools required for installation, including:

Visual Studio 2015 with Update 1 and above (not 2013,2017 for reasons I’ll explain below)
CMake
A BLAS library such as Openblas or Intel MKL
The source code for PyTorch is obtained from the address above
CUDA 7.5 and above
CUDNN 5.1.10 and above
Anaconda3 (Python version 3.5 and later)

The installation steps are as follows:

Install VS, CUDA, cuDNN, CMake, Anaconda. There’s not much to say about this, but as for why VS 2015 Update 1 and above is necessary, it’s a valuable lesson I learned after I hit the pits. VS 2013 has weak support for the C99 standard, VS 2017 does not yet support the CUDA 8.0 compiler, and the native VS 2015 reported a puzzling link error. Anaconda3 was chosen for compatibility with C99.
Add environment variables and add CMake and MSBuild paths to PATH. They’re in a path like this:
```
C:\Program Files\CMake\bin
C:\Program Files (x86)\MSBuild\14.0\Bin\amd64Copy the code
```

Navigate to torch\lib in the PyTorch code directory, create a new directory tmp_install, create a new directory lib under that directory, and drop all the blAS related libs into it. Then modify build_all.bat to locate the end and you’ll find this code

cmake .. /.. / % ~1 -G "Visual Studio 14 2015 Win64" ^
               -DCMAKE_MODULE_PATH=%BASE_DIR%/cmake/FindCUDA ^
               -DTorch_FOUND="1" ^
               -DCMAKE_INSTALL_PREFIX="%INSTALL_DIR%" ^
               -DCMAKE_C_FLAGS="%C_FLAGS%" ^
               -DCMAKE_SHARED_LINKER_FLAGS="%LINK_FLAGS%" ^
               -DCMAKE_CXX_FLAGS="%C_FLAGS% %CPP_FLAGS%" ^
               -DCUDA_NVCC_FLAGS="%BASIC_CUDA_FLAGS%" ^
               -DTH_INCLUDE_PATH="%INSTALL_DIR%/include" ^
               -DTH_LIB_PATH="%INSTALL_DIR%/lib" ^
               -DTH_LIBRARIES="%INSTALL_DIR%/lib/TH.lib" ^
               -DTHS_LIBRARIES="%INSTALL_DIR%/lib/THS.lib" ^
               -DTHC_LIBRARIES="%INSTALL_DIR%/lib/THC.lib" ^
               -DTHCS_LIBRARIES="%INSTALL_DIR%/lib/THCS.lib" ^
               -DTH_SO_VERSION=1 ^
               -DTHC_SO_VERSION=1 ^
               -DTHNN_SO_VERSION=1 ^
               -DTHCUNN_SO_VERSION=1 ^
               -DCMAKE_BUILD_TYPE=Release ^
               -DLAPACK_LIBRARIES="%INSTALL_DIR%/lib/mkl_rt.lib" -DLAPACK_FOUND=TRUECopy the code

The last line can be modified as appropriate, such as OpenBlas to openblas.lib; If you do not intend to use blAS, remove the last line.

Open a CMD window, navigate to the PyTorch code root, and execute the following code:
```
cd torch\lib
build_all.bat --with-cudaCopy the code
```
Then you can have tea, watch a movie, and have a long compile time.
Check to see if thpp.dll is included in torch\lib. If not, the compilation failed. Look at the previous output and think about what the problem might be.
If all goes well, let’s type the last two lines.
```
cd .. \.. python setup.py installCopy the code
```

If there are no errors, congratulations, the installation is successful. However, a few minor tweaks are required. Let’s start by finding cudart and CUDNN modules, which are usually in this location:

C: Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\bin\ CUDart64_80. DLL C: Program Files\NVIDIA GPU Computing Toolkit\CUDA\ v8.0bin \ cudNn64_6.dll C: Program Files NVIDIA GPU Computing Toolkit CUDA v8.0bin CUDNn64_5.dllCopy the code

Copy them to Lib\site-packages\torch\ Lib of Anaconda3

If you are using CUDNN V5, open __init__.py under Lib\ site-Packages \ Torch \backends\cudnn in Anaconda3. Change the _libcudnn function to:

def _libcudnn(a):
 global lib, __cudnn_version
 if lib is None:
     lib = ctypes.cdll.LoadLibrary("cudnn64_5")
     if hasattr(lib, 'cudnnGetErrorString'):
         lib.cudnnGetErrorString.restype = ctypes.c_char_p
         __cudnn_version = lib.cudnnGetVersion()
     else:
         lib = None
 return libCopy the code

With that, we have completed the PyTorch installation for 64-bit Windows. We can run MNIST to test it out:

from __future__ import print_function
import argparse
import time
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torch.autograd import Variable
from torch.backends import cudnn
if __name__ == '__main__':
 parser = argparse.ArgumentParser(description='PyTorch MNIST Example')
 parser.add_argument('--batch-size', type=int, default=64, metavar='N',
                     help='input batch size for training (default: 64)')
 parser.add_argument('--test-batch-size', type=int, default=1000, metavar='N',
                     help='input batch size for testing (default: 1000)')
 parser.add_argument('--epochs', type=int, default=10, metavar='N',
                     help='number of epochs to train (default: 10)')
 parser.add_argument('--lr', type=float, default=0.01, metavar='LR',
                     help='learning rate (default: 0.01)')
 parser.add_argument('--momentum', type=float, default=0.5, metavar='M',
                     help='the SGD momentum (default: 0.5)')
 parser.add_argument('--no-cuda', action='store_true', default=False,
                     help='disables CUDA training')
 parser.add_argument('--seed', type=int, default=1, metavar='S',
                     help='random seed (default: 1)')
 parser.add_argument('--log-interval', type=int, default=10000, metavar='N',
                     help='how many batches to wait before logging training status')
 args = parser.parse_args()
 args.cuda = not args.no_cuda and torch.cuda.is_available()

 print('Using CUDA:' + str(args.cuda))

 torch.manual_seed(args.seed)
 if args.cuda:
     torch.cuda.manual_seed(args.seed)

 class Net(nn.Module):
     def __init__(self):
         super(Net, self).__init__()
         self.conv1 = nn.Conv2d(1.10, kernel_size=5)
         self.conv2 = nn.Conv2d(10.20, kernel_size=5)
         self.conv2_drop = nn.Dropout2d()
         self.fc1 = nn.Linear(320.50)
         self.fc2 = nn.Linear(50.10)

     def forward(self, x):
         x = F.relu(F.max_pool2d(self.conv1(x), 2))
         x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
         x = x.view(- 1.320)
         x = F.relu(self.fc1(x))
         x = F.dropout(x, training=self.training)
         x = self.fc2(x)
         return F.log_softmax(x)

 model = Net()
 if args.cuda:
     model.cuda()

 # cudnn.enabled = False
 cudnn.benchmark = True

 kwargs = {'num_workers': 1.'pin_memory': True} if args.cuda else {}
 train_dataset = datasets.MNIST('.. /data', train=True, download=True, transform=transforms.Compose([
     transforms.ToTensor(),
     transforms.Normalize((0.1307,), (0.3081,))
 ]))
 test_dataset = datasets.MNIST('.. /data', train=False, transform=transforms.Compose([
     transforms.ToTensor(),
     transforms.Normalize((0.1307,), (0.3081,))
 ]))
 train_loader = torch.utils.data.DataLoader(
     train_dataset, batch_size=args.batch_size, shuffle=True, **kwargs)
 test_loader = torch.utils.data.DataLoader(
     test_dataset, batch_size=args.batch_size, shuffle=True, **kwargs)

 optimizer = optim.SGD(model.parameters(), lr=args.lr,
                       momentum=args.momentum)

 def train(epoch):
     model.train()

     for batch_idx, (data, target) in enumerate(train_loader):
         if args.cuda:
             data, target = data.cuda(), target.cuda()
         data, target = Variable(data), Variable(target)
         optimizer.zero_grad()
         output = model(data)
         loss = F.nll_loss(output, target)
         loss.backward()
         optimizer.step()

         if batch_idx % args.log_interval == 0:
             print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                       epoch, batch_idx *
                       len(data), len(train_loader.dataset),
                       100. * batch_idx / len(train_loader), loss.data[0]))

 def test(epoch):
     model.eval()
     test_loss = 0
     correct = 0
     for data, target in test_loader:
         if args.cuda:
             data, target = data.cuda(), target.cuda()
         data, target = Variable(data, volatile=True), Variable(target)
         output = model(data)
         test_loss += F.nll_loss(output, target).data[0]
         # get the index of the max log-probability
         pred = output.data.max(1) [1]
         correct += pred.eq(target.data).cpu().sum()

     test_loss = test_loss
     # loss function already averages over batch size
     test_loss /= len(test_loader)
     print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
         test_loss, correct, len(test_loader.dataset),
         100. * correct / len(test_loader.dataset)))

 for epoch in range(1, args.epochs + 1):
     train(epoch)
     test(epoch)Copy the code

Why is it necessary to use the main module in the outer layer? The problem with PyTorch’s Multi Processing library on Windows is that when the DataLoader is loaded, it will reopen the file using a different thread, causing a collision. Other basically no big problem, can be used normally. The operation of MNIST is measured as shown in the figure below, which is quite fast.

The home page
project

PyTorch compilation for 64-bit Windows

Related Posts

Java paging query, streaming query, Excel POI export

Distributed file storage fastDFS

Wait and wake up for multithreaded threads! Parse the wait() and notify() methods in the thread in detail