Data collection class
Pytorch comes with its own data set
- Torchvision provides image data
- Torchtext provides textual data
import torch
importTorchvision # TorchVision. Datasets provides a way to download datasets, Replace MNIST with other names, MNIST is gray figure/IMDB movie comment text data # torchvision. Datasets. DatasetFolder load other train_set = data set torchvision.datasets.MNIST(root='./data', train=True, download=True)
test_set = torchvision.datasets.MNIST(root='./data', train=False, download=True)
Copy the code
Load the local dataset
data_path = r".. /.. /.. /"# r table this sentence is a string # complete data classclass MyDataset(Dataset) :def __init__(slef) :self.lines = open(data_path).readlines()
def __getitem__(self, index):
return self.lines[index]
def __len__(self):
return len(self.lines)
my_dataset = MyDataset()
Copy the code
Data loader
When training the data with a large amount of data, the whole data is often randomly disordered, processed into batches one by one, and preprocessed at the same time
# Batch shuffingfrom torch.utils.data import DataLoader
batch_size = 128# Number of parallel processing samples, will be128A sample is compressed into one for processing' 'Num_workers Number of threads that load data '' '
train_set = DataLoader(dataset=train_set, batch_size=batch_size, shuffle=True, num_workers=2)
print('afterLoader_train_set len=', len(train_set))
print('afterLoader_train_set type=', type(train_set))
Copy the code
To prepare data
Image data sets are in pil.image.image, trained through the model, processed by using Torchvision. transforms the data objects into a good Tensor, then normalized/regularized the data