PyTorch by deeplizard Part 2: Neural Networks and Deep Learning with PyTorch


这个是第二部分笔记,通过一个图像分类的例子,介绍如何使用PyTorch构建网络,如何简化训练代码。

Section 1: Data and Data Processing

Data in Deep Learning (Important) - Fashion MNIST for Artificial Intelligence

Why study a dataset?

Data focused considerations:

  • Who created the dataset?
  • How was the dataset created?
  • What transformations were used?
  • What intent does the dataset have?
  • Possible unintentional consequences?
  • Is the dataset biased?
  • Are there ethical issues with the dataset?

What is Fashion-MNIST?

Fashion-MNIST 的提出是想取代MNIST, github link

Fashion-MNIST, the dataset has the following ten classes of fashion items:

Index Label
0 T-shirt/top
1 Trouser
2 Pullover
3 Dress
4 Coat
5 Sandal
6 Shirt
7 Sneaker
8 Bag
9 Ankle boot

As we have seen in a previous post, a sample of the items look like this:

img

我们使用torchvision来处理这个数据集。

CNN Image Preparation Code Project - Learn to Extract, Transform, Load (ETL)

The project (Bird’s-eye view):

  • Prepare the data
  • Build the model
  • Train the model
  • Analyze the model’s results

这一节的内容属于Prepare the data。

The ETL process

In this post, we’ll kick things off by preparing the data. To prepare our data, we’ll be following what is loosely known as an ETL process.

  • Extract data from a data source.
  • Transform data into a desirable format.
  • Load data into a suitable structure.

Imports

PyTorch imports

We begin by importing all of the necessary PyTorch libraries.

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F

import torchvision
import torchvision.transforms as transforms

This table describes the of each of these packages:

Package Description
torch The top-level PyTorch package and tensor library.
torch.nn A subpackage that contains modules and extensible classes for building neural networks.
torch.optim A subpackage that contains standard optimization operations like SGD and Adam.
torch.nn.functional A functional interface that contains typical operations used for building neural networks like loss functions and convolutions.
torchvision A package that provides access to popular datasets, model architectures, and image transformations for computer vision.
torchvision.transforms An interface that contains common transforms for image processing.

Other imports

The next imports are standard packages used for data science in Python:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.metrics import confusion_matrix
#from plotcm import plot_confusion_matrix

import pdb

torch.set_printoptions(linewidth=120)

Note that pdb is the Python debugger and the commented import is a local file that we’ll introduce in future posts for plotting the confusion matrix, and the last line sets the print options for PyTorch print statements.

Preparing our data using PyTorch

Our ultimate goal when preparing our data is to do the following (ETL):

  1. Extract – Get the Fashion-MNIST image data from the source.
  2. Transform – Put our data into tensor form.
  3. Load – Put our data into an object to make it easily accessible.

For these purposes, PyTorch provides us with two classes:

Class Description
torch.utils.data.Dataset An abstract class for representing a dataset.
torch.utils.data.DataLoader Wraps a dataset and provides access to the underlying data.

An abstract class is a Python class that has methods we must implement, so we can create a custom dataset by creating a subclass that extends the functionality of the Dataset class.

To create a custom dataset using PyTorch, we extend the Dataset class by creating a subclass that implements these required methods. Upon doing this, our new subclass can then be passed to the a PyTorch DataLoader object.

We will be using the fashion-MNIST dataset that comes built-in with the torchvision package, so we won’t have to do this for our project. Just know that the Fashion-MNIST built-in dataset class is doing this behind the scenes.

All subclasses of the Dataset class must override __len__, that provides the size of the dataset, and __getitem__, supporting integer indexing in range from 0 to len(self) exclusive.

Specifically, there are two methods that are required to be implemented. The __len__ method which returns the length of the dataset, and the __getitem__ method that gets an element from the dataset at a specific index location within the dataset.

PyTorch torchvision package

This is the case for PyTorch. The PyTorch FashionMNIST dataset simply extends the MNIST dataset and overrides the urls.

Here is the class definition from PyTorch’s torchvision source code:

class FashionMNIST(MNIST):
    """`Fashion-MNIST <https://github.com/zalandoresearch/fashion-mnist>`_ Dataset.

    Args:
        root (string): Root directory of dataset where ``processed/training.pt``
            and  ``processed/test.pt`` exist.
        train (bool, optional): If True, creates dataset from ``training.pt``,
            otherwise from ``test.pt``.
        download (bool, optional): If true, downloads the dataset from the internet and
            puts it in root directory. If dataset is already downloaded, it is not
            downloaded again.
        transform (callable, optional): A function/transform that  takes in an PIL image
            and returns a transformed version. E.g, ``transforms.RandomCrop``
        target_transform (callable, optional): A function/transform that takes in the
            target and transforms it.
    """
    urls = [
        'http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz',
        'http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz',
        'http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz',
        'http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz',
    ]

Let’s see now how we can take advantage of torchvision.

如何使用torchvision下载Fashion MNIST

import torch
import torchvision
import torchvision.transforms as transforms


# 下载
train_set = torchvision.datasets.FashionMNIST(
    root='./data'
    ,train=True
    ,download=True
    ,transform=transforms.Compose([
        transforms.ToTensor()
    ])
)

# mini-batch
train_loader = torch.utils.data.DataLoader(train_set
    ,batch_size=1000
    ,shuffle=True
)

We specify the following arguments:

Parameter Description
root The location on disk where the data is located.
train If the dataset is the training set
download If the data should be downloaded.
transform A composition of transformations that should be performed on the dataset elements.

Since we want our images to be transformed into tensors, we use the built-in transforms.ToTensor() transformation, and since this dataset is going to be used for training, we’ll name the instance train_set.

To create a DataLoader wrapper for our training set, we do it like this:

train_loader = torch.utils.data.DataLoader(train_set
    ,batch_size=1000
    ,shuffle=True
)

We just pass train_set as an argument. Now, we can leverage the loader for tasks that would otherwise be pretty complicated to implement by hand:

  • batch_size (1000 in our case)
  • shuffle (True in our case)
  • num_workers (Default is 0 which means the main process will be used)

ETL summary

From an ETL perspective, we have achieved the extract, and the transform using torchvision when we created the dataset:

提取-转换-加载

  1. Extract – The raw data was extracted from the web.
  2. Transform – The raw image data was transformed into a tensor.
  3. Load – The train_set wrapped by (loaded into) the data loader giving us access to the underlying data.

Now, we should have a good understanding of the torchvision module that is provided by PyTorch, and how we can use Datasets and DataLoaders in the PyTorch torch.utils.data package to streamline ETL tasks.

PyTorch Datasets and DataLoaders - Training Set Exploration for Deep Learning and AI

  • Prepare the data
  • Build the model
  • Train the model
  • Analyze the model’s results

这部分也属于Prepare the data,在notebooks里有写。

Section 2: Neural Networks and PyTorch Design

Build PyTorch CNN - Object Oriented Neural Networks

  • Prepare the data
  • Build the model
  • Train the model
  • Analyze the model’s results

开始构建模型

Prerequisites:OOP

这部分可以看https://docs.python.org/3/tutorial/classes.html

PyTorch’s torch.nn package

想要创建网络的话,必须继承torch.nn

Each layer in a neural network has two primary components:

A transformation (code)
A collection of weights (data)

PyTorch nn.Modules have a forward() method

另外我们还要单独实现forward()函数,因为 When we pass a tensor to our network as input, the tensor flows forward though each layer transformation until the tensor reaches the output layer.

The goal of the overall transformation is to transform or map the input to the correct prediction output class, and during the training process, the layer weights (data) are updated in such a way that cause the mapping to adjust to make the output closer to the correct prediction.

PyTorch’s nn.functional package

当我们实现forward()方法的时候,我们会用到nn.functional包。

Building a neural network in PyTorch

构建网络的步骤

Short version:

  1. Extend the nn.Module base class.
  2. Define layers as class attributes.
  3. Implement the forward() method.

More detailed version:

  1. Create a neural network class that extends the nn.Module base class.
  2. In the class constructor, define the network’s layers as class attributes using pre-built layers from torch.nn.
  3. Use the network’s layer attributes as well as operations from the nn.functional API to define the network’s forward pass.

Extending PyTorch’s nn.Module class

下面是一个Python式的网络:

class Network:
    def __init__(self):
        self.layer = None

    def forward(self, t):
        t = self.layer(t)
        return t

需要更改两个地方,将其变为一个Pytorch的网络模型:

class Network(nn.Module): # line 1
    def __init__(self):
        super().__init__() # line 3
        self.layer = None

    def forward(self, t):
        t = self.layer(t)
        return t

These changes transform our simple neural network into a PyTorch neural network because we are now extending PyTorch’s nn.Module base class.

With this, we are done! Now we have a Network class that has all of the functionality of the PyTorch nn.Module class.

Define the network’s layers as class attributes

上面的视线中,属于部分的layer是dumy layer,下面我们用nn库中的layer来替换。

class Network(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)
        self.conv2 = nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)

        self.fc1 = nn.Linear(in_features=12 * 4 * 4, out_features=120)
        self.fc2 = nn.Linear(in_features=120, out_features=60)
        self.out = nn.Linear(in_features=60, out_features=10)

    def forward(self, t):
        # implement the forward pass
        return t

这些layer都是在__init__()里定义的,所以都是属性。

We have two convolutional layers, self.conv1 and self.conv2, and three linear layers, self.fc1, self.fc2, self.out.

We used the abbreviation fc in fc1 and fc2 because linear layers are also called fully connected layers. They also have a third name that we may hear sometimes called dense. So linear, dense, and fully connected are all ways to refer to the same type of layer. PyTorch uses the word linear, hence the nn.Linear class name.

We used the name out for the last linear layer because the last layer in the network is the output layer.

CNN Layers - PyTorch Deep Neural Network Architecture

CNN Layer Parameters

Parameter vs Argument

首先区分一下二者,We’ll parameters are used in function definitions as place-holders while arguments are the actual values that are passed to the function. The parameters can be thought of as local variables that live inside a function.

即parameter是占位符,而argument是实际传入的参数值。

Two types of parameters

  1. Hyperparameters
  2. Data dependent hyperparameters

不同的layer有不同的参数:

  • Convolutional layers
    • in_channels
    • out_channels
    • kernel_size
  • Linear layers
    • in_features
    • out_features

属于Hyperparameters的部分

For building our CNN layers, these are the parameters we choose manually.

  • kernel_size
  • out_channels
  • out_features

This means we simply choose the values for these parameters. In neural network programming, this is pretty common, and we usually test and tune these parameters to find values that work best.

Parameter Description
kernel_size Sets the filter size. The words kernel and filter are interchangeable.
out_channels Sets the number of filters. One filter produces one output channel.
out_features Sets the size of the output tensor.

属于Data dependent hyperparameters的部分:

Data dependent hyperparameters are parameters whose values are dependent on data. The first two data dependent hyperparameters that stick out are the in_channels of the first convolutional layer, and the out_features of the output layer.

加粗的部分都是取决于data的:

img

至于为什么fc1的in_features是12*4*4?12的话是前一层的输出,那么4*4是什么?

Summary of layer parameters

We’ll learn more about the inner workings of our network and how our tensors flow through our network when we implement our forward() function. For now, be sure to check out this table that describes each of the parameters, to make sure you can understand how each parameter value is determined.

self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)
self.conv2 = nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)

self.fc1 = nn.Linear(in_features=12 * 4 * 4, out_features=120)
self.fc2 = nn.Linear(in_features=120, out_features=60)
self.out = nn.Linear(in_features=60, out_features=10)

out_channels代表的是filter的数量,而kernel_size代表的是filter的大小。

Layer Param name Param value The param value is
conv1 in_channels 1 the number of color channels in the input image.
conv1 kernel_size 5 a hyperparameter.
conv1 out_channels 6 a hyperparameter.
conv2 in_channels 6 the number of out_channels in previous layer.
conv2 kernel_size 5 a hyperparameter.
conv2 out_channels 12 a hyperparameter (higher than previous conv layer).
fc1 in_features 1244 the length of the flattened output from previous layer.
fc1 out_features 120 a hyperparameter.
fc2 in_features 120 the number of out_features of previous layer.
fc2 out_features 60 a hyperparameter (lower than previous linear layer).
out in_features 60 the number of out_channels in previous layer.
out out_features 10 the number of prediction classes.

CNN Weights - Learnable Parameters in PyTorch Neural Networks

Hyperparameter values are chosen arbitrarily. 比起手选的超参数,我们更在乎那些可训练的参数。

Acccessing network

class Network(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)
        self.conv2 = nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)

        self.fc1 = nn.Linear(in_features=12 * 4 * 4, out_features=120)
        self.fc2 = nn.Linear(in_features=120, out_features=60)
        self.out = nn.Linear(in_features=60, out_features=10)

    def forward(self, t):
        # implement the forward pass
        return t

network = Network()
print(network)
Network(
  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(6, 12, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=192, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=60, bias=True)
  (out): Linear(in_features=60, out_features=10, bias=True)
)

如果不继承nn.Module的话,就无法通过print看到整个网络了。这一节提到了__repr____str__ 的区别。这部分就不展开了,我会写相关的博客。

Accessing the Network’s Layers

network.conv1
Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))

Accessing the Layer Weights

可以看到weight里包含的是Parameter,这个Parameter class说明这些参数值都是learnable的。

network.conv1.weight
Parameter containing:
tensor([[[[ 0.0616, -0.0828,  0.0560, -0.0340, -0.1634],
          [ 0.1900,  0.1003, -0.0214,  0.0958, -0.1765],
          [-0.1113,  0.1540, -0.1648,  0.0428, -0.0812],
          [-0.0361, -0.1382,  0.0210,  0.0763,  0.0703],
          [-0.0094,  0.0644, -0.0815,  0.0683,  0.0623]]],


        [[[ 0.0086, -0.1131, -0.0345,  0.1765, -0.0229],
          [-0.1940, -0.0665, -0.1150,  0.1063, -0.0778],
          [-0.0028, -0.1502, -0.1974, -0.1888, -0.1605],
          [ 0.1491, -0.0180,  0.0527,  0.0573, -0.1875],
          [ 0.0347,  0.0860,  0.0721, -0.0996,  0.1350]]],
......

Accessing the Networks Parameters

for name, param in network.named_parameters():
    print(name, '\t\t', param.shape)

conv1.weight          torch.Size([6, 1, 5, 5])
conv1.bias          torch.Size([6])
conv2.weight          torch.Size([12, 6, 5, 5])
conv2.bias          torch.Size([12])
fc1.weight          torch.Size([120, 192])
fc1.bias          torch.Size([120])
fc2.weight          torch.Size([60, 120])
fc2.bias          torch.Size([60])
out.weight          torch.Size([10, 60])
out.bias          torch.Size([10])

Callable Neural Networks - Linear Layers in Depth

这部分的讲解实在太棒了,利用debugger详细介绍了__call__()是如何调用forward()函数的。

CNN Forward Method - PyTorch Deep Learning Implementation

Neural Network Programming Series (Recap)

So far in this series, we’ve prepared our data, and we’re now in the process of building our model.

We created our network by extending the nn.Module PyTorch base class, and then, in the class constructor, we defined the network’s layers as class attributes. Now, we need to implement our network’s forward() method, and then, finally, we’ll be ready to train our model.

  • Prepare the data
  • Build the model
    1. Create a neural network class that extends the nn.Module base class.
    2. In the class constructor, define the network’s layers as class attributes.
    3. Use the network’s layer attributes as well nn.functional API operations to define the network’s forward pass.
  • Train the model
  • Analyze the model’s results

Summary

def forward(self, t):
    # (1) input layer
    t = t

    # (2) hidden conv layer
    t = self.conv1(t)
    t = F.relu(t)
    t = F.max_pool2d(t, kernel_size=2, stride=2)

    # (3) hidden conv layer
    t = self.conv2(t)
    t = F.relu(t)
    t = F.max_pool2d(t, kernel_size=2, stride=2)

    # (4) hidden linear layer
    t = t.reshape(-1, 12 * 4 * 4)
    t = self.fc1(t)
    t = F.relu(t)

    # (5) hidden linear layer
    t = self.fc2(t)
    t = F.relu(t)

    # (6) output layer
    t = self.out(t)
    #t = F.softmax(t, dim=1)

    return t

Implementing the forward() method

Input layer #1

这一层虽然不加也行,但是为了完整性,加上比较好

Hidden convolutional layers: Layers #2 and #3

Sometimes we may see pooling operations referred to as pooling layers. Sometimes we may even hear activation operations called activation layers.

However, what makes a layer distinct from an operation is that layers have weights. Since pooling operations and activation functions do not have weights, we will refer to them as operations and view them as being added to the collection of layer operations.

layer是包含weights的,所以对于max-pooling和activation,我们称他们为operation。

Hidden linear layers: Layers #4 and #5

# (4) hidden linear layer
t = t.reshape(-1, 12 * 4 * 4)
t = self.fc1(t)
t = F.relu(t)

# (5) hidden linear layer
t = self.fc2(t)
t = F.relu(t)

之前的内容里,我们讲了12代表12个output channel,但是没有讲 4*4 是什么意思。其实4*4代表的是height和width。The height and width dimensions have been reduced from 28 x 28 to 4 x 4 by the convolution and pooling operations. 至于是如何做到reduction的,在下一节课讲解。

Output layer #6

# (6) output layer
t = self.out(t)
#t = F.softmax(t, dim=1)

we won’t use softmax() because the loss function that we’ll use, F.cross_entropy(), implicitly performs the softmax() operation on its input, so we’ll just return the result of the last linear transformation. 因为最后一层外界的catigorical cross entropy里包含了softmax

CNN Image Prediction with PyTorch - Forward Propagation Explained

Build the model

  • Understanding forward pass transformations

不需要更新weights的时候,可以关闭计算梯度的功能,可以减少内存的占用。

> torch.set_grad_enabled(False) 
<torch.autograd.grad_mode.set_grad_enabled at 0x17c4867dcc0>
# pytorch imports
import torch
import torch.nn as nn
import torch.nn.functional as F

import torchvision
import torchvision.transforms as transforms


class Network(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)
        self.conv2 = nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)

        self.fc1 = nn.Linear(in_features=12 * 4 * 4, out_features=120)
        self.fc2 = nn.Linear(in_features=120, out_features=60)
        self.out = nn.Linear(in_features=60, out_features=10)

    def forward(self, t):
        # 把上一节的函数写成更简洁的形式
        t = F.relu(self.conv1(t))
        t = F.max_pool2d(t, kernel_size=2, stride=2)

        t = F.relu(self.conv2(t))
        t = F.max_pool2d(t, kernel_size=2, stride=2)

        t = self.fc1((t.reshape(-1, 12 * 4 * 4)))
        t = F.relu(self.fc2(t))
        t = self.out(t)

        return t

# 不需要更新weights的时候,可以关闭计算梯度的功能,可以减少内存的占用
torch.set_grad_enabled(False)

# 读取数据
train_set = torchvision.datasets.FashionMNIST(
    root='./data'
    ,train=True
    ,download=True
    ,transform=transforms.Compose([
        transforms.ToTensor()
    ])
)


# 读取一个数据
sample = next(iter(train_set))
image, label = sample
print(image.shape) # torch.Size([1, 28, 28])

# image.unsqueeze(0).shape, torch.Size([1, 1, 28, 28])


pred = network(image.unsqueeze(0))
print(pred)
# tensor([[ 0.0620, -0.0410, -0.0693,  0.0804,  0.1419, -0.0463,  0.1033,  0.1793, -0.1158, -0.1154]])

print(pred.argmax(dim=1))
# tensor([7])

# 可以给最后一层价格softmax变为概率
F.softmax(pred, dim=1)

关于iter()next(),可以看CNN Image Preparation Code Project - Learn to Extract, Transform, Load (ETL)。把数据变成iter()是为了防止占用太多内容,具体解释可以看这里(Do iterators save memory in Python?),解释的很清楚。变成iterator后,只能用next()来一个个访问元素。

Neural Network Batch Processing - Pass Image Batch to PyTorch CNN

Build the model

  • Understanding forward pass transformations

上一节是直接用train_set,这一节我们使用dataloader,把train_set变为batch。

data_loader = torch.utils.data.DataLoader(
     train_set, batch_size=10
)

batch = next(iter(data_loader))
images, labels = batch
print(images.shape, labels.shape)
# (torch.Size([10, 1, 28, 28]), torch.Size([10]))

preds = network(images)

print(preds.argmax(dim=1))
# tensor([7, 7, 7, 7, 7, 7, 7, 7, 7, 7])

# 比较预测结果和真实结果的差别
preds.argmax(dim=1).eq(labels)
# tensor([False, False, False, False, False, False,  True, False, False, False])


def get_num_correct(preds, labels):
    return preds.argmax(dim=1).eq(labels).sum().item()

print(get_num_correct(preds, labels))
# 1

CNN Output Size Formula - Bonus Neural Network Debugging Session

这一节详细解释了进过Convolutional layer 和max pooling layer后size是如何变化的,并给出了公式。这一节还是直接去看post比较好。另外视频通过debug详细展示了如何t.shape和t.min的变化,非常有帮助。

下面图片里展示了如何在watch里面直接指定想要查看的变量。

img

import torch
import torch.nn as nn
import torch.nn.functional as F

import torchvision
import torchvision.transforms as transforms

# torch.set_grad_enabled(False)

class Network(nn.Module):
  def __init__(self):
    super().__init__()
    self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)
    self.conv2 = nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)

    self.fc1 = nn.Linear(in_features=12*4*4, out_features=120)
    self.fc2 = nn.Linear(in_features=120, out_features=60)
    self.out = nn.Linear(in_features=60, out_features=10)

  def forward(self, t):
    # 把上一节的函数写成更简洁的形式
    t = F.relu(self.conv1(t))
    t = F.max_pool2d(t, kernel_size=2, stride=2)

    t = F.relu(self.conv2(t))
    t = F.max_pool2d(t, kernel_size=2, stride=2)

    t = self.fc1((t.reshape(-1, 12 * 4 * 4)))
    t = F.relu(self.fc2(t))
    t = self.out(t)

    return t


network = Network()

train_set = torchvision.datasets.FashionMNIST(
  root='./data',
  train=True,
  download=True,
  transform=transforms.Compose([
    transforms.ToTensor()
  ])
)

sample = next(iter(train_set))
image, label = sample

output = network(image.unsqueeze(0))
print(output)

Section 3: Training Neural Networks

CNN Training with Code Example - Neural Network Programming Course

We are now ready to begin the training process.

  • Prepare the data
  • Build the model
  • Train the model
    • Calculate the loss, the gradient, and update the weights
  • Analyze the model’s results

Training: What we do after the forward pass

During the entire training process, we do as many epochs as necessary to reach our desired level of accuracy. With this, we have the following steps:

  1. Get batch from the training set.
  2. Pass batch to network.
  3. Calculate the loss (difference between the predicted values and the true values).
  4. Calculate the gradient of the loss function w.r.t the network’s weights.
  5. Update the weights using the gradients to reduce the loss.
  6. Repeat steps 1-5 until one epoch is completed.
  7. Repeat steps 1-6 for as many epochs required to reach the minimum loss.

We already know exactly how to do steps 1 and 2. If you’ve already covered the deep learning fundamentals series, then you know that we use a loss function to perform step 3, and you know that we use backpropagation and an optimization algorithm to perform step 4 and 5. Steps 6 and 7 are just standard Python loops (the training loop). Let’s see how this is done in code.

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F

import torchvision
import torchvision.transforms as transforms


class Network(nn.Module):
  def __init__(self):
    super().__init__()
    self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)
    self.conv2 = nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)

    self.fc1 = nn.Linear(in_features=12*4*4, out_features=120)
    self.fc2 = nn.Linear(in_features=120, out_features=60)
    self.out = nn.Linear(in_features=60, out_features=10)

  def forward(self, t):
    # 把上一节的函数写成更简洁的形式
    t = F.relu(self.conv1(t))
    t = F.max_pool2d(t, kernel_size=2, stride=2)

    t = F.relu(self.conv2(t))
    t = F.max_pool2d(t, kernel_size=2, stride=2)

    t = self.fc1((t.reshape(-1, 12 * 4 * 4)))
    t = F.relu(self.fc2(t))
    t = self.out(t)

    return t

train_set = torchvision.datasets.FashionMNIST(
  root='./data',
  train=True,
  download=True,
  transform=transforms.Compose([
    transforms.ToTensor()
  ])
)

#### 整理一下一个batch的处理过程 ####
network = Network()
train_loader = torch.utils.data.DataLoader(train_set, batch_size=100)
optimizer = optim.Adam(network.parameters(), lr=0.01)

batch = next(iter(train_loader)) # Get Batch
images, labels = batch 

preds = network(images) # Pass Batch, Forward 
loss = F.cross_entropy(preds, labels) # Calculating Loss

loss.backward() # Calculating Gradients
optimizer.step() # Update Weights

#----------------------------
print('loss1:', loss.item())
preds = network(images)
loss = F.cross_entropy(preds, labels)
print('loss2:', loss.item())

CNN Training Loop Explained - Neural Network Code Project

In the last episode, we learned that the training process is an iterative process, and to train a neural network, we build what is called the training loop.

  • Prepare the data
  • Build the model
  • Train the model
    • Build the training loop
  • Analyze the model’s results

img

这部分通过debug讲解zero_grad()的方式非常直观。

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F

import torchvision
import torchvision.transforms as transforms


def get_num_correct(preds, labels):
  return preds.argmax(dim=1).eq(labels).sum().item()

class Network(nn.Module):
  def __init__(self):
    super().__init__()
    self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)
    self.conv2 = nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)

    self.fc1 = nn.Linear(in_features=12*4*4, out_features=120)
    self.fc2 = nn.Linear(in_features=120, out_features=60)
    self.out = nn.Linear(in_features=60, out_features=10)

  def forward(self, t):
    # 把上一节的函数写成更简洁的形式
    t = F.relu(self.conv1(t))
    t = F.max_pool2d(t, kernel_size=2, stride=2)

    t = F.relu(self.conv2(t))
    t = F.max_pool2d(t, kernel_size=2, stride=2)

    t = self.fc1((t.reshape(-1, 12 * 4 * 4)))
    t = F.relu(self.fc2(t))
    t = self.out(t)

    return t

train_set = torchvision.datasets.FashionMNIST(
  root='./data',
  train=True,
  download=True,
  transform=transforms.Compose([
    transforms.ToTensor()
  ])
)

# put the network, optimizer, and the train_loader out of the training loop cell.
network = Network()
optimizer = optim.Adam(network.parameters(), lr=0.01)
train_loader = torch.utils.data.DataLoader(
  train_set, 
  batch_size=100,
  shuffle=True)

# training loop:
for epoch in range(10):

  total_loss = 0
  total_correct = 0

  for batch in train_loader: # Get Batch
    images, labels = batch

    preds = network(images) # Pass Batch, Forward 
    loss = F.cross_entropy(preds, labels) # Calculating Loss

    optimizer.zero_grad()
    loss.backward() # Calculating Gradients
    optimizer.step() # Update Weights

    total_loss += loss.item()
    total_correct += get_num_correct(preds, labels)

  print(
    'epoch', epoch,
    'total_correct:', total_correct,
    'loss:', total_loss
  )

CNN Confusion Matrix with PyTorch - Neural Network Programming

Then, we’ll see how we can take this prediction tensor, along with the labels for each sample, to create a confusion matrix. This confusion matrix will allow us to see which categories our network is confusing with one another. Without further ado, let’s get started.

Where we are now in the course.

  • Prepare the data
  • Build the model
  • Train the model
  • Analyze the model’s results
    • Building, plotting, and interpreting a confusion matrix
# Building a Function to get Predictions for ALL Samples
@torch.no_grad()
def get_all_preds(model, loader):
    all_preds = torch.tensor([])
    for batch in loader:
        images, labels = batch

        preds = model(images)
        all_preds = torch.cat(
            (all_preds, preds), 
            dim=0
        )
    return all_preds

# Locally Disabling PyTorch Gradient Tracking
with torch.no_grad():
    prediction_loader = torch.utils.data.DataLoader(train_set, batch_size=10000)
    train_preds = get_all_preds(network, prediction_loader)

# Using the Predictions Tensor
preds_correct = get_num_correct(train_preds, train_set.targets)
print('total correct:', preds_correct)
print('accuracy:', preds_correct / len(train_set))

# 有了true label和predication的paris
stacked = torch.stack(
    (train_set.targets, train_preds.argmax(dim=1)),
    dim=1
)
# 之后制作confusion matrix
cmt = torch.zeros(10, 10, dtype=torch.int64)

for p in stacked:
    tl, pl = p.tolist()
    cmt[tl, pl] = cmt[tl, pl] + 1

Plotting the Confusion Matrix

这部分不使用上面自己写的制作cmt的函数,而是直接用sklearn里的函数

# 还可以用sklearn里的包创建cmt
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix
from resources.plotcm import plot_confusion_matrix
%matplotlib inline

# 想要画出来的话,得变为ndarray
cm = confusion_matrix(train_set.targets, train_preds.argmax(dim=1))
print(type(cm))

names = (
    'T-shirt/top'
    ,'Trouser'
    ,'Pullover'
    ,'Dress'
    ,'Coat'
    ,'Sandal'
    ,'Shirt'
    ,'Sneaker'
    ,'Bag'
    ,'Ankle boot'
)

plt.figure(figsize=(10,10))
plot_confusion_matrix(cm, names)

img

Stack vs Concat in PyTorch, TensorFlow & NumPy - Deep Learning Tensor Ops

Existing vs New Axes

Concatenating joins a sequence of tensors along an existing axis, and stacking joins a sequence of tensors along a new axis.

二者的区别其实很简单,concatenate是在existing axis上的操作,而stack是在new axis上的操作。

import torch
t1 = torch.tensor([1,1,1])

> print(t1.shape)
> print(t1.unsqueeze(dim=0).shape)
> print(t1.unsqueeze(dim=1).shape)
torch.Size([3])
torch.Size([1, 3])
torch.Size([3, 1])

Now, thinking back about concatenating verses stacking, when we concat, we are joining a sequence of tensors along an existing axis. This means that we are extending the length of an existing axis.

When we stack, we are creating a new axis that didn’t exist before and this happens across all the tensors in our sequence, and then we concat along this new sequence.

Stack vs Cat in PyTorch

stack其实就是先用unsquuze扩充维度后,再cat的结果:stack(dim=0) = unsqueeze(dim=0) + cat(dim=0)

import torch

t1 = torch.tensor([1,1,1])
t2 = torch.tensor([2,2,2])
t3 = torch.tensor([3,3,3])


> torch.cat(
    (
         t1.unsqueeze(0)
        ,t2.unsqueeze(0)
        ,t3.unsqueeze(0)
    )
    ,dim=0
)
tensor([[1, 1, 1],
        [2, 2, 2],
        [3, 3, 3]])


# 上面写成stack的话,非常简单
> torch.stack(
    (t1,t2,t3)
    ,dim=0
)
tensor([[1, 1, 1],
        [2, 2, 2],
        [3, 3, 3]])

注意上面都是dim=0上的操作。
下面是dim=1上的操作,注意最后结果1,2,3的排列:

> torch.cat(
    (
         t1.unsqueeze(1)
        ,t2.unsqueeze(1)
        ,t3.unsqueeze(1)
    )
    ,dim=1
)
tensor([[1, 2, 3],
        [1, 2, 3],
        [1, 2, 3]])

> torch.stack(
    (t1,t2,t3)
    ,dim=1
)
tensor([[1, 2, 3],
        [1, 2, 3],
        [1, 2, 3]])

Stack vs Concatenate in NumPy

Library Function Name Function Name Function Name
PyTorch cat() unsqueeze() stack()
TensorFlow concat() expand_dims() stack()
NumPy concatenate() expand_dims() stack()

stack()的函数名称在三个库里都是一样的。

numpy里的expand_dims()对应pytorch里的unsequzze(),可以使用??np.expand_dims来查看对应的文档。

> np.concatenate(
    (
         np.expand_dims(t1, 0)
        ,np.expand_dims(t2, 0)
        ,np.expand_dims(t3, 0)
    )
    ,axis=0
)
array([[1, 1, 1],
       [2, 2, 2],
       [3, 3, 3]])

Stack or Concat: Real-Life Examples

Here are three concrete examples that we can encounter in real life. Let’s decide when we need to stack and when we need to concat.

Joining Images into a Single Batch

问题:将3个3-d tensor的image组合成一个batch。

回答:使用stack,新建的axis就是batch所在的axis。

import torch
t1 = torch.zeros(3,28,28)
t2 = torch.zeros(3,28,28)
t3 = torch.zeros(3,28,28)

torch.stack(
    (t1,t2,t3)
    ,dim=0
).shape

## output ##
torch.Size([3, 3, 28, 28])

Joining Batches into a Single Batch

问题:将3个4-d tensor的image组合成一个batch。

答案:使用concat

import torch
t1 = torch.zeros(1,3,28,28)
t2 = torch.zeros(1,3,28,28)
t3 = torch.zeros(1,3,28,28)
torch.cat(
    (t1,t2,t3)
    ,dim=0
).shape

## output ##
torch.Size([3, 3, 28, 28])

Joining Images with an Existing Batch

问题:将3个3-d tensor的image,添加到一个batch里

答案:先用stack,再用concat

import torch
batch = torch.zeros(3,3,28,28)
t1 = torch.zeros(3,28,28)
t2 = torch.zeros(3,28,28)
t3 = torch.zeros(3,28,28)

torch.cat(
    (
        batch
        ,torch.stack(
            (t1,t2,t3)
            ,dim=0
        )
    )
    ,dim=0
).shape

## output ##
torch.Size([6, 3, 28, 28])

或者使用unsqueeze:

import torch
batch = torch.zeros(3,3,28,28)
t1 = torch.zeros(3,28,28)
t2 = torch.zeros(3,28,28)
t3 = torch.zeros(3,28,28)

torch.cat(
    (
        batch
        ,t1.unsqueeze(0)
        ,t2.unsqueeze(0)
        ,t3.unsqueeze(0)
    )
    ,dim=0
).shape

## output ##
torch.Size([6, 3, 28, 28])

TensorBoard with PyTorch - Visualize Deep Learning Metrics

tensorboard的demo代码

要注意版本问题,具体的可以看我在Medium上的文章。

img

Bird’s eye view of where we are in the training process.

  • Prepare the data
  • Build the model
  • Train the model
  • Analyze the model’s results
    • Using TensorBoard for this

Hyperparameter Tuning and Experimenting - Training Deep Neural Networks

Without further ado, let’s get started.

  • Prepare the data
  • Build the model
  • Train the model
  • Analyze the model’s results
    • Hyperparameter Experimentation

这一节是上一节的后续,主要将如何使用tensorboard来调参。

Naming the Training Runs for TensorBoard

给每一次run起个名字,方便辨识。

tb = SummaryWriter(comment=f' batch_size={batch_size} lr={lr}')

Calculate Loss with Different Batch Sizes

更改前和更改后:

# https://github.com/BrambleXu/deeplizard-pytorch-course/blob/master/scripts/6-tensorboard_2_summarywriter.py#L93

# https://github.com/BrambleXu/deeplizard-pytorch-course/blob/master/scripts/6-tensorboard_3_hyper_nested_iteration.py#L99

total_loss += loss.item() # 1
# update to 
total_loss += loss.item() * batch_size # 2
# update to
total_loss += loss.item() * image.shape[0] # 3
  1. loss = F.cross_entropy(preds, labels)返回的loss其实是按batch取平均值之后返回的结果
  2. 所以我们再乘以batch_size,得到最大的loss
  3. 但是如果训练集和batch size除不尽,那最后一个batch大小肯定小于batch_size,所以最后可以改为image.shape[0],乘以实际的样本数量

Training script

"""
使用pytorch 1.2的话,tensorboard无法显示graph:
  https://github.com/pytorch/pytorch/issues/24157

我更新到了1.4,解决了这个问题

pip install torch==1.4.0 torchvision==0.5.0 tensorboard==2.1.0

使用方法
  tensorboard --version
  tensorboard --logdir=runs    
"""

from itertools import product

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F

import torchvision
import torchvision.transforms as transforms

torch.set_printoptions(linewidth=120) # Display options for output
torch.set_grad_enabled(True) # Already on by default

from torch.utils.tensorboard import SummaryWriter # new 

print(torch.__version__)
print(torchvision.__version__)

def get_num_correct(preds, labels):
  return preds.argmax(dim=1).eq(labels).sum().item()

class Network(nn.Module):
  def __init__(self):
    super().__init__()
    self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)
    self.conv2 = nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)

    self.fc1 = nn.Linear(in_features=12*4*4, out_features=120)
    self.fc2 = nn.Linear(in_features=120, out_features=60)
    self.out = nn.Linear(in_features=60, out_features=10)

  def forward(self, t):
    t = F.relu(self.conv1(t))
    t = F.max_pool2d(t, kernel_size=2, stride=2)

    t = F.relu(self.conv2(t))
    t = F.max_pool2d(t, kernel_size=2, stride=2)

    t = t.flatten(start_dim=1) # t = t.reshape(-1, 12 * 4 * 4)
    t = F.relu(self.fc1(t))
    t = F.relu(self.fc2(t))
    t = self.out(t)

    return t

train_set = torchvision.datasets.FashionMNIST(
  root='./data',
  train=True,
  download=True,
  transform=transforms.Compose([
    transforms.ToTensor()
  ])
)

# Parameter dict
parameters = dict(
    lr = [.01, .001]
    ,batch_size = [100, 1000]
    ,shuffle = [True, False]
)
param_values = [v for v in parameters.values()] # [[0.01, 0.001], [100, 1000], [True, False]]


# Parameter Iteration
for lr, batch_size, shuffle in product(*param_values): 
  network = Network()

  train_loader = torch.utils.data.DataLoader(train_set, batch_size=batch_size, shuffle=shuffle)
  optimizer = optim.Adam(network.parameters(), lr=lr)

  images, labels = next(iter(train_loader))
  grid = torchvision.utils.make_grid(images)

  # Tensorboard lines
  comment = f' batch_size={batch_size} lr={lr} shuffle={shuffle}'
  tb = SummaryWriter(comment=comment)
  tb.add_image('images', grid)
  tb.add_graph(network, images)

  for epoch in range(10):
    total_loss = 0
    total_correct = 0
    for batch in train_loader: 
      images, labels = batch # Get Batch
      preds = network(images) # Pass Batch 
      loss = F.cross_entropy(preds, labels) # Calculating Loss
      optimizer.zero_grad() # Zero Gradients
      loss.backward() # Calculating Gradients
      optimizer.step() # Update Weights

      # total_loss += loss.item() * batch_size # 这种写法有点不严谨,因为最后一个batch可能比实际的batch_size要小
      total_loss += loss.item() * images.shape[0]
      total_correct += get_num_correct(preds, labels)

    # add tb lines for each epoch
    tb.add_scalar('Loss', total_loss, epoch)
    tb.add_scalar('Number Correct', total_correct, epoch)
    tb.add_scalar('Accuracy', total_correct / len(train_set), epoch)

    # tb.add_histogram('conv1.bias', network.conv1.bias, epoch)
    # tb.add_histogram('conv1.weight', network.conv1.weight, epoch)
    # tb.add_histogram('conv1.weight.grad' ,network.conv1.weight.grad, epoch)
    for name, param in network.named_parameters():
      tb.add_histogram(name, param, epoch)
      tb.add_histogram(f'{name}.grad', param.grad, epoch)

    print('epoch', epoch, 'total_correct:', total_correct, 'loss:', total_loss)

  tb.close()

运行之后,执行下面命令

tensorboard --logdir=runs 

Read hyperparameter graph

img

在左侧输入不同的regex,查看不同param下曲线的变化情况。通过查看3个不同的param,可以发现当shuffle=False, batch_size=100, lr=0.01的时候效果最好。

Section 4: Extra Cool Coding Stuff

Training Loop Run Builder - Neural Network Experimentation Code

使用RunBuilder class来高效实验。

Using the RunBuilder Class

这里主要用namedtuple来简化创建immutable class的过程。

from collections import OrderedDict
from collections import namedtuple
from itertools import product

class RunBuilder():
    @staticmethod
    def get_runs(params):
        Run = namedtuple('Run', params.keys())
        runs = []
        for v in product(*params.values()):
            runs.append(Run(*v))
        return runs

params = OrderedDict(
            lr = [0.01, 0.001],
            batch_size = [1000, 10000])

runs = RunBuilder.get_runs(params)

runs里包含了每次run需要的参数,可以通过获取属性的方式查看参数:

> runs
[
    Run(lr=0.01, batch_size=1000),
    Run(lr=0.01, batch_size=10000),
    Run(lr=0.001, batch_size=1000),
    Run(lr=0.001, batch_size=10000)
]

> run = runs[0]
> run
Run(lr=0.01, batch_size=1000)

> print(run.lr, run.batch_size)
0.01 1000

# 迭代的时候,写法更简洁
for run in runs:
    print(run, run.lr, run.batch_size)

Run(lr=0.01, batch_size=1000) 0.01 1000
Run(lr=0.01, batch_size=10000) 0.01 10000
Run(lr=0.001, batch_size=1000) 0.001 1000
Run(lr=0.001, batch_size=10000) 0.001 10000

分解一下其中关于prams的部分

params = OrderedDict(
            lr = [0.01, 0.001],
            batch_size = [1000, 10000])

In [22]: params.keys()
Out[22]: odict_keys(['lr', 'batch_size'])

In [23]: params.values()
Out[23]: odict_values([[0.01, 0.001], [1000, 10000]])

In [25]: list(product(*params.values()))
Out[25]: [(0.01, 1000), (0.01, 10000), (0.001, 1000), (0.001, 10000)]

Coding the RunBuilder Class

Before:

for lr, batch_size, shuffle in product(*param_values):
    comment = f' batch_size={batch_size} lr={lr} shuffle={shuffle}'

    # Training process given the set of parameters

After:

for run in RunBuilder.get_runs(params):
    comment = f'-{run}'

    # Training process given the set of parameters

CNN Training Loop Refactoring - Simultaneous Hyperparameter Testing

这一节看视频更方便理解,看post反而不明白。

Cleaning Up the Training Loop and Extracting Classes

下面的内容是前两节前写的比较复杂的训练代码:

img

在这一节,我们要用RunBuilder,再新建一个RunManager。把之前的Training loop整理一下,改为下面更简洁的情况:

img

在一开始的params里设置好要测试的参数,然后用RunManager()和RunBuilder()来管理训练过程。其中RunManager()可以用来放一些tensorboard处理之类的代码。

下面标记出来的地方都是RunManager()记录的地方:

img

RunManager()里面关于epoch的部分:

img

题外话,使用prefix来让参数更具有解释性:

img

这个tree view真不错啊:

img

Runmanager()里关于run的两个函数:

img

Runmanager()里关于epoch的两个函数:

img

其中end_epoch()的部分负责要把对应的训练结果记录好,保存,转换为pandas的DataFrame结构,方便查看。

img

这两个函数放在最后,用于记录loss:

img

一个函数用于计算预测成功的数量有多少,save则保存结果到csv和json:

img

这一节直接看看代码吧:7-CNN Training Loop Refactoring-RunManager.py

PyTorch DataLoader num_workers - Deep Learning Speed Limit Increase

img

loader = torch.utils.data.DataLoader(train_set, batch_size=run.batch_size, num_workers=run.num_workers)

使用DataLoader的时候,可以通过num_workers来加速训练过程。num_workers默认是0,意思是使用main process来加载batch。这里可以通过设置1,添加一个subprocess,当main process在处理当前的batch的时候,subprocess已经把下一个batch加载到了memory里,这样就能加快加载的速度。

但是添加太多num_workers不一定能保证速度会越来越快,因为每个batch都是在一个queue里的,添加再多的worker,也还是要在queue里一个个排队。所以这里给出的最佳建议是,设定num_workers为1即可。

欢迎订阅我的博客:RSS feed
知乎: 赤乐君
Blog: BrambleXu
GitHub: BrambleXu
Medium: BrambleXu


文章作者: BrambleXu
版权声明: 本博客所有文章除特別声明外,均采用 CC BY-NC-SA 4.0 许可协议。转载请注明来源 BrambleXu !
评论
  目录