Lecture 09 Softmax Classifier

PyTorch Tutorial
09. Softmax Classifier
Lecturer : Hongpu Liu Lecture 9-1 PyTorch Tutorial @ SLAM Research Group
Revision: Diabetes dataset
𝑥1 Linear Layer
Sigmoid Layer
𝑥2 𝑧1 𝑜1
𝑥3 𝑧2 𝑜2 𝑧1 𝑜1
𝑥4 𝑧3 𝑜3 𝑧2 𝑜2
𝑧1 𝑜1 𝑦ො
𝑥5 𝑧4 𝑜4 𝑧3 𝑜3
𝑥6 𝑧5 𝑜5 𝑧4 𝑜4
𝑥7 𝑧6 𝑜6
𝑥8
Revision: MNIST Dataset
There are 10 labels in MNIST dataset.
How to design the neural network?
Design 10 outputs using Sigmoid?
Linear Layer
𝑜1 𝑦ො1
Sigmoid Layer
𝑜2 𝑦ො2
Input Layer
𝑜3 𝑦ො3
𝑜4 𝑦ො4
𝑜5 𝑦ො5
… What is wrong?
𝑜6 𝑦ො6
We hope the outputs is competitive!
𝑜7 𝑦ො7 Actually we hope the neural network
𝑜8 𝑦ො8 outputs a distribution.
𝑜9 𝑦ො9
𝑜10 𝑦ො10
Output a Distribution of prediction with Softmax
Linear Layer
𝑜1 𝑃(𝑦 = 0)
Sigmoid Layer
𝑜2 𝑃(𝑦 = 1)
Input Layer
𝑜3 𝑃(𝑦 = 2)
Softmax Layer
𝑜4 𝑃(𝑦 = 3)
𝑜5 𝑃(𝑦 = 4)
… such that
𝑜6 𝑃(𝑦 = 5)
𝑃 𝑦=𝑖 ≥0
𝑜7 𝑃(𝑦 = 6)
9
𝑜8 𝑃(𝑦 = 7) ෍ 𝑃(𝑦 = 𝑖) = 1
𝑜9 𝑃(𝑦 = 8) 𝑖=0
𝑜10 𝑃(𝑦 = 9)
Softmax Layer
Suppose 𝑍 𝑙 ∈ ℝ𝐾 is the output of the last linear layer, the Softmax function:
𝑒 𝑧𝑖
𝑃(𝑦 = 𝑖) = 𝐾−1 𝑧𝑗 , 𝑖 ∈ 0, … , 𝐾 − 1
σ𝑗=0 𝑒
Softmax Layer - Example
0.2
0.1
…
−0.1
0.2 1.22
0.1 1.11
… Exponent
−0.1 0.90
0.2 1.22
0.1 1.11
… Exponent
−0.1 0.90
Sum
3.23
0.2 1.22 0.38

0.1 1.11
… Exponent Divide 0.34
−0.1 0.90 0.28
Sum
3.23
Softmax
0.2 1.22 0.38

0.1 1.11
… Exponent Divide 0.34
−0.1 0.90 0.28
Sum
3.23
Loss function - Cross Entropy
෡
𝒀
0.2 0.38
0.1 0.34
… Softmax
−0.1 0.28
෡
𝒀 𝒀
0.2 0.38 1
One-hot
0.1 0.34 0
… Softmax 𝟏
−0.1 0.28 0
෡
𝒀 𝒀
0.2 0.38 1
One-hot
0.1 0.34 0
… Softmax Loss 𝟏
−0.1 0.28 0
෠ 𝑌 = −𝑌 log 𝑌෠
𝐿𝑜𝑠𝑠 𝑌,
NLLLoss
Negative Log Likelihood Loss
෡
𝒀 𝒀
0.2 0.38 1
One-hot
0.1 0.34 0
… Softmax Log ෡
−𝒀 𝒍𝒐𝒈 𝒀 𝟏
−0.1 0.28 0
Loss
෠ 𝑌 = −𝑌 log 𝑌෠
𝐿𝑜𝑠𝑠 𝑌,
Cross Entropy in Numpy
Loss
෡
𝒀 𝒀
0.2 0.38 1
One-hot
0.1 0.34 0
… Softmax Log ෡
−0.1 0.28 0
import numpy as np
y = np.array([1, 0, 0])
z = np.array([0.2, 0.1, -0.1])
y_pred = np.exp(z) / np.exp(z).sum()
loss = (- y * np.log(y_pred)).sum()
print(loss)
Loss
෡
𝒀 𝒀
0.2 0.38 1
One-hot
0.1 0.34 0
… Softmax Log ෡
−0.1 0.28 0
import numpy as np
y = np.array([1, 0, 0])
z = np.array([0.2, 0.1, -0.1])
print(loss)
Loss
෡
𝒀 𝒀
0.2 0.38 1
One-hot
0.1 0.34 0
… Softmax Log ෡
−0.1 0.28 0
import numpy as np
y = np.array([1, 0, 0])
z = np.array([0.2, 0.1, -0.1])
print(loss)
Loss
෡
𝒀 𝒀
0.2 0.38 1
One-hot
0.1 0.34 0
… Softmax Log ෡
−0.1 0.28 0
import numpy as np
y = np.array([1, 0, 0])
z = np.array([0.2, 0.1, -0.1])
print(loss)
Cross Entropy in PyTorch
෡
𝒀 Loss 𝒀
0.2 0.38 1
One-hot
0.1 0.34 0
… Softmax Log ෡
−0.1 0.28 0
Torch.nn.CrossEntropyLoss()
import torch
y = torch.LongTensor([0])
z = torch.Tensor([[0.2, 0.1, -0.1]])
criterion = torch.nn.CrossEntropyLoss()
loss = criterion(z, y)
print(loss)
Mini-Batch: batch_size=3
import torch
Y = torch.LongTensor([2, 0, 1])
Y_pred1 = torch.Tensor([[0.1, 0.2, 0.9],

[1.1, 0.1, 0.2],
[0.2, 2.1, 0.1]])
Y_pred2 = torch.Tensor([[0.8, 0.2, 0.3],
[0.2, 0.3, 0.5],
[0.2, 0.2, 0.5]])
l1 = criterion(Y_pred1, Y) Batch Loss1 = tensor(0.4966)

l2 = criterion(Y_pred2, Y)
print("Batch Loss1 = ", l1.data, "\nBatch Loss2=", l2.data) Batch Loss2 = tensor(1.2389)
Exercise 9-1: CrossEntropyLoss vs NLLLoss
• What are the differences?

• Reading the document:
• https://pytorch.org/docs/stable/nn.html#crossentropyloss
• https://pytorch.org/docs/stable/nn.html#nllloss
• Try to know why:
• CrossEntropyLoss <==> LogSoftmax + NLLLoss
Back to MNIST Dataset
There are 10 labels in MNIST dataset.
How to design the neural network?
MNIST Dataset
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.1 0.1 0.5 0.5 0.7 0.1 0.7 1.0 1.0 0.5 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.1 0.4 0.6 0.7 1.0 1.0 1.0 1.0 1.0 0.9 0.7 1.0 1.0 0.8 0.3 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.2 0.9 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.4 0.3 0.3 0.2 0.2 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.9 1.0 1.0 1.0 1.0 1.0 0.8 0.7 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.3 0.6 0.4 1.0 1.0 0.8 0.0 0.0 0.2 0.6 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.6 1.0 0.4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.6 1.0 0.8 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.8 1.0 0.3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 1.0 0.9 0.6 0.4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.3 0.9 1.0 1.0 0.5 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.2 0.7 1.0 1.0 0.6 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.4 1.0 1.0 0.7 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 1.0 0.3 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.2 0.5 0.7 1.0 1.0 0.8 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.2 0.6 0.9 1.0 1.0 1.0 1.0 0.7 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.5 0.9 1.0 1.0 1.0 1.0 0.8 0.3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.3 0.8 1.0 1.0 1.0 1.0 0.8 0.3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.7 0.9 1.0 1.0 1.0 1.0 0.8 0.3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.2 0.7 0.9 1.0 1.0 1.0 1.0 1.0 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.5 1.0 1.0 1.0 0.8 0.5 0.5 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
28 ∗ 28 = 784 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Implementation of classifier to MNIST dataset
Prepare dataset Design model using Class

1 2
Dataset and Dataloader inherit from nn.Module
Construct loss and optimizer Training cycle

3 4
using PyTorch API forward, backward, update
Implementation of classifier to MNIST dataset
Prepare dataset Design model using Class

1 2
Dataset and Dataloader inherit from nn.Module
Construct loss and optimizer Training cycle + Test

3 4
using PyTorch API forward, backward, update
Implementation – 0. Import Package
import torch
from torchvision import transforms
from torchvision import datasets For constructing DataLoader
from torch.utils.data import DataLoader
import torch.nn.functional as F
import torch.optim as optim
import torch
from torchvision import datasets
import torch.nn.functional as F For using function relu()
import torch.optim as optim
import torch
from torchvision import datasets
import torch.nn.functional as F
import torch.optim as optim For constructing Optimizer
Implementation – 1. Prepare Dataset
batch_size = 64
transform = transforms.Compose([
transforms.ToTensor(), Convert the PIL Image to Tensor.
transforms.Normalize((0.1307, ), (0.3081, ))
])
train_dataset = datasets.MNIST(root='../dataset/mnist/',
train=True,
PIL Image
download=True,
transform=transform) ℤ28×28 , 𝑝𝑖𝑥𝑒𝑙 ∈ 0, … , 255
train_loader = DataLoader(train_dataset,
shuffle=True,
batch_size=batch_size)
PyTorch Tensor
test_dataset = datasets.MNIST(root='../dataset/mnist/',
train=False,
download=True, ℝ1×28×28 , 𝑝𝑖𝑥𝑙𝑒 ∈ 0,1
transform=transform)
test_loader = DataLoader(test_dataset,
shuffle=False,
batch_size = 64
transform = transforms.Compose([ The parameters are mean and std
transforms.ToTensor(),
transforms.Normalize((0.1307, ), (0.3081, )) respectively. It use formulation
])
below:
train=True,
download=True,
transform=transform) 𝑷𝒊𝒙𝒆𝒍𝒐𝒓𝒊𝒈𝒊𝒏 − 𝒎𝒆𝒂𝒏
train_loader = DataLoader(train_dataset, 𝑷𝒊𝒙𝒆𝒍𝒏𝒐𝒓𝒎 =
shuffle=True, 𝒔𝒕𝒅
train=False,
download=True,
shuffle=False,
batch_size = 64
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307, ), (0.3081, ))
])
train=True,
download=True,
train_loader = DataLoader(train_dataset,
shuffle=True,
train=False,
download=True,
shuffle=False,
Implementation – 2. Design Model
x = x.view(-1, 784)
(𝑁, 784)
(𝑁, 1,28,28) self.l1 = torch.nn.Linear(784, 512)
(𝑁, 512)
x = F.relu(self.l1(x))
(𝑁, 512)
Input Layer
self.l2 = torch.nn.Linear(512, 256)
Linear Layer (𝑁, 256)
ReLU Layer (𝑁, 256)
Output Layer self.l3 = torch.nn.Linear(256, 128)

(𝑁, 128)
(𝑁, 128)
(𝑁, 64)
(𝑁, 64)
(𝑁, 10) self.l5 = torch.nn.Linear(64, 10)
Implementation – 2. Design Model
class Net(torch.nn.Module):
(𝑁, 784)
def __init__(self):
(𝑁, 1,28,28)
super(Net, self).__init__()
(𝑁, 512)
(𝑁, 512)
Input Layer self.l3 = torch.nn.Linear(256, 128)
Linear Layer (𝑁, 256)
ReLU Layer (𝑁, 256)
def forward(self, x):
Output Layer x = x.view(-1, 784)
(𝑁, 128)
(𝑁, 128)
(𝑁, 64) x = F.relu(self.l4(x))
return self.l5(x)
(𝑁, 64)
(𝑁, 10) model = Net()
Implementation – 3. Construct Loss and Optimizer
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.5)
෡
𝒀 Loss 𝒀
0.2 0.38 1
One-hot
0.1 0.34 0
… Softmax Log ෡
−0.1 0.28 0
Implementation – 4. Train and Test
def train(epoch):
running_loss = 0.0
for batch_idx, data in enumerate(train_loader, 0):
inputs, target = data
optimizer.zero_grad()
# forward + backward + update

outputs = model(inputs)
loss = criterion(outputs, target)
loss.backward()
optimizer.step()
running_loss += loss.item()
if batch_idx % 300 == 299:
print('[%d, %5d] loss: %.3f' % (epoch + 1, batch_idx + 1, running_loss / 300))
running_loss = 0.0
def train(epoch):
running_loss = 0.0
loss.backward()
optimizer.step()
if batch_idx % 300 == 299:
running_loss = 0.0
def train(epoch):
running_loss = 0.0

loss.backward()
optimizer.step()
if batch_idx % 300 == 299:
running_loss = 0.0
def train(epoch):
running_loss = 0.0

loss.backward()
optimizer.step()
if batch_idx % 300 == 299:
running_loss = 0.0
def test():
correct = 0
total = 0
with torch.no_grad():
for data in test_loader:
images, labels = data
outputs = model(images)
_, predicted = torch.max(outputs.data, dim=1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
print('Accuracy on test set: %d %%' % (100 * correct / total))
def test():
correct = 0
total = 0
def test():
correct = 0
total = 0
[1, 300] loss: 0.335

[1, 600] loss: 0.154
[1, 900] loss: 0.067
Accuracy on test set: 90 %
[2, 300] loss: 0.048
[2, 600] loss: 0.040
[2, 900] loss: 0.035
if __name__ == '__main__': Accuracy on test set: 93 %
for epoch in range(10): ………………………………
train(epoch) [9, 300] loss: 0.005
test()
[9, 600] loss: 0.006
[9, 900] loss: 0.007
[10, 300] loss: 0.005
[10, 600] loss: 0.005
[10, 900] loss: 0.005
Softmax and CrossEntropyLoss
෡
𝒀 Loss 𝒀
0.2 0.38 1
One-hot
0.1 0.34 0
… Softmax Log ෡
−0.1 0.28 0
Exercise 9-2: Classifier Implementation
• Try to implement a classifier for:

• Otto Group Product Classification Challenge
• Dataset: https://www.kaggle.com/c/otto-group-product-classification-
challenge/data
PyTorch Tutorial
09. Softmax Classifier

Lecture 09 Softmax Classifier

Uploaded by

Copyright:

Available Formats

Lecture 09 Softmax Classifier

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 09 Softmax Classifier

Uploaded by

Copyright:

Available Formats

PyTorch Tutorial

09. Softmax Classifier

There are 10 labels in MNIST dataset.

How to design the neural network?

𝑜8 𝑦ො8 outputs a distribution.

0.2 1.22 0.38

0.2 1.22 0.38

Y_pred1 = torch.Tensor([[0.1, 0.2, 0.9],

l1 = criterion(Y_pred1, Y) Batch Loss1 = tensor(0.4966)

• What are the differences?

There are 10 labels in MNIST dataset.

How to design the neural network?

Prepare dataset Design model using Class

Construct loss and optimizer Training cycle

Prepare dataset Design model using Class

Construct loss and optimizer Training cycle + Test

Output Layer self.l3 = torch.nn.Linear(256, 128)

# forward + backward + update

# forward + backward + update

# forward + backward + update

[1, 300] loss: 0.335

• Try to implement a classifier for:

You might also like