2024.06.02 PyTorch Essential Training

Posted on 2024-06-02 Edited on 2024-06-09 In AI

Status: Finished
Author: Terezija Semenski
Publishing/Release Date: April 15, 2024
Publisher: Linkedin
Link: https://www.linkedin.com/learning/pytorch-essential-training-deep-learning-23753149/deep-learning-with-pytorch?resume=false&u=3322
Type: Courses
Tags: AI
Start Date: June 1, 2024
End Date: June 2, 2024

Use Google Colab: https://colab.research.google.com/

https://colab.research.google.com/drive/1VHaPSHXGrLlJ5dzC628OVogfY4f8w2mZ

Tensors

Introduction to Tensors

We can think Tensor is generalizations of scalars, vectors, and matrices to any dimension

Tensor vs ndarray

Advantages of Tensors
- Tensor operations are performed significantly faster using GPUs
- Tensors can be stored and manipulated at scale using distributed processing on multiple CPUs and GPUs and across multiple servers
- Tensors keep track of the graph of computations that created them

Creating a tensor CPU example

import torch

first_tens = torch.tensor([[12, 10, 11, 9],[13, 15, 14, 16]])
second_tens = torch.tensor([[1, 2, 3, 4], [5, 6, 7, 8]])

add_tens = first_tens + second_tens

print(add_tens)
print(add_tens.size())

output:

1
2
3

tensor([[13, 12, 14, 13],
        [18, 21, 21, 24]])
torch.Size([2, 4])

1
2
3

sub_tens = first_tens - second_tens
print(sub_tens)
print(sub_tens.size())

output:

1
2
3

tensor([[11,  8,  8,  5],
        [ 8,  9,  7,  8]])
torch.Size([2, 4])

Creating tensors GPU example

import torch

print(torch.__version__)

# output:
# 2.3.0+cu121

if torch.cuda.is_available():
  device = "cuda"
else:
  device = "cpu"

print(device)

# output:
# cuda

tens_a = torch.tensor([[10, 11, 12, 13], [14, 15, 16, 17]], device=device)
tens_b = torch.tensor([[18, 19, 20, 21], [22, 23, 24, 25]], device=device)

multi_tens = tens_a * tens_b
print(multi_tens)

output:

The result is also allocated in GPU. The cuda:0 means the first GPU is used. In the case our device contains multiple GPUs, this way, we can controle which GPU is being used.

1 2	tensor([[180, 209, 240, 273], [308, 345, 384, 425]], device='cuda:0')

Moving Tensor between GPUs and CPUs

By default, all the data are in the CPU
When training neural network, which is huge, we prefer to use GPU for faster training
Transfer the data from the CPU to the GPU
After the training, the output tensors are produced in GPU
The output data requires preprocessing
Some preprocessing libraries don’t support tensors and expect a NumPy array
NumPy supports only data in the CPU; we need to move the data from the CPU to the GPU

Moving Tensors from CPU to GPU

# 1st way
Tensor.cuda()

# 2nd way
Tensor.to("cuda")

# 3rd way
Tensor.to("cuda:0")

Moving Tensors from GPU to CPU

# 1st case Tensor with required_grad = False
Tensor.cpu()

# 2nd case Tensor with required_grad = True
Tensor.detach().cpu()

Creating Tensors

Different ways to create tensors

http://pytorch.org/docs/stable/torch.html

import torch
import numpy as np

# initialize a tensor from a Python list
tensor_from_list = torch.tensor([1, 2, 3, 4, 5])
# initialize a tensor from a tuple
tensor_from_tuple = torch.tensor((6, 7, 8, 9, 10))
print("Tensor from list:", tensor_from_list)
print("Tensor from tuple:", tensor_from_tuple)

1
2
3

# initialize a tensor from a ndarry
tensor_from_array = torch.tensor(np.array([11, 12, 13, 14, 15]))
print("Tensor from array:", tensor_from_array)

Different functions for creating tensors

torch.empty(), torch.ones(), torch.zeros()

tensor_emp = torch.empty(3, 4)
print("tensor_emp :", tensor_emp)
tensor_zeros = torch.zeros(3, 4)
print("tensor_zeros :", tensor_zeros)
tensor_ones = torch.ones(3, 4)
print("tensor_ones :", tensor_ones)

torch.rand(), torch.randn(), torch.randint()

uniform distribution: 均匀分布
normal distribution: 正态分布

# tensors initialized by size with random values
# returns a tensor filled with random numbers from a uniform distribution
tensor_rand_un = torch.rand(4, 5)
print("tensor_rand_un :", tensor_rand_un)

# returns a tensor filled with random numbers from a normal distribution
tensor_rand_norm = torch.randn(4, 5)
print("tensor_rand_norm :", tensor_rand_norm)

# returns a tensor filled with random integers generated uniformly (from 5 to 10)
tensor_rand_int = torch.randint(5, 10, (4, 5))
print("tensor_rand_int :", tensor_rand_int)

output:

tensor_rand_un : tensor([[0.8624, 0.2577, 0.8981, 0.7393, 0.1189],
        [0.1564, 0.9084, 0.1446, 0.2822, 0.2021],
        [0.7456, 0.3061, 0.0126, 0.9152, 0.3011],
        [0.1059, 0.9894, 0.9812, 0.8815, 0.9442]])
tensor_rand_norm : tensor([[-1.1702,  1.5030, -1.2549, -0.1946,  0.9323],
        [ 0.3549, -0.2362,  0.2905,  0.6290, -0.4099],
        [-1.1625,  1.6882,  0.6824, -0.3181,  0.8423],
        [-0.8305, -0.5503,  0.0125,  1.0829, -0.5804]])
tensor_rand_int : tensor([[7, 6, 5, 7, 9],
        [7, 8, 9, 7, 6],
        [7, 5, 6, 6, 5],
        [9, 9, 7, 6, 8]])

1
2
3

# initialize a tensor of ones
tensor_ones = torch.ones_like(tensor_rand_int)
print(tensor_ones)

output

tensor([[1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1]])

Tensor attributes

Knowing device location, datatype, dimension, and rank is very important

1
2
3

import torch

first_tensor = torch.tensor([1,2,3,4,5,6])

torch.device indicates the tensor’s device location

1 2	first_tensor.device # device(type='cpu')

torch.dtype indicates the tensor’s data type

1 2	first_tensor.dtype # torch.int64

torch.shape shows the tensor’s dimensions

1 2	first_tensor.shape # torch.Size([6])

torch.ndim identifies the number of a tensor’s dimensions or rank

1 2	first_tensor.ndim # 1

Tensor data types

Integer data type tensor

#@title Integer data type tensor
int_tensor = torch.tensor([1, 2, 3, 4, 5], dtype=torch.int8)
int_tensor.dtype
# torch.int8

Float data type tensor

#@title Float data type tensor
float_tensor = torch.tensor([1, 2, 3, 4, 5], dtype=torch.float32)
float_tensor.dtype
# torch.float32

Short data type tensor

#@title Short data type tensor
short_tensor = torch.tensor([1, 2, 3, 4, 5], dtype=torch.int16)
short_tensor.dtype
# torch.int16

Casting a tensor to a new data type (1st way)

#@title Casting a tensor to a new data type (1st way)
int_tensor = int_tensor.float()
int_tensor.dtype
# torch.float32

Casting a tensor to a new data type (2nd way)

#@title Casting a tensor to a new data type (2nd way)
last_tensor = short_tensor.to(dtype=torch.int8)
last_tensor.dtype
# torch.int8

Creating tensors from random samples

1 2	torch.manual_seed(111) # fixed seed torch.rand(3, 3)

Creating tensors like other tensors

torch.zeros_like(), torch.ones_like(), torch.rand_like()

1	torch.full((4, 5), 5) # a array with all 5

Manipulate Tensors

Tensor operations

Indexing and slicing of tensors is the same way with NumPy

#@title Indexing 1-dim tensor example
one_dim_tensor = torch.tensor([1, 2, 3, 4, 5, 6])
print(one_dim_tensor[2])
print(one_dim_tensor[2].item())
# tensor(3)
# 3

#@title Slicing 1-dim tensor example
# [start:end:step]
one_dim_tensor[1:3]
# tensor([2, 3])

#@title Indexing 2-dim tensor example
two_dim_tensor = torch.tensor([[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12],
[13, 14, 15, 16, 17, 18], [19, 20, 21, 22, 23, 24]])
two_dim_tensor[1][3]
# tensor(10)

#@title Slicing 2-dim tensor example
print("first three elements of the 1st row: ", two_dim_tensor[0, 0:3])
print("first four elements of the 2nd row: ", two_dim_tensor[1, 0:4])
# first three elements of the 1st row:  tensor([1, 2, 3])
# first four elements of the 2nd row:  tensor([ 7,  8,  9, 10])

1
2
3

#@title Use indexing to extract the data that meets some criteria
two_dim_tensor[two_dim_tensor<11]
# tensor([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

#@title Combining tensors 维度增加
torch.stack((two_dim_tensor, two_dim_tensor))

# output
tensor([[[ 1,  2,  3,  4,  5,  6],
         [ 7,  8,  9, 10, 11, 12],
         [13, 14, 15, 16, 17, 18],
         [19, 20, 21, 22, 23, 24]],

        [[ 1,  2,  3,  4,  5,  6],
         [ 7,  8,  9, 10, 11, 12],
         [13, 14, 15, 16, 17, 18],
         [19, 20, 21, 22, 23, 24]]])

#@title Concatenation 维度不变，shape增大

tensorA = torch.tensor([[1, 2, 3, 4], [5, 6, 7, 8]])
tensorB = torch.tensor([[9, 10, 11, 12], [13, 14, 15, 16]])

print("Vertically concate tensorA and tensorB: (default: dim=0)")
torch.cat([tensorA, tensorB])

print("Horizontally concate tensorA and tensorB: (default: dim=1)")
torch.cat([tensorA, tensorB], dim=1)

# output
tensor([[ 1,  2,  3,  4],
        [ 5,  6,  7,  8],
        [ 9, 10, 11, 12],
        [13, 14, 15, 16]])
tensor([[ 1,  2,  3,  4,  9, 10, 11, 12],
        [ 5,  6,  7,  8, 13, 14, 15, 16]])

#@title Splitting tensors
first_tensor, second_tensor, third_tensor, fourth_tensor = torch.unbind(two_dim_tensor)
print(first_tensor, second_tensor, third_tensor, fourth_tensor)

# tensor([1, 2, 3, 4, 5, 6]) tensor([ 7,  8,  9, 10, 11, 12]) tensor([13, 14, 15, 16, 17, 18]) tensor([19, 20, 21, 22, 23, 24])

#@title Splitting 2-dim tensor
torch.unbind(two_dim_tensor, dim=1)

# output
(tensor([ 1,  7, 13, 19]),
 tensor([ 2,  8, 14, 20]),
 tensor([ 3,  9, 15, 21]),
 tensor([ 4, 10, 16, 22]),
 tensor([ 5, 11, 17, 23]),
 tensor([ 6, 12, 18, 24]))

Mathematical functions

Built-In Math Functions

Pointwise operation
Reduction functions
Comparison function
Linear algebra operation
Spectral and other math computations
Pointwise operation

Perform an operation on each point in the tensor individually and return a new tensor
- Basic math functions: add(), mul(), div(), neg(), and true_divide()
- Functions for truncation: ceil(), clamp(), floor(), etc.
- Logical function
- Trigonometry function (三角函数）
Reduction Operations

Reduce numbers down to a single number or a smaller set of numbers
- Results in reducing the dimensionality or rank of the tensor
- Include statistical functions such as mean, median, mode, etc.
Comparison Functions
- Compare all the values within a tensor or compare values of two different tensors
- Functions to find the minimum or maximum value, sort tensor values, test tensor status or condition, and similar
Linear Algebra Functions

torch.mm(), torch.matmul(), torch.bmm()
- Enable matrix operations and are essential for deep-learning computations
- Functions for matrix computations and tensor computations
Spectral Operations

Useful for data transformations or analysis

#@title Basic math function
a = torch.tensor([10, 2, 8, 6, 4])
b = torch.tensor([1, 2, 4, 3, 1])
print('adding tensors a and b:', a.add(b)) # 等价于 a+b
print('multiplying tensors and b:', a.mul(b)) # a*b
print('dividing tensor and b:', a.div(b)) #a/b

#output
adding tensors a and b: tensor([11,  4, 12,  9,  5])
multiplying tensors and b: tensor([10,  4, 32, 18,  4])
dividing tensor and b: tensor([10.,  1.,  2.,  2.,  4.])

#@title Reduction functions
c = torch.tensor([[20., 14., 11., 8.], [3., 19., 14., 6.]])
print("Mean of the tensor c:", torch.mean(c))
print("Median of the tensor c:", torch.median(c))
print("Model of the tensor c:", torch.mode(c))
print("Standard deviation of the tensor c:", torch.std(c))

# output
Mean of the tensor c: tensor(11.8750)
Median of the tensor c: tensor(11.)
Model of the tensor c: torch.return_types.mode(
values=tensor([8., 3.]),
indices=tensor([3, 0]))
Standard deviation of the tensor c: tensor(6.0341)

Linear algebra operations

http://pytorch.org/docs/stable/linalg.html

PyTorch has a module called torch.linalg that contains a set of built-in algebra functions that are based on BLAS and LAPACK standardized libraries

#@title Compute the dot product (scalar) of two 1d dimensions
first_tensor = torch.tensor([1, 2, 3])
second_tensor = torch.tensor([4, 5, 6])

dot_product = torch.matmul(first_tensor, second_tensor)
dot_product
#tensor(32)

#@title Compute the matrix-matrix product (2D tensor) of two 2d tensors
first_2d_tensor = torch.tensor([[1, 2, 3], [-1, -2, -3]])
second_2d_tensor = torch.tensor([[-1, -2], [4, 5], [4, 5]])

result_2d_tensor = torch.matmul(first_2d_tensor, second_2d_tensor)
result_2d_tensor

# output
tensor([[ 19,  23],
        [-19, -23]])

torch.mm() unlike torch.matmul(), it doesn’t support broadcasting.

#@title Compute the a matrix product of 5 2d tensors 连乘

first_ten = torch.randn(2, 3)
second_ten = torch.randn(3, 4)
third_ten = torch.randn(4, 5)
forth_ten = torch.randn(5, 6)
fifth_ten = torch.randn(6, 7)
torch.linalg.multi_dot((first_ten, second_ten, third_ten, forth_ten, fifth_ten))

# output
tensor([[-5.7800e+00,  1.9394e+01,  2.0585e+00,  6.0236e+01,  3.0810e+01,
         -4.3826e+00,  5.4312e+01],
        [ 3.9269e+00, -1.3813e-02, -5.8432e-01,  5.1208e+00,  2.3080e+00,
         -2.6123e+00, -5.0924e+00]])

#@title Computing eigenvalues and eigenvectors 特征值和特征向量

# create a 4x4 square matrix
A = torch.rand(4, 4)

print("Matrix:", A)

eigenvalues, eigenvectors = torch.linalg.eig(A)

print("Eigen Values:", eigenvalues)
print("Eigen Vectors:", eigenvectors)

# output
Matrix: tensor([[8.9473e-02, 1.0631e-01, 2.5981e-01, 5.5447e-01],
        [1.9051e-02, 5.7340e-01, 7.4079e-01, 8.2669e-01],
        [2.7876e-01, 4.6995e-01, 2.3674e-02, 1.6234e-01],
        [2.9856e-04, 5.1959e-01, 7.5827e-01, 3.4576e-01]])
Eigen Values: tensor([ 1.5377+0.0000j,  0.0979+0.0000j, -0.3016+0.2724j, -0.3016-0.2724j])
Eigen Vectors: tensor([[ 0.3142+0.0000j,  0.8435+0.0000j,  0.0867-0.3980j,  0.0867+0.3980j],
        [ 0.7153+0.0000j, -0.4333+0.0000j,  0.2588-0.2641j,  0.2588+0.2641j],
        [ 0.3363+0.0000j,  0.3132+0.0000j, -0.6401+0.0000j, -0.6401-0.0000j],
        [ 0.5258+0.0000j, -0.0509+0.0000j,  0.3846+0.3739j,  0.3846-0.3739j]])

Automatic differentiation (Autograd)

After we find the loss function, we calculate the derivative of the loss function in terms of the parameters
We iteratively update the weight parameters accordingly so that the loss function returns the smallest possible loss
This step is called iterative optimization, as we use an optimizer to perform the update of parameters
This process is called gradient-based optimization

Automatic differentiation is a set of techniques that allow us to compute gradients for arbitrary complex loss functions efficiently. (自动求偏导）

Numerical Differentiation

Follows the definition of derivative
A derivative of y with respect to x defines the rate of change of y with respect to x

$$
\begin{equation} \frac{\partial y}{\partial x} = \frac{f(x+\Delta x)-f(x)}{\Delta x}\end{equation}
$$

Cons of Numerical Differentiation:

The computational costs, which increase as we increase the number of parameters in the loss function
The truncation errors
The round-off errors

Symbolic Differentiation

Used in calculus
Using a set of rules, meaning a set of formulas that we can apply to the loss function to get the gradients
The derivate of a function $f(x) = 3x^2-4x+5$
When we apply the symbolic rules, we get $f’(x)=6x-4$

Cons of Symbolic Differentiation:

Is limited to the already defined symbolic differentiation rules
It can’t be used for differentiating a given computational procedure
The computational costs, as it can lead to an explosion of symbolic terms

Automatic Differentiation

refer: computation graph

Every complex function can be expressed as a composition of elementary functions
For those elementary functions, we could apply symbolic differentiation, which would mean storing and manipulating symbolic forms of derivatives
By using automatic differentiation, we don’t have to go through the process of simplifying the expressions
Instead, evaluate a given set of values
Another benefit of automatic differentiation is that our function can contain if-else statements, for loops, or recursion.

#@title Define tensors
x = torch.autograd.Variable(torch.tensor([2.]), requires_grad=True)
y = torch.autograd.Variable(torch.tensor([1.]), requires_grad=True)
z = torch.autograd.Variable(torch.tensor([5.]), requires_grad=True)

Compute gradients

#@title Compute gradients
# compute a
a = x - y

# define the function f
f = z * a

# compute gradients
f.backward()

# print the gradient value
print("Gradient value for x:", x.grad)
print("Gradient value for y:", y.grad)
print("Gradient value for z:", z.grad)

# output
Gradient value for x: tensor([5.])
Gradient value for y: tensor([-5.])
Gradient value for z: tensor([1.])

Split tensors to form new tensor

The split function enables you to split tensor given the size of the part

The chunk function enables you to split a tensor into a give number of parts

Tensor.chunk(chunks=4, dim=0)
Tensor.chunk(chunks=4)
Tensor.split([5, 3], dim=0)
Tensor.split([4, 6, 6])

Developing a Deep Learning Model

Introduction to the DL training

Data preparation

The first step in developing a deep learning model
Consists of loading the data, applying transforms, and batching the data using PyTorch’s built-in capabilities
Use Python library called Torchvision; it has classes that support computer vision
tochvision.datasets module provides several subclasses to load image data from standard datasets such as our CIFAR-10 dataset

Data loading

#@title Import libraries and dataset
import torch
from torchvision.datasets import CIFAR10
from keras.datasets import cifar10
from matplotlib import pyplot

train_data = CIFAR10(root="./train/", train=True, download=True)

#@title Load dataset
(trainX, trainy), (testX, testy) = cifar10.load_data()
# summarize loaded dataset
print("Train: X=%s, y=%s" % (trainX.shape, trainy.shape))
print("Test: X=%s, y=%s" % (testX.shape, testy.shape))

#output
Train: X=(50000, 32, 32, 3), y=(50000, 1)
Test: X=(10000, 32, 32, 3), y=(10000, 1)

#@title Display images
# plot first 16 images
for i in range(16):
	# define subplot
	pyplot.subplot(4, 4, i+1)
	# plot raw pixel data
	pyplot.imshow(trainX[i])
# show the figure
pyplot.show()

#@title Examine the training dataset
print(train_data.data.shape)
print(train_data.class_to_idx)

# output
(50000, 32, 32, 3)
{'airplane': 0, 'automobile': 1, 'bird': 2, 'cat': 3, 'deer': 4, 'dog': 5, 'frog': 6, 'horse': 7, 'ship': 8, 'truck': 9}

#@title Check the class labels
for i in range(16):
	data, label = train_data[i]
	print("Picture: " + str(i+1) + ", photograph: ", train_data.classes[label])
	
# output
Picture: 1, photograph:  frog
Picture: 2, photograph:  truck
Picture: 3, photograph:  truck
Picture: 4, photograph:  deer
Picture: 5, photograph:  automobile
Picture: 6, photograph:  automobile
Picture: 7, photograph:  bird
Picture: 8, photograph:  horse
Picture: 9, photograph:  ship
Picture: 10, photograph:  cat
Picture: 11, photograph:  deer
Picture: 12, photograph:  horse
Picture: 13, photograph:  horse
Picture: 14, photograph:  bird
Picture: 15, photograph:  truck
Picture: 16, photograph:  truck

Data transforms

#@title Import and transform for training data set
from torchvision import transforms
from torchvision.datasets import CIFAR10
train_data_path = "./train/"
train_transforms = transforms.Compose([
	transforms.Resize(64),
	transforms.ToTensor(),
	transforms.Normalize(
		mean=(0.4914, 0.4822, 0.4465),
		std=(0.2023, 0.1994, 0.2010))])

training_data = CIFAR10(train_data_path,
										train=True,
										download=True,
										transform=train_transforms)

1 2	#@title Training data of first image print(training_data[0])

(tensor([[[-1.2854, -1.3629, -1.5180,  ...,  0.4981,  0.4593,  0.4399],
         [-1.4986, -1.5761, -1.7312,  ...,  0.3430,  0.3236,  0.3236],
         [-1.9057, -1.9832, -2.1383,  ...,  0.0522,  0.0522,  0.0716],
         ...,
         [-0.2509, -0.4655, -0.9142,  ..., -1.3239, -1.3629, -1.3629],
         [-0.0558, -0.1923, -0.4850,  ..., -0.8752, -0.9532, -0.9922],
         [ 0.0418, -0.0558, -0.2704,  ..., -0.6411, -0.7581, -0.8167]]]), 6)

#@title Defining transform for testing data set
test_data_path = "./test/"
test_transforms = transforms.Compose([
	transforms.ToTensor(),
	transforms.Normalize(
		mean=(0.4914, 0.4822, 0.4465),
		std=(0.2023, 0.1994, 0.2010)
	)
])

test_data = CIFAR10(test_data_path,
										train=False,
										download=True,
										transform=test_transforms)

print(test_data)

Data batching

A data loader feeds data from the dataset into the neural network
At the core of PyTorch data loading utility is the torch.utils.data.DataLoader class
It represents a Python iterable over a dataset, with support for:
- Map-style and iterable-style datasets
- Customizing data loading order
- Automatic batching
- Single and multi-process data loading
The neural network trains best with batches of data
Instead of using the complete dataset in one training pass, we use mini batches, usually 64 or 128 samples
Smaller batches require less memory than the entire dataset, resulting in more efficient and accelerated training
DataLoader has, by default, a batch_size of 1
batch_size represents a number of images that go through the network before we train and update it

from torchvision import transforms
from torchvision.datasets import CIFAR10
from torch.utils.data import DataLoader
train_data_path = "./train/"
test_data_path = "./test/"

train_transforms = transforms.Compose([
	transforms.Resize(64),
	transforms.ToTensor(),
	transforms.Normalize(
		mean=(0.4914, 0.4822, 0.4465),
		std=(0.2023, 0.1994, 0.2010))])

training_data = CIFAR10(train_data_path,
										train=True,
										download=True,
										transform=train_transforms)

test_transforms = transforms.Compose([
	transforms.ToTensor(),
	transforms.Normalize(
		mean=(0.4914, 0.4822, 0.4465),
		std=(0.2023, 0.1994, 0.2010)
	)
])

test_data = CIFAR10(test_data_path,
										train=False,
										download=True,
										transform=test_transforms)

batch_size = 16
train_data_loader = DataLoader(training_data, batch_size=batch_size, shuffle=True)
test_data_loader = DataLoader(test_data, batch_size=batch_size, shuffle=False)

Model development and training

#@title Import libraries and dataset
from torchvision import transforms
from torchvision.datasets import CIFAR10
from torch.utils.data import DataLoader
from torch.utils.data import random_split
from torchvision import models
from torch import optim
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F

#@title Define neural network, init and forward functions
class Net(nn.Module):
	def __init__(self):
		super(Net, self).__init__()
		self.conv1 = nn.Conv2d(3, 6, 5)
		self.pool = nn.MaxPool2d(2, 2)
		self.conv2 = nn.Conv2d(6, 16, 5)
		self.fc1 = nn.Linear(16*5*5, 120)
		self.fc2 = nn.Linear(120, 84)
		self.fc3 = nn.Linear(84, 10)
	
	def forward(self, x):
		x = self.pool(F.relu(self.conv1(x)))
		x = self.pool(F.relu(self.conv2(x)))
		x = x.view(-1, 16 * 5 * 5)
		x = F.relu(self.fc1(x))
		x = F.relu(self.fc2(x))
		x = self.fc3(x)
		return x

#@title Instantiate the Model
net = Net()

# define the loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

#@title Load and transform the data
transform = transforms.Compose([
						transforms.ToTensor(),
						transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

traindata = CIFAR10(root="./data", train=True, download=True, transform=transform)

train_set, val_set = random_split(train_data, [40000, 10000])

trainloader = torch.utils.data.DataLoader(train_set, batch_size=4, shuffle=True, num_workers=2)

valloader = torch.utils.data.DataLoader(val_set, batch_size=4, shuffle=True)

testset = CIFAR10(root="./data", train=False, download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4, shuffle=False, num_workers=2)

#@title Train the network
for epoch in range(10): # loop over the dataset multiple times
	running_loss = 0.0
	for i, data in enumerate(trainloader, 0):
		# get the inputs; data is a tuple of [inputs, labels]
		inputs, labels = data
		
		# zero the parameter gradients
		optimizer.zero_grad()
		
		# forward + backward + optimize
		outputs = net(inputs)
		loss = criterion(outputs, labels)
		loss.backward()
		optimizer.step()
		
		# print statistics
		running_loss += loss.item()
		if i % 2000 == 1999: #print every 2000 mini-batches
			print(f'[{epoch+1}, {i + 1: 5d}] loss: {running_loss / 2000: .3f}')
			running_loss = 0.0

print('Finish Training')

With validation

#@title Train the network
for epoch in range(10): # loop over the dataset multiple times
	net.train() # Set the model to training mode
	running_loss = 0.0
	for i, data in enumerate(trainloader, 0):
		# get the inputs; data is a tuple of [inputs, labels]
		inputs, labels = data
		
		# zero the parameter gradients
		optimizer.zero_grad()
		
		# forward + backward + optimize
		outputs = net(inputs)
		loss = criterion(outputs, labels)
		loss.backward()
		optimizer.step()
		
		# print statistics
		running_loss += loss.item()
		# if i % 2000 == 1999: #print every 2000 mini-batches
			#print(f'[{epoch+1}, {i + 1: 5d}] loss: {running_loss / 2000: .3f}')
			#running_loss = 0.0
	
	net.eval() # Set the model evaluation mode for validation
	validation_loss = 0.0
	correct = 0
	total = 0
	with torch.no_grad():
		for data in valloader:
			images, labels = data
			outputs = net(images)
			loss = criterion(outputs, labels)
			validation_loss += loss.item()
			_, predicted = torch.max(outputs.data, 1)
			total += labels.size(0)
			correct += (predicted == labels).sum().item()
	print(f'{epoch+1}, Training loss: {running_loss / len(trainloader): .3f}, Validation Loss: {validation_loss / len(valloader)}, Validation Accuracy: {100*correct / total}%')

print('Finish Training')