Convolution Neural Network

Digit Recognizer | Kaggle

最近基本补完了卷积的相关知识,拿 MNIST 来实践巩固一下,torch封装的确实好,定义完网络基本就是调参,对外屏蔽了太多的细节,但对于开发者来说就可以把精力花在对具体问题的分析和解决上(努力达到这个目标!)

准备以这个实验为例子,仔细记录和分析一下CNN重点知识&代码相关的内容

在参考的时候还看到有纯数学实现的CNN网络,太逆天了


同样使用 pytorch 构建 CNN,score 为 0.99035

可以通过数据增强等方法进一步提升模型精度


Notebook

基本的一些库,包括torch, numpy, pandas, matplotlib, sklearn,都是必须的库

有用到TensorDataset, DataLoader,进一步标准化训练流程,也是为了防止显存溢出(最后预测的代码段如果不分batch,会报显存 out of memory)

最后,device判断 GPU 是否可用,使用 Kaggle 平台的 GPU P100,有卷积层的情况下,CPU 和 GPU 的训练时间差很大

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
for filename in filenames:
print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All"
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import TensorDataset
from torch.utils.data import DataLoader

import torchinfo

import warnings
warnings.filterwarnings("ignore")

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
device

device(type=’cuda’, index=0)


data loading

pd.read_csv导入训练和测试数据

训练集:42000张图片,每组785个值1 + 1 * 28 * 28 = 785,一个标签&单通道,宽高28pixel的图片

测试集:28000张图片,每组784个值,单通道,宽高28pixel的图片

1
2
3
4
train_data = pd.read_csv('/kaggle/input/digit-recognizer/train.csv')
test_data = pd.read_csv('/kaggle/input/digit-recognizer/test.csv')
print('train_data.shape:', train_data.shape)
print('test_data.shape:', test_data.shape)

train_data.shape: (42000, 785)
test_data.shape: (28000, 784)


data processing

分离输入数据和标签

1
2
3
X = train_data.iloc[:, train_data.columns != 'label']
y = train_data.label.values
X
pixel0 pixel1 pixel2 pixel3 pixel4 pixel5 pixel6 pixel7 pixel8 pixel9 ... pixel774 pixel775 pixel776 pixel777 pixel778 pixel779 pixel780 pixel781 pixel782 pixel783
0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
41995 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
41996 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
41997 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
41998 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
41999 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0

42000 rows × 784 columns


输出图片样例,可以观察一下图片(此处就不贴图片了)

1
2
3
4
5
plt.figure(figsize=(3, 3))
plt.title(y[0])
img = X.values[0].reshape(28,28)
plt.axis('off')
plt.imshow(img, cmap='gray')

在训练数据上划分出验证集,以便在训练中测试准确度acc

train_test_split0.2表示比率

1
X_train, X_valid, y_train, y_valid = train_test_split(X, y, test_size=0.2)
1
2
3
4
# 0-255 -> 0-1
X_train = X_train.values / 255
X_valid = X_valid.values / 255
X_train

array([[0., 0., 0., …, 0., 0., 0.],
[0., 0., 0., …, 0., 0., 0.],
[0., 0., 0., …, 0., 0., 0.],
…,
[0., 0., 0., …, 0., 0., 0.],
[0., 0., 0., …, 0., 0., 0.],
[0., 0., 0., …, 0., 0., 0.]])

1
2
3
4
5
6
7
8
9
# numpy -> tensor
X_train = torch.FloatTensor(X_train)
y_train = torch.LongTensor(y_train)

X_valid = torch.FloatTensor(X_valid)
y_valid = torch.LongTensor(y_valid)

print('X_train.dtype:', X_train.dtype)
print('y_train.dtype:', y_train.dtype)

X_train.dtype: torch.float32
y_train.dtype: torch.int64


构建dataset 和 dataloader,更好的配合batch

shuffle表示是否打乱数据

1
2
3
4
5
train_ds = TensorDataset(X_train, y_train)
valid_ds = TensorDataset(X_valid, y_valid)

train_loader = DataLoader(train_ds, batch_size=100, shuffle=True)
valid_loader = DataLoader(valid_ds, batch_size=100, shuffle=False)

Modeling

图像并不如泰坦尼克幸存者预测有一张特征表,图像只提供了像素

对于图像分类来说

卷积层的作用就是在保持数据空间结构的基础上进行自动的特征提取,逐渐抽象出概念

全连接层,可以理解为利用卷积层提取出的特征进行分类(相当于逻辑回归)

由卷积层和线性层的特性,中间需要一个Flatten操作,由多维展平成线性层可接收的一维,但这个参数是需要手动填入的,可以直接推导,也可以在定义模型时利用torchinfo.summary()获取


通道

以28*28的RGB图片为例,每张图的shape 就是(3, 28, 28)

每个通道都有单独的卷积核(如RGB就要配合3个卷积核,一组),通过一次默认的5*5卷积核后RGB三个通道会加到一起 shape(1, 24, 24)

而这样的卷积核有好几组,所以会发现输出通道变多,如若有64组卷积核,则(3, 28, 28) -> (64, 24, 24),对卷积核或输出的中间图片进行分析,可以认为每组卷积核都在尝试提取图片不同的特征


maxpool:池化,取最大值,也是一个非线性操作(比取均值更好),主要用以快速降维

dropout:以概率p丢弃计算图中的一些结点,防止由于学习到数据中的噪声而导致的过拟合(一个很好的解决过拟合的方案)


定义如下(5的卷积核比3的有更高的概括性):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
class Digit_Recognizer_Model(nn.Module):
"""
Architecture summary:
- Layer 1: Convolution 1 > Activation (ReLU) > Pooling 1 > Dropout 1
- Layer 2: Convolution 2 > Activation (ReLU) > Pooling 2 > Dropout 2 > Flatten
- Layer 3: Linear 1 > Activation (ReLU) > Dropout 3
- Layer 4: Linear 2 > Activation (ReLU) > Dropout 4
- Layer 5: Output > Activation (Softmax)
"""
def __init__(self):
super().__init__()

# layer 1
self.conv1 = nn.Conv2d(in_channels=1, out_channels=64, kernel_size=5, stride=1, padding=0)
self.pool1 = nn.MaxPool2d(kernel_size=(2, 2))
self.drop1 = nn.Dropout(p=0.3)

#layer 2
self.conv2 = nn.Conv2d(in_channels=64, out_channels=128, kernel_size=5, stride=1, padding=0)
self.pool2 = nn.MaxPool2d(kernel_size=(2, 2))
self.drop2 = nn.Dropout(p=0.4)

#layer 3
self.fc1 = nn.Linear(in_features=128 * 4 * 4, out_features=64)
self.drop3 = nn.Dropout(p=0.4)

#layer 4
self.fc2 = nn.Linear(in_features=64, out_features=32)
self.drop4 = nn.Dropout(p=0.4)

#layer 5
self.fc3 = nn.Linear(in_features=32, out_features=10)

def forward(self, x):
x = self.drop1(self.pool1(F.relu(self.conv1(x)))) # layer 1
x = self.drop2(self.pool2(F.relu(self.conv2(x)))) # layer 2
x = x.view(-1, 128 * 4 * 4) # flatten
x = self.drop3(F.relu(self.fc1(x)))
x = self.drop4(F.relu(self.fc2(x)))
x = self.fc3(x)
return x
1
2
3
model = Digit_Recognizer_Model().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
1
torchinfo.summary(model, (1,28,28), col_names=('input_size', 'output_size', 'num_params', 'kernel_size'))
============================================================================================================================================
Layer (type:depth-idx)                   Input Shape               Output Shape              Param #                   Kernel Shape
============================================================================================================================================
Digit_Recognizer_Model                   [1, 28, 28]               [1, 10]                   --                        --
├─Conv2d: 1-1                            [1, 28, 28]               [64, 24, 24]              1,664                     [5, 5]
├─MaxPool2d: 1-2                         [64, 24, 24]              [64, 12, 12]              --                        [2, 2]
├─Dropout: 1-3                           [64, 12, 12]              [64, 12, 12]              --                        --
├─Conv2d: 1-4                            [64, 12, 12]              [128, 8, 8]               204,928                   [5, 5]
├─MaxPool2d: 1-5                         [128, 8, 8]               [128, 4, 4]               --                        [2, 2]
├─Dropout: 1-6                           [128, 4, 4]               [128, 4, 4]               --                        --
├─Linear: 1-7                            [1, 2048]                 [1, 64]                   131,136                   --
├─Dropout: 1-8                           [1, 64]                   [1, 64]                   --                        --
├─Linear: 1-9                            [1, 64]                   [1, 32]                   2,080                     --
├─Dropout: 1-10                          [1, 32]                   [1, 32]                   --                        --
├─Linear: 1-11                           [1, 32]                   [1, 10]                   330                       --
============================================================================================================================================
Total params: 340,138
Trainable params: 340,138
Non-trainable params: 0
Total mult-adds (M): 212.54
============================================================================================================================================
Input size (MB): 0.00
Forward/backward pass size (MB): 0.36
Params size (MB): 1.36
Estimated Total Size (MB): 1.73
============================================================================================================================================

以下给出shape变化的推导:

1
2
3
4
(1,28,28)  --- 5*5 conv-kernal ---> (64,24,24) # 24 = 28-5+1
(64,24,24) --- 2*2 maxpool ---> (64,12,12)
(64,12,12) --- 5*5 conv-kernal ---> (128,8,8) # 8 = 12-5+1
(128,8,8) --- 2*2 maxpool ---> (128,4,4)

Train

每个epoch都利用验证集测试准确率,感觉最好分别封装成函数

特别注意model.train(), model.eval()在eval状态下模型会固定dropout的参数采用训练值

说一下argmax(1)输出每行中最大值的索引,模型最后的全连接层输出包含10个概率值的向量,通过softmax转换成总和为1的一组数据,每行的最大值即是最可能的数字,argmax()可以视作softmax的函数版本

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
# --------- train ----------
epochs = 15
size = len(train_loader.dataset)

for epoch in range(epochs):
print(f'epoch:{epoch+1}\n---------------------------')

# ------ train -------
for batch, (X, y) in enumerate(train_loader):
model.train()
X = X.view(-1, 1, 28, 28).to(device)
y = y.to(device)
# forward
output = model(X)
train_loss = criterion(output, y)

# backward
optimizer.zero_grad()
train_loss.backward()
optimizer.step()

if batch % 100 == 0:
print(f'loss: {train_loss.item():>6f} [{(batch+1)*len(X):>5d}/{size}]')

# ------ test -------
model.eval()
valid_loss, acc = 0, 0
with torch.no_grad():
for X, y in valid_loader:
X = X.view(-1, 1, 28, 28).to(device)
y = y.to(device)

pred = model(X)
valid_loss += criterion(pred, y).item()
acc += (pred.argmax(1) == y).type(torch.float).sum().item()

valid_loss /= len(valid_loader.dataset)/100
acc /= len(valid_loader.dataset)
print(f'acc: {acc*100:>0.2f}%, average_loss: {valid_loss:>8f}')
epoch:1
---------------------------
loss: 2.310378 [  100/33600]
loss: 0.830918 [10100/33600]
loss: 0.440694 [20100/33600]
loss: 0.341579 [30100/33600]
acc: 96.49%, average_loss: 0.135642
epoch:2
---------------------------
loss: 0.354336 [  100/33600]
loss: 0.339530 [10100/33600]
loss: 0.192688 [20100/33600]
loss: 0.228617 [30100/33600]
acc: 97.63%, average_loss: 0.083003
epoch:3
---------------------------
loss: 0.310060 [  100/33600]
loss: 0.095043 [10100/33600]
loss: 0.387724 [20100/33600]
loss: 0.102543 [30100/33600]
acc: 97.93%, average_loss: 0.068741
epoch:4
---------------------------
loss: 0.141973 [  100/33600]
loss: 0.352170 [10100/33600]
loss: 0.187784 [20100/33600]
loss: 0.144804 [30100/33600]
acc: 98.38%, average_loss: 0.056450
epoch:5
---------------------------
loss: 0.122953 [  100/33600]
loss: 0.209291 [10100/33600]
loss: 0.116453 [20100/33600]
loss: 0.143049 [30100/33600]
acc: 98.62%, average_loss: 0.048829
epoch:6
---------------------------
loss: 0.142570 [  100/33600]
loss: 0.154392 [10100/33600]
loss: 0.080456 [20100/33600]
loss: 0.071681 [30100/33600]
acc: 98.48%, average_loss: 0.054392
epoch:7
---------------------------
loss: 0.049880 [  100/33600]
loss: 0.145781 [10100/33600]
loss: 0.151001 [20100/33600]
loss: 0.096750 [30100/33600]
acc: 98.67%, average_loss: 0.043555
epoch:8
---------------------------
loss: 0.047690 [  100/33600]
loss: 0.049740 [10100/33600]
loss: 0.089974 [20100/33600]
loss: 0.074760 [30100/33600]
acc: 98.92%, average_loss: 0.039260
epoch:9
---------------------------
loss: 0.051311 [  100/33600]
loss: 0.116427 [10100/33600]
loss: 0.159870 [20100/33600]
loss: 0.162403 [30100/33600]
acc: 98.89%, average_loss: 0.042807
epoch:10
---------------------------
loss: 0.085799 [  100/33600]
loss: 0.131933 [10100/33600]
loss: 0.095897 [20100/33600]
loss: 0.053006 [30100/33600]
acc: 98.93%, average_loss: 0.042807
epoch:11
---------------------------
loss: 0.034143 [  100/33600]
loss: 0.031655 [10100/33600]
loss: 0.090933 [20100/33600]
loss: 0.075254 [30100/33600]
acc: 98.86%, average_loss: 0.042070
epoch:12
---------------------------
loss: 0.071062 [  100/33600]
loss: 0.058759 [10100/33600]
loss: 0.090834 [20100/33600]
loss: 0.025837 [30100/33600]
acc: 99.01%, average_loss: 0.041503
epoch:13
---------------------------
loss: 0.069484 [  100/33600]
loss: 0.068897 [10100/33600]
loss: 0.033803 [20100/33600]
loss: 0.023075 [30100/33600]
acc: 99.14%, average_loss: 0.039555
epoch:14
---------------------------
loss: 0.024676 [  100/33600]
loss: 0.042386 [10100/33600]
loss: 0.051634 [20100/33600]
loss: 0.084135 [30100/33600]
acc: 98.98%, average_loss: 0.040245
epoch:15
---------------------------
loss: 0.099463 [  100/33600]
loss: 0.039321 [10100/33600]
loss: 0.058406 [20100/33600]
loss: 0.113745 [30100/33600]
acc: 99.30%, average_loss: 0.034384

用GPU炼真的很快,CPU只能拿来应付线性层


Submission

用pandas构建一下提交数据

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
submission_rows = [['ImageId', 'Label']]
X_test = test_data.values / 255
X_test = torch.FloatTensor(X_test)
y_test = np.zeros(X_test.shape)
y_test = torch.LongTensor(y_test)
test_ds = TensorDataset(X_test, y_test)
test_loader = DataLoader(test_ds, batch_size=100, shuffle=False)

with torch.no_grad():
image_id = 1
for X, _ in test_loader:
model.eval()
X = X.view(-1, 1, 28, 28).to(device)
pred = model(X).argmax(1)
for i in pred:
submission_rows.append([image_id, i.item()])
image_id += 1

submission = pd.DataFrame(submission_rows)
submission.columns = submission.iloc[0]
submission = submission.drop(0, axis=0)
submission.to_csv('submission.csv', index=False)
submission
ImageId Label
1 1 2
2 2 0
3 3 9
4 4 9
5 5 3
... ... ...
27996 27996 9
27997 27997 7
27998 27998 3
27999 27999 9
28000 28000 2

28000 rows × 2 columns