MNIST-about convolution
Convolution Neural Network
最近基本补完了卷积的相关知识,拿 MNIST 来实践巩固一下,torch封装的确实好,定义完网络基本就是调参,对外屏蔽了太多的细节,但对于开发者来说就可以把精力花在对具体问题的分析和解决上(努力达到这个目标!)
准备以这个实验为例子,仔细记录和分析一下CNN重点知识&代码相关的内容
在参考的时候还看到有纯数学实现的CNN网络,太逆天了
同样使用 pytorch 构建 CNN,score 为 0.99035
可以通过数据增强等方法进一步提升模型精度
Notebook
基本的一些库,包括torch, numpy, pandas, matplotlib, sklearn
,都是必须的库
有用到TensorDataset, DataLoader
,进一步标准化训练流程,也是为了防止显存溢出(最后预测的代码段如果不分batch,会报显存 out of memory)
最后,device
判断 GPU 是否可用,使用 Kaggle 平台的 GPU P100,有卷积层的情况下,CPU 和 GPU 的训练时间差很大
1 | # This Python 3 environment comes with many helpful analytics libraries installed |
1 | import matplotlib.pyplot as plt |
device(type=’cuda’, index=0)
data loading
pd.read_csv
导入训练和测试数据
训练集:42000张图片,每组785个值1 + 1 * 28 * 28 = 785
,一个标签&单通道,宽高28pixel的图片
测试集:28000张图片,每组784个值,单通道,宽高28pixel的图片
1 | train_data = pd.read_csv('/kaggle/input/digit-recognizer/train.csv') |
train_data.shape: (42000, 785)
test_data.shape: (28000, 784)
data processing
分离输入数据和标签
1 | X = train_data.iloc[:, train_data.columns != 'label'] |
pixel0 | pixel1 | pixel2 | pixel3 | pixel4 | pixel5 | pixel6 | pixel7 | pixel8 | pixel9 | ... | pixel774 | pixel775 | pixel776 | pixel777 | pixel778 | pixel779 | pixel780 | pixel781 | pixel782 | pixel783 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
41995 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
41996 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
41997 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
41998 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
41999 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
42000 rows × 784 columns
输出图片样例,可以观察一下图片(此处就不贴图片了)
1 | plt.figure(figsize=(3, 3)) |
在训练数据上划分出验证集,以便在训练中测试准确度acc
train_test_split
,0.2
表示比率
1 | X_train, X_valid, y_train, y_valid = train_test_split(X, y, test_size=0.2) |
1 | # 0-255 -> 0-1 |
array([[0., 0., 0., …, 0., 0., 0.],
[0., 0., 0., …, 0., 0., 0.],
[0., 0., 0., …, 0., 0., 0.],
…,
[0., 0., 0., …, 0., 0., 0.],
[0., 0., 0., …, 0., 0., 0.],
[0., 0., 0., …, 0., 0., 0.]])
1 | # numpy -> tensor |
X_train.dtype: torch.float32
y_train.dtype: torch.int64
构建dataset 和 dataloader,更好的配合batch
shuffle表示是否打乱数据
1 | train_ds = TensorDataset(X_train, y_train) |
Modeling
图像并不如泰坦尼克幸存者预测有一张特征表,图像只提供了像素
对于图像分类来说
卷积层的作用就是在保持数据空间结构的基础上进行自动的特征提取,逐渐抽象出概念
全连接层,可以理解为利用卷积层提取出的特征进行分类(相当于逻辑回归)
由卷积层和线性层的特性,中间需要一个Flatten操作,由多维展平成线性层可接收的一维,但这个参数是需要手动填入的,可以直接推导,也可以在定义模型时利用torchinfo.summary()
获取
通道
以28*28的RGB图片为例,每张图的shape 就是(3, 28, 28)
每个通道都有单独的卷积核(如RGB就要配合3个卷积核,一组),通过一次默认的5*5卷积核后RGB三个通道会加到一起 shape(1, 24, 24)
而这样的卷积核有好几组,所以会发现输出通道变多,如若有64组卷积核,则(3, 28, 28) -> (64, 24, 24)
,对卷积核或输出的中间图片进行分析,可以认为每组卷积核都在尝试提取图片不同的特征
maxpool
:池化,取最大值,也是一个非线性操作(比取均值更好),主要用以快速降维
dropout
:以概率p丢弃计算图中的一些结点,防止由于学习到数据中的噪声而导致的过拟合(一个很好的解决过拟合的方案)
定义如下(5的卷积核比3的有更高的概括性):
1 | class Digit_Recognizer_Model(nn.Module): |
1 | model = Digit_Recognizer_Model().to(device) |
1 | torchinfo.summary(model, (1,28,28), col_names=('input_size', 'output_size', 'num_params', 'kernel_size')) |
============================================================================================================================================
Layer (type:depth-idx) Input Shape Output Shape Param # Kernel Shape
============================================================================================================================================
Digit_Recognizer_Model [1, 28, 28] [1, 10] -- --
├─Conv2d: 1-1 [1, 28, 28] [64, 24, 24] 1,664 [5, 5]
├─MaxPool2d: 1-2 [64, 24, 24] [64, 12, 12] -- [2, 2]
├─Dropout: 1-3 [64, 12, 12] [64, 12, 12] -- --
├─Conv2d: 1-4 [64, 12, 12] [128, 8, 8] 204,928 [5, 5]
├─MaxPool2d: 1-5 [128, 8, 8] [128, 4, 4] -- [2, 2]
├─Dropout: 1-6 [128, 4, 4] [128, 4, 4] -- --
├─Linear: 1-7 [1, 2048] [1, 64] 131,136 --
├─Dropout: 1-8 [1, 64] [1, 64] -- --
├─Linear: 1-9 [1, 64] [1, 32] 2,080 --
├─Dropout: 1-10 [1, 32] [1, 32] -- --
├─Linear: 1-11 [1, 32] [1, 10] 330 --
============================================================================================================================================
Total params: 340,138
Trainable params: 340,138
Non-trainable params: 0
Total mult-adds (M): 212.54
============================================================================================================================================
Input size (MB): 0.00
Forward/backward pass size (MB): 0.36
Params size (MB): 1.36
Estimated Total Size (MB): 1.73
============================================================================================================================================
以下给出shape变化的推导:
1 | (1,28,28) --- 5*5 conv-kernal ---> (64,24,24) # 24 = 28-5+1 |
Train
每个epoch都利用验证集测试准确率,感觉最好分别封装成函数
特别注意model.train(), model.eval()
,在eval状态下模型会固定dropout的参数采用训练值
说一下argmax(1)
输出每行中最大值的索引,模型最后的全连接层输出包含10个概率值的向量,通过softmax转换成总和为1的一组数据,每行的最大值即是最可能的数字,argmax()
可以视作softmax的函数版本
1 | # --------- train ---------- |
epoch:1
---------------------------
loss: 2.310378 [ 100/33600]
loss: 0.830918 [10100/33600]
loss: 0.440694 [20100/33600]
loss: 0.341579 [30100/33600]
acc: 96.49%, average_loss: 0.135642
epoch:2
---------------------------
loss: 0.354336 [ 100/33600]
loss: 0.339530 [10100/33600]
loss: 0.192688 [20100/33600]
loss: 0.228617 [30100/33600]
acc: 97.63%, average_loss: 0.083003
epoch:3
---------------------------
loss: 0.310060 [ 100/33600]
loss: 0.095043 [10100/33600]
loss: 0.387724 [20100/33600]
loss: 0.102543 [30100/33600]
acc: 97.93%, average_loss: 0.068741
epoch:4
---------------------------
loss: 0.141973 [ 100/33600]
loss: 0.352170 [10100/33600]
loss: 0.187784 [20100/33600]
loss: 0.144804 [30100/33600]
acc: 98.38%, average_loss: 0.056450
epoch:5
---------------------------
loss: 0.122953 [ 100/33600]
loss: 0.209291 [10100/33600]
loss: 0.116453 [20100/33600]
loss: 0.143049 [30100/33600]
acc: 98.62%, average_loss: 0.048829
epoch:6
---------------------------
loss: 0.142570 [ 100/33600]
loss: 0.154392 [10100/33600]
loss: 0.080456 [20100/33600]
loss: 0.071681 [30100/33600]
acc: 98.48%, average_loss: 0.054392
epoch:7
---------------------------
loss: 0.049880 [ 100/33600]
loss: 0.145781 [10100/33600]
loss: 0.151001 [20100/33600]
loss: 0.096750 [30100/33600]
acc: 98.67%, average_loss: 0.043555
epoch:8
---------------------------
loss: 0.047690 [ 100/33600]
loss: 0.049740 [10100/33600]
loss: 0.089974 [20100/33600]
loss: 0.074760 [30100/33600]
acc: 98.92%, average_loss: 0.039260
epoch:9
---------------------------
loss: 0.051311 [ 100/33600]
loss: 0.116427 [10100/33600]
loss: 0.159870 [20100/33600]
loss: 0.162403 [30100/33600]
acc: 98.89%, average_loss: 0.042807
epoch:10
---------------------------
loss: 0.085799 [ 100/33600]
loss: 0.131933 [10100/33600]
loss: 0.095897 [20100/33600]
loss: 0.053006 [30100/33600]
acc: 98.93%, average_loss: 0.042807
epoch:11
---------------------------
loss: 0.034143 [ 100/33600]
loss: 0.031655 [10100/33600]
loss: 0.090933 [20100/33600]
loss: 0.075254 [30100/33600]
acc: 98.86%, average_loss: 0.042070
epoch:12
---------------------------
loss: 0.071062 [ 100/33600]
loss: 0.058759 [10100/33600]
loss: 0.090834 [20100/33600]
loss: 0.025837 [30100/33600]
acc: 99.01%, average_loss: 0.041503
epoch:13
---------------------------
loss: 0.069484 [ 100/33600]
loss: 0.068897 [10100/33600]
loss: 0.033803 [20100/33600]
loss: 0.023075 [30100/33600]
acc: 99.14%, average_loss: 0.039555
epoch:14
---------------------------
loss: 0.024676 [ 100/33600]
loss: 0.042386 [10100/33600]
loss: 0.051634 [20100/33600]
loss: 0.084135 [30100/33600]
acc: 98.98%, average_loss: 0.040245
epoch:15
---------------------------
loss: 0.099463 [ 100/33600]
loss: 0.039321 [10100/33600]
loss: 0.058406 [20100/33600]
loss: 0.113745 [30100/33600]
acc: 99.30%, average_loss: 0.034384
用GPU炼真的很快,CPU只能拿来应付线性层
Submission
用pandas构建一下提交数据
1 | submission_rows = [['ImageId', 'Label']] |
ImageId | Label | |
---|---|---|
1 | 1 | 2 |
2 | 2 | 0 |
3 | 3 | 9 |
4 | 4 | 9 |
5 | 5 | 3 |
... | ... | ... |
27996 | 27996 | 9 |
27997 | 27997 | 7 |
27998 | 27998 | 3 |
27999 | 27999 | 9 |
28000 | 28000 | 2 |
28000 rows × 2 columns