MNIST-神经网络的经典实践
阅读原文时间:2023年07月15日阅读:3

MNIST手写体数字识别是神经网络的一个经典的入门案例,堪称深度学习界的”Hello Word任务”。

本博客基于python语言,在TensorFlow框架上对其进行了复现,并作了详细的注释,希望有参考作用。

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

mnist = input_data.read_data_sets("D:\ClassStudy\ImageProcessing\MNIST_DATA", one_hot=True)

batch_size = 100 #batch大小为100,训练样本为55000,那么总共有5500个batch
learning_rate = 0.8
learning_rate_decay = 0.999
max_steps = 30000 #最大训练步数

training_step = tf.Variable(0,trainable=False) #定义存储训练轮数的变量,一般将其设置为不可训练的,完成一个batch即完成一轮训练

def hidden_layer(input_tensor,weights1,biases1,weights2,biases2,layer_name):
'''
定义得到隐藏层和输出层的前向传播计算方式,采用relu()激活函数
'''
layer1=tf.nn.relu(tf.matmul(input_tensor,weights1)+biases1)
return tf.matmul(layer1,weights2)+biases2

x = tf.placeholder(tf.float32,[None,784],name='x-input')
y_ = tf.placeholder(tf.float32,[None,10],name='y-output')

#生成隐藏层权重参数,生成的是784*500的数组,总共392000个参数,500是经验值,实际多少都可以
weights1 = tf.Variable(tf.truncated_normal([784,500],stddev=0.1))
biases1 = tf.Variable(tf.constant(0.1,shape=[500]))

#生成输出层权重参数,生成的是500*10的数组,总共5000个参数,这里的500为了跟隐藏层的输出矩阵列数500对应,10是要求输出必须为10列,因为总共0-9就是10个分类
weights2 = tf.Variable(tf.truncated_normal([500,10],stddev=0.1))
biases2 = tf.Variable(tf.constant(0.1,shape=[10]))

#计算经过神经网络前向传播后得到的y值,这个y是一个10列的矩阵
y = hidden_layer(x,weights1,biases1,weights2,biases2,'y')

'''
为了在采用随机梯度下降算法训练神经网络时提高最终模型在测试数据上的表现,TensorFlow提供了一种在变量上使用滑动平均的方法,通常称之为滑动平均模型
'''
#通过train.ExponentialMovingAverage()函数初始化一个滑动平均类,同时需要向函数提供一个衰减率参数,这个衰减率控制模型更新的速度。
#滑动平均算法会对每一个变量的影子变量(shadow_variable)进行维护,这个影子变量的初始值就是相应变量的初始值。如果变量发生变化,影子变量也会按照一定的规则更新。
#衰减率决定了滑动平均模型的更新速度,一般设成接近于1,且越大模型越趋于稳。
averages_class = tf.train.ExponentialMovingAverage(0.99,training_step)
#通过滑动平均类的apply函数提供要进行滑动平均计算的变量
averages_op = averages_class.apply(tf.trainable_variables())
#average()函数是滑动平均类的一个函数,这个函数真正执行了影子变量的计算。在使用时,对其传入需要进行计算的变量即可。
#这里再次计算y值,使用了滑动平均,但是要牢记滑动平均值只是一个影子变量。
average_y = hidden_layer(x,averages_class.average(weights1),
averages_class.average(biases1),
averages_class.average(weights2),
averages_class.average(biases2),'average_y')

#计算交叉熵损失,用到的这个函数适用于输入的样本只能被划分为某一类的情况,特别适合于我们这个任务。
cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=y,labels=tf.argmax(y_,1))

#在得到交叉熵之后,我们可以计算权重的L2正则,并将正则损失和交叉熵损失糅合在一起计算总损失
regularizer = tf.contrib.layers.l2_regularizer(0.0001)
regularization = regularizer(weights1)+regularizer(weights2)
#总损失
loss = tf.reduce_mean(cross_entropy)+regularization

#总损失确定好了,还需要一个优化器。这里采用原理最简单的随机梯度下降优化器,学习率采用指数衰减的形式,优化器类的minimize()函数指明了最小化的目标。
learning_rate = tf.train.exponential_decay(learning_rate,training_step,mnist.train.num_examples/batch_size,learning_rate_decay)
train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss,global_step=training_step)

#在训练这个模型时,每过一遍数据既需要通过反向传播来更新神经网络中的参数,又需要更新每一个参数的滑动平均值,control_dependencies()用于完成这样的一次性多次操作
with tf.control_dependencies([train_step,averages_op]):
train_op = tf.no_op(name='train')

#检查使用了滑动平均值模型的神经网络前向传播结果是否正确
#equal()函数用于判断两个张量的每一位数组是否相等
#如果相等则返回true,否则返回false
crorent_predicition = tf.equal(tf.arg_max(average_y,1),tf.arg_max(y_,1))

#cast()函数原型为cast(x, DstT, name),在这里用于将一个布尔型的数据转换为float32类型
#之后对得到的float32类型数据求平均值,这个平均值就是模型在这一组数据上的正确率
accuracy = tf.reduce_mean(tf.cast(crorent_predicition,tf.float32))

'''
以上都完成之后,就可以创建会话并开始训练了
'''
with tf.Session() as sess:
#对参数进行初始化
tf.global_variables_initializer().run()
#准备验证数据
validate_feed = {x:mnist.validation.images,y_:mnist.validation.labels}
#准备测试数据
test_feed = {x:mnist.test.images,y_:mnist.test.labels}
#循环训练,最大训练步数(轮数),训练一个batch为一轮
for i in range(max_steps):
if i % 1000 == 0:
#计算滑动平均模型在验证数据上的结果
#为了能得到百分数输出,需要将validate_accuracy扩大100倍
validate_accuracy = sess.run(accuracy, feed_dict=validate_feed)
print('After %d training step(s), validation accuracy'
'using average model is %g%%' % (i,validate_accuracy*100))
#train.next_batch()函数通过设置函数的batch_size参数就可以从所有的训练数据中读取一小部分作为一个训练的batch
xs,ys = mnist.train.next_batch(batch_size=100)
sess.run(train_op,feed_dict={x:xs,y_:ys})
#使用测试数据集最终验证正确率,同样为了得到得到百分数输出,需要扩大100倍
test_accuracy = sess.run(accuracy,feed_dict=test_feed)
print('After %d training step(s), test accuracy using average'
'model is %g%%' % (max_steps,test_accuracy*100))

  输出结果:

After 0 training step(s), validation accuracyusing average model is 7.4%
After 1000 training step(s), validation accuracyusing average model is 97.82%
After 2000 training step(s), validation accuracyusing average model is 98.1%
After 3000 training step(s), validation accuracyusing average model is 98.36%
After 4000 training step(s), validation accuracyusing average model is 98.38%
After 5000 training step(s), validation accuracyusing average model is 98.48%
After 6000 training step(s), validation accuracyusing average model is 98.36%
After 7000 training step(s), validation accuracyusing average model is 98.5%
After 8000 training step(s), validation accuracyusing average model is 98.4%
After 9000 training step(s), validation accuracyusing average model is 98.52%
After 10000 training step(s), validation accuracyusing average model is 98.5%
After 11000 training step(s), validation accuracyusing average model is 98.6%
After 12000 training step(s), validation accuracyusing average model is 98.48%
After 13000 training step(s), validation accuracyusing average model is 98.56%
After 14000 training step(s), validation accuracyusing average model is 98.54%
After 15000 training step(s), validation accuracyusing average model is 98.6%
After 16000 training step(s), validation accuracyusing average model is 98.6%
After 17000 training step(s), validation accuracyusing average model is 98.62%
After 18000 training step(s), validation accuracyusing average model is 98.56%
After 19000 training step(s), validation accuracyusing average model is 98.66%
After 20000 training step(s), validation accuracyusing average model is 98.6%
After 21000 training step(s), validation accuracyusing average model is 98.7%
After 22000 training step(s), validation accuracyusing average model is 98.6%
After 23000 training step(s), validation accuracyusing average model is 98.54%
After 24000 training step(s), validation accuracyusing average model is 98.6%
After 25000 training step(s), validation accuracyusing average model is 98.64%
After 26000 training step(s), validation accuracyusing average model is 98.64%
After 27000 training step(s), validation accuracyusing average model is 98.6%
After 28000 training step(s), validation accuracyusing average model is 98.56%
After 29000 training step(s), validation accuracyusing average model is 98.52%
After 30000 training step(s), test accuracy using averagemodel is 98.4%