我试图使用批量标准化来训练我的神经网络使用TensorFlow,但我不清楚如何使用
the official layer implementation of Batch Normalization(注意这与
API的不同).
在他们的github issues上进行了一些痛苦的挖掘后,似乎需要一个tf.cond来正确使用它并且还有一个’resue = True’标志,以便BN移位和缩放变量被正确地重复使用.在弄清楚之后,我提供了一个小小的描述,说明我认为是使用它的正确方法here.
现在我已经编写了一个简短的脚本来测试它(只有一个层和一个ReLu,很难让它比这个小).但是,我不是百分百确定如何测试它.现在我的代码运行时没有错误消息,但意外返回NaN.这降低了我对我在其他帖子中提供的代码可能正确的信心.或许我所拥有的网络很奇怪.无论哪种方式,有人知道什么是错的?这是代码:
import tensorflow as tf # download and install the MNIST data automatically from tensorflow.examples.tutorials.mnist import input_data from tensorflow.contrib.layers.python.layers import batch_norm as batch_norm def batch_norm_layer(x,train_phase,scope_bn): bn_train = batch_norm(x,decay=0.999,center=True,scale=True,is_training=True,reuse=None,# is this right? trainable=True,scope=scope_bn) bn_inference = batch_norm(x,is_training=False,reuse=True,scope=scope_bn) z = tf.cond(train_phase,lambda: bn_train,lambda: bn_inference) return z def get_NN_layer(x,input_dim,output_dim,scope,train_phase): with tf.name_scope(scope+'vars'): W = tf.Variable(tf.truncated_normal(shape=[input_dim,output_dim],mean=0.0,stddev=0.1)) b = tf.Variable(tf.constant(0.1,shape=[output_dim])) with tf.name_scope(scope+'Z'): z = tf.matmul(x,W) + b with tf.name_scope(scope+'BN'): if train_phase is not None: z = batch_norm_layer(z,scope+'BN_unit') with tf.name_scope(scope+'A'): a = tf.nn.relu(z) # (M x D1) = (M x D) * (D x D1) return a mnist = input_data.read_data_sets("MNIST_data/",one_hot=True) # placeholder for data x = tf.placeholder(tf.float32,[None,784]) # placeholder that turns BN during training or off during inference train_phase = tf.placeholder(tf.bool,name='phase_train') # variables for parameters hiden_units = 25 layer1 = get_NN_layer(x,input_dim=784,output_dim=hiden_units,scope='layer1',train_phase=train_phase) # create model W_final = tf.Variable(tf.truncated_normal(shape=[hiden_units,10],stddev=0.1)) b_final = tf.Variable(tf.constant(0.1,shape=[10])) y = tf.nn.softmax(tf.matmul(layer1,W_final) + b_final) ### training y_ = tf.placeholder(tf.float32,10]) cross_entropy = tf.reduce_mean( -tf.reduce_sum(y_ * tf.log(y),reduction_indices=[1]) ) train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy) with tf.Session() as sess: sess.run(tf.initialize_all_variables()) steps = 3000 for iter_step in xrange(steps): #Feed_dict_batch = get_batch_Feed(X_train,Y_train,M,phase_train) batch_xs,batch_ys = mnist.train.next_batch(100) # Collect model statistics if iter_step%1000 == 0: batch_xstrain,batch_xstrain = batch_xs,batch_ys #simualtes train data batch_xcv,batch_ycv = mnist.test.next_batch(5000) #simualtes CV data batch_xtest,batch_ytest = mnist.test.next_batch(5000) #simualtes test data # do inference train_error = sess.run(fetches=cross_entropy,Feed_dict={x: batch_xs,y_:batch_ys,train_phase: False}) cv_error = sess.run(fetches=cross_entropy,Feed_dict={x: batch_xcv,y_:batch_ycv,train_phase: False}) test_error = sess.run(fetches=cross_entropy,Feed_dict={x: batch_xtest,y_:batch_ytest,train_phase: False}) def do_stuff_with_errors(*args): print args do_stuff_with_errors(train_error,cv_error,test_error) # Run Train Step sess.run(fetches=train_step,train_phase: True}) # list of booleans indicating correct predictions correct_prediction = tf.equal(tf.argmax(y,1),tf.argmax(y_,1)) # accuracy accuracy = tf.reduce_mean(tf.cast(correct_prediction,tf.float32)) print(sess.run(accuracy,Feed_dict={x: mnist.test.images,y_: mnist.test.labels,train_phase: False}))
当我运行它时,我得到:
Extracting MNIST_data/train-images-idx3-ubyte.gz Extracting MNIST_data/train-labels-idx1-ubyte.gz Extracting MNIST_data/t10k-images-idx3-ubyte.gz Extracting MNIST_data/t10k-labels-idx1-ubyte.gz (2.3474066,2.3498712,2.3461707) (0.49414295,0.88536006,0.91152304) (0.51632041,0.393666,nan) 0.9296
它曾经是最后一个是南,现在只有少数几个.一切都好还是我是偏执狂?
解决方法
我不确定这是否能解决您的问题,BatchNorm的文档不是很容易使用/提供信息,所以这里简单回顾一下如何使用简单的BatchNorm:
首先,您定义BatchNorm图层.如果你想在仿射/完全连接的层之后使用它,你可以这样做(只是一个例子,订单可以根据需要不同):
... inputs = tf.matmul(inputs,W) + b inputs = tf.layers.batch_normalization(inputs,training=is_training) inputs = tf.nn.relu(inputs) ...
函数tf.layers.batch_normalization调用变量初始值设定项.这些是内部变量,需要调用一个特殊的范围,它位于tf.GraphKeys.UPDATE_OPS中.因此,您必须按如下方式调用优化程序函数(在定义了所有图层之后!):
... extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS) with tf.control_dependencies(extra_update_ops): trainer = tf.train.AdamOptimizer() updateModel = trainer.minimize(loss,global_step=global_step) ...
你可以阅读更多关于它here.我知道回答你的问题有点晚了,但它可能会帮助其他人在tensorflow中遇到BatchNorm问题!