VGGNet,牛津大學電腦視覺組(Visual Geometry Group)和Google DeepMind公司一起研發,深度捲積神經網路。VGGNet反覆堆疊3x3小型捲積核和2x2最大池化層,成功構築16~19層深捲積神經網路。比state-of-the-art網路結構,錯誤率幅下降,取得I ...
VGGNet,牛津大學電腦視覺組(Visual Geometry Group)和Google DeepMind公司一起研發,深度捲積神經網路。VGGNet反覆堆疊3x3小型捲積核和2x2最大池化層,成功構築16~19層深捲積神經網路。比state-of-the-art網路結構,錯誤率幅下降,取得ILSVRC 2014比賽分類第2名和定位第1名。拓展性強,遷移其他圖片數據泛化性好。結構簡潔,整個網路都用同樣大小捲積核尺寸和最大池化尺寸。VGGNet訓練後模型參數官方開源,domain specific圖像分類任務再訓練,提供較好初始化權重。
ConvNet Configuration A A-LRN B C D E weight layers 11 11 13 16 16 19 input(224x224 RGB image) conv3-64 conv3-64 conv3-64 conv3-64 conv3-64 conv3-64 LRN conv3-64 conv3-64 conv3-64 conv3-64 maxpool conv3-128 conv3-128 conv3-128 conv3-128 conv3-128 conv3-128 conv3-128 conv3-128 conv3-128 conv3-128 maxpool conv3-256 conv3-256 conv3-256 conv3-256 conv3-256 conv3-256 conv3-256 conv3-256 conv3-256 conv3-256 conv3-256 conv3-256 conv1-256 conv3-256 conv3-256 conv3-256 maxpool conv3-512 conv3-512 conv3-512 conv3-512 conv3-512 conv3-512 conv3-512 conv3-512 conv3-512 conv3-512 conv3-512 conv3-512 conv1-512 conv3-512 conv3-512 conv3-512 maxpool conv3-512 conv3-512 conv3-512 conv3-512 conv3-512 conv3-512 conv3-512 conv3-512 conv3-512 conv3-512 conv3-512 conv3-512 conv1-512 conv3-512 conv3-512 conv3-512 maxpool FC-4096 FC-4096 FC-1000 soft-max Network A,A-LRN B C D E Number of parameters 133 133 134 138 144
捲積層參數量少,最後3個全連接層參數多。訓練耗時在捲積,計算量較大。D為VGGNet-16,E為VGGNet-19。C比B多3個1x1捲積層,線性變換,輸入、輸出通道數不變,沒降維。
VGGNet 5段捲積,每段2~3捲積層,每段後接最大池化層給縮小圖片尺寸。每段捲積核數量一樣,越後段捲積核數量越多,64-128-256-512-512。多個3x3捲積層堆疊。2個3x3捲積層串聯相當1個5x5。3個3x3捲積層串聯相當1個7x7。 參數更少,非線性變換更多,增強特征學習能力。
先訓練級別A簡單網路,再復用A網路權重初如化複雜模型,訓練收斂速度更快。預測,Multi-Scale,圖像scale尺寸Q,圖片輸入捲積網路計算。最後捲積層,滑窗分類預測,不同視窗分類結果平均,不同尺寸Q結果平均得最後結果,提高圖片數據利用率,提升預測準確率。訓練過程,用Multi-Scale數據增強,原始圖像縮放不同尺寸S,隨機裁切224x224圖片,增加數據量,防止過擬合。
LRN層作用不大,越深網路效果越好,1x1捲積很有效,但大卷積核可以學習更大空間特征。
載入系統庫、TensorFlow。
conv_op函數,創建捲積層,參數存入參數列表。輸入,input_op tensor,name 層名,kh kernel height 捲積核高,kw kernel width 捲積核寬,n_out 捲積核數量 輸出通道數,dh 步長高,dw 步長寬,p參數列表。get_shape()[-1].value獲取輸入input_op通道數。tf.name_scope(name)設置scope。tf.get_variable創建kernel(捲積核),shape [kh,kw,n_in,n_out],捲積核高寬、輸入輸出通道數。tf.contrib.layers.xavier_initializer_conv2d()參數初始化。
tf.nn.conv2d捲積處理input_op。捲積核kernel,步長dhxdw,paddings模式SAME。tf.constant 賦值biases 0,tf.Variable轉可訓練參數。tf.nn.bias_add 相加捲積結果conv和bias,tf.nn.relu非線性處理得activation。創建捲積層,參數kernel、biases添加到參數列表p,捲積層輸出activation返回。
全連接層創建函數 fc_op。先獲取輸入input_op通道數。tf.get_variable創建全連接層參數,第一維度輸入通道數n_in,第二維度輸出通道數n_out。xavier_initializer參數初始化。biases初始化0.1,避免dead neuron。tf.nn.relu_layer矩陣相乘input_op、kernel,加biases,ReLU非線性,交換得activation。全連接層參數kernel、biases添加參數列表p, activation返回。
定義最大池化層創建函數mpool_op。tf.nn.max_pool,輸入input_op,池化尺寸khxkw,步長dhxdw,padding模式SAME。
VGGNet-16網路結構,6個部分,前5段捲積網路,最後一段全連接網路。定義創建VGGNet網路結構函數inference_op。輸入input_op、keep_prob(控制dropout比率,placeholder)。先初始化參數列表p。
創建第一段捲積網路,兩個捲積層(conv_op),一個最大池化層(mpool_op)。捲積核大小3x3,捲積核數量(輸出通道數) 64,步長1x1,全像素掃描。第一卷積層輸入input_op尺寸224x224x3,輸出尺寸224x224x64。第二捲積層輸入輸出尺寸224x224x64。最大池化層2x2,輸出112x112x64。
第二段捲積網路,2個捲積層,1個最大池化層。捲積輸出通道數128。輸出尺寸56x56x128。
第三段捲積網路,3個捲積層,1個最大池化層。捲積輸出通道數256。輸出尺寸28x28x256。
第四段捲積網路,3個捲積層,1個最大池化層。捲積輸出通道數512。輸出尺寸14x14x512。
第五段捲積網路,3個捲積層,1個最大池化層。捲積輸出通道數512。輸出尺寸7x7x512。輸出結果每個樣本,tf.reshape 扁平化為長度7x7x512=25088一維向量。
連接4096隱含點全連接層,激活函數ReLU。連接Dropout層,訓練節點保留率0.5,預測1.0。
全連接層,Dropout層。
最後連接1000隱含點全連接層,Softmax 分類輸出概率。tf.argmax 輸出概率最大類別。返回fc8、softmax、predictions、參數列表p。
VGGNet-16網路結構構建完成。
評測函數time_tensorflow_run。session.run()方法,引入feed_dict,方便傳入keep_prob控制Dropout層保留比率。
評測主函數run_benchmark。評測forward(inference)、backward(trainning)運算性能。生成尺寸224x224隨機圖片,tf.random_nornal函數生成標準差0.1正態分佈隨機數。
創建keep_prob placeholder,調用inference_op函數構建VGGNet-16網路結構,獲得predictions、softmax、fc8、參數列表p。
創建Session,初始化全局參數。設keep_prob 1.0 預測。time_tensorflow_run評測forward運算時間。
計算VGGNet-16最後全連接層輸出fc8 l2 loss。tf.gradients求loss所有模型參數梯度。time_tensorflow_run評測backward運算時間。target為求解梯度操作grad,keep_prob 0.5。設batch_size 32。
執行評測主函數run_benchmark(),測試VGGNet-16 TensorFlow forward、backward耗時。forward平均每個batch耗時0.152s。backward求解梯度,平均每個batch耗時0.617s。
VGGNet,7.3%錯誤率。更深網路,更小捲積核,隱式正則化。
from datetime import datetime import math import time import tensorflow as tf def conv_op(input_op, name, kh, kw, n_out, dh, dw, p): n_in = input_op.get_shape()[-1].value with tf.name_scope(name) as scope: kernel = tf.get_variable(scope+"w", shape=[kh, kw, n_in, n_out], dtype=tf.float32, initializer=tf.contrib.layers.xavier_initializer_conv2d()) conv = tf.nn.conv2d(input_op, kernel, (1, dh, dw, 1), padding='SAME') bias_init_val = tf.constant(0.0, shape=[n_out], dtype=tf.float32) biases = tf.Variable(bias_init_val, trainable=True, name='b') z = tf.nn.bias_add(conv, biases) activation = tf.nn.relu(z, name=scope) p += [kernel, biases] return activation def fc_op(input_op, name, n_out, p): n_in = input_op.get_shape()[-1].value with tf.name_scope(name) as scope: kernel = tf.get_variable(scope+"w", shape=[n_in, n_out], dtype=tf.float32, initializer=tf.contrib.layers.xavier_initializer()) biases = tf.Variable(tf.constant(0.1, shape=[n_out], dtype=tf.float32), name='b') activation = tf.nn.relu_layer(input_op, kernel, biases, name=scope) p += [kernel, biases] return activation def mpool_op(input_op, name, kh, kw, dh, dw): return tf.nn.max_pool(input_op, ksize=[1, kh, kw, 1], strides=[1, dh, dw, 1], padding='SAME', name=name) def inference_op(input_op, keep_prob): p = [] # assume input_op shape is 224x224x3 # block 1 -- outputs 112x112x64 conv1_1 = conv_op(input_op, name="conv1_1", kh=3, kw=3, n_out=64, dh=1, dw=1, p=p) conv1_2 = conv_op(conv1_1, name="conv1_2", kh=3, kw=3, n_out=64, dh=1, dw=1, p=p) pool1 = mpool_op(conv1_2, name="pool1", kh=2, kw=2, dw=2, dh=2) # block 2 -- outputs 56x56x128 conv2_1 = conv_op(pool1, name="conv2_1", kh=3, kw=3, n_out=128, dh=1, dw=1, p=p) conv2_2 = conv_op(conv2_1, name="conv2_2", kh=3, kw=3, n_out=128, dh=1, dw=1, p=p) pool2 = mpool_op(conv2_2, name="pool2", kh=2, kw=2, dh=2, dw=2) # # block 3 -- outputs 28x28x256 conv3_1 = conv_op(pool2, name="conv3_1", kh=3, kw=3, n_out=256, dh=1, dw=1, p=p) conv3_2 = conv_op(conv3_1, name="conv3_2", kh=3, kw=3, n_out=256, dh=1, dw=1, p=p) conv3_3 = conv_op(conv3_2, name="conv3_3", kh=3, kw=3, n_out=256, dh=1, dw=1, p=p) pool3 = mpool_op(conv3_3, name="pool3", kh=2, kw=2, dh=2, dw=2) # block 4 -- outputs 14x14x512 conv4_1 = conv_op(pool3, name="conv4_1", kh=3, kw=3, n_out=512, dh=1, dw=1, p=p) conv4_2 = conv_op(conv4_1, name="conv4_2", kh=3, kw=3, n_out=512, dh=1, dw=1, p=p) conv4_3 = conv_op(conv4_2, name="conv4_3", kh=3, kw=3, n_out=512, dh=1, dw=1, p=p) pool4 = mpool_op(conv4_3, name="pool4", kh=2, kw=2, dh=2, dw=2) # block 5 -- outputs 7x7x512 conv5_1 = conv_op(pool4, name="conv5_1", kh=3, kw=3, n_out=512, dh=1, dw=1, p=p) conv5_2 = conv_op(conv5_1, name="conv5_2", kh=3, kw=3, n_out=512, dh=1, dw=1, p=p) conv5_3 = conv_op(conv5_2, name="conv5_3", kh=3, kw=3, n_out=512, dh=1, dw=1, p=p) pool5 = mpool_op(conv5_3, name="pool5", kh=2, kw=2, dw=2, dh=2) # flatten shp = pool5.get_shape() flattened_shape = shp[1].value * shp[2].value * shp[3].value resh1 = tf.reshape(pool5, [-1, flattened_shape], name="resh1") # fully connected fc6 = fc_op(resh1, name="fc6", n_out=4096, p=p) fc6_drop = tf.nn.dropout(fc6, keep_prob, name="fc6_drop") fc7 = fc_op(fc6_drop, name="fc7", n_out=4096, p=p) fc7_drop = tf.nn.dropout(fc7, keep_prob, name="fc7_drop") fc8 = fc_op(fc7_drop, name="fc8", n_out=1000, p=p) softmax = tf.nn.softmax(fc8) predictions = tf.argmax(softmax, 1) return predictions, softmax, fc8, p def time_tensorflow_run(session, target, feed, info_string): num_steps_burn_in = 10 total_duration = 0.0 total_duration_squared = 0.0 for i in range(num_batches + num_steps_burn_in): start_time = time.time() _ = session.run(target, feed_dict=feed) duration = time.time() - start_time if i >= num_steps_burn_in: if not i % 10: print ('%s: step %d, duration = %.3f' % (datetime.now(), i - num_steps_burn_in, duration)) total_duration += duration total_duration_squared += duration * duration mn = total_duration / num_batches vr = total_duration_squared / num_batches - mn * mn sd = math.sqrt(vr) print ('%s: %s across %d steps, %.3f +/- %.3f sec / batch' % (datetime.now(), info_string, num_batches, mn, sd)) def run_benchmark(): with tf.Graph().as_default(): image_size = 224 images = tf.Variable(tf.random_normal([batch_size, image_size, image_size, 3], dtype=tf.float32, stddev=1e-1)) keep_prob = tf.placeholder(tf.float32) predictions, softmax, fc8, p = inference_op(images, keep_prob) init = tf.global_variables_initializer() config = tf.ConfigProto() config.gpu_options.allocator_type = 'BFC' sess = tf.Session(config=config) sess.run(init) time_tensorflow_run(sess, predictions, {keep_prob:1.0}, "Forward") objective = tf.nn.l2_loss(fc8) grad = tf.gradients(objective, p) time_tensorflow_run(sess, grad, {keep_prob:0.5}, "Forward-backward") batch_size=32 num_batches=100 run_benchmark()
參考資料:
《TensorFlow實踐》
歡迎付費咨詢(150元每小時),我的微信:qingxingfengzi