1. 捲積神經網路 1.1 多層前饋神經網路多層前饋神經網路是指在多層的神經網路中，每層神經元與下一層神經元完全互連，神經元之間不存在同層連接，也不存在跨層連接的情況，如圖 11所示。圖 11 對於上圖中隱藏層的第j個神經元的輸出可以表示為：其中，f是激活函數，bj為每個神經元的偏置。 1.2 ...

1. 捲積神經網路

1.1 多層前饋神經網路

　　多層前饋神經網路是指在多層的神經網路中，每層神經元與下一層神經元完全互連，神經元之間不存在同層連接，也不存在跨層連接的情況，如圖 11所示。

圖 11

對於上圖中隱藏層的第j個神經元的輸出可以表示為：

其中，f是激活函數，b_j為每個神經元的偏置。

1.2 捲積神經網路

1.2.1 網路結構

　　捲積神經網路與多層前饋神經網路的結構不一樣，其每層神經元與下一層神經元不是全互連，而是部分連接，即每層神經層中只有部分的神經元與下一層神經元有連接，但是神經元之間不存在同層連接，也不存在跨層連接的情況，這兩點與多層神經網路結構類似。如圖 12所示。

圖 12

　　圖 12中的輸入層有4個神經元，但隱藏層的每個神經元只有3個輸入，而圖 11中的多層前饋神經網路結構中，隱藏層的每個神經元有4個輸入層神經元的輸入。

其中將輸入層中的局部神經元稱為局部感受野，如圖 12所示中，(x1,x2,x3)，(x2,x3,x4)，(x3,x4)都為局部感受野。

1.2.2 捲積計算

　　捲積神經網路還有一點與前饋神經網路不同的，就是對於隱藏層中每個神經元共用一套輸入權重，同時共用同一個偏置。所以對於圖 12中隱藏層的第j個神經元的輸出可以表示為：

i的區間是[0,1]，f是激活函數，b為每個神經元的共用偏置。

　　其中將輸入層到隱藏層中所共用的那一套權重和所共用那一個偏置，稱為共用權重和共用偏置。

1.2.3 池化計算

　　從隱藏層到輸出層也不是全連接結構，如圖 12所示，也是隱藏層部分神經元連接到輸出層神經元。同時隱藏層神經元到輸出層神經元的計算方式有多種，如常用的最大值池化(max-pooling)法，輸出層每個神經元選擇從隱藏層連接到其神經元中最大的那個，如在圖 12中y1,y2,y3的值分別為1,2,3。那麼o1為2，o2為3.當然捲積神經網路的池化方法還有很多種，如L2法等。

1.2.4 特征映射

　　圖 12中輸入層只通過一套權重和一個偏置將輸入層神經元映射到一個隱藏層，其實捲積神經網路可以通過多套權重和多個偏置將輸入層映射為多個隱藏層。這些隱藏層是平行的。多少個特征映射完全取決於用戶的計算需要。如圖 13所示，第一次捲積運算時，一個輸入層被映射為4個隱藏層（捲積層）；第二次捲積運算時，每個輸入層（池化層）被映射為3個隱藏層。所以經過第二次捲積後，總共有12個捲積層。

圖 13

1.2.5 圖像應用

　　上述介紹的捲積神經網路結構都是一維形式，即輸入層、隱藏層和輸出層都是一個向量形式。但是圖像是一個二維結構，即一個矩陣形式。所以將捲積神經網路應用到圖像識別上，需要轉變一下思維，即將數據從一維轉變到二維。

如圖 14所示將一張28*28的圖像（輸入層）進行捲積運算，其中局部感受野為5*5。對於隱藏層的第一個像素點可以由輸入層的前5*5矩形所有像素點進行計算而得，即

其中，i=0,j=0,若將式(3)轉換為一維的，則可表示為：

圖 14

以此類推能計算出隱藏層的第二個像素點，如圖 15所示,即通過公式可以表示為

其中,i=0,j=0，而且式(3)和式(5)中的權重w_k,j和偏置b是相同的。

圖 15

接著對隱藏層中的2*2矩形採用最大法進行池化，就能形成一個輸出層，如圖 16所示。

圖 16

那麼通過3組特征映射就能將一個輸入層映射為3個隱藏層了，然後每個隱藏層能池化為一個輸出層，如圖 17所示的結構。

圖 17

2. TensorFlow實現

2.1 API介紹

　　正如上述所介紹的，捲積神經網路有兩個主要計算步驟：捲積和池化。TensorFlow為方便用戶進行計算，提供了眾多API來進行計算。

2.1.1 維度轉換

　　由於在TensorFlow中常會出現輸入數據維度與API介面行參維度不一致，如輸入數據集為一個[784]結構的向量(數組)，而API需要一個[28,28]結構的矩陣，那麼就需要將一維(1-d)的向量轉換為二維(2-d)的矩陣，當然保持數據不變。那麼此時就可以使用TensorFlow提供的reshape函數。

def reshape(tensor, shape, name=None):

其中reshape函數的主要參數語義為：

tensor：是需要被轉換的tensor對象，可以是1-d、2-d或n-d結構；
shape：指定了需要將tensor轉換為什麼結構的tensor，如上述可以傳入為：[28,28].

# tensor 't' is [1, 2, 3, 4, 5, 6, 7, 8, 9]

# tensor 't' has shape [9]

reshape(t, [3, 3]) ==> [[1, 2, 3],

[4, 5, 6],

[7, 8, 9]]

# tensor 't' is [[[1, 1], [2, 2]],

# [[3, 3], [4, 4]]]

# tensor 't' has shape [2, 2, 2]

reshape(t, [2, 4]) ==> [[1, 1, 2, 2],

[3, 3, 4, 4]]

# tensor 't' is [[[1, 1, 1],

# [2, 2, 2]],

# [[3, 3, 3],

# [4, 4, 4]],

# [[5, 5, 5],

# [6, 6, 6]]]

# tensor 't' has shape [3, 2, 3]

# pass '[-1]' to flatten 't'

reshape(t, [-1]) ==> [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6]

# -1 can also be used to infer the shape

# -1 is inferred to be 9:

reshape(t, [2, -1]) ==> [[1, 1, 1, 2, 2, 2, 3, 3, 3],

[4, 4, 4, 5, 5, 5, 6, 6, 6]]

# -1 is inferred to be 2:

reshape(t, [-1, 9]) ==> [[1, 1, 1, 2, 2, 2, 3, 3, 3],

[4, 4, 4, 5, 5, 5, 6, 6, 6]]

# -1 is inferred to be 3:

reshape(t, [ 2, -1, 3]) ==> [[[1, 1, 1],

[2, 2, 2],

[3, 3, 3]],

[[4, 4, 4],

[5, 5, 5],

[6, 6, 6]]]

即若shape中某一維度指定的是-1，那麼reshape會自動將數據填充到所指定的那一維中。

2.1.2 捲積操作

　　TensorFlow提供conv2d函數來實現神經網路的捲積運算，如下所示的定義：

def tf.nn.conv2d(input, filter, strides, padding, use_cudnn_on_gpu=None,

data_format=None, name=None)

主要參數語義為：

　　1) Input：為待計算的輸入層，其是一個[batch, in_height, in_width, in_channels]結構的tensor：

- batch：為待計算的批次數量，若是圖像，則為圖像的數量；
- in_height：為每張圖像高度；
- in_width：為每張圖像寬度；
- in_channels：為特征映射組的數量。

　　2) filter：其指定了捲積後的隱藏層信息，也可以理解為進行計算的權重向量，其是一個[filter_height, filter_width, in_channels, out_channels]結構的tensor：

- filter_height：為局部感受野（或稱核）的高度；
- filter_width：為局部感受野（或稱核）的寬度；
- in_channels：為輸入層的特征映射組的數量；
- out_channels：為輸出層的特征映射組的數量；

　　3) strides：為局部感受野每次移動的跨距，一般為1個像素點；　　4) padding：其指示了是否補充最後局部感受野。由於strides屬性指定局部感受野仔橫向或縱向的移動步長，有時移動到最後可能滿足不了局部感受野的大小，那麼padding就是設置是否忽略最後一個感受野。如一張[28,28]的圖像仔一個大小為[5,5]局部感受野上進行捲積，移動步長為[1,1]，那麼在抵達右邊（或者底部）的輸入圖像之前，只能把局部感受野橫向移動23個單元（或者縱向向下移動23單元）。此時若指定padding為"SAME"，則可以繼續橫向（縱向）移動，局部感受野缺少的部分，則補充數據。

如圖 14的池化操作，可以按如下使用：

x_image = tf.reshape(x, [-1, 28, 28, 1])

initial_w = tf.truncated_normal([5, 5, 1, 3], stddev=0.1)

w=tf.Variable(initial_w)

initial_d = tf.constant(0.1, shape= [3])

d=tf.Variable(initial_d)

y=tf.nn.relu (tf.nn.conv2d(x_image, w )+d)

其中：

x是一個2維（2-d）的圖像輸入，即是一個[none,784]的多張圖像。為了適應conv2d圖像的輸入，所以將其轉換為4維（4-d）的結構。由於指定的是share是[-1, 28, 28, 1]，所以第一維是none數量值，而784會自動轉換為28*28，最後一維都只有一個元素.
由於局部感受野是5*5，同時輸入層中每張圖像只有一組特征映射，同時生成的隱藏層中有3組特征映射，所以定義了結構是[5, 5, 1, 3]。
由於每個特征映射只有一個偏置，同時隱藏層中希望生成3組特征映射，所以偏置d為[3]結構。
relu是TensorFlow提供的線性激活函數，其是一個[none,28,28,3]結構的tensor，因為不知道被計算圖像的數量，所以是none；同時padding預設為'SAME'，保持維度不變，所以仍未28*28.

2.1.3 池化操作

　　TensorFlow提供多個函數來實現神經網路的池化運算，由於池化函數定義的參數語義類似，所以這裡只介紹其中的max_pool函數，如下是其定義：

def tf.nn. max_pool(value, ksize, strides, padding, data_format="NHWC", name=None):

主要參數語義為：

　　1) value：為待進行池化的圖層，是一個[batch, height, width, channels]結構的tensor，

- batch：為待池化的數量，即圖像的數量；
- height：每張圖像的高度；
- width：每張圖像的寬度；
- channels：為待池化圖層中的特征映射組的數量。

　　2) Ksize：指定一個待池化層中矩形區域的大小，其是一個[1,pool_height, pool_width,1]數組，第1個和第4個元素都為1，

- pool_height：為待池化層中矩形區域的高度；
- pool_width：為待池化層中矩形區域的寬度；

　　3) strides：為池化矩形的移動步長，分別指定橫向和縱向移動的像素點，其是一個[1, strides_height, strides_width,1]數組，第1個和第4個元素都為1，

- strides_height：為向下移動的步長；
- strides_width：為向右移動的步長；

　　4) padding：其指示了待池化層與池化層圖像是否為相同大小。

如所示的池化操作，可以用TensorFlow進行如下操作：

tf.nn.max_pool(y, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')

y為上述經過池化後的數據，其是一個[none,28,28,3]結構的tensor；
由於隱藏層中每次以一個2*2的矩形進行池化，所以ksize為[1, 2, 2, 1]；
池化操作不像捲積會出現像素點重疊，向右和向下以2個像素點移動，所以strides為[1, 2, 2, 1]。

2.2 多層捲積網路

2.2.1 輔助函數

　　為了後續計算方便，我們定義瞭如下四個函數：

def conv2d(x, W):

"""conv2d returns a 2d convolution layer with full stride."""

return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):

"""max_pool_2x2 downsamples a feature map by 2X."""

return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],

strides=[1, 2, 2, 1], padding='SAME')

def weight_variable(shape):

"""weight_variable generates a weight variable of a given shape."""

initial = tf.truncated_normal(shape, stddev=0.1)

return tf.Variable(initial)

def bias_variable(shape):

"""bias_variable generates a bias variable of a given shape."""

initial = tf.constant(0.1, shape=shape)

return tf.Variable(initial)

2.2.2 第一次捲積和池化

　　獲取的mnist數據，是以[6000,784]結構存在的tensor數據。為了能夠使用TensorFlow的 tf.nn.conv2d 函數，所以需要將輸入數據進行結構重置。

# grayscale -- it would be 3 for an RGB image, 4 for RGBA, etc.

x_image = tf.reshape(x, [-1, 28, 28, 1])

# First convolutional layer - maps one grayscale image to 32 feature maps.

W_conv1 = weight_variable([5, 5, 1, 32])

b_conv1 = bias_variable([32])

h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)

# Pooling layer - downsamples by 2X.

h_pool1 = max_pool_2x2(h_conv1)

　　對於數據數據的每張圖像是以[28,28]形式；通過捲積後，轉變為[28,28,32]形式，其中32是其特征映射組的數量；再進行池化後，轉變為[14,14,64]的形式。如圖 21所示。

圖 21

2.2.3 第二次捲積和池化

　　在第一次捲積和池化後生成的h_pool1對象是一個[None,14,14,64]的tensor。即對於第二次捲積來說，h_pool1就是一個輸入tensor。如下所示的捲積和池化操作。

# Second convolutional layer -- maps 32 feature maps to 64.

W_conv2 = weight_variable([5, 5, 32, 64])

b_conv2 = bias_variable([64])

h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)

# Second pooling layer.

h_pool2 = max_pool_2x2(h_conv2)

圖 22

2.2.4 前饋神經網路

　　在兩次捲積池化後，再採用傳統前饋網路進行訓練。第二次池化後的h_pool2對象是一個[None,7,7,64]的tensor，即一張圖片從一開始輸入，經過兩次捲積池化後，變成一張有7*7*64個像素點的圖像。

由於傳統前饋網路的輸入數據和輸出數據是一個一維(1-d)結構，所以需要對h_pool2對象進行結構重置。

# Fully connected layer 1 -- after 2 round of downsampling, our 28x28 image

# is down to 7x7x64 feature maps -- maps this to 1024 features.

W_fc1 = weight_variable([7 * 7 * 64, 1024])

b_fc1 = bias_variable([1024])

h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])

h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

如圖 23所示的重置和全連接網路結構：

圖 23

2.2.5 過擬合操作

# Dropout - controls the complexity of the model, prevents co-adaptation of

# features.

keep_prob = tf.placeholder(tf.float32)

h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

2.2.6 生成輸出標簽

　　對於多層前饋神經網路有一個輸入層和一個輸出層，以及多個隱藏層。我們只實現一個隱藏層，所以這裡直接將隱藏層轉換為輸出層，如下所示的程式：

# Map the 1024 features to 10 classes, one for each digit

W_fc2 = weight_variable([1024, 10])

b_fc2 = bias_variable([10])

y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2

結合圖 23的輸入層和隱藏層，增加了輸出層，整個多層前饋神經網路的結構如圖 24所示。

圖 24

2.2.7 模型訓練

　　每一張圖像（[784]類型的向量）通過多層前饋神經網路運算輸出一個[10]向量後，此時可以使用softmax激活函數，生成一個[10]的標簽，指明是哪一個阿拉伯數字了。如下所示進行數據訓練的過程：

#創建優化器，使其來優化W和b等參數

cross_entropy = tf.reduce_mean(

tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv))

train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)

#通過捲積和前饋網路計算後，有y_conv的預測值，所以能夠將其與y_進行比較，從而測量其性能。

correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))

accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

with tf.Session() as sess:

sess.run(tf.global_variables_initializer())

for i in range(20000):

batch = mnist.train.next_batch(50)

if i % 100 == 0:

train_accuracy = accuracy.eval(feed_dict={

x: batch[0], y_: batch[1], keep_prob: 1.0})

print('step %d, training accuracy %g' % (i, train_accuracy))

train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})

print('test accuracy %g' % accuracy.eval(feed_dict={

x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))

3. 參考文獻

[1].TensorFlow中文社區；

4. 附錄

　　該附錄程式是來自 \tensorflow\examples\tutorials\mnist\mnist_deep.py。但是mnist數據存在本地的'/tmp/MNIST_data/'路徑。

from __future__ import absolute_import

from __future__ import division

from __future__ import print_function

import argparse

import sys

from tensorflow.examples.tutorials.mnist import input_data

import tensorflow as tf

FLAGS = None

def deepnn(x):

"""deepnn builds the graph for a deep net for classifying digits.

Args:

x: an input tensor with the dimensions (N_examples, 784), where 784 is the

number of pixels in a standard MNIST image.

Returns:

A tuple (y, keep_prob). y is a tensor of shape (N_examples, 10), with values

equal to the logits of classifying the digit into one of 10 classes (the

digits 0-9). keep_prob is a scalar placeholder for the probability of

dropout.

"""

# Reshape to use within a convolutional neural net.

# Last dimension is for "features" - there is only one here, since images are

# grayscale -- it would be 3 for an RGB image, 4 for RGBA, etc.

x_image = tf.reshape(x, [-1, 28, 28, 1])

# First convolutional layer - maps one grayscale image to 32 feature maps.

W_conv1 = weight_variable([5, 5, 1, 32])

b_conv1 = bias_variable([32])

h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)

# Pooling layer - downsamples by 2X.

h_pool1 = max_pool_2x2(h_conv1)

# Second convolutional layer -- maps 32 feature maps to 64.

W_conv2 = weight_variable([5, 5, 32, 64])

b_conv2 = bias_variable([64])

h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)

# Second pooling layer.

h_pool2 = max_pool_2x2(h_conv2)

# Fully connected layer 1 -- after 2 round of downsampling, our 28x28 image

# is down to 7x7x64 feature maps -- maps this to 1024 features.

W_fc1 = weight_variable([7 * 7 * 64, 1024])

b_fc1 = bias_variable([1024])

h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])

h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

# Dropout - controls the complexity of the model, prevents co-adaptation of

# features.

keep_prob = tf.placeholder(tf.float32)

h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

# Map the 1024 features to 10 classes, one for each digit

W_fc2 = weight_variable([1024, 10])

b_fc2 = bias_variable([10])

y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2

return y_conv, keep_prob

def conv2d(x, W):

"""conv2d returns a 2d convolution layer with full stride."""

return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):

"""max_pool_2x2 downsamples a feature map by 2X."""

return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],

strides=[1, 2, 2, 1], padding='SAME')

def weight_variable(shape):

"""weight_variable generates a weight variable of a given shape."""

initial = tf.truncated_normal(shape, stddev=0.1)

return tf.Variable(initial)

def bias_variable(shape):

"""bias_variable generates a bias variable of a given shape."""

initial = tf.constant(0.1, shape=shape)

return tf.Variable(initial)

def main(_):

# Import data

mnist = input_data.read_data_sets(FLAGS.data_dir, one_hot=True)

# Create the model

x = tf.placeholder(tf.float32, [None, 784])

# Define loss and optimizer

y_ = tf.placeholder(tf.float32, [None, 10])

# Build the graph for the deep net

y_conv, keep_prob = deepnn(x)

cross_entropy = tf.reduce_mean(

tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv))

train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)

correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))

accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

with tf.Session() as sess:

sess.run(tf.global_variables_initializer())

for i in range(20000):

batch = mnist.train.next_batch(50)

if i % 100 == 0:

train_accuracy = accuracy.eval(feed_dict={

x: batch[0], y_: batch[1], keep_prob: 1.0})

print('step %d, training accuracy %g' % (i, train_accuracy))

train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})

print('test accuracy %g' % accuracy.eval(feed_dict={

x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))

if __name__ == '__main__':

parser = argparse.ArgumentParser()

parser.add_argument('--data_dir', type=str,

default='/tmp/MNIST_data/',

help='Directory for storing input data')

FLAGS, unparsed = parser.parse_known_args()

tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)

TensorFlow框架(4)之CNN捲積神經網路詳解

1. 捲積神經網路

1.1 多層前饋神經網路

1.2 捲積神經網路

1.2.1 網路結構

1.2.2 捲積計算

1.2.3 池化計算

1.2.4 特征映射

1.2.5 圖像應用

2. TensorFlow實現

2.1 API介紹

2.1.1 維度轉換

2.1.2 捲積操作

2.1.3 池化操作

2.2 多層捲積網路

2.2.1 輔助函數

2.2.2 第一次捲積和池化

2.2.3 第二次捲積和池化

2.2.4 前饋神經網路

2.2.5 過擬合操作

2.2.6 生成輸出標簽

2.2.7 模型訓練

3. 參考文獻

4. 附錄