题
我正在尝试加载预训练网络,我收到以下错误
F1101 23:03:41.857909 73 net.cpp:757] Cannot copy param 0 weights
from layer ‘fc4’; shape mismatch. Source param shape is 512 4096
(2097152); target param shape is 512 256 4 4 (2097152). To learn this
layer’s parameters from scratch rather than copying from a saved net,
rename the layer.
我注意到512 x 256 x 4 x 4 == 512 x 4096,所以似乎在保存和重新加载网络权重时,图层以某种方式被展平.
我该如何抵消这个错误?
重现
我正试图在this GitHub repository中使用D-CNN预训练网络.
我加载网络
@H_@R_403_448@_28@import caffe net = caffe.Net('deploy_D-CNN.prototxt','D-CNN.caffemodel',caffe.TEST)
原型文件是
@H_@R_403_448@_28@name: "D-CNN" input: "data" input_dim: 10 input_dim: 3 input_dim: 259 input_dim: 259 layer { name: "conv1" type: "Convolution" bottom: "data" top: "conv1" convolution_param { num_output: 64 kernel_size: 5 stride: 2 } } layer { name: "relu1" type: "ReLU" bottom: "conv1" top: "conv1" } layer { name: "pool1" type: "Pooling" bottom: "conv1" top: "pool1" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "norm1" type: "LRN" bottom: "pool1" top: "norm1" lrn_param { local_size: 5 alpha: 0.0001 beta: 0.75 } } layer { name: "conv2" type: "Convolution" bottom: "norm1" top: "conv2" convolution_param { num_output: 128 pad: 1 kernel_size: 3 } } layer { name: "relu2" type: "ReLU" bottom: "conv2" top: "conv2" } layer { name: "pool2" type: "Pooling" bottom: "conv2" top: "pool2" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "conv3" type: "Convolution" bottom: "pool2" top: "conv3" convolution_param { num_output: 256 pad: 1 kernel_size: 3 stride: 1 } } layer { name: "relu3" type: "ReLU" bottom: "conv3" top: "conv3" } layer { name: "fc4" type: "Convolution" bottom: "conv3" top: "fc4" convolution_param { num_output: 512 pad: 0 kernel_size: 4 } } layer { name: "relu4" type: "ReLU" bottom: "fc4" top: "fc4" } layer { name: "drop4" type: "Dropout" bottom: "fc4" top: "fc4" dropout_param { dropout_ratio: 0.5 } } layer { name: "pool5_spm3" type: "Pooling" bottom: "fc4" top: "pool5_spm3" pooling_param { pool: MAX kernel_size: 10 stride: 10 } } layer { name: "pool5_spm3_flatten" type: "Flatten" bottom: "pool5_spm3" top: "pool5_spm3_flatten" } layer { name: "pool5_spm2" type: "Pooling" bottom: "fc4" top: "pool5_spm2" pooling_param { pool: MAX kernel_size: 14 stride: 14 } } layer { name: "pool5_spm2_flatten" type: "Flatten" bottom: "pool5_spm2" top: "pool5_spm2_flatten" } layer { name: "pool5_spm1" type: "Pooling" bottom: "fc4" top: "pool5_spm1" pooling_param { pool: MAX kernel_size: 29 stride: 29 } } layer { name: "pool5_spm1_flatten" type: "Flatten" bottom: "pool5_spm1" top: "pool5_spm1_flatten" } layer { name: "pool5_spm" type: "Concat" bottom: "pool5_spm1_flatten" bottom: "pool5_spm2_flatten" bottom: "pool5_spm3_flatten" top: "pool5_spm" concat_param { concat_dim: 1 } } layer { name: "fc4_2" type: "InnerProduct" bottom: "pool5_spm" top: "fc4_2" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } inner_product_param { num_output: 512 weight_filler { type: "gaussian" std: 0.005 } bias_filler { type: "constant" value: 0.1 } } } layer { name: "relu4" type: "ReLU" bottom: "fc4_2" top: "fc4_2" } layer { name: "drop4" type: "Dropout" bottom: "fc4_2" top: "fc4_2" dropout_param { dropout_ratio: 0.5 } } layer { name: "fc5" type: "InnerProduct" bottom: "fc4_2" top: "fc5" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } inner_product_param { num_output: 19 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layer { name: "prob" type: "Softmax" bottom: "fc5" top: "prob" }
最佳答案
看起来你正在采用预训练网,其中“fc4”层是一个完全连接的层(又名类型:“InnerProduct”层),它被“重新塑造”成卷积层.
由于内积层和卷积层对输入执行大致相同的线性运算,因此可以在某些假设下进行这种改变(参见例如here).
正如您已经正确识别的那样,原始预训练的完全连接层的权重被保存为“扁平化”,因为形状可以预期卷积层.
由于内积层和卷积层对输入执行大致相同的线性运算,因此可以在某些假设下进行这种改变(参见例如here).
正如您已经正确识别的那样,原始预训练的完全连接层的权重被保存为“扁平化”,因为形状可以预期卷积层.
我认为这个问题的解决方案可以使用share_mode: PERMISSIVE
:
@H_@R_403_448@_28@layer { name: "fc4" type: "Convolution" bottom: "conv3" top: "fc4" convolution_param { num_output: 512 pad: 0 kernel_size: 4 } param { lr_mult: 1 decay_mult: 1 share_mode: PERMISSIVE # should help caffe overcome the shape mismatch } param { lr_mult: 2 decay_mult: 0 share_mode: PERMISSIVE } }