[Windows] mask r-cnn 填坑笔记

注:这里介绍的问题,是在Windows环境下可能出现的错误,在其他环境暂不清楚。因为之前在Mac OS下是没出现类似的问题,可能是因为Mac OS安装的是没有GPU加速的tensorflow

mask r-cnn项目的GitHub地址

1. 显存不足导致的各种问题

1.1 failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED

image.png
2019-02-24 20:34:03.563275: E C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\stream_executor\cuda\cuda_blas.cc:366] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-02-24 20:34:03.563666: E C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\stream_executor\cuda\cuda_blas.cc:366] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-02-24 20:34:04.486945: E C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\stream_executor\cuda\cuda_dnn.cc:385] could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2019-02-24 20:34:04.487252: E C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\stream_executor\cuda\cuda_dnn.cc:389] error retrieving driver version: Unimplemented: kernel reported driver version not implemented on Windows
2019-02-24 20:34:04.487533: E C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\stream_executor\cuda\cuda_dnn.cc:352] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
2019-02-24 20:34:04.487769: F C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\kernels\conv_ops.cc:667] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo<T>(), &algorithms) 

解决方案

# 全局设置
os.environ['KMP_DUPLICATE_LIB_OK']='True'

如:


image.png

1.2 OOM when allocating tensor with shape

2019-02-24 20:41:58.028372: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.08GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
2019-02-24 20:42:12.927144: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\bfc_allocator.cc:273] Allocator (GPU_0_bfc) ran out of memory trying to allocate 256.00MiB.  Current allocation summary follows.
2019-02-24 20:42:12.927448: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\bfc_allocator.cc:627] Bin (256):  Total Chunks: 253, Chunks in use: 245. 63.3KiB allocated for chunks. 61.3KiB in use in bin. 11.1KiB client-requested in use in bin.
2019-02-24 20:42:12.927720: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\bfc_allocator.cc:627] Bin (512):  Total Chunks: 44, Chunks in use: 42. 22.5KiB allocated for chunks. 21.0KiB in use in bin. 21.0KiB client-requested in use in bin.
... 省略一大坨
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[2,256,256,512]
     [[Node: training/SGD/gradients/rpn_model/rpn_bbox_pred/convolution_grad/Conv2DBackpropInput = Conv2DBackpropInput[T=DT_FLOAT, _class=["loc:@rpn_model/rpn_bbox_pred/convolution"], data_format="NHWC", padding="VALID", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](training/SGD/gradients/rpn_model/rpn_bbox_pred/convolution_grad/ShapeN, rpn_bbox_pred/kernel/read, training/SGD/gradients/rpn_model/lambda_3/Reshape_grad/Reshape)]]
     [[Node: training/SGD/gradients/mrcnn_mask_conv1/convolution_grad/Conv2DBackpropInput/_4653 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_7160_training/SGD/gradients/mrcnn_mask_conv1/convolution_grad/Conv2DBackpropInput", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

ran out of memory

OOM when allocating tensor with shape

报这个错其实是因为显卡内存(显存)不足导致的,解决的办法有:

  • 降低每个GPU处理的图片数量
  • 重置输入图片尺寸,即通过减小图片的大小来减少对显存的消耗

解决方案

class ShapeConfig(Config):
    """Configuration for training on the toy  dataset.
    Derives from the base Config class and overrides some values.
    """
    # Give the configuration a recognizable name
    NAME = "shape"

    # We use a GPU with 12GB memory, which can fit two images.
    # Adjust down if you use a smaller GPU.
    IMAGES_PER_GPU = 1

    # Input image resizing
    IMAGE_MIN_DIM = IMAGE_MAX_DIM = 128
    ...

降低其中的某一个配置,一般都可以达到效果,当然也可以两个都设置。

下面为这两个配置的说明:

    # Number of images to train with on each GPU. A 12GB GPU can typically
    # handle 2 images of 1024x1024px.
    # Adjust based on your GPU memory and image sizes. Use the highest
    # number that your GPU can handle for best performance.
    IMAGES_PER_GPU = 2

    # Input image resizing
    # Generally, use the "square" resizing mode for training and predicting
    # and it should work well in most cases. In this mode, images are scaled
    # up such that the small side is = IMAGE_MIN_DIM, but ensuring that the
    # scaling doesn't make the long side > IMAGE_MAX_DIM. Then the image is
    # padded with zeros to make it a square so multiple images can be put
    # in one batch.
    # Available resizing modes:
    # none:   No resizing or padding. Return the image unchanged.
    # square: Resize and pad with zeros to get a square image
    #         of size [max_dim, max_dim].
    # pad64:  Pads width and height with zeros to make them multiples of 64.
    #         If IMAGE_MIN_DIM or IMAGE_MIN_SCALE are not None, then it scales
    #         up before padding. IMAGE_MAX_DIM is ignored in this mode.
    #         The multiple of 64 is needed to ensure smooth scaling of feature
    #         maps up and down the 6 levels of the FPN pyramid (2**6=64).
    # crop:   Picks random crops from the image. First, scales the image based
    #         on IMAGE_MIN_DIM and IMAGE_MIN_SCALE, then picks a random crop of
    #         size IMAGE_MIN_DIM x IMAGE_MIN_DIM. Can be used in training only.
    #         IMAGE_MAX_DIM is not used in this mode.
    IMAGE_RESIZE_MODE = "square"
    IMAGE_MIN_DIM = 800
    IMAGE_MAX_DIM = 1024

参考:
mask_rcnn代码解析config.py
OOM when allocating tensor

1.3 另一种 failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED

因暂时没办法重现,但真的遇到这个错误,所以没有相应的截图,只拷贝了别人的日志打印,类似如下:

E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_blas.cc:372] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED 
W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\stream.cc:1390] attempting to perform BLAS operation using StreamExecutor without BLAS support 
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_blas.cc:372] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED 
W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\stream.cc:1390] attempting to perform BLAS operation using StreamExecutor without BLAS support

该错误同样是显存不足导致的。

解决方案:


image.png

参考:
https://github.com/tensorflow/tensorflow/issues/7072#issuecomment-422488354

2. unexpected keyword argument 'keep_dims'

tensorflow配置Mask-RCNN报错:tf.reduce_mean got an unexpected keyword argument 'keep_dims'

3. Tensor is not an element in the graph

当把mask r-cnn`封装成服务,多次调用detect()``来识别物体时,可能会出现类似如下错误:

ValueError: Tensor Tensor("mrcnn_detection/Reshape_1:0", shape=(1, 100, 6), dtype=float32) is not an element of this graph
Tensor is not an element in the graph

解决思路

https://github.com/matterport/Mask_RCNN/issues/588
https://github.com/matterport/Mask_RCNN/issues/600
https://github.com/keras-team/keras/issues/6462#issuecomment-319232504
https://github.com/keras-team/keras/issues/2397

解决方法

在加载weights后,即调用方法model.load_weights()后,还需要调用model.keras_model._make_predict_function()。可以参考:https://github.com/chohaku84/np_detect/blob/28874d61f3301b138e80f3daacb1dace3a74f516/api/flask_app.py#L85

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 212,294评论 6 493
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 90,493评论 3 385
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 157,790评论 0 348
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 56,595评论 1 284
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 65,718评论 6 386
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 49,906评论 1 290
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,053评论 3 410
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 37,797评论 0 268
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,250评论 1 303
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 36,570评论 2 327
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 38,711评论 1 341
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 34,388评论 4 332
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,018评论 3 316
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 30,796评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,023评论 1 266
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 46,461评论 2 360
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 43,595评论 2 350