一次PMC的RAID卡写入性能延迟的问题

问题现象

用户反馈多台机器PMC raid卡下硬盘写入性能延迟高的问题,机器OS为ESXi系统。

问题分析

日志分析

2023-11-13T17:34:53.663Z cpu13:2098447)WARNING: ScsiDeviceIO: 1513: Device naa.600508b1001c55603c71cb71e23b8301 performance has deteriorated. I/O latency increased from average value of 25206 microseconds to 757505 microseconds.
2023-11-13T17:34:54.980Z cpu0:2098449)ScsiDeviceIO: 1513: Device naa.600508b1001c55603c71cb71e23b8301 performance has improved. I/O latency reduced from 757505 microseconds to 147922 microseconds.
2023-11-13T17:35:02.615Z cpu62:2098460)ScsiDeviceIO: 1513: Device naa.600508b1001c55603c71cb71e23b8301 performance has improved. I/O latency reduced from 147922 microseconds to 49903 microseconds.
2023-11-13T17:52:47.544Z cpu30:2098457)WARNING: ScsiDeviceIO: 1513: Device naa.600508b1001c55603c71cb71e23b8301 performance has deteriorated. I/O latency increased from average value of 25277 microseconds to 763337 microseconds.
2023-11-13T17:52:48.769Z cpu32:2098446)ScsiDeviceIO: 1513: Device naa.600508b1001c55603c71cb71e23b8301 performance has improved. I/O latency reduced from 763337 microseconds to 149346 microseconds.
2023-11-13T17:53:06.086Z cpu17:2098445)ScsiDeviceIO: 1513: Device naa.600508b1001c55603c71cb71e23b8301 performance has improved. I/O latency reduced from 149346 microseconds to 50366 microseconds.
2023-11-14T16:01:46.047Z cpu31:2098445)WARNING: ScsiDeviceIO: 1513: Device naa.600508b1001c55603c71cb71e23b8301 performance has deteriorated. I/O latency increased from average value of 25397 microseconds to 780078 microseconds.
2023-11-14T16:01:46.618Z cpu0:2098451)ScsiDeviceIO: 1513: Device naa.600508b1001c55603c71cb71e23b8301 performance has improved. I/O latency reduced from 780078 microseconds to 154412 microseconds.

从日志中确实可以看到IO延迟较大。

继续查看vcenter下硬盘的延迟情况

查看结果为,硬盘的延迟均表现为写延迟较高(且非常高),读延迟正常,读延迟无任何问题。


image.png

vcenter下的告警也均为写延迟

image.png

image.png

分析结论:
从各个层面来看,均为写延迟,且延迟数值较高,因此怀疑为cache层面问题。

继续排查cache

logical drive的cache策略如下

Logical Device number 1
   Logical Device name                        : VD_1
   Disk Name                                  : Not Applicable
   Block Size of member drives                : 512 Bytes
   Array                                      : 1
   RAID level                                 : 6
   Status of Logical Device                   : Optimal
   Parity Initialization Status               : Completed
   Size                                       : 122076928 MB
   Stripe-unit size                           : 256 KB
   Full Stripe Size                           : 2048 KB
   Interface Type                             : Serial Attached SCSI
   Device Type                                : Data
   Boot Type                                  : None
   Heads                                      : 255
   Sectors Per Track                          : 32
   Cylinders                                  : 65535
   Caching                                    : Enabled
   Mount Points                               : Not Applicable
   LD Acceleration Method                     : Controller Cache

logical driver已经设置了cache策略,解释说明如下:
LD Acceleration Method : Setting of the LD acceleration method. Controller cache or SSD I/O BypassormaxCache.

controller cache策略如下

   Cache Properties
   --------------------------------------------------------
   Cache Status                               : Ok
   Cache Serial Number                        : Not Applicable
   Cache memory                               : 3856 MB
   Read Cache Percentage                      : 100 percent
   Write Cache Percentage                     : 0 percent
   No-Battery Write Cache                     : Disabled
   Wait for Cache Room                        : Disabled
   Write Cache Bypass Threshold Size          : 1040 KB
   --------------------------------------------------------

从上面结果可以看出controller的cache策略将cache100%分配给了读,而没有给分配为写,这是不正常。
关于此参数的解释说明:
Cache Ratio : The controller cache ratio setting determines the controller ability to adjust theamount
of memory for read-ahead cache versus write cache.
Cache Ratio (Read) : Sets the ratio of controller cache memory used for read-ahead cache
versus write cache. Cache ratio values range from 0-100, in increments of 5

解决方法

调整read cache占比10%,write cache占比90%.


image.png

调整命令参考如下:


image.png
©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成,浏览时请结合常识与多方信息审慎甄别。
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

相关阅读更多精彩内容

友情链接更多精彩内容