linux structure needs cleaning结构需要清理

最近遇到一个类似的故障Bug624293-XFS internal error/mount: Structure need scleaning

structure needs cleaning

容器引擎启动失败,/home/robot/docker下报错structure needs cleaning。看Linux 操作系统日志也是上边的报错。

首先问自己为什么(why)出现structure  needs cleaning?什么时间(when)会出现structure needs cleaning?怎么(how)恢复环境?

Try to repair:首先尝试修复

[root@scheat tmp]# xfs_check /dev/vdb

xfs_check: 无法初始化数据cannot init perag data (117)

ERROR:文件系统在日志中有重要的元数据更改,需要重播。 The filesystem has  valuable metadata changes in a log which  needs to be replayed.  挂载文件系统重播日志,卸载文件系统前首先运行xfs_check (Mount the  filesystem to replay the log, and unmount it before re-running xfs_check). 如果无法卸载文件系统则使用xfs_repair -L 参数破坏日志并尝试修复。 If you are  unable to mount the filesystem, then use the xfs_repair -L option  to destroy the  log and attempt a repair.

Note that destroying the log may cause  corruption -- please attempt a mount of the  filesystem before doing this.

[root@scheat tmp]# xfs_repair /dev/vdb

Phase 1 - find and verify superblock...

Phase 2 - using internal log

        - zero log...

ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed.  Mount the filesystem to replay the log, and unmount it before re-running  xfs_repair.  If you are unable to mount  the filesystem, then use the -L option to  destroy the log and attempt a repair.

Note that destroying the log may cause corruption -- please attempt a mount of the  filesystem before doing this.

[root@scheat tmp]#

xfs_metadump  -g /dev/vdb ./dev-vdb.dump

xfs_metadump: cannot init perag data (117)

Copying log                                               

[root@scheat tmp]

nothing help

going forward with:下一步-L修复

xfs_repair -L /dev/vdb

lot of errors!

Timeline of the Problem:问题的时间表

- everything went fine I installing a new  virtual Fileserver

- The Host has a 3Ware Controller in:

I have a 3Ware 9690SA-8I Controller with 4 x 2TB Disks ( RAID 10 for data ) and 2 x 320GB ( for OS ).

Then I do a reboot to clean the system and checks if all OK. There one Disks disappear from the RAID 10. Most likly because I don't set it to fix Link Speed = 1.5 Gbps. Then I rebuild the array but I couldn't mount it because of Metadata Problems !

I also see  the message:

Aug 15 20:30:05 scheat kernel: Filesystem "vdb": Disabling barriers, trial barrier write failed

Does this filesystem Problems only happen because of the disapperd Disk and the wrong Link Speed(是否仅由于缺少磁盘和错误的链接速度而导致此文件系统出现问题) ? or do I need to change something other ?

thanks for help

The array controller should be taking care  of any data integrity problems.磁盘阵列控制器应注意任何数据的完整性问题。

原理篇

Q: What is the problem with the write cache on journaled filesystems?

https://xfs.org/index.php/XFS_FAQ#Q:_What_is_the_problem_with_the_write_cache_on_journaled_filesystems.3F

Many drives use a write back cache in order to speed up the performance of writes. However, there are conditions such as power failure when the write cache memory is never flushed to the actual disk. Further, the drive can destage data from the write cache to the platters in any order that it chooses. This causes problems for XFS and journaled filesystems in general because they rely on knowing when a write has completed to the disk通常这会导致XFS和日记文件系统出现问题因为它们依赖于知道何时完成对磁盘的写入. They need to know that the log information has made it to disk before allowing metadata to go to disk它们需要知道日志信息在允许元数据进入磁盘之前已进入磁盘. When the metadata makes it to disk then the transaction can effectively be deleted from the log resulting in movement of the tail of the log and thus freeing up some log space当元数据放入磁盘时则可以有效地从日志中删除事务,从而移动日志尾部,从而释放一些日志空间. So if the writes never make it to the physical disk, then the ordering is violated and the log and metadata can be lost, resulting in filesystem corruption因此如果从未写入物理磁盘,则将违反顺序并且日志和元数据可能会丢失,从而导致文件系统损坏。.

With hard disk cache sizes of currently (Jan 2009) up to 32MB that can be a lot of valuable information. In a RAID with 8 such disks these adds to 256MB, and the chance of having filesystem metadata in the cache is so high that you have a very high chance of big data losses on a power outage.总结一句话:硬盘缓存越大则丢数据的可能性越大。当前(2009年1月)的硬盘缓存大小最大为32MB,这可能是很多有价值的信息。 在8个此类磁盘的RAID中,硬盘缓存增加到256MB,这样的话,在高速缓存中有文件系统元数据的机会非常高,以至于停电时很有可能造成大量数据丢失。

With a single hard disk and barriers turned on (on=default), the drive write cache is flushed before and after a barrier is issued. A powerfail "only" loses data in the cache but no essential ordering is violated, and corruption will not occur.在单个硬盘和barriers打开的情况下(on = default),在barrier解决前后都会刷新驱动器写缓存。 电源故障“仅”会丢失高速缓存中的数据,但不会违反基本顺序,也不会发生损坏。

With a RAID controller with battery backed controller cache and cache in write back mode, you should turn off barriers - they are unnecessary in this case, and if the controller honors the cache flushes, it will be harmful to performance. But then you *must* disable the individual hard disk write cache in order to ensure to keep the filesystem intact after a power failure. The method for doing this is different for each RAID controller. See the section about RAID controllers below.对于具有后备电池的控制器缓存和缓存处于回写模式的RAID控制器,在这种情况下应该关闭barriers,它们是不必要的,并且如果控制器采用高速缓存刷新功能,则将对性能造成危害。 但是,你必须*禁用单个硬盘写缓存,以确保断电后保持文件系统完整。 每个RAID控制器对这个的处理方法不同。 请参阅下面有关RAID控制器的部分。

问题清楚了

Thats clear, I already mention that the  maybe the Controller trigger the Problem.

But this night I get another XFS internal  error during a rsync Job:

----Once again, that is not directory block data that is being dumped there. It looks like a  partial path name ("/Pm.Reduzieren/S")  which tends to indicate that the directory  read has returned uninitialisd data.这不是转储在那里的目录块数据。看起来像部分路径名(“ /Pm.Reduzieren/S”),倾向于读取的目录已返回未初始化的数据。

Did the filesystem repair cleanly? if you run xfs_repair a second time, did it find more  errors or was it clean? i.e. is this still  corruption left over from the original  incident, or is it new corruption?文件系统修复干净了吗? 如果第二次运行xfs_repair,它是否发现了更多错误还是没有? 是从原始事件遗留的损坏,还是新的损坏?

----The filesystem repair did work fine,  all was Ok. the second was a new Problem.

LSI / 3 Ware now replace the Controller and the BBU Board and also the Battery, because they don't now what's happen.


There where no problem on the Host.

I now disable the write Cache according  the faq: /cX/uX set cache=off

tw_cli /c6/u1 show all

tw_cli /c6/u1 set cache=off

But not sure howto disable the individual Harddisk Cache.

最后的面纱

File system errors can be a little tricky to narrow down. In some of the more rare cases a drive might be writing out bad data. However, per the logs I didn’t see any indication of a drive problem and not one has reallocated a sector. I see that all four are running at the 1.5Gb/s Link Speed now.要减小文件系统错误,可能会有些棘手。 在某些较罕见的情况下,驱动器可能会写出不良数据。 然而根据日志,没有看到任何驱动器问题的迹象,也没有重新分配了扇区。 我看到4个文件系统都以1.5Gb / s的链接速度运行。

Sometimes the problem can be traced back to the controller and/or the BBU. I did notice something pretty interesting in the driver message log and the controller’s advanced diagnostic.有时问题可以追溯到控制器或BBU。我确实在驱动程序消息日志和控制器的高级诊断中发现了一些非常有趣的东西。

According to the driver message log, the last Health Check [capacity test] was done on Aug 10th:驱动消息日志中最后一次健康检查操作在8月10号

Aug 10 21:40:35 enif kernel: 3w-9xxx: scsi6: AEN: INFO (0x04:0x0051): Battery health check started:.

However, the controller’s advanced log shows this:然后控制器的高级日志显示如下

/c6/bbu Last Capacity Test        = 10-Jul-2010

There is an issue between controller and BBU and we need to understand which component is at issue. If this is a live server you may want to replace both components. Or if you can perform some troubleshooting, power the system down and remove the BBU and its daughter PCB from the RAID controller. Then ensure the write cache setting remains enabled and see if there’s a reoccurrence. If so the controller is bad. If not it’s the BBU that we need to replace.这是一个在控制器和BBU之间的问题,我们需要理解问题所在的组件模块。

Just for Information,the Problem was a  Bug in the virtio driver with disks over 2 TB !

Bug605757 - 2tb virtio disk gets massively corrupted filesystems

*** This bug has been marked as a duplicate of bug 605757 ***

©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 217,907评论 6 506
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 92,987评论 3 395
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 164,298评论 0 354
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 58,586评论 1 293
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 67,633评论 6 392
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 51,488评论 1 302
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 40,275评论 3 418
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 39,176评论 0 276
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 45,619评论 1 314
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,819评论 3 336
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,932评论 1 348
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,655评论 5 346
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 41,265评论 3 329
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,871评论 0 22
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,994评论 1 269
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 48,095评论 3 370
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,884评论 2 354