Monday, March 15, 2010

XFS B-Tree error and Building a New DMAPI kernel

In LINUX Kernel releases prior to, and including 2.6.29, a large number of file creations and deletions in a single directory uncovers a bug that causes the XFS file system to become unavailable. This is evidenced by the following error message in the system log (/var/log/messages):
infinidiscdev kernel: XFS internal error XFS_WANT_CORRUPTED_GOTO at line 3327 of file fs/xfs/xfs_btree.c. Caller 0xc032b241

This only occurs on 32-bit InfiniDisc systems, and not on any installed OSVault systems. So far the problem has only been seen in lab testing using the BackupPC software to backup a large number of client systems to an InfiniDisc system.

Since nearly all of our customers for OSVault archive systems do not usually do large amounts of file deletions, this problem would be hard to find in that environment.

The fix is to update to a newer kernel version, such as 2.6.31 or 2.6.33. Our InfiniDisc systems will update to 2.6.33, which we get from XFS.ORG but then have to make a couple of patches to. The reason for the patches is that we have legacy systems built with gcc 3.4 and the baseline 2.6.33 kernel will not compile with that kernel. For our more current InfiniDisc appliances built on CentOS 5.4, the changes to the standard kernel source are not required.

The primary change we make to 2.6.33 is in the mptsas message routine, where we move the mptsas_set_rphy routine before the mptsas_port_delete routine.