Wednesday, March 9, 2011

Fixing the 1TByte inode problem in XFS file systems

If you have an XFS file system that you fill completely full, then add more hard disk space to it, you could run into the situation where the file system reports "no space available", but a "df" command shows plenty of space available. This is caused by the inability to allocate new inodes in that XFS filesystem.

Now, XFS dynamically allocates inodes, so you might be wondering how this could happen. The reason is that unless you say otherwise, inodes are limited to 32-bit values, which means they must fit in the first 1TByte of storage in the file system. But you completely filled that 1TByte previously, so XFS can't allocate more inodes now.

You could just switch to the "inode64" mount option and continue on, but that risks compatibility problems with NFS and with DMAPI (MySQL won't store a 64-bit inode properly in an "int" variable).

To get around this problem is actually not too difficult IF you know how. If you don't know how, it can be a very difficult time.

To fix the situation, you need to move files that occupy some of the storage in the first 1TByte of the file system. To do that, try the following:

1. Run xfs_info on your XFS mount point. For example:
[root@osvault ~]# xfs_info /cache/
meta-data=/dev/CACHE/CACHE isize=256 agcount=375, agsize=64469728 blks
= sectsz=512 attr=1
data = bsize=4096 blocks=24157093888, imaxpct=25
= sunit=0 swidth=0 blks, unwritten=1
naming = version 2 bsize=4096
log =internal bsize=4096 blocks=32768, version=1
= sectsz=512 sunit=0 blks, lazy-count=0
realtime =none extsz=4096 blocks=0, rtextents=0

Notice that in this case, the "agsize" is 64 million blocks and the "bsize" is 4K, so the Allocation Groups are 256GBytes each. That means the first 1Tbyte of storage is in the first 4 Allocation Groups. So you want to find the largest files that are in that first 1TByte. If your allocation group is shown differently, then divide 1TByte by the allocation group size and get the number of allocation groups.

2. Figure out if those first allocation groups are full by running:

for ag in `seq 0 1 5`; do echo freespace in AG $ag; xfs_db -r -c "freesp -s -a $ag" /dev/CACHE/CACHE ; grep "total free"; done

If the total free blocks in each Allocation Group (AG) is less than about 40, then you can't create inodes in that allocation group. So now you want to find some files in that allocation group and move them out of the file system and then back in again. Its important that you "mv" the file, rather then "cp", so that the file is deleted from the XFS file system.

2. Now run xfs_bmap -v filename on all of the files in your filesystem. Yes, its tedious, so you probably want to script it. Just run an "ls /mountpoint" and send the output to a temporary file. Edit that temporary file and add the command to the beginning. Beware of spaces, quotes and parenthesis in your filenames.

3. Examine the output from all those xfs_bmap programs and search for lines with " 0 " (thats a space, then a numeral zero, then a space). That will find the files in Allocation Group 0. Repeat this for Allocation group 1, 2, and 3 (in this example). Every file in those first few allocation groups are candidates. All you have to do is copy the file to a temporary location, then copy it back to the XFS file system. The new instance of the file will be placed in storage in other allocation groups, and your XFS file system will now be able to allocate more inodes for new files.

3 comments:

Unknown said...

Dude...this blog post saved my ASS. thank you sir.

martinitime1975 said...

This did the trick. As soon as I moved the largest file that met the requirements, everything was fixed. Copied the file back and all was good!

martinitime1975 said...
This comment has been removed by the author.