Thursday, September 19, 2013

Recovering From A Failed Tape Cartridge

If you have been storing files on a tape cartridge and it starts to exhibit serious I/O errors, you need to act quickly to make sure you don't lose any files.  Luckily, tape cartridges rarely fail, and they usually fail when you first start writing data to them.

You will need to use the Command Line Interface to run most of these steps:

First Step - Run "unmigrate -s /cache -v LABEL -r",   where LABEL is the barcode label of the tape that is getting errors.  This will make sure that all files on the tape that are still on disk are marked as not migrated, so that they will be written to another tape at a later time.  The "-r"  means to "don't recall" the files, so that the tape will not be loaded to get files back.

Second Step - After running the "unmigrate", you should go the the web gui, click on the tape label in question, then select "Display Current Tape Contents".  The popup window that appears will show all the current files that are still on the suspect tape.  Hopefully, this display shows no files.  If it shows no files, you are done and should eject that tape from your tape library and destroy it.

Third Step - If you are at this step, then you have files on the tape that are not fully on disk storage and you need to recover those.  If you have been making second copies ("copy2" tapes), then you can just switch to the second copy for those files and restore them.  If you do not have a second copy, so to the "Try to Recover From Bad Tape" step below.

To switch to the second copy, run "switchmig -s /cache -v LABEL".  This program will make all files that have their primary copy on the tape with barcode LABEL now have their primary copy where ever the second copy was stored.  After "switchmig" completes, run "check_db" to make sure that the database has the new locations.

Fourth Step -  Now that you can recover the files from the second copy, you can elect to just leave things that way, or you can unmigrate all those files that were originally on the bad tape, so that there are two new copies on other tapes.  If you wish to just use the previous second copy as the primary copy now, just stop at this point and the next second copy migration will make another copy of the file.

Fifth Step - How to Unmigrate the files from second copy media back to disk. This step is more involved and requires you to use a LinkFile created in /var/tmp.   If you look at a directory listing of /var/tmp, you will see the linkfiles.  Select the linkfile from yesterday,

Try to Recover From Bad Tape Step
The procedure to follow at this point depends on whether the bad tape is a TAR tape or an LTFS tape.  For LTFS tapes, you should follow the LTFS recovery processes detailed by IBM.  For TAR format tapes, you should run "unmigrate -s /cache -w".   This will create a restore process that will recover all files in order, until a serious I/O error occurs.