[Techtalk] Filesystem corruption?

Tue Nov 20 10:10:37 EST 2001

Hello again,

I was a bit fried when I wrote my email last night, so I thought I'd 
follow up with more information (if it helps).  I apologize for the long 
email, but there's lots o' data [;-)] .

The machine in question originally had 2 SCSI physical drives; sda and 
sdb.  sda was the OS drive, partitioned off so that several standard 
directories had their own filesystems (/tmp, /var/, /etc).  sdb only had 
one partition, and it was for the user's home directories.  None of the 
filesystems on /dev/sda experienced any corruption.  Running badblocks 
turned up nothing.  Also, I haven't seen any read/write/seek errors on 
the physical devices themselves (that is, the old user data drive or the 
new one) as I would expect to with failing drives.

Here's a more complete log sample:

Nov 18 04:12:51 kernel: EXT2-fs warning (device sd(8,17)): 
ext2_free_blocks: bit already cleared for block 6022795
Nov 18 04:12:51 kernel: EXT2-fs warning (device sd(8,17)): 
ext2_free_inode: bit already cleared for inode 1505329
Nov 18 04:12:51 kernel: find_dentry_by_ino: getting root dentry for 08:11
Nov 18 04:12:51 kernel: lookup_by_inode: ino 1505329 not found in /
Nov 18 04:12:51 kernel: find_fh_dentry: 08:11/1505329 dir/2 not found!

These messages are happening pretty frequently.  It's always the same 
device (8,17), same block and the same inode, althought a different same 
block and inode than on the previous drive.  They're not the same files 
either; on the old drive, the corruption consistently caused the same 
set of files to go bunk.  On the new drive, this is true, but they're a 
different set of files.

Running e2fsck generates something like the following every time the 
system reboots (obviously, all the stuff in <CAPS> are my comments):

e2fsck 1.14, 9-Jan-1999 for EXT2 FS 0.5b, 95/08/09
/dev/sdb1 contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Duplicate blocks found... invoking duplicate block passes.
Pass 1B: Rescan for duplicate/bad blocks
Duplicate/bad block(s) in inode 1114181: 6022795
Duplicate/bad block(s) in inode 1114210: 6022795
Duplicate/bad block(s) in inode 1114231: 6022795
Duplicate/bad block(s) in inode 1114237: 6022795

<LOTS OF DUP/BAD BLOCK ERRORS IN VARIOS INODES>

Pass 1C: Scan directories for inodes with dup blocks.
Pass 1D: Reconciling duplicate blocks
(There are 93 inodes containing duplicate/bad blocks.)

File <BADFILEONE> (inode #1503745, mod time Thu Nov 15 19:20:04 2001)
   has 1 duplicate block(s), shared with 92 file(s):

     <LIST OF OTHER FILES>

This repeats for each of the 92 files, and asks me if I want to clone 
the duplicate blocks.  I said yes for each.

Then the following comes up:

Pass 2: Checking directory structure
Directory inode 1292343, block 8, offset 0: directory corrupted
Salvage? yes

Directory inode 1114181, block 2, offset 0: directory corrupted
Salvage? yes

Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Unattached inode 1292859
Connect to /lost+found? yes

Inode 1292859 ref count is 2, should be 1.  Fix? yes

<BUNCH MORE REF COUNT ERRORS>

Pass 5: Checking group summary information
Block bitmap differences:  +6022795 -6162448 -6162449 -6162454 -6162455 
-6162456 -6162457 -6162473 -6162486 -6162503 -6162511 -6171307 -6172685 
-6180052 -6180053 -6180055 -6180060 -6180062 -6$Fix? yes

Free blocks count wrong for group #80 (358, counted=266).
Fix? yes

<BUNCH MORE OF THESE GROUP PROBLEMS>

/dev/sdb1: ***** FILE SYSTEM WAS MODIFIED *****

   239205 inodes used (10%)
    45333 non-contiguous inodes (19.0%)
          # of inodes with ind/dind/tind blocks: 56553/3330/2
  8289304 blocks used (92%)
        0 bad blocks

   230477 regular files
     8316 directories
        0 character device files
        0 block device files
        0 fifos
      231 links
      403 symbolic links (397 fast symbolic links)
        0 sockets
--------
   239422 files

My plan at this point is to try upgrading the kernel, nfs, and various 
filesystem utilities, or indeed any other packages that might affect the 
filesystem (although this machine is pretty up-to-date aside from it's 
kernel).  Then perhaps swapping out the motherboard and processors. 
Failing that I may move the home directories to another machine and 
share them via NFS to the server, which will share them to it's clients. 
  I imagine the performance hit will be huge, but at least the data 
might be safe.

Then, if all else fails, it's time for a complete rebuild.  Hope that 
doesn't happen :( .

Thanks in advance for any and all advice...

-Brian

Brian Sweeney wrote:

> Hey all,
> 
> Having some weird filesystem corruption with a linux 2.2.5-22 ext2 box. 
>  Here's the deal:
> 
> User's started complaining about files getting corrupted.  Log shows 
> errors like the following:
> 
> kernel: EXT2-fs warning (device sd(8,33)): ext2_free_inode: bit already 
> cleared for inode 372903
> kernel: find_dentry_by_ino: getting root dentry for 08:21 failed
> kernel: lookup_by_inode: ino 372903 not found in /
> 
> and similar errors.  I tried:
> 
> 1) Swapping out memory
> 2) Swapping out hard drive
> 3) Swapping out SCSI controller
> 4) Swapping out SCSI cable
> 5) Swapping out NIC card (just for fun ;-))
> 
> Still get the error.  I'm about to recompile the kernel, but I don't 
> think that's gonna do it.
> 
> Online I found issues relating to old 2.0 kernels with ext2fs 
> corruption, and some other people swearing it had to be memory or hard 
> drive, but that doesn't seem to have helped.
> 
> Any suggestions?
> 
> -Brian
> 
> 
> _______________________________________________
> Techtalk mailing list
> Techtalk at linuxchix.org
> http://www.linuxchix.org/mailman/listinfo/techtalk