[Techtalk] Filesystem corruption?
Brian Sweeney
bsweeney at physics.ucsb.edu
Tue Nov 20 10:10:37 EST 2001
Hello again,
I was a bit fried when I wrote my email last night, so I thought I'd
follow up with more information (if it helps). I apologize for the long
email, but there's lots o' data [;-)] .
The machine in question originally had 2 SCSI physical drives; sda and
sdb. sda was the OS drive, partitioned off so that several standard
directories had their own filesystems (/tmp, /var/, /etc). sdb only had
one partition, and it was for the user's home directories. None of the
filesystems on /dev/sda experienced any corruption. Running badblocks
turned up nothing. Also, I haven't seen any read/write/seek errors on
the physical devices themselves (that is, the old user data drive or the
new one) as I would expect to with failing drives.
Here's a more complete log sample:
Nov 18 04:12:51 kernel: EXT2-fs warning (device sd(8,17)):
ext2_free_blocks: bit already cleared for block 6022795
Nov 18 04:12:51 kernel: EXT2-fs warning (device sd(8,17)):
ext2_free_inode: bit already cleared for inode 1505329
Nov 18 04:12:51 kernel: find_dentry_by_ino: getting root dentry for 08:11
Nov 18 04:12:51 kernel: lookup_by_inode: ino 1505329 not found in /
Nov 18 04:12:51 kernel: find_fh_dentry: 08:11/1505329 dir/2 not found!
These messages are happening pretty frequently. It's always the same
device (8,17), same block and the same inode, althought a different same
block and inode than on the previous drive. They're not the same files
either; on the old drive, the corruption consistently caused the same
set of files to go bunk. On the new drive, this is true, but they're a
different set of files.
Running e2fsck generates something like the following every time the
system reboots (obviously, all the stuff in <CAPS> are my comments):
e2fsck 1.14, 9-Jan-1999 for EXT2 FS 0.5b, 95/08/09
/dev/sdb1 contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Duplicate blocks found... invoking duplicate block passes.
Pass 1B: Rescan for duplicate/bad blocks
Duplicate/bad block(s) in inode 1114181: 6022795
Duplicate/bad block(s) in inode 1114210: 6022795
Duplicate/bad block(s) in inode 1114231: 6022795
Duplicate/bad block(s) in inode 1114237: 6022795
<LOTS OF DUP/BAD BLOCK ERRORS IN VARIOS INODES>
Pass 1C: Scan directories for inodes with dup blocks.
Pass 1D: Reconciling duplicate blocks
(There are 93 inodes containing duplicate/bad blocks.)
File <BADFILEONE> (inode #1503745, mod time Thu Nov 15 19:20:04 2001)
has 1 duplicate block(s), shared with 92 file(s):
<LIST OF OTHER FILES>
This repeats for each of the 92 files, and asks me if I want to clone
the duplicate blocks. I said yes for each.
Then the following comes up:
Pass 2: Checking directory structure
Directory inode 1292343, block 8, offset 0: directory corrupted
Salvage? yes
Directory inode 1114181, block 2, offset 0: directory corrupted
Salvage? yes
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Unattached inode 1292859
Connect to /lost+found? yes
Inode 1292859 ref count is 2, should be 1. Fix? yes
<BUNCH MORE REF COUNT ERRORS>
Pass 5: Checking group summary information
Block bitmap differences: +6022795 -6162448 -6162449 -6162454 -6162455
-6162456 -6162457 -6162473 -6162486 -6162503 -6162511 -6171307 -6172685
-6180052 -6180053 -6180055 -6180060 -6180062 -6$Fix? yes
Free blocks count wrong for group #80 (358, counted=266).
Fix? yes
<BUNCH MORE OF THESE GROUP PROBLEMS>
/dev/sdb1: ***** FILE SYSTEM WAS MODIFIED *****
239205 inodes used (10%)
45333 non-contiguous inodes (19.0%)
# of inodes with ind/dind/tind blocks: 56553/3330/2
8289304 blocks used (92%)
0 bad blocks
230477 regular files
8316 directories
0 character device files
0 block device files
0 fifos
231 links
403 symbolic links (397 fast symbolic links)
0 sockets
--------
239422 files
My plan at this point is to try upgrading the kernel, nfs, and various
filesystem utilities, or indeed any other packages that might affect the
filesystem (although this machine is pretty up-to-date aside from it's
kernel). Then perhaps swapping out the motherboard and processors.
Failing that I may move the home directories to another machine and
share them via NFS to the server, which will share them to it's clients.
I imagine the performance hit will be huge, but at least the data
might be safe.
Then, if all else fails, it's time for a complete rebuild. Hope that
doesn't happen :( .
Thanks in advance for any and all advice...
-Brian
Brian Sweeney wrote:
> Hey all,
>
> Having some weird filesystem corruption with a linux 2.2.5-22 ext2 box.
> Here's the deal:
>
> User's started complaining about files getting corrupted. Log shows
> errors like the following:
>
> kernel: EXT2-fs warning (device sd(8,33)): ext2_free_inode: bit already
> cleared for inode 372903
> kernel: find_dentry_by_ino: getting root dentry for 08:21 failed
> kernel: lookup_by_inode: ino 372903 not found in /
>
> and similar errors. I tried:
>
> 1) Swapping out memory
> 2) Swapping out hard drive
> 3) Swapping out SCSI controller
> 4) Swapping out SCSI cable
> 5) Swapping out NIC card (just for fun ;-))
>
> Still get the error. I'm about to recompile the kernel, but I don't
> think that's gonna do it.
>
> Online I found issues relating to old 2.0 kernels with ext2fs
> corruption, and some other people swearing it had to be memory or hard
> drive, but that doesn't seem to have helped.
>
> Any suggestions?
>
> -Brian
>
>
> _______________________________________________
> Techtalk mailing list
> Techtalk at linuxchix.org
> http://www.linuxchix.org/mailman/listinfo/techtalk
More information about the Techtalk
mailing list