kernel BUG at ll_rw_blk.c:946, 2.4.21-pre6-ac2

Post by lk » Sat, 30 Aug 2003 03:10:08

symoops output is below. I don't know if the root of the problem is in
the ext3 code (which hasn't changed between this kernel and 2.4.23-rc1),
or if it's actually in the elevator code, which definitely has changed.

The system uptime was almost 60 days before the crash, so I thought this
crash report might still be of some use.

compiled with: gcc version 3.2.2 20030222 (Red Hat Linux 3.2.2-5)
Machine config: SMP 2.4G Xeon, 2G memory

ksymoops 2.4.9 on i686 2.4.21-rc6-ac2. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.4.21-rc6-ac2/ (default)
-m /usr/src/linux-2.4.21-rc6-ac2/ (specified)

kernel BUG at ll_rw_blk.c:946!
invalid operand: 0000
CPU: 2
EIP: 0010:[<c01a6cd2>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010246
eax: 00000800 ebx: 00000001 ecx: ebf1b200 edx: 00002000
esi: 00000008 edi: 000f133e ebp: 00000008 esp: e5745b8c
ds: 0018 es: 0018 ss: 0018
Process nfsd (pid: 22786, stackpage=e5745000)
Stack: f88661b7 c62cac80 c62cac80 00001000 f7034294 f7034200 c62cac80 f7034294
00002000 f71ade40 d247c460 00000000 00000008 00000008 e5745c10 ebf1b200
00000008 000f133e 00000008 c01a73ba f71ade18 00000001 ebf1b200 f8866863
Call Trace: [<f88661b7>] [<c01a73ba>] [<f8866863>] [<c01a746b>] [<c01a7586>]
[<f885530c>] [<f8853110>] [<f8852f31>] [<f884f70c>] [<f89adddc>] [<c020cbc6>]
[<c0201200>] [<c021dc32>] [<c021c758>] [<c021db70>] [<f8870c60>] [<f88569b0>]
[<f88560bc>] [<f8852bad>] [<f884e25a>] [<f884e37b>] [<f884e445>] [<c0149ba9>]
[<f884fc4c>] [<f886c061>] [<f8861294>] [<f88647b0>] [<f8870ae0>] [<f8870ae0>]
[<f8a3ee40>] [<f8861210>] [<f8a3f501>] [<f8870ae0>] [<f8a17856>] [<f8a46711>]
[<f8a478e9>] [<f8a4d634>] [<f8a3a62e>] [<f8a4cc78>] [<f8a3a560>] [<f8a174bf>]
[<f8a4d634>] [<f8a4cc98>] [<f8a3a3ff>] [<c01074fe>] [<f8a3a1e0>]
Code: 0f 0b b2 03 e4 d0 27 c0 8b 54 24 58 8b 35 90 a5 35 c0 8b 4c

Trace; f88661b7 <[ext3]ext3_do_update_inode+177/3e0>
Trace; c01a73ba <generic_make_request+da/130>
Trace; f8866863 <[ext3]ext3_mark_iloc_dirty+43/70>
Trace; c01a746b <submit_bh+5b/80>
Trace; c01a7586 <ll_rw_block+f6/1d0>
Trace; f885530c <[jbd]journal_update_superblock+5c/a0>
Trace; f8853110 <[jbd]cleanup_journal_tail+a0/120>
Trace; f8852f31 <[jbd]log_do_checkpoint+21/160>
Trace; f884f70c <[jbd]journal_dirty_metadata+15c/1f0>
Trace; f89adddc <[e1000]e1000_xmit_frame+12c/260>
Trace; c020cbc6 <qdisc_restart+16/1a0>
Trace; c0201200 <dev_queue_xmit+290/350>
Trace; c021dc32 <ip_finish_output2+a2/110>
Trace; c021c758 <ip_output+68/b0>
Trace; c021db70 <output_maybe_reroute+0/20>
Trace; f8870c60 <[ext3]ext3_sops+0/50>
Trace; f88569b0 <[jbd]__ksymtab_journal_abort+0/8>
Trace; f88560bc <[jbd]__jbd_kmalloc+2c/80>
Trace; f8852bad <[jbd]log_wait_for_space+8d/a0>
Trace; f884e25a <[jbd]start_this_handle+ba/190>
Trace; f884e37b <[jbd]new_handle+4b/70>
Trace; f884e445 <[jbd]journal_start+a5/c0>
Trace; c0149ba9 <f

I have a quad-cpu server running RHEL AS 3, Taroon that acts as an NFS
server for some other machines.

It had a 390 uptime when the problem occured.

Essentially, the directory paths it serves out stopped working, attempting
to mount the directory on two client boxes results in:

box1# mount rhelas3_box:/share1 /mnt/share1
mount: RPC: Program not registered

box2# mount rhelas3_box:/share2 /mnt/share2
mount: RPC: Program not registered

My guess as to what happened is portmap died on the server and then the
rpc commands could not be established between the server and the clients.

The options used for the mount are as follows:
rhelas3_box:/share1 /mnt/share1 nfs
rhelas3_box:/share2 /mnt/share2 nfs

On the server in /etc/exports, the shares are below:

Is this more of a kernel issue or specifically the portmap daemon? I have
not seen this error in the past where NFS mounts randomly die, disappear
or fail to work.

Other than checking the logs during the time of the failure, what is the
best way to begin troubleshooting this issue?



