Sigh, Supermicro board is Oops’ing kernel

It seems Mldonkey has always been the unfortunate process to trigger this, so far. It happens with 2.6.7 and 2.6.8.1 so far. It seems likely to be a hardware issue. A partial memtest86 run has revealed nothing so far. Next I am going to try reseating the CPUs and RAM, since it appears to have been dropped during shipping to me. (The HSFs had fallen off the CPUs.) Next, I’ll try each CPU by itself with an UP kernel. Finally, I’ll remove the board and try running it on an electrostatic bag. It would be quite helpful if I could find something less intensive to trigger the Oops, though, since the latter most test might not be possible since connecting all the cards to their hard disks will be quite a stretch with the mainboard out of the case.

I can still downgrade to my old L7S7A2, but I’d seriously rather not given the performance increase from the Supermicro Super 370DLE.

It’s always CPU 0. Maybe that means something:

nebula:~# egrep 'CPU:    [0-9]' /var/log/syslog.0
Sep  9 09:31:38 nebula kernel: CPU:    0
Sep  9 17:26:35 nebula kernel: CPU:    0
Sep  9 19:16:42 nebula kernel: CPU:    0

The full trace in all its glory.

nebula kernel: mlnet: page allocation failure. order:5, mode:0xd0
nebula kernel:  [__alloc_pages+761/880] __alloc_pages+0x2f9/0x370
nebula kernel:  [__get_free_pages+31/64] __get_free_pages+0x1f/0x40
nebula kernel:  [kmem_getpages+32/224] kmem_getpages+0x20/0xe0
nebula kernel:  [cache_grow+160/528] cache_grow+0xa0/0x210
nebula kernel:  [cache_alloc_refill+381/560] cache_alloc_refill+0x17d/0x230
nebula kernel:  [__kmalloc+136/160] __kmalloc+0x88/0xa0
nebula kernel:  [__crc_generic_read_dir+1429139/7856069] xfs_iread_extents+0x70/0x180 [xfs]
nebula kernel:  [__crc_generic_read_dir+1260038/7856069] xfs_bmapi+0x213/0x1480 [xfs]
nebula kernel:  [autoremove_wake_function+0/96] autoremove_wake_function+0x0/0x60
nebula kernel:  [mempool_alloc+115/320] mempool_alloc+0x73/0x140
nebula kernel:  [as_update_arq+46/128] as_update_arq+0x2e/0x80
nebula kernel:  [as_add_request+434/544] as_add_request+0x1b2/0x220
nebula kernel:  [__crc_generic_read_dir+1446652/7856069] xfs_imap_to_bmap+0x39/0x240 [xfs]
nebula kernel:  [__crc_generic_read_dir+1447581/7856069] xfs_iomap+0x19a/0x540 [xfs]
nebula kernel:  [as_update_arq+46/128] as_update_arq+0x2e/0x80
nebula kernel:  [__crc_generic_read_dir+1588792/7856069] linvfs_get_block_core+0xa5/0x2c0 [xfs]
nebula kernel:  [__make_request+712/1408] __make_request+0x2c8/0x580
nebula kernel:  [__crc_generic_read_dir+1589391/7856069] linvfs_get_block+0x3c/0x40 [xfs]
nebula kernel:  [do_mpage_readpage+943/960] do_mpage_readpage+0x3af/0x3c0
nebula kernel:  [radix_tree_node_alloc+31/112] radix_tree_node_alloc+0x1f/0x70
nebula kernel:  [radix_tree_insert+237/272] radix_tree_insert+0xed/0x110
nebula kernel:  [add_to_page_cache+96/192] add_to_page_cache+0x60/0xc0
nebula kernel:  [mpage_readpages+315/368] mpage_readpages+0x13b/0x170
nebula kernel:  [__crc_generic_read_dir+1589331/7856069] linvfs_get_block+0x0/0x40 [xfs]
nebula kernel:  [read_pages+312/336] read_pages+0x138/0x150
nebula kernel:  [__crc_generic_read_dir+1589331/7856069] linvfs_get_block+0x0/0x40 [xfs]
nebula kernel:  [__alloc_pages+784/880] __alloc_pages+0x310/0x370
nebula kernel:  [do_page_cache_readahead+275/400] do_page_cache_readahead+0x113/0x190
nebula kernel:  [page_cache_readahead+256/512] page_cache_readahead+0x100/0x200
nebula kernel:  [do_generic_mapping_read+266/1120] do_generic_mapping_read+0x10a/0x460
nebula kernel:  [__generic_file_aio_read+523/576] __generic_file_aio_read+0x20b/0x240
nebula kernel:  [file_read_actor+0/240] file_read_actor+0x0/0xf0
nebula kernel:  [__crc_generic_read_dir+1616642/7856069] xfs_read+0x18f/0x2c0 [xfs]
nebula kernel:  [__crc_generic_read_dir+1600798/7856069] linvfs_read+0x8b/0xa0 [xfs]
nebula kernel:  [do_sync_read+137/192] do_sync_read+0x89/0xc0
nebula kernel:  [link_path_walk+1550/2304] link_path_walk+0x60e/0x900
nebula kernel:  [permission+47/80] permission+0x2f/0x50
nebula kernel:  [__crc_generic_read_dir+1602973/7856069] linvfs_open+0x8a/0x90 [xfs]
nebula kernel:  [dentry_open+206/400] dentry_open+0xce/0x190
nebula kernel:  [filp_open+104/112] filp_open+0x68/0x70
nebula kernel:  [vfs_read+184/304] vfs_read+0xb8/0x130
nebula kernel:  [sys_read+66/112] sys_read+0x42/0x70
nebula kernel:  [syscall_call+7/11] syscall_call+0x7/0xb
nebula kernel:
nebula kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000000
nebula kernel:  printing eip:
nebula kernel: d0bdb945
nebula kernel: *pde = 00000000
nebula kernel: Oops: 0002 [#1]
nebula kernel: SMP
nebula kernel: Modules linked in: nfsd exportfs lockd sunrpc 8250 serial_core xfs reiserfs lp parport_pc parport e100 mii rtc unix ext3 jbd 3w_xxxx megaraid sd_mod scsi_mod ide_disk serverworks ide_core
nebula kernel: CPU:    0
nebula kernel: EIP:    0060:[__crc_generic_read_dir+1259032/7856069]    Not tainted
nebula kernel: EFLAGS: 00010202   (2.6.7)
nebula kernel: EIP is at xfs_bmap_read_extents+0x2f5/0x4d0 [xfs]
nebula kernel: eax: dc010000   ebx: 00000001   ecx: 00000000   edx: 0300603a
nebula kernel: esi: c308c018   edi: 00000000   ebp: 00000000   esp: ce7168c0
nebula kernel: ds: 007b   es: 007b   ss: 0068
nebula kernel: Process mlnet (pid: 731, threadinfo=ce716000 task=ce734cf0)
nebula kernel: Stack: cec63400 005015e0 00000001 00000000 ce716914 00000002 00000000 839c0b00
nebula kernel:        000000fe 000000fe 000000fe 005015e0 00000000 c15815a0 00001d69 c2dd7808
nebula kernel:        cec63400 00000000 c4d13360 00000001 c308c000 c15815a0 c4d13360 0001d690
nebula kernel: Call Trace:
nebula kernel:  [__crc_generic_read_dir+1429178/7856069] xfs_iread_extents+0x97/0x180 [xfs]
nebula kernel:  [__crc_generic_read_dir+1260038/7856069] xfs_bmapi+0x213/0x1480 [xfs]
nebula kernel:  [autoremove_wake_function+0/96] autoremove_wake_function+0x0/0x60
nebula kernel:  [mempool_alloc+115/320] mempool_alloc+0x73/0x140
nebula kernel:  [as_update_arq+46/128] as_update_arq+0x2e/0x80
nebula kernel:  [as_add_request+434/544] as_add_request+0x1b2/0x220
nebula kernel:  [__crc_generic_read_dir+1446652/7856069] xfs_imap_to_bmap+0x39/0x240 [xfs]
nebula kernel:  [__crc_generic_read_dir+1447581/7856069] xfs_iomap+0x19a/0x540 [xfs]
nebula kernel:  [as_update_arq+46/128] as_update_arq+0x2e/0x80
nebula kernel:  [__crc_generic_read_dir+1588792/7856069] linvfs_get_block_core+0xa5/0x2c0 [xfs]
nebula kernel:  [__make_request+712/1408] __make_request+0x2c8/0x580
nebula kernel:  [__crc_generic_read_dir+1589391/7856069] linvfs_get_block+0x3c/0x40 [xfs]
nebula kernel:  [do_mpage_readpage+943/960] do_mpage_readpage+0x3af/0x3c0
nebula kernel:  [radix_tree_node_alloc+31/112] radix_tree_node_alloc+0x1f/0x70
nebula kernel:  [radix_tree_insert+237/272] radix_tree_insert+0xed/0x110
nebula kernel:  [add_to_page_cache+96/192] add_to_page_cache+0x60/0xc0
nebula kernel:  [mpage_readpages+315/368] mpage_readpages+0x13b/0x170
nebula kernel:  [__crc_generic_read_dir+1589331/7856069] linvfs_get_block+0x0/0x40 [xfs]
nebula kernel:  [read_pages+312/336] read_pages+0x138/0x150
nebula kernel:  [__crc_generic_read_dir+1589331/7856069] linvfs_get_block+0x0/0x40 [xfs]
nebula kernel:  [__alloc_pages+784/880] __alloc_pages+0x310/0x370
nebula kernel:  [do_page_cache_readahead+275/400] do_page_cache_readahead+0x113/0x190
nebula kernel:  [page_cache_readahead+256/512] page_cache_readahead+0x100/0x200
nebula kernel:  [do_generic_mapping_read+266/1120] do_generic_mapping_read+0x10a/0x460
nebula kernel:  [__generic_file_aio_read+523/576] __generic_file_aio_read+0x20b/0x240
nebula kernel:  [file_read_actor+0/240] file_read_actor+0x0/0xf0
nebula kernel:  [__crc_generic_read_dir+1616642/7856069] xfs_read+0x18f/0x2c0 [xfs]
nebula kernel:  [__crc_generic_read_dir+1600798/7856069] linvfs_read+0x8b/0xa0 [xfs]
nebula kernel:  [do_sync_read+137/192] do_sync_read+0x89/0xc0
nebula kernel:  [link_path_walk+1550/2304] link_path_walk+0x60e/0x900
nebula kernel:  [permission+47/80] permission+0x2f/0x50
nebula kernel:  [__crc_generic_read_dir+1602973/7856069] linvfs_open+0x8a/0x90 [xfs]
nebula kernel:  [dentry_open+206/400] dentry_open+0xce/0x190
nebula kernel:  [filp_open+104/112] filp_open+0x68/0x70
nebula kernel:  [vfs_read+184/304] vfs_read+0xb8/0x130
nebula kernel:  [sys_read+66/112] sys_read+0x42/0x70
nebula kernel:  [syscall_call+7/11] syscall_call+0x7/0xb
nebula kernel:
nebula kernel: Code: 89 4d 00 83 c6 10 89 c1 89 7d 04 89 d7 0f c9 0f cf 87 cf 89