Bugzilla – Bug 1141881
"BUG: unable to handle kernel NULL pointer dereference at 0000000000000029" on OBS job
Last modified: 2019-10-31 11:58:30 UTC
Created attachment 810755 [details]
full log file of failed OBS job with the kernel bug stack trace
[ 1844s] + rm -rf /home/abuild/rpmbuild/BUILDROOT/openQA-4.6.1563206570.e00d3964-220.2.x86_64/DB
[ 1844s] [ 1832.460286] BUG: unable to handle kernel NULL pointer dereference at 0000000000000029
[ 1844s] [ 1832.461661] #PF error: [normal kernel read fault]
[ 1844s] [ 1832.461848] PGD 0 P4D 0
[ 1844s] [ 1832.461848] Oops: 0000 [#1] SMP NOPTI
[ 1844s] [ 1832.461848] CPU: 7 PID: 7939 Comm: rm Not tainted 5.1.16-1-default #1 openSUSE Tumbleweed (unreleased)
[ 1844s] [ 1832.461848] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c89-prebuilt.qemu.org 04/01/2014
[ 1844s] [ 1832.461848] RIP: 0010:vfs_unlink+0xb3/0x1c0
[ 1844s] [ 1832.461848] Code: bc f0 ff ff ff e8 2d da e4 ff 44 89 e0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 f0 83 44 24 fc 00 49 8b 86 68 01 00 00 48 85 c0 74 49 <48> 8b 50 28 48 8d 48 28 48 39 ca 0f 84 de 00 00 00 ba 04 00 00 00
[ 1844s] [ 1832.461848] RSP: 0018:ffffa50c40e1be98 EFLAGS: 00010202
[ 1844s] [ 1832.461848] RAX: 0000000000000001 RBX: ffffa50c40e1bee0 RCX: 000000000000018f
[ 1844s] [ 1832.461848] RDX: 0000000000000000 RSI: ffff978410ebc300 RDI: ffff97842fc6fa88
[ 1844s] [ 1832.461848] RBP: ffff978410ebc300 R08: 0000000000000020 R09: ffffa50c40e1be90
[ 1844s] [ 1832.461848] R10: 0000000000000005 R11: 00007fffffffffff R12: 0000000000000000
[ 1844s] [ 1832.461848] R13: ffff97842fc6fa88 R14: ffff978410ec31a8 R15: ffff978410ec3248
[ 1844s] [ 1832.461848] FS: 00007fa066ffe580(0000) GS:ffff9784b7bc0000(0000) knlGS:0000000000000000
[ 1844s] [ 1832.461848] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1844s] [ 1832.461848] CR2: 0000000000000029 CR3: 0000000231c72000 CR4: 00000000000406e0
[ 1844s] [ 1832.461848] Call Trace:
[ 1844s] [ 1832.461848] do_unlinkat+0x18b/0x2c0
[ 1844s] [ 1832.461848] do_syscall_64+0x60/0x120
[ 1844s] [ 1832.461848] entry_SYSCALL_64_after_hwframe+0x49/0xbe
in an OBS job.
inode->i_flctx is 1 (rax) in this test in break_lease:
if (inode->i_flctx && !list_empty_careful(&inode->i_flctx->flc_lease))
return __break_lease(inode, mode, FL_LEASE);
Then, there is 0x28(%rax) in the code to fetch flc_lease which indeed crashes.
Goldwyn, any idea how this could happen? Or who else could know? I don't see any fixes post 5.1 in this area, so I suppose this was not fixed upstream yet?
inode->i_flctx should clearly be a pointer. The only way it can have 0x1 is corruption. Without a dump this may be difficult to diagnose.
Adding Neil to check if he has seen something similar before.
> Adding Neil to check if he has seen something similar before.
No, I haven't seen anything like this. I agree with your analysis.
(In reply to Goldwyn Rodrigues from comment #2)
> Without a dump this may be difficult to diagnose.
Could you setup kdump on OBS workers somehow?
no, sorry. Not possible for myself. I don't have that kind of access to OBS workers. I suggest you get in contact with the administrators of the OBS instance or declare the bug as "WORKSFORME" because I have not seen this again so it might not be easily reproducible anyway.
If you see it again, feel free to reopen.