Bug 1099012 - BUG: unable to handle kernel paging request at 000000100000004c in put_css_set_locked
BUG: unable to handle kernel paging request at 000000100000004c in put_css_se...
Status: RESOLVED WORKSFORME
Classification: openSUSE
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Kernel
Current
Other Other
: P5 - None : Normal (vote)
: ---
Assigned To: E-mail List
E-mail List
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2018-06-25 14:17 UTC by Jiri Slaby
Modified: 2019-06-13 08:35 UTC (History)
1 user (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jiri Slaby 2018-06-25 14:17:06 UTC
While doing poweroff of a virtual machine with tumbleweed, a crash occurred:
> [  OK  ] Stopped Session c1 of user root.
> BUG: unable to handle kernel paging request at 000000100000004c
> PGD 0 P4D 0
> Oops: 0000 [#1] PREEMPT SMP PTI
> Modules linked in: af_packet iscsi_ibft iscsi_boot_sysfs nls_iso8859_1 nls_cp437 vfat fat snd_hda_codec_generic virtio_gpu ttm snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm drm_kms_helper drm ppdev snd_timer snd fb_sys_fops syscopyarea sysfillrect sysimgblt soundcore joydev pcspkr virtio_input virtio_balloon virtio_net parport_pc i2c_piix4 parport qemu_fw_cfg button ata_generic uhci_hcd ehci_pci ehci_hcd ata_piix usbcore virtio_scsi serio_raw floppy sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua virtio_rng
> CPU: 1 PID: 18 Comm: ksoftirqd/1 Not tainted 4.17.2-5.g33a2d86-default #1 openSUSE Tumbleweed (unreleased)
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
> RIP: 0010:put_css_set_locked+0x7f/0x270
> RSP: 0000:ffffae67806dfda8 EFLAGS: 00010006
> RAX: 0000000fffffffe0 RBX: ffffa03a76ba9cc8 RCX: 0000000000000040
> RDX: ffffa03b015610c8 RSI: 0000000000000000 RDI: ffffa03a76ba9c00
> RBP: ffffa03a76ba9c08 R08: ffffa03b001b47e0 R09: 0000000000000100
> R10: ffffa03b001b4540 R11: ffffa03b001b42a0 R12: ffffa03a76ba9c00
> R13: ffffa03a76ba9d88 R14: dead000000000200 R15: dead000000000100
> FS:  0000000000000000(0000) GS:ffffa03b06880000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000000100000004c CR3: 000000004f554000 CR4: 00000000000006e0

The rest of the output was overwritten unfortunately.

But the assembly is:
ffffffff8112d366:       48 8b 43 08             mov    0x8(%rbx),%rax
ffffffff8112d36a:       48 8b 13                mov    (%rbx),%rdx
ffffffff8112d36d:       48 89 42 08             mov    %rax,0x8(%rdx)
ffffffff8112d371:       48 89 10                mov    %rdx,(%rax)
ffffffff8112d374:       4c 89 3b                mov    %r15,(%rbx)
ffffffff8112d377:       4c 89 73 08             mov    %r14,0x8(%rbx)
ffffffff8112d37b:       48 8b 45 00             mov    0x0(%rbp),%rax
ffffffff8112d37f:       f6 40 6c 01             testb  $0x1,0x6c(%rax)

The crash is on the last line. That is
  if (!(css->flags & CSS_NO_REF))
from css_put. I.e. css is garbage (0x0000000fffffffe0). This is called from put_css_set_locked as:
        for_each_subsys(ss, ssid) {
                list_del(&cset->e_cset_node[ssid]);
                css_put(cset->subsys[ssid]);
        }


According to rdi (holds cset->subsys) and rbp (iterator which is cset->subsys + ssid), ssid is 1. If I understand correctly, 1 stands for cpu_cgrp_id, so cset->subsys[cpu_cgrp_id] is the garbage.
Comment 1 Jiri Slaby 2018-06-25 14:19:42 UTC
And it seems to be reproducible, so full output:
> BUG: unable to handle kernel paging request at 000000100000004c
> PGD 0 P4D 0
> Oops: 0000 [#1] PREEMPT SMP PTI
> Modules linked in: af_packet iscsi_ibft iscsi_boot_sysfs nls_iso8859_1 nls_cp437 vfat fat snd_hda_codec_generic virtio_gpu snd_hda_intel ttm snd_hda_codec snd_hda_core drm_kms_helper snd_hwdep drm snd_pcm snd_timer fb_sys_fops ppdev syscopyarea sysfillrect snd joydev sysimgblt virtio_input pcspkr virtio_balloon virtio_net soundcore parport_pc i2c_piix4 parport qemu_fw_cfg button ata_generic uhci_hcd ehci_pci ehci_hcd usbcore ata_piix virtio_scsi serio_raw floppy sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua virtio_rng
> CPU: 3 PID: 30 Comm: ksoftirqd/3 Not tainted 4.17.2-5.g33a2d86-default #1 openSUSE Tumbleweed (unreleased)
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
> RIP: 0010:put_css_set_locked+0x7f/0x270
> RSP: 0000:ffffbda08073fda8 EFLAGS: 00010006
> RAX: 0000000fffffffe0 RBX: ffff9c6d36636cc8 RCX: 0000000000000040
> RDX: ffff9c6d0f20b0c8 RSI: 0000000000000000 RDI: ffff9c6d36636c00
> RBP: ffff9c6d36636c08 R08: ffff9c6dbf1e5900 R09: 0000000000000100
> R10: ffff9c6dbf1e5300 R11: ffff9c6dbf1e5e40 R12: ffff9c6d36636c00
> R13: ffff9c6d36636d88 R14: dead000000000200 R15: dead000000000100
> FS:  0000000000000000(0000) GS:ffff9c6dc6980000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000000100000004c CR3: 00000000776ba000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  cgroup_free+0x90/0xb0
>  __put_task_struct+0x3d/0x140
>  rcu_process_callbacks+0x245/0x560
>  __do_softirq+0x110/0x36b
>  ? sort_range+0x20/0x20
>  run_ksoftirqd+0x2e/0x40
>  smpboot_thread_fn+0xf1/0x1e0
>  kthread+0x112/0x130
>  ? kthread_create_worker_on_cpu+0x40/0x40
>  ret_from_fork+0x3a/0x50
> Code: 00 ad de eb 0d 48 83 c3 10 48 83 c5 08 49 39 dd 74 52 48 8b 43 08 48 8b 13 48 89 42 08 48 89 10 4c 89 3b 4c 89 73 08 48 8b 45 00 <f6> 40 6c 01 75 d4 65 ff 05 04 88 ee 76 48 8b 50 18 f6 c2 03 0f
> RIP: put_css_set_locked+0x7f/0x270 RSP: ffffbda08073fda8
> CR2: 000000100000004c
Comment 3 Jiri Slaby 2018-06-25 15:09:51 UTC
To note, I saw it twice in a row and no more.
Comment 4 Jiri Slaby 2018-08-22 11:40:52 UTC
FWIW happens with 4.18 too:
> BUG: unable to handle kernel paging request at 000000100000005c
> PGD 0 P4D 0
> Oops: 0000 [#1] PREEMPT SMP PTI
> CPU: 0 PID: 9 Comm: ksoftirqd/0 Not tainted 4.18.4-3.gbf5cd4a-default #1 openSUSE Tumbleweed (unreleased)
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
> RIP: 0010:put_css_set_locked+0x7f/0x270
> Code: ad de eb 0d 48 83 c3 10 48 83 c5 08 49 39 dd 74 52 48 8b 43 08 48 8b 13 48 89 42 08 48 89 10 4c 89 3b 4c 89 73 08 48 8b 45 00 <f6> 40 7c 01 75 d4 65 ff 05 24 50 ee 6b 48 8b 50 18 f6 c2 03 0f 85
> RSP: 0018:ffffb2fb8067fda8 EFLAGS: 00010006
> RAX: 0000000fffffffe0 RBX: ffff9172a1d470c8 RCX: 0000000000000040
> RDX: ffff9173424e6cc8 RSI: 0000000000000000 RDI: ffff9172a1d47000
> RBP: ffff9172a1d47008 R08: ffff9173401fdd20 R09: 0000000000000100
> R10: ffff9173401fd7e0 R11: ffff9173401ffd20 R12: ffff9172a1d47000
> R13: ffff9172a1d47188 R14: dead000000000200 R15: dead000000000100
> FS:  0000000000000000(0000) GS:ffff917346800000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000000100000005c CR3: 0000000022b98000 CR4: 00000000000006f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  cgroup_free+0x93/0xb0
>  __put_task_struct+0x3d/0x140
>  rcu_process_callbacks+0x270/0x580
>  __do_softirq+0x111/0x370
>  ? sort_range+0x20/0x20
>  run_ksoftirqd+0x30/0x40
>  smpboot_thread_fn+0xf1/0x1e0
>  kthread+0x112/0x130
>  ? kthread_create_worker_on_cpu+0x40/0x40
>  ret_from_fork+0x3a/0x50
> Modules linked in: af_packet iscsi_ibft iscsi_boot_sysfs nls_iso8859_1 nls_cp437 vfat fat snd_hda_codec_generic snd_hda_intel virtio_gpu snd_hda_codec ttm snd_hda_core snd_hwdep drm_kms_helper snd_pcm snd_timer snd ppdev drm virtio_net fb_sys_fops syscopyarea sysfillrect parport_pc sysimgblt pcspkr soundcore net_failover joydev failover virtio_balloon parport virtio_input i2c_piix4 qemu_fw_cfg button ata_generic ehci_pci uhci_hcd ehci_hcd ata_piix floppy serio_raw usbcore virtio_scsi sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua virtio_rng
> CR2: 000000100000005c
> ---[ end trace 918d9c21d8ad57cd ]---
Comment 5 Jiri Slaby 2019-06-13 08:35:43 UTC
I have not seen it for quite some time.