Bug 1075178 - ARM midway: kernel oops with running VMs
ARM midway: kernel oops with running VMs
Status: NEW
Classification: openSUSE
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Kernel
armv7 openSUSE Factory
: P4 - Low : Normal (vote)
: ---
Assigned To: Andre Przywara
E-mail List
Depends on:
  Show dependency treegraph
Reported: 2018-01-09 11:53 UTC by Ruediger Oertel
Modified: 2019-04-12 15:07 UTC (History)
9 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Note You need to log in before you can comment on or make changes to this bug.
Description Ruediger Oertel 2018-01-09 11:53:26 UTC
current kernels (4.14, 4.15pre) crash oopsing after only a minute once the obs workers are active.


armbuild16 login: [  102.656417] Internal error: Oops: 80000207 [#1] PREEMPT SMP ARM
[  102.662336] Modules linked in: loop af_packet ipmi_si ipmi_devintf ipmi_msghandler xgmac highbank_mc_edac uio_pdrv_genirq uio sata_highbank xhci_hcd ohci_hcd ehci_hcd usbcore sg dm_multipath dm_mod dax scsi_dh_rdac scsi_dh_emc scsi_dh_alua
[  102.683610] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.15.0-rc7-1.gb07c570-lpae #1
[  102.691265] Hardware name: Highbank
[  102.694757] PC is at 0x401e54
[  102.697728] LR is at dbg_cpu_pm_notify+0x3c/0x48
[  102.702345] pc : [<00401e54>]    lr : [<c044516c>]    psr: 800301d3
[  102.708610] sp : ed55fe40  ip : ed55fe68  fp : ed55fe64
[  102.713834] r10: e93d7bc8  r9 : 800301d3  r8 : c1717434
[  102.719057] r7 : 00000000  r6 : 00448698  r5 : 0000001a  r4 : 0000001a
[  102.725581] r3 : 00000004  r2 : 00001000  r1 : ffffffff  r0 : 00000000
[  102.732108] Flags: Nzcv  IRQs off  FIQs off  Mode SVC_32  ISA ARM  Segment user
[  102.739415] Control: 30c5387d  Table: 28183700  DAC: 55555555
[  102.745163] Process swapper/2 (pid: 0, stack limit = 0x98bb84e1)
[  102.751168] Stack: (0xed55fe40 to 0xed560000)
[  102.755525] fe40: fffffffc 00000000 00000000 00000002 c160cd68 00000017 ed55fe74 ed55fe68
[  102.763705] fe60: c044516c c0444dd0 ed55fe9c ed55fe78 c04a5c0c c044513c c17aeb70 00000002
[  102.771883] fe80: 00000000 00000000 ffffffff 00000017 ed55fecc ed55fea0 c04a5ef8 c04a5ba0
[  102.780059] fea0: 00000000 c16ddf24 ed55fecc 00000002 ffffffff 00000000 c16ddf24 e698db20
[  102.788237] fec0: ed55fef4 ed55fed0 c0596ee4 c04a5ebc 00000000 c16dded4 c16ddf24 00000001
[  102.796416] fee0: 00000001 c16dded4 ed55ff04 ed55fef8 c0596f78 c0596eac ed55ff1c ed55ff08
[  102.804595] ff00: c0b6d200 c0596f60 c16ddf34 00000001 ed55ff5c ed55ff20 c0b69dcc c0b6d1dc
[  102.812772] ff20: c0b6bf8c c0b6bbd4 00000000 e93d7e18 0033338b 00000001 e93d7bc8 c16dded4
[  102.820951] ff40: 00000004 e93d7bc8 c16dded4 c1605e44 ed55ff7c ed55ff60 c0b6a1e8 c0b69cb4
[  102.829128] ff60: ffffe000 c1605dd4 c1605e38 00000004 ed55ff8c ed55ff80 c04cd83c c0b6a1b4
[  102.837302] ff80: ed55ffcc ed55ff90 c04cdb94 c04cd818 00000085 c108bea0 c16f21fd c1513d08
[  102.845478] ffa0: 00307000 00000085 00000002 30c0387d c17173d8 00307000 413fc0f2 00000000
[  102.853653] ffc0: ed55ffdc ed55ffd0 c04cdec8 c04cd9b8 ed55fff4 ed55ffe0 c043fe74 c04cdeac
[  102.861825] ffe0: 2d4f7cc0 00000000 00000000 ed55fff8 00401eac c043fcfc 00000000 00000000
[  102.870004] Code: bad PC value
[  102.873067] ---[ end trace 755fb07cdd72e990 ]---
[  102.877679] Kernel panic - not syncing: Attempted to kill the idle task!
[  102.884381] CPU1: stopping
[  102.887089] CPU: 1 PID: 39 Comm: kworker/1:1 Tainted: G      D          4.15.0-rc7-1.gb07c570-lpae #1
[  102.896300] Hardware name: Highbank
[  102.899789] Workqueue: events wait_rcu_exp_gp
[  102.904152] [<c0443eb0>] (unwind_backtrace) from [<c043ca28>] (show_stack+0x20/0x28)
[  102.911893] [<c043ca28>] (show_stack) from [<c0d0c8b8>] (dump_stack+0xb8/0xe4)
[  102.919114] [<c0d0c8b8>] (dump_stack) from [<c04404e8>] (handle_IPI+0x3bc/0x3cc)
[  102.926507] [<c04404e8>] (handle_IPI) from [<c0401890>] (gic_handle_irq+0x8c/0x90)
[  102.934072] [<c0401890>] (gic_handle_irq) from [<c0d2c63c>] (__irq_svc+0x5c/0x94)
[  102.941548] Exception stack(0xed69de00 to 0xed69de48)
[  102.946593] de00: c1605dd4 e93c74c0 c162ac40 00000001 ed69de60 00000003 c04fd13c 00000000
[  102.954764] de20: 00000003 00000002 c162ac40 ed69de9c 00000001 ed69de50 00000020 c052084c
[  102.962933] de40: 200f0013 ffffffff
[  102.966420] [<c0d2c63c>] (__irq_svc) from [<c052084c>] (smp_call_function_single+0xbc/0x1b8)
[  102.974854] [<c052084c>] (smp_call_function_single) from [<c04fbfd0>] (sync_rcu_exp_select_cpus+0x288/0x460)
[  102.984677] [<c04fbfd0>] (sync_rcu_exp_select_cpus) from [<c04fcac0>] (wait_rcu_exp_gp+0x20/0x34)
[  102.993547] [<c04fcac0>] (wait_rcu_exp_gp) from [<c049d6f0>] (process_one_work+0x208/0x584)
[  103.001895] [<c049d6f0>] (process_one_work) from [<c049e740>] (worker_thread+0x64/0x5c0)
[  103.009983] [<c049e740>] (worker_thread) from [<c04a3f78>] (kthread+0x16c/0x174)
[  103.017373] [<c04a3f78>] (kthread) from [<c0437cec>] (ret_from_fork+0x14/0x28)
[  103.024589] CPU0: stopping
[  103.027295] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G      D          4.15.0-rc7-1.gb07c570-lpae #1
[  103.036247] Hardware name: Highbank
[  103.039737] [<c0443eb0>] (unwind_backtrace) from [<c043ca28>] (show_stack+0x20/0x28)
[  103.047477] [<c043ca28>] (show_stack) from [<c0d0c8b8>] (dump_stack+0xb8/0xe4)
[  103.054697] [<c0d0c8b8>] (dump_stack) from [<c04404e8>] (handle_IPI+0x3bc/0x3cc)
[  103.062090] [<c04404e8>] (handle_IPI) from [<c0401890>] (gic_handle_irq+0x8c/0x90)
[  103.069656] [<c0401890>] (gic_handle_irq) from [<c0d2c63c>] (__irq_svc+0x5c/0x94)
[  103.077131] Exception stack(0xc1601e78 to 0xc1601ec0)
[  103.082176] 1e60:                                                       00000000 00000000
[  103.090347] 1e80: 27ea0000 200e0093 c16ddf34 00000001 c16dded4 c16ddf24 f44dad14 00000017
[  103.098519] 1ea0: e93b3bc8 c1601f04 c1601ec8 c1601ec8 c0b6a0b8 c0b6a0bc 200e0013 ffffffff
[  103.106700] [<c0d2c63c>] (__irq_svc) from [<c0b6a0bc>] (cpuidle_enter_state+0x414/0x4d0)
[  103.114790] [<c0b6a0bc>] (cpuidle_enter_state) from [<c0b6a1e8>] (cpuidle_enter+0x40/0x54)
[  103.123052] [<c0b6a1e8>] (cpuidle_enter) from [<c04cd83c>] (call_cpuidle+0x30/0x54)
[  103.130705] [<c04cd83c>] (call_cpuidle) from [<c04cdb94>] (do_idle+0x1e8/0x240)
[  103.138011] [<c04cdb94>] (do_idle) from [<c04cdec8>] (cpu_startup_entry+0x28/0x2c)
[  103.145578] [<c04cdec8>] (cpu_startup_entry) from [<c0d23fd0>] (rest_init+0xe0/0xe4)
[  103.153323] [<c0d23fd0>] (rest_init) from [<c1400ea0>] (start_kernel+0x454/0x460)
[  104.038862] SMP: failed to stop secondary CPUs
[  104.055120] Rebooting in 10 seconds..
[  115.212551] SMP: failed to stop secondary CPUs
Comment 1 Alexander Graf 2018-01-09 19:10:40 UTC
Let's ask Andre - he's the only other person I know of who has a Midway system.
Comment 2 Andre Przywara 2018-01-09 22:13:28 UTC
4.15-rc1 introduced an issue with the DT that is on the SPI flash, since that only advertises a 4KB GIC CPU interface. So KVM now refuses the VGIC functionality to guests, with QEMU reverting to its userland GIC emulation (hello RPi2!). Up until -rc4 the box would actually crash on booting due to two other bugs, but this is now fixed with -rc5 (f384dcfe4d918 and its parent).
Could you dump the boot messages, to see if the VGIC is usable? There are some explicit warnings if the DT is not correct.
So if this is the case, guests now no longer use the in-kernel VGIC, which is actually not well tested. In the long run we should debug and fix this, though it is easier (and much faster!) to actually fix the DT and get the VGIC back.
A quick workaround would consist in patching the DT loaded into DRAM in U-Boot:
> fdt addr $fdt_addr
> fdt set /soc/interrupt-controller reg <0xfff11000 0x1000 0xfff12000 0x2000 0xfff14000 0x2000 0xfff16000 0x2000>
One could put those two lines into the boot.scr, for instance.

Also I can provide instructions on how to update the firmware flash with the fixed DT. This is pretty low-risk, as we would just change the DT partition, which does not influence the rest of the firmware and thus has no potential to brick the box.

Let me know if that is the actual issue. I ran some easy testing with (fixed) -rc3 and lately -rc6 (with the VGIC), and didn't spot any obvious issues. Would be good to know your reproducer, if this problem still persists.
Comment 3 Ruediger Oertel 2018-01-11 16:23:07 UTC
hm booting 4.15.0-rc7 gives me:
[    0.000000] GIC: GICv2 detected, but range too small and irqchip.gicv2_force_probe not set
[    0.000000] GIC physical location is 0xfff11000

tried then "irqchip.gicv2_force_probe=1"

[    0.000000] GIC: GICv2 at 0x00000000fff12000, but range is too small (broken DT?), assuming 8kB
[    0.000000] GIC: Using split EOI/Deactivate mode
[    0.000000] GIC physical location is 0xfff11000
[    2.390469] kvm [1]: vgic-v2@fff14000
[    2.394252] kvm [1]: vgic interrupt IRQ16

this time the cpu stalls when trying to boot a VM.

trying to put the two lines into boot.script and recreated boot.scr ...

# dmesg | grep -i gic
[    0.000000] Kernel command line: root=/dev/disk/by-id/ata-Hitachi_HTS727550A9E364_J3300080HDT93B-part2 loader=uboot disk=/dev/disk/by-id/ata-Hitachi_HTS727550A9E364_J3300080HDT93B resume=/dev/disk/by-id/ata-Hitachi_HTS727550A9E364_J3300080HDT93B-part3 plymouth.enable=0 console=ttyAMA0,115200n8 irqchip.gicv2_force_probe=1 panic=10
[    0.000000] GIC: Using split EOI/Deactivate mode
[    0.000000] GIC physical location is 0xfff11000
[    2.381624] kvm [1]: vgic-v2@fff14000
[    2.385397] kvm [1]: vgic interrupt IRQ16

dropping the kernel param again:
[    0.000000] GIC: Using split EOI/Deactivate mode
[    0.000000] GIC physical location is 0xfff11000
[    2.378315] kvm [1]: vgic-v2@fff14000
[    2.382097] kvm [1]: vgic interrupt IRQ16
Comment 4 Ruediger Oertel 2018-01-11 16:43:27 UTC
but it still never gets past starting a VM, machine gets stuck without any message
Comment 5 Andre Przywara 2018-01-11 16:59:12 UTC
Mmh, and this worked before? I tested VMs with a host 4.15-rc7, even with some early Spectre var2 workarounds, and didn't spot any issues.
Can you pinpoint to a kernel version that worked? 4.13?
Can you provide your QEMU command line and guest kernel information, so I could try this here?

And yes: irqchip.gicv2_force_probe=1 should have the same effect as the DT patching. I wouldn't recommend running 4.15 without any of these two workarounds, as using the userland GIC (and userland timer!) affects VM performance.
Comment 6 Ruediger Oertel 2018-01-12 09:14:27 UTC
yes it works, the host kernel we are currently running there is 4.11.1-1-lpae
most of the ones in between 4.12-4.14 simply crashed on boot. 4.15-rc is the
first one that I can get to boot again but as described hangs when starting
VMs (note that the guest kernels vary anyway, these are build jobs and the
guest kernel is basically the kernel of the distro it's building for).

example qemu cmdline:
/usr/bin/qemu-system-arm -nodefaults -no-reboot -nographic -vga none -enable-kvm -M virt -cpu host -object rng-random,filename=/dev/random,id=rng0 -device virtio-rng-device,rng=rng0 -runas qemu -mem-prealloc -mem-path /dev/hugepages -net none -kernel /var/cache/obs/worker/root_2/.mount/boot/kernel -initrd /var/cache/obs/worker/root_2/.mount/boot/initrd -append root=/dev/disk/by-id/virtio-0 rootfstype=ext4 rootflags=noatime panic=1 quiet no-kvmclock nmi_watchdog=0 rw rd.driver.pre=binfmt_misc elevator=noop console=ttyAMA0 init=/.build/build -m 1020 -drive file=/var/cache/obs/worker/root_2/root,format=raw,if=none,id=disk,serial=0,cache=unsafe -device virtio-blk-device,drive=disk -drive file=/var/cache/obs/worker/root_2/swap,format=raw,if=none,id=swap,serial=1,cache=unsafe -device virtio-blk-device,drive=swap -serial stdio -smp 1
Comment 7 Ruediger Oertel 2019-04-12 15:07:10 UTC
tried a different approach:
got fdt from http://ftp.informatik.uni-frankfurt.de/fedora/releases/28/Everything/armhfp/os/images/pxeboot/dtb/ecx-2000.dtb

setenv kernel 'zImage'
setenv initrd 'initrd'
setenv flavor 'midway'
setenv fdtfile 'ecx-2000.dtb'
setenv target 'boot'
setenv load_kernel 'ext2load ${bootdev} ${bootpart} ${kerneladdr} ${kernel}'
setenv load_initrd 'ext2load ${bootdev} ${bootpart} ${ramdiskaddr} ${initrd}; setenv rd_filesize ${sizeprefix}${filesize}'
setenv loadfdt 'ext2load ${bootdev} ${bootpart} ${fdtaddr} ${fdtfile}'
setenv fixfdt 'fdt addr $fdtaddr; fdt set /soc/interrupt-controller reg <0xfff11000 0x1000 0xfff12000 0x2000 0xfff14000 0x2000 0xfff16000 0x2000>'
setenv boot_kernel '${boottype} ${kerneladdr} ${ramdiskaddr}:${rd_filesize} ${fdtaddr}'
setenv doit 'if run load_kernel; then; run load_initrd; run loadfdt; run fixfdt; run boot_kernel; fi'
setenv bootcmd 'for bootdev in ${bootdevs}; do for unit in ${units}; do setenv bootpart; for bootpart in ${bootparts}; do setenv fix_bootpart "setenv bootpart $bootpart"; run fix_bootpart; run setdev; run doit; done; done; done'

tried using kernel 5.0.7-1-lpae ... but in the end I either see the same
crashes as initially or the machine locks up for a bit and then:
Apr 12 15:03:30 armbuild10 kernel: rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
Apr 12 15:03:30 armbuild10 kernel: rcu:         1-...0: (217 ticks this GP) idle=95a/0/0x1 softirq=10854/10856 fqs=5981
Apr 12 15:03:30 armbuild10 kernel: rcu:         (detected by 3, t=12002 jiffies, g=13069, q=31)
Apr 12 15:03:30 armbuild10 kernel: Sending NMI from CPU 3 to CPUs 1:
Apr 12 15:03:30 armbuild10 kernel: rcu: rcu_preempt kthread starved for 1999 jiffies! g13069 f0x0 RCU_GP_DOING_FQS(6) ->state=0x0 ->cpu=2
Apr 12 15:03:30 armbuild10 kernel: rcu: RCU grace-period kthread stack dump:
Apr 12 15:03:30 armbuild10 kernel: rcu_preempt     R  running task        0    10      2 0x00000000
Apr 12 15:03:30 armbuild10 kernel: [<c0e2c124>] (__schedule) from [<c180a570>] (__per_cpu_offset+0x0/0x40)

okay, back to 4.11.1 that works ...