Bug 1172886 - Kernel 4.12.14-lp151.28.52-default is unable to start on some hardware
Kernel 4.12.14-lp151.28.52-default is unable to start on some hardware
Status: RESOLVED WONTFIX
Classification: openSUSE
Product: openSUSE Distribution
Classification: openSUSE
Component: Kernel
Leap 15.1
x86-64 Other
: P1 - Urgent : Critical with 5 votes (vote)
: ---
Assigned To: Borislav Petkov
E-mail List
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2020-06-13 11:22 UTC by Andreas Ernst
Modified: 2020-06-15 10:20 UTC (History)
5 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
Screenshot of Kernelpanic (116.45 KB, image/png)
2020-06-13 11:22 UTC, Andreas Ernst
Details
Screenshot of Kernelpanic with BootOptions (73.18 KB, image/png)
2020-06-14 08:02 UTC, Andreas Ernst
Details
Screenshot of BootOptions (39.07 KB, image/png)
2020-06-14 08:02 UTC, Andreas Ernst
Details
dmesg (47.56 KB, text/plain)
2020-06-14 10:30 UTC, Andreas Ernst
Details
cpuid (9.71 KB, text/plain)
2020-06-14 10:31 UTC, Andreas Ernst
Details
hwinfo of a hardware setup that does not start with kernel 4.12.14-lp151.28.52-default (916.74 KB, text/plain)
2020-06-14 21:20 UTC, Simon Wood
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Andreas Ernst 2020-06-13 11:22:41 UTC
Created attachment 838763 [details]
Screenshot of Kernelpanic

Kernelpanic with Linux mail 4.12.14-lp151.28.52-default #1 SMP Wed Jun 10 15:32:08 UTC 2020 (464fb5f) x86_64 x86_64 x86_64 GNU/Linux

Kernel Linux mail 4.12.14-lp151.28.48-default #1 SMP Fri Apr 17 05:38:36 UTC 2020 (18849d1) x86_64 x86_64 x86_64 GNU/Linux works fine.

I can only add a picture from the root server.
Comment 1 Takashi Iwai 2020-06-14 07:35:49 UTC
This might be rather the latest ucode-intel firmware update.
Could you try to boot with dis_ucode_ldr boot option?
Comment 2 Andreas Ernst 2020-06-14 08:01:58 UTC
I hope i did i right.
Comment 3 Andreas Ernst 2020-06-14 08:02:29 UTC
Created attachment 838765 [details]
Screenshot of Kernelpanic with BootOptions
Comment 4 Andreas Ernst 2020-06-14 08:02:51 UTC
Created attachment 838766 [details]
Screenshot of BootOptions
Comment 5 Takashi Iwai 2020-06-14 10:00:46 UTC
Thanks for the quick test.
Then it's not about the firmware upgrade, but the regression in the kernel indeed.

Could you boot again with the old kernel, run hwinfo and attach the output to Bugzilla?  The bug is very likely specific to the hardware, and we need details.

Reassigned to Boris.
Comment 6 Andreas Ernst 2020-06-14 10:17:16 UTC
It's too long for pasting here:

https://pastebin.com/Xf8uyScU

This are VServers with Dedicated Harddisks. I never had such issue with these servers.
Comment 7 Borislav Petkov 2020-06-14 10:23:25 UTC
Interesting:

  2b:*	0f 01 c9             	mwait  %eax,%ecx		<-- trapping instruction

provided I've typed the Code: line right.

Ok, questions:

* this box worked before with previous kernels?

* if so, pls boot it in a working kernel and upload full dmesg by doing

# dmesg > dmesg.log

* also, get the cpuid[1] tool (leap should have it) and do

# cpuid -r > cpuid.log

and upload that log too pls.

That should be for now.

Thx.

[1]: https://software.opensuse.org/package/cpuid
Comment 8 Andreas Ernst 2020-06-14 10:30:43 UTC
Created attachment 838767 [details]
dmesg
Comment 9 Andreas Ernst 2020-06-14 10:31:02 UTC
Created attachment 838768 [details]
cpuid
Comment 10 Borislav Petkov 2020-06-14 10:55:22 UTC
Wait a minute. Is that kernel running as a guest on some parallels hypervisor which says it is KVM?!?

[    0.000000] DMI: Parallels Software International Inc. Parallels Virtual Platform/Parallels Virtual Platform, BIOS 6.12.26096.1233688 08/07/2019
[    0.000000] Hypervisor detected: KVM

In any case, try booting with "idle=nomwait".
Comment 11 Simon Wood 2020-06-14 19:38:43 UTC
After updating to the *.52 I was also no longer able to boot. The previous kernel (*.48) wouldn't boot anymore either until I added the dis_ucode_ldr boot option, as recommended by Takashi Iwai.
I don't get any error messages when the boot fails. The system just hangs. Is there a boot option I should specify, that would produce log messages that could be helpful here?
Comment 12 Borislav Petkov 2020-06-14 21:17:00 UTC
(In reply to Simon Wood from comment #11)
> After updating to the *.52 I was also no longer able to boot.

Please open a separate bug and upload dmesg from a booting kernel there.
Comment 13 Simon Wood 2020-06-14 21:20:41 UTC
Created attachment 838778 [details]
hwinfo of a hardware setup that does not start with kernel 4.12.14-lp151.28.52-default

This is the output of hwinfo. After downgrading intel-ucode to version 20191115-lp151.2.21.1 I can boot kernel 4.12.14-lp151.28.52-default.
Comment 14 Borislav Petkov 2020-06-15 06:28:51 UTC
(In reply to Simon Wood from comment #13)
> Created attachment 838778 [details]
> hwinfo of a hardware setup that does not start with kernel
> 4.12.14-lp151.28.52-default

I asked you to open a *separate* bug instead of hijacking this one.

> This is the output of hwinfo. After downgrading intel-ucode to version
> 20191115-lp151.2.21.1 I can boot kernel 4.12.14-lp151.28.52-default.

And yes, you're the next one affected by faulty microcode. There's nothing we can do about that.
Comment 15 Andreas Ernst 2020-06-15 07:01:23 UTC
Ok, this solved the issue:

Linux dyndns 4.12.14-lp151.28.52-default #1 SMP Wed Jun 10 15:32:08 UTC 2020 (464fb5f) x86_64 x86_64 x86_64 GNU/Linux

Should i keep this Option always? Or can i remove this Option, if a new Kernel is available?
Comment 16 Borislav Petkov 2020-06-15 10:20:19 UTC
(In reply to Andreas Ernst from comment #15)
> Ok, this solved the issue:
> 
> Linux dyndns 4.12.14-lp151.28.52-default #1 SMP Wed Jun 10 15:32:08 UTC 2020
> (464fb5f) x86_64 x86_64 x86_64 GNU/Linux
> 
> Should i keep this Option always? Or can i remove this Option, if a new
> Kernel is available?

Keep it always as long as you're using parallels and they haven't fixed it. It looks like they're reporting CPUID(5).ECX=0x3 and the kernel tries to use it but it #GPs because, well, virtualization. And kvm is probably fine...

Closing.

(Clear stale NEEDINFO.)