Bug 1142926 - Cannot boot Tumbleweek on HP Envy x360 15m-ds0011dx Kernel oops
Cannot boot Tumbleweek on HP Envy x360 15m-ds0011dx Kernel oops
Status: RESOLVED FIXED
Classification: openSUSE
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Kernel
Current
x86-64 Other
: P5 - None : Critical (vote)
: ---
Assigned To: E-mail List
E-mail List
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2019-07-26 03:06 UTC by Gary Greene
Modified: 2019-08-12 10:06 UTC (History)
3 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
photo of kernel oops message (3.89 MB, image/jpeg)
2019-07-26 03:06 UTC, Gary Greene
Details
RTC CMOS clock hanging (4.06 MB, image/jpeg)
2019-07-26 23:44 UTC, Gary Greene
Details
dmesg output (62.86 KB, text/plain)
2019-08-09 23:57 UTC, Gary Greene
Details
dmesg output (79.72 KB, text/plain)
2019-08-10 05:12 UTC, Gary Greene
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Gary Greene 2019-07-26 03:06:07 UTC
Created attachment 811678 [details]
photo of kernel oops message

When booting an HP Envy x360 15m-ds0011dx with Tumbleweed snapshot 20190721, the machine fails to boot due to a kernel panic at boot.

The machine is running an AMD Ryzen 5 3500U APU with Vega 10 graphics with 4 cores and hyperthreading, and has 8Gb of RAM.

The commandline I need to use to boot with follows:
acpi=off idle=nomwait mitigations=off

Disabling ACPI, however, disables all but one CPU core, and disables hyperthreading

Attached is a photo of the Oops
Comment 1 Takashi Iwai 2019-07-26 10:05:35 UTC
Is this a regression from previous releases?
Comment 2 Takashi Iwai 2019-07-26 10:09:17 UTC
Also, try to boot with noapictimer option.  This should skip the APIC timer initialization your kernel Oops suggested as RIP, at least.
Comment 3 Gary Greene 2019-07-26 23:36:53 UTC
This is a new machine, so I am unaware if openSUSE ever worked on this model of machine.

As requested, I've attempted to boot the machine with the noapictimer flag, and it _does_ get farther than before, however it hangs (no oops) when attempting to set the system clock:

[ 1.931226] rtc_cmos 00:01: setting system clock to 2019-07-26T10:24:06 UTC (1564169046)

The flags being passed to the linuxefi line follows:

splash=0 verbose idle=nomwait mitigations=off noapictimer rescue=1

Note, I do see all cores trying to load up in this mode.

New photo capture will be attached shortly.
Comment 4 Gary Greene 2019-07-26 23:44:32 UTC
Created attachment 811802 [details]
RTC CMOS clock hanging
Comment 5 Takashi Iwai 2019-07-27 06:42:03 UTC
OK.  Then you'd better to report it to upstream.

But before that, make sure whether the issue was already addressed in the latest upstream 5.3-rc kernel, found in OBS Kernel:HEAD repo:
  http://download.opensuse.org/repositories/Kernel:/HEAD/standard/

If you'd like to try some older kernels, you can find them in TW history repos,
  http://download.opensuse.org/history/

and I have a collection of old kernels in my OBS repos, e.g.
  http://download.opensuse.org/repositories/home:/tiwai:/kernel:/5.1/standard/
  http://download.opensuse.org/repositories/home:/tiwai:/kernel:/5.0/standard/
etc.

When you deal with multiple kernels, I recommend you to edit /etc/zypp/zypp.conf and allow more installable kernels before the installing test kernels.  Check the line multiversion.kernels and add more entries there.
Comment 6 Gary Greene 2019-08-07 01:41:00 UTC
I've tested with 5.3rc3 (5.3.0-rc3-1.g571863b-default) and am still getting the same errors when trying to boot with ACPI enabled.

At present, I'm able to get it booting using the following boot flags:

acpi=off pci=biosirq irq=nomwait mitigations=off

In this mode, the touchpad doesn't load, the touchscreen does not respond to touch events, and the CPU still only enables one core.

I've checked on the Kernel's bugzilla, however, I don't know which area would be the most appropriate to post the bug to with how split up the subsystems are on there. A pointer as to which would be the most appropriate would be appreciated.

Now that I'm booted to the system (even with degraded performance), is there any specific items you'd like to get for debugging purposes?
Comment 7 Jiri Slaby 2019-08-08 13:47:07 UTC
(In reply to Gary Greene from comment #4)
> Created attachment 811802 [details]
> RTC CMOS clock hanging

It might help to boot with:
initcall_debug
It should print the last called initialization function -- the one which never returns...

Also capturing whole APIC timer init oops would be most helpful (using higher resolution, smaller font or some console).
Comment 8 Jiri Slaby 2019-08-08 14:42:26 UTC
(In reply to Gary Greene from comment #6)
> I've checked on the Kernel's bugzilla, however, I don't know which area
> would be the most appropriate to post the bug to with how split up the
> subsystems are on there. A pointer as to which would be the most appropriate
> would be appreciated.

I would perhaps go for platform / x86-64.

Anyway, global_clock_event is NULL on your platform. That means no usable timers  were found (no HPET nor PIT). That is very weird and full dmesg would definitely help.
Comment 9 Jiri Slaby 2019-08-09 13:16:18 UTC
(In reply to Jiri Slaby from comment #8)
> Anyway, global_clock_event is NULL on your platform. That means no usable
> timers  were found (no HPET nor PIT). That is very weird and full dmesg
> would definitely help.

A kernel fixing the apic crash is building here:
https://build.opensuse.org/project/monitor/home:jirislaby:bnc1142926

Could you try it once it finishes build?
Comment 10 Gary Greene 2019-08-09 23:56:30 UTC
Attached is my dmesg while booted with the following boot flags (without them, it plain won't boot):

BOOT_IMAGE=/boot/vmlinuz-5.3.0-rc3-1.g571863b-default root=UUID=46f08410-723b-43b5-8cd5-bab436e640ae splash=0 acpi=off pci=biosirq irq=nomwait mitigations=off

I'll try the linked kernel tomorrow after I get some stuff I need to get done for work completed.
Comment 11 Gary Greene 2019-08-09 23:57:22 UTC
Created attachment 813566 [details]
dmesg output
Comment 12 Gary Greene 2019-08-10 05:10:59 UTC
After installing that build of the kernel from your OBS home project, that did more than just fix the issue with the APIC timer screwing up. I now have all 4 cores/8 threads available.

I do see some errors from the ACPI layer that do indicate that there are some areas of the BIOS from HP that are buggy, but at this time, the machine seems to be working without issue.

I'll upload the current dmesg for comparison to the previous.
Comment 13 Gary Greene 2019-08-10 05:12:57 UTC
Created attachment 813577 [details]
dmesg output
Comment 14 Jiri Slaby 2019-08-12 10:06:48 UTC
Pushed to stable.