Bug 1104777 - Xen Dom0 Boot crashes with unhandled invalid opcode fault - fix may exist in kernel 4.15.5
Xen Dom0 Boot crashes with unhandled invalid opcode fault - fix may exist in ...
Status: RESOLVED FIXED
Classification: openSUSE
Product: openSUSE Distribution
Classification: openSUSE
Component: Xen
Leap 15.0
x86-64 Other
: P5 - None : Critical (vote)
: ---
Assigned To: Jürgen Groß
E-mail List
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2018-08-14 08:33 UTC by Stephen Halpin
Modified: 2022-03-04 20:51 UTC (History)
1 user (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
The is the serial port output after adding "-- console com1" and entering <F10> to boot (9.88 KB, text/plain)
2018-08-14 08:35 UTC, Stephen Halpin
Details
hwinfo output (351.52 KB, text/plain)
2018-08-14 08:36 UTC, Stephen Halpin
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Stephen Halpin 2018-08-14 08:33:41 UTC
User-Agent:       Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0
Build Identifier: 

Clean installs of Leap 15.0 with the latest patches applied always crashes when attempting to boot into the Xen Dom0 world on Dell R210II servers but always succeeds to boot the non-virtualized kernel.

Some Googling based on the serial log output led me to this post of a fix applied to the Xen code which another part of the thread said ended up in 4.15.5:

    https://lists.xenproject.org/archives/html/xen-devel/2018-04/msg00158.html

The first link near the bottom of the post describes the problem as the compiler making reference to the %gs register before its initialized and the second link points to the fix.  Later parts of the thread indicate the fix went into 4.15.5.  

It should be noted that the fix author (Jeurgen) had was a page fault issue and the original poster (Ajay) had an unhandled invalid opcode fault like I'm seeing.  This post which includes dialog from both Ajay and Jeurgen suggests this patch may have solved the invalid opcode issue:

    https://lists.xenproject.org/archives/html/xen-devel/2018-04/msg00294.html

As an aside, you'll see a comment in the console log about missing "iommu_inclusive_mapping=1" but I've added that every way I could think of and did it a second round with the addition of "iommu=1" but I later found that these are supposed to be the defaults in Xen 4.10 so I'm not sure if this is a Red Herring:

     http://xenbits.xen.org/docs/4.10-testing/misc/xen-command-line.html

Reproducible: Always

Steps to Reproduce:
Install Leap 15.0 on a 250GB SSD with a 512MB /boot/efi partition and the rest an LVM partition.  A 20GB root (/) and 3GB swap are created in the LVM partition - NO OTHER partitions or DomU configuration is used.  The KDE Desktop option was chosen.  The Software selections are changed to add "Console Tools," "Xen Virtual Machine/Host Server," "Help and Support Documentation," "Documentation" and "Network Administration."  Removed are "Multimedia," "Office Software," "Graphics," and "Games."  Individual RPMs for "vlan," "bridge-utils" and "patch" were also added.  YaST Online Update was used to bring in any updates.  The firewall was disabled and SSH enabled.  Note that to use YaST for the updates the machine was booted into the non-VM kernel after the installation.  It's the next (or any successive boot) to the Xen kernel that dies.  I can always go back to the non-VM kernel with no problems.
Actual Results:  
The machine crashes.  The instructions below suggest I should add my captured logs as attachments after the bug is created so that's where the detail will appear.

Expected Results:  
A boot to at least a console prompt if not a full KDE login window.

I'll attach the console log I pulled off the serial port by adding "-- console com1" to the "chainloader" line and a hwinfo log.
Comment 1 Stephen Halpin 2018-08-14 08:35:28 UTC
Created attachment 779669 [details]
The is the serial port output after adding "-- console com1" and entering <F10> to boot
Comment 2 Stephen Halpin 2018-08-14 08:36:18 UTC
Created attachment 779670 [details]
hwinfo output
Comment 3 Stephen Halpin 2018-08-14 08:49:13 UTC
BTW, to make sure the BIOS and associated microcode weren't the issue I upgraded to Dell's latest BIOS released on July 16, 2018.  The attachments were generated with the latest BIOS.

Also, the root partition is ext4 as the delays in cleanup in btrfs on small partitions during installation results in transient "disk full" errors.
Comment 4 Stephen Halpin 2018-08-14 08:53:13 UTC
Apparently I edited out the fact this is in UEFI mode.  Sorry.
Comment 5 Jürgen Groß 2018-08-14 08:55:55 UTC
Right, the backport of the mentioned patch is missing. Doing that right now.
Comment 6 Jürgen Groß 2018-08-14 09:23:53 UTC
I have added the patch to the kernel.
Comment 7 Swamp Workflow Management 2018-08-14 19:03:35 UTC
This is an autogenerated message for OBS integration:
This bug (1104777) was mentioned in
https://build.opensuse.org/request/show/629278 15.0 / kernel-source
Comment 9 Swamp Workflow Management 2018-08-17 10:45:56 UTC
openSUSE-SU-2018:2407-1: An update that solves 12 vulnerabilities and has 60 fixes is now available.

Category: security (important)
Bug References: 1065600,1081917,1083647,1086288,1086314,1086315,1086317,1086327,1086331,1086906,1087081,1087092,1089343,1090888,1097104,1097577,1097808,1099811,1099813,1099844,1099845,1099846,1099849,1099863,1099864,1100132,1101116,1101828,1101832,1101833,1101837,1101839,1101841,1101843,1101844,1101845,1101847,1101852,1101853,1101867,1101872,1101874,1101875,1101882,1101883,1101885,1101887,1101890,1101891,1101893,1101895,1101896,1101900,1101902,1101903,1102340,1103097,1103269,1103277,1103363,1103445,1103886,1104066,1104211,1104319,1104353,1104365,1104427,1104494,1104495,1104708,1104777
CVE References: CVE-2018-10853,CVE-2018-10876,CVE-2018-10877,CVE-2018-10878,CVE-2018-10879,CVE-2018-10880,CVE-2018-10881,CVE-2018-10882,CVE-2018-10883,CVE-2018-3620,CVE-2018-3646,CVE-2018-5391
Sources used:
openSUSE Leap 15.0 (src):    kernel-debug-4.12.14-lp150.12.16.1, kernel-default-4.12.14-lp150.12.16.1, kernel-docs-4.12.14-lp150.12.16.1, kernel-kvmsmall-4.12.14-lp150.12.16.1, kernel-obs-build-4.12.14-lp150.12.16.1, kernel-obs-qa-4.12.14-lp150.12.16.1, kernel-source-4.12.14-lp150.12.16.1, kernel-syms-4.12.14-lp150.12.16.1, kernel-vanilla-4.12.14-lp150.12.16.1
Comment 10 Swamp Workflow Management 2018-08-20 13:25:27 UTC
SUSE-SU-2018:2450-1: An update that solves 12 vulnerabilities and has 88 fixes is now available.

Category: security (important)
Bug References: 1051510,1051979,1065600,1066110,1077761,1081917,1083647,1086274,1086288,1086314,1086315,1086317,1086327,1086331,1086906,1087081,1087092,1089343,1090888,1097104,1097577,1097808,1099811,1099813,1099844,1099845,1099846,1099849,1099858,1099863,1099864,1100132,1101116,1101331,1101669,1101822,1101828,1101832,1101833,1101837,1101839,1101841,1101843,1101844,1101845,1101847,1101852,1101853,1101867,1101872,1101874,1101875,1101882,1101883,1101885,1101887,1101890,1101891,1101893,1101895,1101896,1101900,1101902,1101903,1102633,1102658,1103097,1103269,1103277,1103356,1103363,1103421,1103445,1103517,1103723,1103724,1103725,1103726,1103727,1103728,1103729,1103730,1103886,1103917,1103920,1103948,1103949,1104066,1104111,1104174,1104211,1104319,1104353,1104365,1104427,1104494,1104495,1104708,1104777,1104897
CVE References: CVE-2018-10853,CVE-2018-10876,CVE-2018-10877,CVE-2018-10878,CVE-2018-10879,CVE-2018-10880,CVE-2018-10881,CVE-2018-10882,CVE-2018-10883,CVE-2018-3620,CVE-2018-3646,CVE-2018-5391
Sources used:
SUSE Linux Enterprise Module for Public Cloud 15 (src):    kernel-azure-4.12.14-5.13.1, kernel-source-azure-4.12.14-5.13.1, kernel-syms-azure-4.12.14-5.13.1
Comment 12 Swamp Workflow Management 2018-08-28 16:17:02 UTC
SUSE-SU-2018:2538-1: An update that solves four vulnerabilities and has 52 fixes is now available.

Category: security (important)
Bug References: 1046305,1046306,1046307,1051510,1065600,1081917,1083647,1086288,1086315,1086317,1086327,1086331,1086906,1087092,1090888,1097104,1097577,1097583,1097584,1097585,1097586,1097587,1097588,1097808,1100132,1101480,1101669,1101822,1102517,1102715,1103269,1103277,1103363,1103445,1103886,1104353,1104365,1104427,1104482,1104494,1104495,1104683,1104708,1104777,1104890,1104897,1105292,1105296,1105322,1105355,1105378,1105396,1105467,1105731,802154,971975
CVE References: CVE-2018-10853,CVE-2018-10902,CVE-2018-15572,CVE-2018-9363
Sources used:
SUSE Linux Enterprise Module for Live Patching 15 (src):    kernel-default-4.12.14-25.16.1, kernel-livepatch-SLE15_Update_4-1-1.3.1
Comment 13 Swamp Workflow Management 2018-08-28 16:25:56 UTC
SUSE-SU-2018:2539-1: An update that solves four vulnerabilities and has 52 fixes is now available.

Category: security (important)
Bug References: 1046305,1046306,1046307,1051510,1065600,1081917,1083647,1086288,1086315,1086317,1086327,1086331,1086906,1087092,1090888,1097104,1097577,1097583,1097584,1097585,1097586,1097587,1097588,1097808,1100132,1101480,1101669,1101822,1102517,1102715,1103269,1103277,1103363,1103445,1103886,1104353,1104365,1104427,1104482,1104494,1104495,1104683,1104708,1104777,1104890,1104897,1105292,1105296,1105322,1105355,1105378,1105396,1105467,1105731,802154,971975
CVE References: CVE-2018-10853,CVE-2018-10902,CVE-2018-15572,CVE-2018-9363
Sources used:
SUSE Linux Enterprise Workstation Extension 15 (src):    kernel-default-4.12.14-25.16.1
SUSE Linux Enterprise Module for Legacy Software 15 (src):    kernel-default-4.12.14-25.16.1
SUSE Linux Enterprise Module for Development Tools 15 (src):    kernel-docs-4.12.14-25.16.1, kernel-obs-build-4.12.14-25.16.1, kernel-source-4.12.14-25.16.1, kernel-syms-4.12.14-25.16.1, kernel-vanilla-4.12.14-25.16.1, lttng-modules-2.10.0-5.6.1
SUSE Linux Enterprise Module for Basesystem 15 (src):    kernel-default-4.12.14-25.16.1, kernel-source-4.12.14-25.16.1, kernel-zfcpdump-4.12.14-25.16.1
SUSE Linux Enterprise High Availability 15 (src):    kernel-default-4.12.14-25.16.1
Comment 14 Jürgen Groß 2018-09-19 06:20:31 UTC
Closing the bug, patch is in the kernel.