Bug 1168645 - Kernel panic with kernel 5.6.0 while booting
Kernel panic with kernel 5.6.0 while booting
Status: RESOLVED FIXED
Classification: openSUSE
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Kernel
Current
Other Other
: P5 - None : Normal (vote)
: ---
Assigned To: E-mail List
E-mail List
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2020-04-05 02:38 UTC by Neil Rickert
Modified: 2020-04-20 08:03 UTC (History)
5 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
Screenshot showing kernel error message (24.35 KB, image/png)
2020-04-05 02:39 UTC, Neil Rickert
Details
Another screenshot with kernel message (23.49 KB, image/png)
2020-04-05 02:40 UTC, Neil Rickert
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Neil Rickert 2020-04-05 02:38:34 UTC
User-Agent:       Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0
Build Identifier: 

Note that this is only happening with 32-bit UEFI (ia32) firmware.  Kernel 5.6.0 is doing fine with 64-bit UEFI firmware and with a legacy BIOS.  I will upload a couple of screenshots showing the problem.

This is likely to be related to bug 1167933 (which is with a 5.5 kernel).

I see this in KVM with the OVMF firmware "/usr/share/qemu/ovmf-ia32-code.bin".  I suppose it could be a bug on OVMF firmware, but it seems strange that a firmware bug could cause a kernel panic.  I do not have any physical machines with 32-bit UEFI, so I can only test this with virtualization.

Perhaps it is not important.  I'll leave that for others to judge.


Reproducible: Always
Comment 1 Neil Rickert 2020-04-05 02:39:34 UTC
Created attachment 834882 [details]
Screenshot showing kernel error message
Comment 2 Neil Rickert 2020-04-05 02:40:48 UTC
Created attachment 834883 [details]
Another screenshot with kernel message
Comment 3 Neil Rickert 2020-04-05 16:19:33 UTC
As a comparison, I booted Fedora 32 Beta
 Fedora-Workstation-Live-x86_64-32_Beta-1.2.iso

That boots just fine in the same VM, with a 5.6.0 kernel (and ia32 efi booting).
Comment 4 Jiri Slaby 2020-04-06 08:15:03 UTC
That bug is:
  BUG_ON(mm != current->active_mm);
in exit_mm. That is really weird.
Comment 5 Borislav Petkov 2020-04-06 08:23:26 UTC
Possible, if EFI runtime calls are done in the context of an mm now. I believe there was work in that direction, see struct mm_struct efi_mm...
Comment 6 Jiri Slaby 2020-04-06 08:34:38 UTC
(In reply to Neil Rickert from comment #3)
> As a comparison, I booted Fedora 32 Beta
>  Fedora-Workstation-Live-x86_64-32_Beta-1.2.iso
> 
> That boots just fine in the same VM, with a 5.6.0 kernel (and ia32 efi
> booting).

Perhaps the live dvd does not run systemd-hibernate.service -- which is the process killing the kernel at its exit()...
Comment 7 Neil Rickert 2020-04-06 18:03:35 UTC
>Perhaps the live dvd does not run systemd-hibernate.service

Actually, I created another VM for testing bug 1167933 (using Tumbleweed snapshot 20200331, and virt-install to select the ia32 efi firmware).  I updated that to kernel 5.6.0.  And the same crash occurs there.

I then cloned that VM, and installed Fedora 32 to that cloned VM.  It is booting normally with kernel 5.6.0.  I get this output:

# systemctl status systemd-hibernate
● systemd-hibernate.service - Hibernate
     Loaded: loaded (/usr/lib/systemd/system/systemd-hibernate.service; static;>
     Active: inactive (dead)
       Docs: man:systemd-suspend.service(8)

I think that means that the installed Fedora 32 is running systemd-hibernate.

Note that "efibootmgr -v" shows that it is booting with "\EFI\fedora\shimia32.efi".
Comment 8 Jiri Slaby 2020-04-08 07:44:01 UTC
The preceding error is more interesting:

> BUG: unable to handle page fault for address: 000000001557ee88
> #PF: supervisor write access in kernel mode
> #PF: error_code(0x0003) - permissions violation
> PGD fd52063 P4D fd52063 PUD fd53063 PMD 154000e1 
> Oops: 0003 [#1] SMP PTI
> CPU: 1 PID: 191 Comm: systemd-escape Not tainted 5.6.2-20.gb22bc26-default #1 openSUSE Tumbleweed (unreleased)
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
> RIP: 0008:0x3d2eed95
> Code: 8b 45 d4 8b 4d 10 8b 40 04 89 01 89 3b 50 6a 00 8b 55 0c 6a 00 8b 45 08 0f b6 4d e4 6a 01 31 f6 e8 ee c5 fc ff 83 c4 10 eb 07 <89> 03 be 05 00 00 80 a1 74 63 31 3d 83 c0 48 e8 44 d2 ff ff eb 05
> RSP: 0018:000000000fd66fa0 EFLAGS: 00010002
> RAX: 0000000000000001 RBX: 000000001557ee88 RCX: 000000003d1f1120
> RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000001
> RBP: 000000000fd66fd8 R08: 000000001557ee88 R09: 0000000000000000
> R10: 0000000000000055 R11: 0000000000000000 R12: 0000000015bcf000
> R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> FS:  00007f36ee9dc940(0000) GS:ffff9b903d700000(0000) knlGS:0000000000000000
> CS:  0008 DS: 0018 ES: 0018 CR0: 0000000080050033
> CR2: 000000001557ee88 CR3: 000000000fd5e000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> Modules linked in: efivarfs
> CR2: 000000001557ee88

The BUG at exit.c is only a result of that page fault.
Comment 9 Joey Lee 2020-04-08 07:49:36 UTC
(In reply to Neil Rickert from comment #0)
> User-Agent:       Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101
> Firefox/68.0
> Build Identifier: 
> 
> Note that this is only happening with 32-bit UEFI (ia32) firmware.  Kernel
> 5.6.0 is doing fine with 64-bit UEFI firmware and with a legacy BIOS.  I
> will upload a couple of screenshots showing the problem.
> 
> This is likely to be related to bug 1167933 (which is with a 5.5 kernel).
> 
> I see this in KVM with the OVMF firmware
> "/usr/share/qemu/ovmf-ia32-code.bin".  I suppose it could be a bug on OVMF
> firmware, but it seems strange that a firmware bug could cause a kernel
> panic.  I do not have any physical machines with 32-bit UEFI, so I can only
> test this with virtualization.
> 
> Perhaps it is not important.  I'll leave that for others to judge.
> 
> 
> Reproducible: Always

We maintains 32-bit OVMF on openSUSE TW because TW still releases i586 build. But we do not support the mixed mode (32-bit OVFM + 64-bit openSUSE).
Comment 10 Jiri Slaby 2020-04-08 10:37:06 UTC
Bisected to:

commit d9e3d2c4f103200d87f2c243a84c1fd3b3bfea8c (refs/bisect/bad)
Author: Ard Biesheuvel <ardb@kernel.org>
Date:   Mon Jan 13 18:22:37 2020 +0100

    efi/x86: Don't map the entire kernel text RW for mixed mode
Comment 11 Jiri Slaby 2020-04-08 10:54:06 UTC
Reported to upstream:
https://lore.kernel.org/linux-efi/63b125a4-6c62-fcdf-de22-d3bebe2dcbf5@suse.cz/
Comment 12 Neil Rickert 2020-04-08 22:10:46 UTC
Replying to Joey Lee:

>We maintains 32-bit OVMF on openSUSE TW because TW still releases i586 build.

I hope this is wrong.  I hope the real reason is to allow people to experiment with 32-bit OVMF in their virtual machines, perhaps with distros such as Debian or Fedora that do support mixed mode.
Comment 13 Neil Rickert 2020-04-19 04:25:15 UTC
This problem appears to be solved with the 5.6.4 kernel that just arrived in Tumbleweed.

Thanks to all for the work you have done to solve this.
Comment 14 Jiri Slaby 2020-04-20 08:03:56 UTC
good