Bug 1177009

Summary: Leap 15.2 stopped working in KVM with ovmf-ia32 firmware
Product: [openSUSE] openSUSE Distribution Reporter: Neil Rickert <nwr10cst-oslnx>
Component: KernelAssignee: openSUSE Kernel Bugs <kernel-bugs>
Status: NEW --- QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P5 - None CC: fvogt, glin, jlee, kernel-bugs, mchang, nwr10cst-oslnx, tiwai
Version: Leap 15.2Flags: mchang: needinfo? (kernel-bugs)
Target Milestone: ---   
Hardware: x86-64   
OS: openSUSE Leap 15.2   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: Output of "last" command
Software updates on KVM host since 9/21

Description Neil Rickert 2020-09-26 19:20:06 UTC
User-Agent:       Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0
Build Identifier: 

I'm reporting as a kernel bug, but it might be a bug elsewhere.  And yes, I know that this is not a supported configuration.

Situation:  I have both Leap 15.2 and Tumbleweed installed side by side in a KVM virtual machine, using 32-bit efi for booting.  Both Leap and Tumbleweed are 64-bit.  I originally setup this VM under Leap 15.0 (as KVM host machine).

Everything had been working well.  However, I recently tried to boot the Leap 15.2 system in that VM, and it failed to boot.  I see the kernel and initrd being loaded.  And then there is a reset and I get back to the boot screen.  However, Tumbleweed continues to boot without a problem on that virtual machine.

I am booting with "grub2-i386-efi", which is installed for both Tumbleweed and for Leap 15.2.  This does not appear to be a grub2 problem, because the kernel and initrd are being loaded.

Before today, I most recently booted Leap 15.2 on Sept 22, and I updated it at that time.  I then rebooted (successfully) to check after the update.  I have not been able to boot since, until my workaround today.

The KVM host is also running Leap 15.2, and I did apply some updates since Sept 22.  My guess is that one of those updates affected how the virtualization is working.

I have a similar install of Leap 15.2 on an external USB drive, and I have that setup so that it can boot with legacy BIOS, with 64-bit efi and with 32-bit efi.  Testing with that, it also fails to boot the same virtual machine with 32-bit efi, bit it successfully boots with either legacy BIOS or with 64-bit efi.  So it looks as if whatever is going wrong has to do with communication between the kernel and the 32-bit efi firmware.

Booting the virtual machine to Tumbleweed, then setting up for rescue/chroot, I have installed kernel 5.8.11-1.gf4bb27a-default from the stable kernels repo.  And now Leap 15.2 does boot successfully with that kernel.  It does not boot with any of the 5.3.x kernels for Leap 15.2 that I have tested.



Reproducible: Always
Comment 1 Neil Rickert 2020-09-26 19:23:15 UTC
Created attachment 841980 [details]
Output of "last" command

Here's output of the "last" command.  Entries for today (Sept 26) are all with my workaround of using the 5.8.11 kernel.  Before today, you can see that previous kernels worked.  On Sept 22, you can see that I booted, then rebooted.  The reboot was after an update.
Comment 2 Neil Rickert 2020-09-26 19:51:35 UTC
Created attachment 841981 [details]
Software updates on KVM host since 9/21

This is selected from from the output of

grep '^2020-09-2.*install' /var/log/history

on the host system (where libvirt is running for the virtualization).  I deleted lines with dates on 9/21 or earlier.
Comment 4 Neil Rickert 2020-09-30 05:12:14 UTC
>Per your description, the issue happened after kernel upgrade in the VM.

No, that's not quite right.

The system booted fine with all kernels, up through Sept 22.  This included booting with 5.3.18-lp152.41  However, something changed (I'm not sure what), and  now neither 5.3.18-lp152.41 nor 5.3.18-lp152.36 will boot.  The kernel from the release iso also will not boot, but it did at one time.

I originally installed Leap 15.2 into this VM on Aug 30, 2019 when it was an alpha release.  I'm checking that date from the date of "/lost+found".  And it has worked well ever since, until a few days ago.

To check the possibility that the NVRAM might be corrupt, I recreated the VM (with a different name) using "virt-install".  And it still would not boot those kernels.  So I don't think it is an NVRAM corruption problem.
Comment 5 Gary Ching-Pang Lin 2020-09-30 05:49:07 UTC
The new grub2 package(2.04-lp152.7.15.1) is released at Sept. 22. I compared the difference between 2.04-lp152.7.15.1 and 2.04-lp152.7.12.1, and there is only one patch added:

https://build.opensuse.org/package/view_file/openSUSE:Leap:15.2:Update/grub2/0001-efi-linux-provide-linux-command.patch?expand=1

The patch forced "linux" command to use "linuxefi" first and then the original "linux". "linuxefi" was mostly used in x86_64 for Secure Boot, and maybe it's worth to downgrade grub2 to 2.04-lp152.7.12.1 and see how it works.
Comment 6 Neil Rickert 2020-09-30 17:32:01 UTC
Thank you for that suggestion.

So grub2-i386-efi version 2.04-lp152.7.15.1 is broken.  But everything works correctly with version 2.04-lp152.7.12.1

My tests:
I downgraded all grub2 to the version in the main repo.  I then ran grub2-install (this is not automatic for i386-efi).  On reboot, all kernels work.

I then upgrade again to the latest version, but I did not run "grub2-install".  Again, everything worked.  Then I ran "grub2-install", and after that I could not boot any of the leap kernels, so I booted to kernel 5.8.11.

Finally, I downgraded only grub2-i386-efi to version 2.04-lp152.7.12.1, and ran "grub2-install".  And everything works correctly (all kernels boot).
Comment 7 Neil Rickert 2020-10-04 22:27:33 UTC
I am adding Michael Chang to the CC list, because I think he maintains grub.

Current situation:

grub2-i386-efi version 2.04-lp152.7.15.1 is broken.  The fix is to install the previous version (2.04-lp152.7.12.1).

In Tumbleweed, verson 2.04-17.1 is broken.  To fix this, I installed 2.04-16.1, which I found in the history archives (I think for 20200915).

In both cases, the broken version of grub2-i386-efi will boot the Tumbleweed 5.8 kernels, but fails to boot the Leap 5.3 kernels.

In response, I am changing my testing procedure.

Previous testing procedure:
 When I see that this package is updated, I run:
  grub2-install --target=i386-efi
 and then I checked whether it would boot the system.

New testing procedure:
 When the package has been updated:
   cd /boot/grub2
   rm -rf i386-efi.old
   mv i386-efi  i386-efi.old
   grub2-install --target=i386-efi
 and then check that it can boot both Tumbleweed and Leap 15.2

Moving the directory is to allow easier recovery in case of problems.

Note that all of my test is in a KVM virtual machine.  I do not have a physical machine with this firmware, but such machines do exist.

I'm guessing that I'm probably the only person routinely testing this package when it is updated.
Comment 8 Michael Chang 2020-10-05 02:42:34 UTC
Hi Neil,

Are you using `linux` command in 32-bit efi for booting kernel ? If so, could you please revert to old "working" grub, and try "linuxefi" there to see if the same problem can be reproduced? The last modification in grub will change the linux command to use efi handover protocol on efi platforms to boot the kernel, and probably ia32 kernel is having some issues dealing with that.
TIA.
Comment 9 Neil Rickert 2020-10-05 03:55:06 UTC
Yes, I am using "linux".

I did just try with "linuxefi" (and "initrdefi") a few minutes ago, using the grub2-i386-2.04-lp152.7.15.1 and it does not work.  It gives the same problem as using "linux".

This is Leap 15.2.  The kernel is 64-bit (x64, not ia32).  If I understand it correctly, "linuxefi" would try to load the kernel as an ia32 efi binary.  But it is an x64 efi binary, so that should not work.

I don't know why the grub code does not recognize this and drop back to using "linux".  That seems to work with the 5.8 kernel for Tumbleweed (also 64-bit), but not with the 5.3 kernel for Leap 15.2
Comment 10 Michael Chang 2020-10-05 04:12:35 UTC
(In reply to Neil Rickert from comment #9)

> I don't know why the grub code does not recognize this and drop back to
> using "linux".  That seems to work with the 5.8 kernel for Tumbleweed (also
> 64-bit), but not with the 5.3 kernel for Leap 15.2

Because "Secure Boot", it just cannot provide any vehicle to bypass efistub loader.
Comment 11 Michael Chang 2020-10-05 04:53:26 UTC
If there's no secure boot support required for ia32 and we used to come acorss some absurd x64 shipped with 32-bit efi firmware, maybe we should revert the last change made to i386-efi to make sure existing hardware can continue to work.

I am not sure whether this has any implication to kiwi again, since they might see different needs and prefer different "default"..

@Fabian,

What do you think ? Thanks in advanced.
Comment 12 Michael Chang 2020-10-06 05:31:33 UTC
I have tested ovmf-ia32 and openSUSE Tumbleweed (64-bit)  booted fine with latest grub2-i386-efi (2.04-17.1) . The kernel version is 5.8.12-1-default.

Hi Neil

Just to make sure we're on the same page.

It is not clear to me. In comment#7 you described grub version 2.04-17.1 is broken, but you also mentioned that "In both cases, the broken version of grub2-i386-efi will boot the Tumbleweed 5.8 kernels, but fails to boot the Leap 5.3 kernels."

If some kernel works and others don't, maybe we have to include kernel team to help, or we just revert the change made to i386-efi (comment#11) given the 32-bit efi handover entry and 64/32 mixed mode is a bit too chaos that we probably need to do more test before switching.
Comment 13 Fabian Vogt 2020-10-06 08:39:50 UTC
(In reply to Michael Chang from comment #11)
> If there's no secure boot support required for ia32 and we used to come
> acorss some absurd x64 shipped with 32-bit efi firmware, maybe we should
> revert the last change made to i386-efi to make sure existing hardware can
> continue to work.
> 
> I am not sure whether this has any implication to kiwi again, since they
> might see different needs and prefer different "default"..
> 
> @Fabian,
> 
> What do you think ? Thanks in advanced.

So linuxefi on grub2-i386-efi is not able to boot the x86_64 Leap kernel?

Kiwi does this:

set linux=linux
set initrd=initrd
if [ "$${grub_cpu}" = "x86_64" -o "$${grub_cpu}" = "i386" ];then
    if [ "$${grub_platform}" = "efi" ]; then
        set linux=linuxefi
        set initrd=initrdefi
    fi
fi

Which is practically the same logic as in grub itself now with that patch.
If there's no reason to use the kernel efi stub on i386 (with an 32bit kernel), I'm also in favor of just not using that there. This would need a change in kiwi as well to remove the '-o "$${grub_cpu}" = "i386"', but I'm not aware of any kiwi image with grub2-i386-efi currently, so it's not urgent.

(In reply to Michael Chang from comment #12)
> I have tested ovmf-ia32 and openSUSE Tumbleweed (64-bit)  booted fine with
> latest grub2-i386-efi (2.04-17.1) . The kernel version is 5.8.12-1-default.
> 
> Hi Neil
> 
> Just to make sure we're on the same page.
> 
> It is not clear to me. In comment#7 you described grub version 2.04-17.1 is
> broken, but you also mentioned that "In both cases, the broken version of
> grub2-i386-efi will boot the Tumbleweed 5.8 kernels, but fails to boot the
> Leap 5.3 kernels."
> 
> If some kernel works and others don't, maybe we have to include kernel team
> to help, or we just revert the change made to i386-efi (comment#11) given
> the 32-bit efi handover entry and 64/32 mixed mode is a bit too chaos that
> we probably need to do more test before switching.

Both Leap and TW kernels have CONFIG_EFI_MIXED=y, so it should work in theory.
It looks like there is some work on the 32->64 startup code in the TW kernel,
including https://github.com/torvalds/linux/commit/17054f492dfd4d91e093ebb87013807812ec42a4#diff-9dda850e3f9dc2892fa064d7067c86f7, but AFAICT it should've worked previously with efi handover as well.
Comment 14 Neil Rickert 2020-10-06 16:44:33 UTC
Responding to Michael at comment #12

Note that I originally reported this as a kernel bug, so Takashi is on the CC list for this bug.

Let me give a few more details that I may have missed.

I have both Tumbleweed and Leap 15.2 installed in the same VM.  Currently, Leap 15.2 controls the booting.  But it is easy to switch to have Tumbleweed control the booting.  Whichever controls the booting, the grub menu includes an entry to boot Tumbleweed and an entry to boot Leap 15.2.

When I said that the Tumbleweed grub would not boot the 5.3 kernel, I was referring to booting Leap 15.2 from the Tumbleweed grub menu.  I could try installing that kernel in Tumbleweed, but it wouldn't make any difference since the problem seems to be a failure to load the kernel.

I did install a 5.8 kernel into Leap 15.2, and that was able to boot.

Using the working grub2-i386-efi (the previous version), I did try booting using "linuxefi" rather than "linux".  And that successfully boots Tumbleweed (5.8.x kernel) but fails to boot Leap 15.2 (5.3.x kernel).

I have noticed, from another bug report, that Takashi has a repo with old Tumbleweed kernels back to 5.4.x.  I may try some of those to see where the kernel behavior changes with "linuxefi".  I have already tested kernel 5.7.11, which I still have in the Tumbleweed system.  And that boots fine with "linuxefi".
Comment 15 Neil Rickert 2020-10-06 19:39:48 UTC
A followup to my last comment.

In final paragraph, I mentioned that Takashi has older kernels.  I found that information in bug 1175908

I have tried several of those older kernels in Tumbleweed with ia32 efi booting.  I used "linuxefi" in my attempts to boot.

kernel-default-5.6.15-1.1.gbfa465b.x86_64.rpm
kernel-default-5.5.13-1.1.g0af205d.x86_64.rpm
kernel-default-5.4.14-1.1.gfc4ea7a.x86_64.rpm

Those all boot without a problem.  However:

kernel-default-5.3.12-1.1.g60a2268.x86_64.rpm

will not boot with "linuxefi".

It looks as if there was a change between 5.3 kernels and 5.4 kernels.
Comment 16 Michael Chang 2020-10-07 04:11:05 UTC
Hi Neil and Fabian,

Thanks a lot for all the valuable and informative feedback, especially Neil who took his time to test on different kernel versions that we definitely can use the result to pin down the cause.

Now setting needinfo to the kernel team as this is about changes in between 5.3 and 5.4 kernels. Let's see will come up with.
Comment 17 Michael Chang 2020-10-09 02:45:59 UTC
Hi Neil,

I have verified this kernel commit which fixed the problem for me ...

https://github.com/torvalds/linux/commit/4911ee401b7ceff8f38e0ac597cbf503d71e690c 

The test package for kernel-5.3.12 with above patch backported can be downloaded here:

https://download.opensuse.org/repositories/home:/michael-chang:/kernel/standard/x86_64/

Would you please help to verify if the test package also fix the problem for you? Thanks in advanced.
Comment 18 Neil Rickert 2020-10-09 04:43:31 UTC
Yes, kernel-default-5.3.18-1.1.ge31647a from your repo is working fine in Leap 15.2, with the current grub2-i386-efi (version 2.04-lp152.7.15.1).

Thanks.
Comment 19 Michael Chang 2020-10-12 09:04:23 UTC
Hi Neil,

Many thanks for taking your time to verify the patch, now it's clear for what's missing in the old kernel. Let's wait for the response from kernel team if they would cherry-pick the patch for maintenance update.
Comment 20 Neil Rickert 2020-10-18 23:14:33 UTC
It turns out that this might be a issue also with x86_64.

I have Tumbleweed 32-bit installed in an external drive.  And I have it setup so that it can boot with legacy mbr booting, with 32-bit efi booting and with 64-bit efi booting.

Yesterday, I connected the external drive to a system with 64-bit efi booting, but with secure-boot disabled.  And the Tumbleweed 32-bit system would not boot.  It used to boot.  So I reverted grub2-x86_64-efi back to version 2.04-16.1 (the version just before the update around Sept 15th).  And after going back to the old version, I can now boot it again with 64-bit efi.

By the way, to install x86_64-efi boot support, I use:

grub2-install --target=x86-64-efi --removable -no-nvram

For further testing, I have setup a VM with Leap 15.2 and Tumbleweed 32-bit installed side by side.  In the Leap install, I used version 2.04-lp152.7.12.1 of grub2-x86_64.efi so that it can boot Tumbleweed.  Or I can directly use the grub2 boot code installed from Tumbleweed (version 2.04-16.1) to boot either.  Creating that VM was tricky, because of bug 1177849
Comment 21 Michael Chang 2020-10-19 03:16:16 UTC
(In reply to Neil Rickert from comment #20)
> It turns out that this might be a issue also with x86_64.

[snip]

> Yesterday, I connected the external drive to a system with 64-bit efi
> booting, but with secure-boot disabled.  And the Tumbleweed 32-bit system
> would not boot.  It used to boot.  So I reverted grub2-x86_64-efi back to
> version 2.04-16.1 (the version just before the update around Sept 15th). 
> And after going back to the old version, I can now boot it again with 64-bit
> efi.

I'm confused on why you would have to revert grub2-x86_64-efi in order to have 32-bit efi to boot ? I suppose a typo was in "And the Tumbleweed 32-bit system would not boot" with which the 32-bit should be replaced by 64-bit ?

Is the tumbleweed system fully updated ? If the problem is reproducible only with "secure-boot disabled", then is there anything to do with bsc#1165773 ?

Thanks.
Comment 22 Neil Rickert 2020-10-21 03:31:12 UTC
Sorry.  I guess that was confusing.

This was about a different system, that does not fit the title of this bug report.  But it seems to be the same problem, except for grub2-x86_64-efi.

I was previously able to boot this system using the grub "linux" command.  I doubt that it would ever boot using "linuxefi".  With the latest grub2 changes, it no longer boots.  Reverting to an earlier version of grub2-x86_64-efi allows it go again boot.

The system itself is running 32-bit Tumbleweed, and is installed on an external drive.  I installed with the drive plugged into a system using BIOS/MBR booting.  And then I configured it so that it could also boot on a UEFI (X64) system.  And that last part is what stopped working with the latest grub2 updates.
Comment 23 Neil Rickert 2021-06-16 01:59:51 UTC
With the latest kernel in Leap 15.3 (kernel-default-5.3.18-59.5.2) I am now able to boot using the latest grub2-i386-efi.

It still won't work with Leap 15.2 unless I use an older "grub2".  But, since it does work now with Leap 15.3, feel free to close this bug.