Bug 1202438 - grub2 can no longer boot from an ISO
grub2 can no longer boot from an ISO
Status: NEW
Classification: openSUSE
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Other
Current
x86-64 openSUSE Tumbleweed
: P5 - None : Normal (vote)
: ---
Assigned To: Michael Chang
E-mail List
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2022-08-16 12:33 UTC by Hansi Meir
Modified: 2022-10-07 07:25 UTC (History)
8 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---
mchang: needinfo? (Hansi.Meir)


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Hansi Meir 2022-08-16 12:33:23 UTC
Long tiem ago, I have build a menu item in 40_custom where I boot Rescuezilla 2.3 from an ISO file.

After the update from 2.06-25.2 to 2.06-26.1, however grub2 no longer finds the file, the entry is unusable:
---------------------------------------------------------------------------------
#!/bin/sh
exec tail -n +3 $0
# This file provides an easy way to add custom menu entries.  Simply type the
# menu entries you want to add after this comment.  Be careful not to change
# the 'exec tail' line above.

menuentry 'Rescuezilla' {
#  load_video
#  insmod loopback
#  insmod iso9660
#  insmod part_gpt
#  insmod part_msdos
#  insmod gzio
#  insmod ntfs
  insmod ext2
  ISO_FILE="/boot/iso/rescuezilla.iso"
  locale_dir=/boot/iso
  search --no-floppy --file $ISO_FILE --set
      loopback loop $ISO_FILE
      echo '... loading linux for rescuezilla'
      linux (loop)/casper/vmlinuz boot=casper isofrom_device=/dev/nvme1n1p5 iso-scan/filename=$ISO_FILE \
       xforcevesa nomodeset vga=791 fsck.mode=skip noprompt edd=on ${locale_opts} --
      echo '... loading initial ramdisk'
      initrd (loop)/casper/initrd.lz
}

menuentry 'System ausschalten' {
echo "System shutdown initiated ..."
halt
}
---------------------------------------------------------------------------------

I tried to change the search (via the UUID) or set root via the device, but no success.

I restored the old version 2.06-25.2 and now I may boot this ISO file again.

May be there is ab Bug in grub2 2.06-26.1?
Or is there a mistake in my code above?
Comment 1 Hansi Meir 2022-08-21 09:23:08 UTC
Meanwhile grub2 version 2.06-27.1 is offered so I tried again.
Unfortunately the problem persists.

Maybe the important messages are helpful:

>error: ../../grub-core/commands/search.c:296:no such device: /boot/iso/rescuezilla.iso.
>error: ../../grub-core/fs/fshelp.c:257:file '/boot/iso/rescuezilla.iso'not found.

Hint: this /boot/iso in error message is exist on a second NVMe, but system with grub and openSUSE is installed on first NVMe of a Lenovo Thinkbook 16p Gen2


Since there hasn't been any reaction so far, is someone on the matter?
Comment 2 Michael Chang 2022-08-23 02:26:21 UTC
(In reply to Hansi Meir from comment #1)
> Meanwhile grub2 version 2.06-27.1 is offered so I tried again.
> Unfortunately the problem persists.
> 
> Maybe the important messages are helpful:
> 
> >error: ../../grub-core/commands/search.c:296:no such device: /boot/iso/rescuezilla.iso.
> >error: ../../grub-core/fs/fshelp.c:257:file '/boot/iso/rescuezilla.iso'not found.
> 
> Hint: this /boot/iso in error message is exist on a second NVMe, but system
> with grub and openSUSE is installed on first NVMe of a Lenovo Thinkbook 16p
> Gen2

It seems second NVMe is not presented by firmware as one of boot disks accessible by grub. Could you please check 'ls' command in grub command shell to have an overlook of what devices are being managed by grub?

grub> ls

And walking through them to identify the 'second' NVMe.

grub> ls (hd0)/boot/iso/rescuezilla.iso
grub> ls (hd1)/boot/iso/rescuezilla.iso

First we must make sure the disk is enrolled in device list of grub. otherwise search won't work either. You could also try with boot menu so you see a list of bootable device first before one of them is selected. This additional step would be helpful to have firmware initialing all attached disks in order to list them on menu thus none would get skipped in the course of unattended fast boot.

> 
> 
> Since there hasn't been any reaction so far, is someone on the matter?
Comment 3 Hansi Meir 2022-08-23 08:08:58 UTC
Please remember my first statement: it works until version 25, it fails since version 26. So of course I've installed version 25 currently.


To your hints:

1. in UEFI-BIOS I can see both NVMe in the menu where I can select the boot order. So both NVMe should be bootable.

2. I entered grub command line from boot menu with version 25:

2.1 
grub> ls shows both NVMe devices as (hd0) and (hd1) and all partitions on both.
The rescuezilla.iso is in (hd1,gpt5). So I started with

grub> ls (hd1,gpt5)   that shows long technical data with fs, mod time, UUID, startpoint and capacity.

Then I continued with 

grub> ls (hd1,gpt5)/boot/iso/rescuezilla.iso 
     with this result:
     
rescuezilla.iso
grub>


3. Now I installed grub version 27 and repeated the procedure

3.1  same result as 2.1 with
grub> ls 
and
grub> ls (hd1,gpt5) 

But
grub> ls (hd1,gpt5)/boot/iso/rescuezilla.iso 
      now shows an error:

error: ../../grub-core/kern/mm.c:372:out of memory


That confirms my assumption: 
since version 26 something has changed and it's not a problem on my notebook.


Meanwhile version 25 is active again and everything works fine.
Comment 4 Michael Chang 2022-08-23 11:15:43 UTC
The major change from 25 to 26 was adding tpm support to efi grub image and no change to existing ext2 module. Maybe the out of memory was caused by bigger memory consumption at run time due to additional memory allocation for measurement ? Anyway please help to check if your UEFI bios have tpm enabled ? If so, is it helpful to disable it ? Thanks.
Comment 5 Hansi Meir 2022-08-23 11:49:22 UTC
Yes, there is tpm enabled but as the AMD counterpart.
Disabling is not helpful, but

grub> ls (hd1,gpt5)/boot/iso/rescuezilla.iso 
      now shows an this error:

error: ../../grub-core/fs/fshelp.c:257:file '/boot/iso/rescuezilla.iso' not found


btw: because the notebook is used with dual boot (Windows 11 and openSUSE Tumbleweed) may be it is no goood idea in principle to disable tpm?
Comment 6 Michael Chang 2022-08-23 12:07:56 UTC
(In reply to Hansi Meir from comment #5)
> Yes, there is tpm enabled but as the AMD counterpart.
> Disabling is not helpful, but
> 
> grub> ls (hd1,gpt5)/boot/iso/rescuezilla.iso 
>       now shows an this error:
> 
> error: ../../grub-core/fs/fshelp.c:257:file '/boot/iso/rescuezilla.iso' not
> found

It is getting more strange. Is there any other file in your second NVME disk can be used to test file access other than the iso ? Or the file can be listed by 

  grub> ls (hd1,gpt5)/boot/iso

or even
  grub> ls (hd1,gpt5)/

> 
> 
> btw: because the notebook is used with dual boot (Windows 11 and openSUSE
> Tumbleweed) may be it is no goood idea in principle to disable tpm?

Certainly not a good idea. It is just for debugging/troubleshooting.
Comment 7 Hansi Meir 2022-08-23 13:13:16 UTC
So sorry, I made a mistake on last try, there was an USB stick I forgot to remove . So hd1 was not the same as before ...

I did it once again without USB stick (and disabled TPM) and then the old 
error: ../../grub-core/kern/mm.c:372:out of memory
occured again. 

Sorry for that ... facepalm ....

Then I tried a few files from NVMe2. 
One from my /home on gpt3, a jpg with 2,6 MB size in the download path, grub found it.

Then I tried the other iso files in gpt5/boot/iso, grub only found one of it. It was SuperGrub2.iso and this file has only 15.6 MB size. All the other files are much bigger from round abound 700 MB up to 1.1 GB (the rescuzilla.iso).

I tried also another small file in gpt5/boot/iso/ventoy-1.0.79/README and grub found it.

May be file size makes the out of memory error?
Comment 8 Hansi Meir 2022-08-23 13:15:08 UTC
addendum: ALL FILES could be listed with the ls command.
Comment 9 Michael Chang 2022-08-24 09:16:51 UTC
(In reply to Hansi Meir from comment #7)
[snip]
> May be file size makes the out of memory error?

I guess so. To perform measurement, it is necessary that the files are read into memory in single chunk to calculate hash and extend the result into TPM PCRs before it can be used.

What I don't understand is the problem didn't go away when tpm was disabled. I will have a look.
Comment 10 Michael Chang 2022-08-26 02:47:41 UTC
Hi Hansi,

The `loopback' module is not part of the signed grub.efi delivered by SUSE rpm. This has the question raised to me: Did you build and use your own grub.efi signed by your own key to have the loopback workable ? Or you have to let go Secure Boot disabled during the whole process booting Rescuezilla ?
Thanks.
Comment 11 Hansi Meir 2022-08-26 08:40:52 UTC
I never touched anything like grub.efi and I only used openSUSE grub from openSUSE repository. 

The loopback statements I had found with google on multiple websites. I don't really know how this works, so I only modified filenames and devices.

I just tried grub2 version 2.06-27.1 on my desktop PC - self build in 2015, tpm 1.2 module but not activated - and got exactly same problem, same messages. 

And grub> ls shows all devices and small files on "rescuezilla.iso-device" are found, but not rescuezilla.iso itself. Same memory error.

And of course on this PC: I never touched anything like grub.efi and I only used openSUSE grub from openSUSE repository. 

May be there is a difference between openSUSE and SUSE?


About Secure Boot:

- On my PC in ASUS-BIOS is configured
-- Secure Boot state:       Enabled
-- Platform Key (PK) state: Unloaded
-- OS Type:                 Other OS

- On my Notebook
very strange 
-- Secure Boot: Enabled
but on the first site is an informatve overview of all current settings and it shows Secure Boot as disabled. I don't know why. I guess a bug in Lenovo BIOS but which expression is the correct one?
Comment 12 Richard Brown 2022-08-26 09:11:50 UTC
Seeing the out of memory issue too on multiple Dell UEFI machines (namely XPS 13 and Inspiron 5510)

Disabling Secure Boot/TPM in BIOS Setup doesn't workaround the issue

This means in essence we have a number of machines out there that can no longer boot Tumbleweed or MicroOS installation media

After discussion with Michael the decision has been to (at least temporarily) revert the grub tpm changes in Factory

Revert happening right now, will be using this bug reference to keep track on it
Comment 13 Hansi Meir 2022-08-27 12:27:44 UTC
I guess that's why I found 2.06-28.1 today.
It works fine, thanks.
Comment 14 Richard Brown 2022-08-31 10:15:11 UTC
Copy/Paste of my debugging info for Michael as we dug deeper in this

Image based with MicroOS 0829 using his 1202438 patched grub2

TPM 2.0 On + Enabled in the BIOS, Dell 5510 (so, default settings). Boot fails, tpm-hello below

Hello!
TPM 2.0
HashAlgorithmBitmap 0x3
TPMPresentFlag 0x1
NumberOfPcrBanks 0x2
ActivePcrBanks0x3

TPM 2.0 Off  in the BIOS, Dell 5510. Boot WORKS! (YAY!), tpm-hello below

Hello!
No TPM Handle found!


TPM 2.0 On but Disabled, Dell 5510, Boot FAILS (meh) tpm-hello below

Hello!
TPM 2.0
HashAlgorithmBitmap 0x3
TPMPresentFlag 0x1
NumberOfPcrBanks 0x2
ActivePcrBanks 0x3

So, it seems there's no difference between On+Enabled and On+Disabled, but turning the TPM off does work with your patch
Comment 15 Richard Brown 2022-08-31 10:28:43 UTC
The Dell XPS 13 shows subtly different info when debugging the TPM status (below), but the same behaviour - boots with your patch only with TPM Off, out of Memory with TPM On (+ Enabled or Disabled)

Hello!
TPM 2.0
HashAlgorithmBitmap 0x7
TPMPresentFlag 0x1
NumberOfPcrBanks 0x3
ActivePcrBanks 0x3
Comment 16 Michael Chang 2022-09-01 07:07:01 UTC
Hi Richard,

Many thanks for your help and the information!

This seems to imply : Enable/disable TPM in firmware may refer to enable/disable it to use tpm for measure boot (ie SRTM) but not to enable/disable the device.

Hi Hansi,

Would it be it possible for you to double check the result on your side ? Just drop me a line if you are ok.

JFYI. The test package is available at: 

> https://download.opensuse.org/repositories/home:/michael-chang:/bsc:/1202438/openSUSE_Tumbleweed/

The test grub.efi can be extracted from rpm directly and manually replace it on ESP.

> wget https://download.opensuse.org/repositories/home:/michael-chang:/bsc:/1202438/openSUSE_Tumbleweed/noarch/grub2-x86_64-efi-2.06-36.5.noarch.rpm
> unrpm grub2-x86_64-efi-2.06-36.5.noarch.rpm
> cp /boot/efi/EFI/opensuse/grub.efi /boot/efi/EFI/opensuse/grub.efi.backup
> cp $PWD/usr/share/grub2/x86_64-efi/grub.efi /boot/efi/EFI/opensuse/grub.efi

Please note you have to disable secure boot to test, otherwise grub.efi will be rejected given it was not signed by SUSE key but was built from my home project.
Thanks.
Comment 17 Hansi Meir 2022-09-01 10:43:11 UTC
I don't know an unrpm and neither does my tumbleweed, sorry I'm not a hero with things like rpm packages.

I tried with ark and extracted grub.efi only. Hope this was your intention.

After replacing it and disabling secure boot I tried with

- TPM enabled
-- rescuezilla.iso not found, it looks like original problem


- TPM disabled
-- rescuezilla.iso was booted, looks good
Comment 18 Michael Chang 2022-09-02 06:03:38 UTC
(In reply to Hansi Meir from comment #17)

[snip]

> I tried with ark and extracted grub.efi only. Hope this was your intention.
> 
> After replacing it and disabling secure boot I tried with
> 
> - TPM enabled
> -- rescuezilla.iso not found, it looks like original problem
> 
> 
> - TPM disabled
> -- rescuezilla.iso was booted, looks good

Yes this is what I want. Good job! :)
Thanks a lot.
Comment 19 Gary Ching-Pang Lin 2022-09-02 06:15:10 UTC
Hi Richard and Hansi,

Could you try my grub2 package?

https://download.opensuse.org/repositories/home:/gary_lin:/branches:/Base:/System/standard/noarch/grub2-x86_64-efi-2.06-37.1.noarch.rpm

I backported the upstream memory management patches plus Michael's tpm patch. It dynamically allocates memory in case the initial heap space is used up, so the OOM situation should be mitigated largely unless a huge file is loaded.

Please also help me to observe any noticeable delay with the testing grub2. The upstream patchset chooses a small default heap size. Per my test, it introduces a noticeable delay (2~3) when loading the grub2 menu with openSUSE theme. The default heap size is increased in the testing grub2 and it works for my VM. Hope it's sufficient for other cases.
Comment 20 Hansi Meir 2022-09-02 09:23:43 UTC
Hi Gary,

I started tests with Secure Boot disabled.

- TPM disabled
-- works good (boot of rescuzilla.iso successfully) after a black screen delay of 66 seconds

- TPM enabled 
-- same result

- Secure Boot and TPM enabled
-- same result

Would this delay disappear, it would be perfect.
If this is not possible, a message like "be patient..." should appear.
Comment 21 Gary Ching-Pang Lin 2022-09-02 12:23:17 UTC
I can bake another patch to use the old heap size and it may help. Anyway, the delay caused by frequent memory allocation is raised in upstream mailing list. Let's see how it will develop.
Comment 22 Gary Ching-Pang Lin 2022-09-05 07:58:14 UTC
I added a patch to allocate the old heap size for the first memory region. Please try the testing grub2 and see if the delay is improved.

https://download.opensuse.org/repositories/home:/gary_lin:/branches:/Base:/System/standard/noarch/grub2-x86_64-efi-2.06-38.1.noarch.rpm
Comment 23 Hansi Meir 2022-09-05 09:08:54 UTC
Hi Gary,


when TPM is ensabled, rescuzilla.iso boot up successfully but with the black screen delay of 66 seconds.

When TPM is disabled there is no delay, state of Secure Boot doesn't matter.
Comment 24 Gary Ching-Pang Lin 2022-09-06 07:00:27 UTC
Okay, at least the loading time is improved when TPM is disabled. As for the delay with TPM, I don't have a good solution now. The grub2 verifier is designed to load the whole file into memory and then measure the file content with TPM, and this would take a while for the additional memory allocation and file reading. Maybe we can do the check in verifier's open() and show some messages to notify the user about the possible delay.
Comment 25 Gary Ching-Pang Lin 2022-09-07 03:21:52 UTC
Hi Hansi,

I updated the testing grub2 to allocate at least 1MB for the additional heap allocations. Maybe this could improve the loading time.

https://download.opensuse.org/repositories/home:/gary_lin:/branches:/Base:/System/standard/noarch/grub2-x86_64-efi-2.06-39.1.noarch.rpm
Comment 26 Hansi Meir 2022-09-07 10:23:02 UTC
Hi Gary,

The delay is unfortunately unchanged at 66s.
Comment 27 Gary Ching-Pang Lin 2022-09-08 07:14:56 UTC
Thanks for testing.

So the delay is hard to eliminate :(
A notification is probably the only thing we can do...
Comment 28 Hansi Meir 2022-09-08 11:20:13 UTC
And it is quite sure that the memory allocation is the reason?
Since changes do nothing there: couldn't something completely different be the cause?
Comment 29 Gary Ching-Pang Lin 2022-09-12 01:39:40 UTC
(In reply to Hansi Meir from comment #28)
> And it is quite sure that the memory allocation is the reason?
> Since changes do nothing there: couldn't something completely different be
> the cause?

There are 3 possible causes in your case.

1) memory allocation
2) big file reading
3) TPM measurement on a big file

The memory allocation could be very inefficient when the heap is used up following lots of small chunk allocations. This may be improved by setting a minimal allocation size. However, it doesn't help your case, so it seems to me that the delay is mainly caused by 2) and 3). It's unfortunate that those two are required for the effective TPM measurement, and I don't know if there is any way to accelerate the process...
Comment 30 Gary Ching-Pang Lin 2022-09-13 07:27:06 UTC
Hi Hansi,

An upstream developer spotted a possible performance improvement by deferring the disk cache invalidation. Since your case involves the disk reading, it may be worthwhile testing the patch. I built the testing package and it's available in my OBS branch:

https://download.opensuse.org/repositories/home:/gary_lin:/branches:/Base:/System/standard/noarch/grub2-x86_64-efi-2.06-41.1.noarch.rpm
Comment 31 Hansi Meir 2022-09-13 11:03:01 UTC
Hi Gary,

no idea what that means ... two PCIe3 NVMe are installed here.

- grub2 & system is on first: Sandisk WD Black SN750 
- rescuzilla.iso on second: Micron/Crucial P2 

Test result: the delay is unfortunately unchanged at 66s.
Comment 32 Gary Ching-Pang Lin 2022-09-14 03:21:27 UTC
It's sad that the patch doesn't work for you :(
So your case only involves the contiguous read operations and the disk cache doesn't help much.
Comment 33 OBSbugzilla Bot 2022-10-07 07:25:06 UTC
This is an autogenerated message for OBS integration:
This bug (1202438) was mentioned in
https://build.opensuse.org/request/show/1008674 Factory / grub2