Bug 1098550 - Current Tumbleweed Pine64 image fails to boot
Current Tumbleweed Pine64 image fails to boot
Status: RESOLVED NORESPONSE
Classification: openSUSE
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Basesystem
Current
aarch64 openSUSE Factory
: P5 - None : Critical (vote)
: ---
Assigned To: Guillaume GARDET
E-mail List
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2018-06-21 08:10 UTC by Paul Gonin
Modified: 2019-11-18 10:49 UTC (History)
10 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
Broken boot with GPT errors (6.53 KB, text/plain)
2018-06-21 09:08 UTC, Guillaume GARDET
Details
Broken boot with invalid GPT error (4.92 KB, text/plain)
2018-06-21 10:17 UTC, Paul Gonin
Details
Broken boot without GPT error (4.72 KB, text/plain)
2018-06-21 10:25 UTC, Guillaume GARDET
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Paul Gonin 2018-06-21 08:10:30 UTC
Broken at U Boot stage
Comment 1 Guillaume GARDET 2018-06-21 08:54:32 UTC
(In reply to Paul Gonin from comment #0)
> Broken at U Boot stage

Paul, could you attach the full bootlog, please?

U-Boot is not able to read the GPT partitions anymore whereas a desktop computer is able to mount and read the 2 partitions.

Switching from GPT partition to legacy MBR fixes the boot (at least on kubic image).
Comment 2 Guillaume GARDET 2018-06-21 09:08:32 UTC
Created attachment 774808 [details]
Broken boot with GPT errors
Comment 3 Paul Gonin 2018-06-21 09:11:42 UTC
I tried again today with image openSUSE-Tumbleweed-ARM-JeOS-pine64.aarch64-2018.06.20-Build9.5.raw.xz

The error is different from yesterday, the log you attached is from today with this image
Comment 4 Andreas Färber 2018-06-21 09:12:31 UTC
First boot can probably be fixed by running wipefs /dev/sdX before dd.

The problem I've been reporting is that after successful boot the reboot breaks in this way. U-Boot was unreliable in reading the partitions. Doing an mmc rescan might lead to some ls mmc 0:2 operation succeeding, but before all files were loaded failing again. Or successfully getting into GRUB and then GRUB (using some U-Boot drivers) failing to load the kernel.

We used to have a similar partitioning problem in the past and were able to fix it by including gptfdisk with bootinclude="true" for Kiwi:


Tue Oct 18 18:28:43 UTC 2016 - afaerber@suse.de

- Ensure that gdisk is available for firstboot on GPT (Marcus)
* Drop gptfdisk from EXTRA_PACKAGES for any USE_EFI images and
  enforce adding it in packagelist.inc
* chromebook: Switch from PKG_TAG to PKG_BOOT_TAG


I am guessing this is now a side-effect of switching from Kiwi's own partition resizing to the new dracut+cloud-init approach?

Note there's also an open bug #1072900 for RPi3 on 42.3.
Comment 5 Guillaume GARDET 2018-06-21 09:43:06 UTC
(In reply to Paul Gonin from comment #3)
> I tried again today with image
> openSUSE-Tumbleweed-ARM-JeOS-pine64.aarch64-2018.06.20-Build9.5.raw.xz
> 
> The error is different from yesterday, the log you attached is from today
> with this image

No, this is the log from yesterday. 
Apparently, updating U-Boot from v2018.07-rc1 to -rc2 clears the GPT errors.


(In reply to Andreas Färber from comment #4)
> We used to have a similar partitioning problem in the past and were able to
> fix it by including gptfdisk with bootinclude="true" for Kiwi

This is still in place.

> 
> I am guessing this is now a side-effect of switching from Kiwi's own
> partition resizing to the new dracut+cloud-init approach?

We do not use cloud-init, but dracut-kiwi-oem-repart in Tumbleweed.


Paul will do more test and we will update information here.
Comment 6 Paul Gonin 2018-06-21 10:17:53 UTC
Created attachment 774838 [details]
Broken boot with invalid GPT error
Comment 7 Guillaume GARDET 2018-06-21 10:25:59 UTC
Created attachment 774842 [details]
Broken boot without GPT error
Comment 8 Alexander Graf 2018-06-21 11:38:29 UTC
In both cases grub fails to read the initrd:

> error: failure reading sector 0x1c780 from `hd0'.

and

> error: failure reading sector 0x21300 from `hd0'.

Those sector numbers are pretty low though. Are they always the same when you boot up? What if you manually try to read from those sectors with the "mmc read" command in U-Boot?
Comment 9 Paul Gonin 2018-06-21 12:11:13 UTC
boot#2
error: failure reading sector 0x25b80 from `hd0'.

boot#3
error: failure reading sector 0x12780 from `hd0'.

boot#4
error: failure reading sector 0x1dd80 from `hd0'.
Comment 10 Alexander Graf 2018-06-21 12:16:35 UTC
Ok, this is clearly very weird. Can you add a line

> #define DEBUG

at the top of lib/efi_loader/efi_disk.c in U-Boot, recompile U-Boot, reflash it on the SD card and try a few boots again? That should give us a few hints on what grub is trying to access when.
Comment 13 Guillaume GARDET 2018-06-26 13:50:48 UTC
Switching from GPT to MBR allows to workaround this problem, as tested by Paul.

For now, we can switch to MBR for TW image JeOS-pine64, until upstream U-boot get fixed.
Comment 14 Swamp Workflow Management 2018-06-26 14:00:05 UTC
This is an autogenerated message for OBS integration:
This bug (1098550) was mentioned in
https://build.opensuse.org/request/show/619174 Factory:ARM:Live / JeOS
Comment 15 Alexander Graf 2018-06-26 14:04:33 UTC
I looked at Andreas' Pine64 and it looks to be a U-Boot issue. Randomly SD reads start to fail. My assumption is that the SD card is clocked too fast.

There are a couple of discussions on the U-Boot ML right now on failing eMMC adapters, which are probably related.
Comment 16 Stefan Brüns 2018-06-27 01:03:39 UTC
Andre just sent two patches to the U-Boot ML.
Comment 17 Andre Przywara 2018-07-06 16:58:31 UTC
The two patches have been merged and will be in the next U-Boot release (due in a few days).
And while Andreas reported that he still sees occasional failures, the situation should be much better now.
Andreas, can you give a hint on how to reproduce it now? It worked for me with a "ls mmc 0" loop for a long time.
Comment 18 Guillaume GARDET 2018-07-16 07:45:27 UTC
(In reply to Andre Przywara from comment #17)
> The two patches have been merged and will be in the next U-Boot release (due
> in a few days).

The fix is in U-Boot v2018.07 which is already used for Pine64 image.
See: http://git.denx.de/?p=u-boot.git;a=commit;h=be0d217952222b2bd3ed071de9bb0c66d8cc80d9
So, closing this bug.

> And while Andreas reported that he still sees occasional failures, the 
> situation should be much better now.
> Andreas, can you give a hint on how to reproduce it now? It worked for me 
> with a "ls mmc 0" loop for a long time.

@Andreas, if you still have problems and a way to reproduce, feel free to reopen.
Comment 19 Andreas Färber 2018-08-20 10:05:34 UTC
Last week I have seen issues reoccur repeatedly at varying points in distro boot or GRUB. Same patched package version still. I needed between 2 and 5 retries of powering off to boot successfully.
Comment 21 Guillaume GARDET 2019-01-15 13:38:07 UTC
(In reply to Andreas Färber from comment #19)
> Last week I have seen issues reoccur repeatedly at varying points in distro
> boot or GRUB. Same patched package version still. I needed between 2 and 5
> retries of powering off to boot successfully.

@Andreas, what is the current status of this bug? Are you able to reproduce with latest u-boot?
Comment 22 Andreas Färber 2019-01-15 14:00:22 UTC
It still sometimes occurs - last tested u-boot-pine64plus v2019.01-rc3, by now with upstream TF-A v2.0.
Comment 23 Andre Przywara 2019-01-15 14:13:03 UTC
So there was this report that the arch timer bug workaround is not sufficient all of the time. There is some hac^Wpatch posted to the U-Boot ML:
https://lists.denx.de/pipermail/u-boot/2019-January/353661.html
You might want to try this. It shouldn't hurt to have it in (just extending the range of suspicious arch timer values by shrinking the mask), but I am somewhat suspicious of the patch since it sounds like papering over something else.

Andreas, can you reproduce it to the point where you can judge whether the fix helps or not?
Comment 24 Swamp Workflow Management 2019-05-06 15:20:20 UTC
This is an autogenerated message for OBS integration:
This bug (1098550) was mentioned in
https://build.opensuse.org/request/show/701121 Factory / openSUSE-MicroOS
Comment 25 Guillaume GARDET 2019-07-02 12:32:42 UTC
@andreas, do we still have boot problems on Pine64 with latest Tumbleweed?
Comment 26 Matthias Brugger 2019-11-18 10:49:03 UTC
latest pin64 image I tested worked fine. Closing as no repsonse from on the bug.