Bug 966255 - No passphrase requested on boot if plymouth is active
No passphrase requested on boot if plymouth is active
Status: CONFIRMED
Classification: openSUSE
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Bootloader
Current
64bit SUSE Other
: P3 - Medium : Critical (vote)
: ---
Assigned To: Cliff Zhao
Jiri Srain
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2016-02-11 14:18 UTC by Robert Schweikert
Modified: 2020-06-30 13:08 UTC (History)
16 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
cryptsetup boot password prompt goes into background as a job (1.63 MB, image/jpeg)
2016-02-29 20:22 UTC, Jacob W
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Robert Schweikert 2016-02-11 14:18:22 UTC
kernel 4.4.0.3

Has encrypted user partition. Hangs at

Ignoring BGRT: Invalid version 0 (expected 1)


This appears to happen right before the point where I would expect to enter my passphrase.
Comment 1 Takashi Iwai 2016-02-12 14:18:25 UTC
Is this a regression with the recent kernel?
Comment 2 Robert Schweikert 2016-02-12 14:37:47 UTC
Appears to be. Fabian Vogt reports a similar issue on the factory ML (Thread: TW has fallen and cannot get back up) that started around the time of the inclusion of the 4.4 kernel.

The work around is to add plymouth.enable=0 to the kernel command line.

Having done so I can now also confirm that my initial suspicion of it being related to the request for the passphrase is correct. After disabling plymouth the passphrase is requested right after the message " Ignoring BGRT: Invalid version 0 (expected 1)"
Comment 3 Robert Schweikert 2016-02-12 14:38:47 UTC
For completeness, for those that run into the issue here is the step by step work around implementation:

1.) Boot into a rescue system
2.) mount the partition that is usually your root partition
  (mount /dev/sdaX /mnt) X is a placeholder for a number
3.) mount your boot partition
  (mount /dev/sdaY /mnt/boot/efi) Y is a placeholder for a number, assuming EFI boot setup
4.) bind mount /proc /sys /dev /run
   (mount --bind /proc /mnt/proc)
5.) chroot to /mnt
   (chroot /mnt)
6.) vi /etc/default/grub
  (add plymouth.enable=0 to the string for GRUB_CMDLINE_LINUX_DEFAULT)
7.) grub2-mkconfig -o /boot/grub2/grub.cfg
8.) exit
9.) shutdown -r now


Setting up the chroot is not strictly necessary but it makes dealing with grub2-mkconfig easier and thus makes things less error prone.
Comment 4 Takashi Iwai 2016-02-12 14:45:17 UTC
OK, so I suppose that the BGRT message is just a red herring, it appeared in the past, too?  In anyway, it'd be helpful to know whether this happens by the kernel update or by others...

Ismail, is there any recent change in plymouth that may trigger such a problem?
Comment 5 Takashi Iwai 2016-02-12 14:47:16 UTC
For confirming that it's a kernel regression: I have the old kernel packages, e.g. 4.3.x in OBS home:tiwai:kernel:4.3 repo.  Install this kernel and retest again with plymouth enabled.
Comment 6 Robert Schweikert 2016-02-12 15:02:44 UTC
(In reply to Takashi Iwai from comment #4)
> OK, so I suppose that the BGRT message is just a red herring, it appeared in
> the past, too?  In anyway, it'd be helpful to know whether this happens by
> the kernel update or by others...

I just added the message to provide an indication where in the boot process things are going wrong. I did not think it was an indicator for the problem. Sorry if that was not clear.
Comment 7 Robert Schweikert 2016-02-12 16:21:52 UTC
Reported by Paul Gonin on ML:

Same issue here with Tumbleweed, updated to 4.4 kernel and LUKS
crypted partition (but not root fortunately)

Symptoms are similar but not exactly the same.
I was able to boot when insisting... but protected partition is not mounted.
Looking at systemd journal I could see that plymouth related issues :
Feb 12 09:30:25 systemd[1]: plymouth-start.service: Main process
exited, code=dumped, status=11/SEGV
Feb 12 09:30:25 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295
ses=4294967295 msg='unit=plymouth-start comm="systemd"
exe="/usr/lib/systemd/systemd" hostname=? addr=? termi
Feb 12 09:30:25 systemd[1]: plymouth-start.service: Unit entered failed state.
Feb 12 09:30:25 systemd[1]: plymouth-start.service: Failed with result
'core-dump'.

I have had issues with plymouth and luks for dome time actually
https://bugzilla.opensuse.org/show_bug.cgi?id=948975
Comment 8 Robert Schweikert 2016-02-12 17:00:11 UTC
Experiencing the same symptom with kernel-default-4.3.3-1.1.gda39cbd.x86_64.rpm

Please don't make me go back in time through more kernel versions.
Comment 9 Takashi Iwai 2016-02-12 17:17:20 UTC
(In reply to Robert Schweikert from comment #8)
> Experiencing the same symptom with
> kernel-default-4.3.3-1.1.gda39cbd.x86_64.rpm
> 
> Please don't make me go back in time through more kernel versions.

If you can conclude certainly that this is no kernel regression, I'm happy to hear.  If not, it'd be helpful for further tests; there are other kernel versions in my OBS home:tiwai:kernel:$VERSION repos :)  (4.1 is found in Leap.)
Comment 10 Robert Schweikert 2016-02-12 17:48:26 UTC
I did spend some time looking at the log and while I do not find any evidence of a crash as indicated by Paul, see comment #7, I do think it is a matter of plymouth just not displaying the request for the passphrase.

From the log:

Feb 12 07:53:31 rush systemd[1]: Started Forward Password Requests to Plymouth.
Feb 12 07:53:31 rush audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-journal-flush comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'

After this the log continues to accumulate messages and then eventually...

Feb 12 07:53:47 rush systemd[1]: Received SIGINT.

Which is basically when I hit Ctrl-Alt Delete to restart the system as I simply cannot enter the passphrase for the encrypted partition.

I would say it's a plymouth problem.
Comment 11 Fabian Vogt 2016-02-12 19:07:10 UTC
Considering that I've been working with plymouth in the past weeks and I can easily reproduce it, I'll take this one.
Comment 12 Ismail Dönmez 2016-02-15 12:52:33 UTC
I setup a new VM to try this bug but sorry

1. I got a graphical dialog asking the disk password
2. systemctl status plymouth-start shows that it exited successfully.

I don't get a login windows but looks like X.org crashes with a reference to vboxvideo_drv.so which is a VirtualBox bug it seems.

So, if any of you guys can get a coredump with debuginfo it would be the most appreciated. I'll be testing on a laptop as a second resort.
Comment 13 Fabian Vogt 2016-02-15 13:43:54 UTC
I tried to debug it, but the issue is that plymouth works fine if started manualy, even in the initrd.
As a better workaround, I added plymouth to the list of modules to omit from the initrd, so that it uses the plymouth on / and not the one inside the initrd and it works fine. I guess it's some kind of timing issue...
Comment 14 Ismail Dönmez 2016-02-15 13:51:03 UTC
(In reply to Fabian Vogt from comment #13)
> I tried to debug it, but the issue is that plymouth works fine if started
> manualy, even in the initrd.
> As a better workaround, I added plymouth to the list of modules to omit from
> the initrd, so that it uses the plymouth on / and not the one inside the
> initrd and it works fine. I guess it's some kind of timing issue...

Do we really need plymouth in initrd? That really makes it harder to debug it.
Comment 15 Fabian Vogt 2016-02-15 13:59:31 UTC
For / on LUKS it's required to have plymouth in the initrd to get a graphical password prompt.

So it's not *absolutely* required, but you could argue that plymouth isn't required anywhere then. The initrd is about 15% of the total boot time.
Comment 16 Jan Kara 2016-02-17 20:12:35 UTC
Is this by any chance related to bug 804607 but now it just happens for more people?
Comment 17 Pascal DELROT 2016-02-17 22:39:59 UTC
Thanks for the hint, will surely work on many other distributions.

Alternative, easier for those are not fond with command line :
1) boot the system on the hard-drive from openSuse Tumbleweed install USB key
2) log as root
3) Use yast2 in "Bootloader configuration" to add "plymouth.enable=0" to the kernel command line.
As expected, the passphrase is no shown in text mode.

Pascal
Comment 18 Daniel Noga 2016-02-19 22:52:51 UTC
Is it the same bug like I have: https://bugzilla.opensuse.org/show_bug.cgi?id=962520 . Or is it different issue?
Comment 19 Tomáš Chvátal 2016-02-24 07:55:50 UTC
I was just bitten with this on Leap with update kernel to 4.1.15, old initrd with 4.1.13 works fine. I do not dare to regenerate using mkinitrd :P

Maybe for the sake of debugging we should really not have plymouth in initrd and simply live with the fact if one uses encrypted / he gets ugly request for password. I would strongly support that.
Comment 20 Ismail Dönmez 2016-02-24 08:20:18 UTC
For Leap looks like bsc-939204.patch regressed this which I'll submit an update for. But that patch never made it to Tumbleweed so likely we have a different problem there.
Comment 23 Fabian Vogt 2016-02-29 18:00:14 UTC
I *believe* I found the problem.

/usr/lib/systemd/system/plymouth-start.service:

> [Unit]
> Description=Show Plymouth Boot Screen
> DefaultDependencies=no
> Wants=systemd-ask-password-plymouth.path systemd-vconsole-setup.service
> After=systemd-vconsole-setup.service systemd-udev-trigger.service systemd-udevd.service
> Before=systemd-ask-password-plymouth.service
> ConditionKernelCommandLine=!plymouth.enable=0
> 
> [Service]
> ExecStart=/usr/sbin/plymouthd --mode=boot --pid-file=/run/plymouth/pid --attach-to-session
> ExecStartPost=-/usr/bin/plymouth show-splash
> Type=forking
> KillMode=none
> SendSIGKILL=no

By replacing ExecStart and ExecStartPost with
> ExecStart=/bin/sh -c "(/usr/sbin/plymouthd --mode=boot --pid-file=/run/plymouth/pid --attach-to-session &); sleep 5; /usr/bin/plymouth show-splash; sleep 3"

it starts to work.

My theory: "plymouth show-splash" is invoked too early, so it fails.
This is ignored due to the "-" in the ExecStartPost line and all requests end up ignored.

Now:
1. Why did it work before? (Before systemd v228. maybe?)
2. What's the best solution here? My idea is to make it a "Type=notify" service.
Comment 24 Jacob W 2016-02-29 20:21:37 UTC
I can confirm that Fabian's workaround from comment 23 does NOT work for me. It changes nothing. cryptsetup, with or without plymouth still behaves like the password prompt is NOT required (it background's the job indefinitely). So when Plymouth is enabled, it crashes the boot process. The only solution is what I have proposed in bug 942940#37 which is:

Remove kernel boot parameter "splash=silent" but keep "quiet". Without "quiet" the job is ignored and put into the background.


Now I don't know if my system is suffering from this bug or bug 942940 or possibly something else.


I am cross posting my issue from bug bug 942940 because it may help:

I am now experiencing a similar issue since TW update Build20160205, maybe one previous, sure 100% sure. Before prompt worked great in TW and in 13.2.

I only have /home encrypted. LUKS without LVM. Both fstab and crypttab show the correct disk id.

Password prompt is not shown and plymouth theme never shows up, instead it freezes the system where VT switching does not work.

Forcing the system to boot without plymouth (press e on grub's kernel list) shows what looks like cryptsetup trying to prompt for the password, but instead instantly being backgrounded as a job. After more processes go through their boot procedure, the boot process stops and looks frozen. If I press a button on the keyboard, it forces the password prompt to re-appear. Now I can enter my password and everything works from there.

I tried to add/remove nofail option (all 4 combinations) in both fstab and crypttab, does not help.

I tried plymouth-set-default-theme -R <theme> and yast2 bootloader. Does not help.

Adding x-systemd.device-timeout=15 to /etc/fstab from #963526 DOES NOT help or fix this problem. Tried both plymouth and raw boot, nothing changed.


I'm attaching a photo (boot-screen-password-prompt-job-problem.jpg) I made of the boot screen right before submitting password. Please note that the last line, like mentioned above, only appears after hitting a key on the keyboard. In other words, I have to force it to the foreground.
Comment 25 Jacob W 2016-02-29 20:22:28 UTC
Created attachment 667207 [details]
cryptsetup boot password prompt goes into background as a job
Comment 26 Fabian Vogt 2016-02-29 20:43:29 UTC
(In reply to Jacob W from comment #25)
> Created attachment 667207 [details]
> cryptsetup boot password prompt goes into background as a job

That's not the plymouth prompt, so a totally different issue.
It didn't even try to use plymouth for password asking.

> Password prompt is not shown and plymouth theme never shows up, instead it freezes the system where VT switching does not work.

If even the cursor stops blinking, the kernel either froze or panics -> kernel bug.
To confirm, run on a booted system as root on a TTY:

> rcxdm stop
> plymouthd --no-daemon --no-boot-log --tty=/dev/tty0 --debug --mode=boot

and on a different tty also as root

> plymouth show-splash

and you should the the splash screen (on TTY7).

> Forcing the system to boot without plymouth (press e on grub's kernel list) shows what looks like cryptsetup trying to prompt for the password, but instead instantly being backgrounded as a job. After more processes go through their boot procedure, the boot process stops and looks frozen. If I press a button on the keyboard, it forces the password prompt to re-appear. Now I can enter my password and everything works from there.

It's not backgrounded, it's the current TTY so systemd uses it, although it's also being used by something else... -> systemd bug(?)
Comment 27 Jacob W 2016-02-29 22:10:22 UTC
Thank you Fabian for your quick replies.

(In reply to Fabian Vogt from comment #26)
> (In reply to Jacob W from comment #25)
> > Created attachment 667207 [details]
> > cryptsetup boot password prompt goes into background as a job
> 
> That's not the plymouth prompt, so a totally different issue.
> It didn't even try to use plymouth for password asking.

This is when booting without kernel options "splash=silent" and "quiet". So of course plymouth is not started, because it's not supposed to. The point of this screenshot is to show that even without plymouth, the password prompting is incorrect.

If I do have the above kernel options enabled, I do not get a splash screen. All I get is a blank black screen. VT switching does not work. This blank black screen shows up where previously, prior to this bug, I would get the plymouth password prompt.

> 
> > Password prompt is not shown and plymouth theme never shows up, instead it freezes the system where VT switching does not work.
> 
> If even the cursor stops blinking, the kernel either froze or panics ->
> kernel bug.

Logs show no mention of kernel panic. Kernel loads fine from what I can tell. Have not had a kernel panic since I started having this cryptsetup issue. I've check both /var/log/messages and systemd journald.

> To confirm, run on a booted system as root on a TTY:
> 
> > rcxdm stop
> > plymouthd --no-daemon --no-boot-log --tty=/dev/tty0 --debug --mode=boot
> 
> and on a different tty also as root
> 
> > plymouth show-splash
> 
> and you should the the splash screen (on TTY7).

I do not see the splash screen. All I see is the regular output of booting:
[ OK ] ....
[ OK ] ....

No errors, nothing unusual, no interruptions, etc. None of the above commands showed errors from what I could tell.

> 
> > Forcing the system to boot without plymouth (press e on grub's kernel list) shows what looks like cryptsetup trying to prompt for the password, but instead instantly being backgrounded as a job. After more processes go through their boot procedure, the boot process stops and looks frozen. If I press a button on the keyboard, it forces the password prompt to re-appear. Now I can enter my password and everything works from there.
> 
> It's not backgrounded, it's the current TTY so systemd uses it, although
> it's also being used by something else... -> systemd bug(?)

Exactly! The password prompt should halt all further processes until the correct password is entered, but this is not the case. The password prompt is suppressed and other things just keep loading. I call this backgrounding, because that's exactly how it looks like: the job (it's literally called that) of cryptsetup password prompt is being suppressed. Hitting a button later "reveals" the job (to me, that's like foregrounding it). Maybe that's bad use of terminology.

Yes, this looks like a systemd bug. I've been trying to say that from the beginning. Isn't this bug report a systemd issue? I'm confused. 

Like I mentioned before, I do not think plymouth is to blame, but systemd because even without plymouth, there is an issue with the password prompt (see previously attached screenshot).

See also (like you already have) my comments on bug 942940

To recap:
1) without kernel options "splash=silent" and "quiet", it's clear cryptsetup password prompt is not being correctly initiated (see previously attached screenshot).
2) with kernel option "quiet" but NOT "splash=silent" (or with "splash=verbose"), password prompt is correctly shown and halts everything until a correct password is provided. Once provded, everything continues to load. This is how it's supposed to work.
3) with both "splash=silent" and "quiet", the instance plymouth is supposed show the password prompt, what I assume is the graphical system freezes and all I get is a blank black screen. I cannot switch VT.

From the above, I am guessing that this is a systemd issue. When using plymouth, it is getting two conflicting "signals": 1) the passowrd prompt is needed and 2) don't show password prompt. This is exactly what happens without kernel options "splash=silent" and "quiet" (see previously attached screenshot): cryptsetup is asked to show password prompt and 2) the password prompt is not shown / hidden (I call this the job being backgrounded). But because there is no plymouth being loaded, the system does not freeze and you wait until other things load, press a button on the keyboard, and the password prompt is then foregrounded.
Comment 28 Fabian Vogt 2016-03-01 08:06:13 UTC
I just noticed that that also seems to solve some cases of plymouth showing a text "splash" instead of the graphical one...
Comment 29 shashank gaur 2017-07-05 04:01:19 UTC
I am also affected by the bug
strangely sometimes plymouth works fine but otherwise
it is the same black screen and no prompt for luks
workaround by Jacob W works but removes plymouth
using that for now
Comment 30 John Chufar 2017-08-12 14:02:36 UTC
1. Tumbleweeed 20170808, tried kernel parameters plymouth.enable=0, that did not work.
2. Prompt never appears and esc splash screen, showed terminal displaying repeating message of the attempt to mount crypt device.
3. Works OK in Leap 42.3
Comment 31 John Chufar 2017-08-12 16:18:53 UTC
I can confirm the only thing that works for me is the method Jacob W listed.
Remove kernel boot parameter "splash=silent" but keep "quiet"

I then enter my pw after the point the system starts prompting in the message display during boot. It still spools other output while I am typing sometimes, but works.
Comment 32 Jacob W 2017-08-12 19:24:22 UTC
I'm glad that people have found my workaround helpful.

Just like Shashank and John, this bug still affects me after all this time and I still use the workaround I gave 1.5 years ago myself. 

That a confirmed CRITICAL bootloader bug is *still* unsolved after 1.5 years is total bullshit and shows the incompetence of the person responsible. No attempt at solving it the past 1.5 years.
Comment 33 Fabian Vogt 2017-08-12 20:02:22 UTC
(In reply to Jacob W from comment #32)
> I'm glad that people have found my workaround helpful.
> 
> Just like Shashank and John, this bug still affects me after all this time
> and I still use the workaround I gave 1.5 years ago myself. 

It's not easy to debug as it only happens reproducibly on very certain setups. For instance, it no longer shows up on my system where I originally debugged the issue for a while, I'll also removed plymouth from all systems I have access to meanwhile. I recommend everyone to do the same, it's an extremely annoying and unreliable piece of software.

There's currently work ongoing to drop plymouth from openSUSE and replace it with a different kernel-based bootsplash that is free from race conditions.

Still, I adjusted the title and reassigned it to the plymouth maintainer.
Comment 34 Jacob W 2017-08-12 21:23:44 UTC
(In reply to Fabian Vogt from comment #33)
> I adjusted the title and reassigned it to the plymouth maintainer.

Great, I'm sure Zhao Qiang will do a great job fixing this critical bug!