Bug 1155687 - Bad install when using Tumbleweed live iso (20191030)
Bad install when using Tumbleweed live iso (20191030)
Status: RESOLVED FIXED
Classification: openSUSE
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Installation
Current
Other Other
: P5 - None : Normal (vote)
: ---
Assigned To: Fabian Vogt
Jiri Srain
https://trello.com/c/QfwcLZBb
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2019-11-02 13:56 UTC by Neil Rickert
Modified: 2020-01-29 23:40 UTC (History)
6 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
Yast logs for the install (2.57 MB, application/x-xz)
2019-11-02 13:58 UTC, Neil Rickert
Details
compressed tar with "/etc/group" and "/etc/passwd" (881 bytes, application/x-compressed-tar)
2019-11-02 14:01 UTC, Neil Rickert
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Neil Rickert 2019-11-02 13:56:24 UTC
User-Agent:       Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0
Build Identifier: 

I was alerted to this via a forum post:
https://forums.opensuse.org/showthread.php/538068-You-can-t-install-from-the-live-isos-Better-use-the-installation-iso

So I downloaded the KDE live iso to give it a try.

I mostly went with defaults for installing KDE, except that I used "ext4" for root and "/home".  This was in a KVM virtual machine.

The install itself seemed to go fine.  The one peculiarity was that, at the end, it said it would reboot.  But this was install from a live system, so of course it just dropped back to the live system instead of rebooting.

The real problem came when I attempted to boot the newly installed system.  I  finished up with a black screen.  Retrying, with " 3" at the end of the kernel bootline, I did successfully boot to a command line.  There were several failure messages during boot.  The seemed to indicate that some required system users and system groups were missing.

I will attach Yast logs for the install, and a copy of the installed "/etc/passwd" and "/etc/group".

Reproducible: Always
Comment 1 Neil Rickert 2019-11-02 13:58:23 UTC
Created attachment 823079 [details]
Yast logs for the install
Comment 2 Neil Rickert 2019-11-02 14:01:23 UTC
Created attachment 823080 [details]
compressed tar with "/etc/group" and "/etc/passwd"

Boot messages indicating missing users "sddm", "polkitd" and several others.  Also some missing system groups.  Some services failed to start because of this.  Attachment includes group and passwd files soon after install.
Comment 3 Andrei Borzenkov 2019-11-02 20:19:00 UTC
(In reply to Neil Rickert from comment #2)
> Boot messages indicating missing users "sddm", "polkitd" and several others.

Bug is actually rather interesting. Following happens

1. Live system has most users in its /etc/passwd.
2. Scripts used by various system-user-* packages basically do

getent user || useradd user

3. During installation /run from Live system is bind-mounted on target /mnt/run (where /mnt is root of system being installed). This makes nscd socket available to scripts run by rpm which are chrooted to /mnt.

4. So during installation getent contacts nscd *in Live system*, and gets positive answer. As result, user creation is skipped.

Possible fixes are

1. Do not use getent, parse /etc/passwd directly
2. Stop nscd during installation in Live
3. Do not bind mount /run onto /mnt/run, just explicitly mount tpmfs on /mnt/run

I personally favor 3, I do not see why rpm would need access to "parent" /run during installation. Also there are packages that use getent || useradd directly.

To illustrate:

linux@10:~> cat /mnt/etc/passwd
root:x:0:0:root:/root:/bin/bash
linux@10:~> sudo rpm --root /mnt --dbpath /var/lib/rpm -U --percent --noglob --force --nodeps -- system-user-lp-20170617-8.2.noarch.rpm
%% 0.000000
%% 0.000000
useradd -r -s /sbin/nologin -c "Printing daemon" -g lp -d /var/spool/lpd lp
%% 0.000000
%% 46.610172
warning: user lp does not exist - using root
warning: group lp does not exist - using root
%% 73.728813
%% 100.000000
linux@10:~> cat /mnt/etc/passwd
root:x:0:0:root:/root:/bin/bash
linux@10:~> sudo umount /mnt/run
linux@10:~> sudo rpm --root /mnt --dbpath /var/lib/rpm -U --percent --noglob --force --nodeps -- system-user-lp-20170617-8.2.noarch.rpm
%% 0.000000
%% 0.000000
useradd -r -s /sbin/nologin -c "Printing daemon" -U -d /var/spool/lpd lp
%% 0.000000
%% 46.610172
%% 73.728813
%% 100.000000
linux@10:~> cat /mnt/etc/passwd
root:x:0:0:root:/root:/bin/bash
lp:x:499:499:Printing daemon:/var/spool/lpd:/sbin/nologin
linux@10:~> exit

Cc maintainers of yast2-installation and sysuser-tools.
Comment 4 Thorsten Kukuk 2019-11-02 20:29:48 UTC
(In reply to Andrei Borzenkov from comment #3)
> (In reply to Neil Rickert from comment #2)
> > Boot messages indicating missing users "sddm", "polkitd" and several others.
> 
> Bug is actually rather interesting. Following happens
> 
> 1. Live system has most users in its /etc/passwd.
> 2. Scripts used by various system-user-* packages basically do
> 
> getent user || useradd user

All packages creating users are doing this.

> 3. During installation /run from Live system is bind-mounted on target
> /mnt/run (where /mnt is root of system being installed). This makes nscd
> socket available to scripts run by rpm which are chrooted to /mnt.

Why do we run nscd in such a case? doesn't make any sense.

> 4. So during installation getent contacts nscd *in Live system*, and gets
> positive answer. As result, user creation is skipped.
> 
> Possible fixes are
> 
> 1. Do not use getent, parse /etc/passwd directly

This "fixes" only your very special use case and breaks all default installations. Beside that this would mean touching all packages creating users.
Comment 5 Neil Rickert 2019-11-02 22:04:24 UTC
Thanks for the analysis.

I would guess that one benefit from bind mounting "/run" is that it gives access to "/etc/resolv.conf" (via a symlink).

I have since noticed one additional problem with the install.  I scrolled through "/etc/shadow".  The entry added for root looks okay.  But the entry added for the created user has only a 13-char encrypted password.  It presumably uses the old DES crypt algorithm
Comment 6 Neil Rickert 2019-11-06 21:49:32 UTC
I tried a new install with
 openSUSE-Tumbleweed-KDE-Live-x86_64-Snapshot20191104-Media.iso
As before, this result in failures because required users/groups did not exist.

Following the suggestion in comment #3, I then repeated the install.  But this time I stopped "nscd" before the install
  systemctl stop nscd.service

This gave a mostly good install.  I got a couple of errors near the end of the install, about a failure to access repos.  This was after most of the install, and probably related to setup up the repos for the installed system.  I ignored the errors, and the installed system booted without problems.
Comment 7 Steffen Winterfeldt 2019-11-14 09:51:31 UTC
> But the entry added for the created user has only a 13-char encrypted password.  > It presumably uses the old DES crypt algorithm

Good catch. See bug 1156552.
Comment 8 Lukas Ocilka 2019-11-14 09:56:57 UTC
(In reply to Steffen Winterfeldt from comment #7)
> Good catch. See bug 1156552.

Or maybe even bug 1155735
Comment 9 Steffen Winterfeldt 2019-11-14 12:10:53 UTC
I would leave out the password encryption issue here as that's tracked elsewhere (see last two comments) and keep this bug only for the nscd issue.
Comment 10 Fabian Vogt 2019-12-05 21:07:56 UTC
This was not caught by openQA, because the graphical session still got up and worked as expected, but systemctl status was indeed quite red.

live-net-installer stops nscd.service now before starting YaST: https://build.opensuse.org/request/show/754515

I'll keep this open for YaST though, as it's something that YaST should be (explicitly) aware of - more fallout of the /run change.
Comment 11 Lukas Ocilka 2019-12-06 12:19:50 UTC
Fixed by Fabian (comment #10).
If there is anything else, please, report another bug.
Comment 12 Swamp Workflow Management 2020-01-22 15:50:25 UTC
This is an autogenerated message for OBS integration:
This bug (1155687) was mentioned in
https://build.opensuse.org/request/show/766355 15.1 / live-net-installer
Comment 13 Swamp Workflow Management 2020-01-28 14:15:36 UTC
openSUSE-RU-2020:0118-1: An update that has one recommended fix can now be installed.

Category: recommended (moderate)
Bug References: 1155687
CVE References: 
Sources used:
openSUSE Leap 15.1 (src):    live-net-installer-1.0-lp151.8.3.1
Comment 14 Swamp Workflow Management 2020-01-29 20:43:58 UTC
openSUSE-RU-2020:0127-1: An update that has one recommended fix can now be installed.

Category: recommended (moderate)
Bug References: 1155687
CVE References: 
Sources used:
openSUSE Backports SLE-15-SP1 (src):    live-net-installer-1.0-bp151.2.3.6