Bug 1125760 - Multiple graphics stack restarts during GDM startup on NVidia driver
Multiple graphics stack restarts during GDM startup on NVidia driver
Status: IN_PROGRESS
Classification: openSUSE
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: GNOME
Current
x86-64 Other
: P3 - Medium : Normal (vote)
: ---
Assigned To: E-mail List
E-mail List
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2019-02-18 10:50 UTC by Hadrien Grasland
Modified: 2020-06-04 13:04 UTC (History)
1 user (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
"journalctl --boot=-0" log of a boot where the problem occurred (982.05 KB, text/plain)
2019-02-18 10:50 UTC, Hadrien Grasland
Details
Xorg.0.log (26.81 KB, text/plain)
2019-02-18 12:55 UTC, Hadrien Grasland
Details
Xorg.0.log.old (25.52 KB, text/plain)
2019-02-18 12:56 UTC, Hadrien Grasland
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Hadrien Grasland 2019-02-18 10:50:06 UTC
Created attachment 797058 [details]
"journalctl --boot=-0" log of a boot where the problem occurred

Using the current packages from Tumbleweed and the nvidia-tumbleweed repo (snapshot 20190214, kernel 4.20.7-1-default, "G05" nvidia driver stack v410.93-20.1 on a Quadro K620), the graphics stack can end up restarting multiple times during GDM startup. After GDM manages to start, everything works normally.

Depending on luck, GDM can show up right away, or the graphics stack can crash a dozen times before reaching a stable state.

This is not a new problem, it probably dates back from the 4.19 or 4.20 kernel upgrade if memory serves me well. At the time, I also had to switch from G04 to the newly released G05 due to issues with G04 (IIRC it wouldn't build), and this is where the problem appeared.

I apologize for not finding the time to report this problem earlier, and for the somewhat imprecise reporting that ensues.

A journalctl log is attached. You can see that there is a large amount of systemd errors during the X11 start loop, but I'm not knowledgeable enough about either systemd or the graphics stack to tell if this is the cause of the problem or a symptom of it.
Comment 1 Stefan Dirsch 2019-02-18 11:18:58 UTC
Hmm. Nothing obvious I could spot. Could you also attach the files

/var/lib/gdm/.local/share/xorg/Xorg.0.log
/var/lib/gdm/.local/share/xorg/Xorg.0.log.old
Comment 2 Hadrien Grasland 2019-02-18 12:55:40 UTC
Created attachment 797065 [details]
Xorg.0.log
Comment 3 Hadrien Grasland 2019-02-18 12:56:04 UTC
Created attachment 797066 [details]
Xorg.0.log.old
Comment 4 Hadrien Grasland 2019-02-18 12:56:22 UTC
There you go :)
Comment 5 Stefan Dirsch 2019-02-18 14:47:01 UTC
Hmm. The X logfile looks fine. I see a egular termination of Xserver. By the way, the build issues for G04 with Kernel 4.20 have been fixed meanwhile. The issue might be related to the displaymanager gdm and may not occur with xdm and/or other displaymanagers like sddm and lightdm. Feel free to give it a try via

  update-alternatives --config default-displaymanager
Comment 6 Hadrien Grasland 2019-02-18 16:42:49 UTC
(In reply to Stefan Dirsch from comment #5)
> Hmm. The X logfile looks fine. I see a egular termination of Xserver. By the
> way, the build issues for G04 with Kernel 4.20 have been fixed meanwhile.
> The issue might be related to the displaymanager gdm and may not occur with
> xdm and/or other displaymanagers like sddm and lightdm. Feel free to give it
> a try via
> 
>   update-alternatives --config default-displaymanager

Thanks for the tips. I will experiment with other display managers and G04 as soon as I can close my graphical session and report back any findings.

Checking the original journalctl log again, I noticed that there is a gdm crash report before the first X initialization log (gdm is started at line 1446 and crashes due to not finding a display at line 1525, whereas the first X11 initialization log starts at line 1576). Maybe there is some kind of X11 / gdm startup race going on there ? Or maybe the X11 initialization logs are just sent to the journal at a later time...
Comment 7 Hadrien Grasland 2019-02-18 17:22:35 UTC
Indeed, there is no X11 restart loop with lightdm (at least across my 3 reboot attemps). Will now try the G04 + gdm combination...
Comment 8 Hadrien Grasland 2019-02-18 17:52:58 UTC
...and gdm goes into a restart loop every bit as effectively on G04. So there is no G04 / G05 difference as I expected from my previous observations.

I guess the trigger must have been another update from around that time, then. Tumbleweed snapshots were quite feature-packed at the end of last year, and this is my work machine so it is not always updated on a per-snapshot basis but may leap across several snapshots from time to time.
Comment 9 Stefan Dirsch 2019-02-19 08:24:19 UTC
(In reply to Hadrien Grasland from comment #6)
> Checking the original journalctl log again, I noticed that there is a gdm
> crash report before the first X initialization log (gdm is started at line
> 1446 and crashes due to not finding a display at line 1525, whereas the
> first X11 initialization log starts at line 1576). Maybe there is some kind
> of X11 / gdm startup race going on there ? Or maybe the X11 initialization
> logs are just sent to the journal at a later time...

Thanks for closer looking at it yourself!

Line 1525
févr. 18 11:16:58 linux-2ak3 gnome-shell[1572]: Failed to create backend: No GPUs with outputs found

My current guess is, that gdm is trying to figure out here, whether your GPU is capable of support Wayland or not. Indeed detecting no GPU looks fatal. Anyway our GNOME developers need to look at this with other displaymanagers like lightdm not being affected by this.