Bug 1083860 - nvidia 390.42: Switching to VT and back restarts/kills X applications
nvidia 390.42: Switching to VT and back restarts/kills X applications
Status: RESOLVED NORESPONSE
Classification: openSUSE
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: X11 3rd Party Driver
Current
Other Other
: P3 - Medium : Normal (vote)
: ---
Assigned To: Stefan Dirsch
Stefan Dirsch
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2018-03-04 10:15 UTC by Peter Sütterlin
Modified: 2018-07-20 10:04 UTC (History)
5 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---
sndirsch: needinfo? (P.Suetterlin)


Attachments
requested logfile from nvidia-bugreport (412.02 KB, application/gzip)
2018-03-05 09:50 UTC, Peter Sütterlin
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Peter Sütterlin 2018-03-04 10:15:13 UTC
I'm running TW on a desktop with nvidia GTX 1060 and proprietary driver from the nvidia repo.  I run standard plasma5 desktop.

If I switch to a VT (using CTRL-ALT-F<n>) and then back to the X session (ALT-F7), I get a message that 'Desktop effects have been restarted due to a graphics reset'.  Other applications, like my browser vivaldi, fall completely dead with a black window and 100% CPU usage.

Not sure when it started exactly, *might* be with the update to 390.25 some time ago.  Probably similar to boo1027696?
Comment 1 Peter Sütterlin 2018-03-04 10:31:48 UTC
Some more observations:

The 100% CPU was actually systemd compressing the coredump.  Vivaldi crashed (and restarted afterwards).  I had started it from a terminal, and there got a message

[18396:18396:0304/111944.173400:ERROR:gles2_cmd_decoder.cc(5299)] GLES2DecoderImpl::ResizeOffscreenFramebuffer failed because offscreen FBO was incomplete.
[18396:18396:0304/111944.173693:ERROR:gles2_cmd_decoder.cc(3679)] ContextResult::kFatalFailure: Could not allocate offscreen buffer storage.
[1:1:0304/111944.371854:ERROR:command_buffer_proxy_impl.cc(134)] ContextResult::kTransientFailure: Failed to send GpuChannelMsg_CreateCommandBuffer.

Full coredump is 20MB, so I won't attach unless asked to (can also make it available via ftp).

(This was an empty session, just the start screen - the previous had many open tabs, and was 200MB compressed...)
Comment 2 Felix Miata 2018-03-04 15:16:44 UTC
For around 28 months, which was when I switched from 13.1 to 42.1, going to VT and back has been putting SeaMonkey into offline mode on leave and out of offline mode on return. I know this because on return to KDE3 Chatzilla has to reconnect, while KSirc just keeps on as if nothing happened, and if I spent any considerable time in VT and I went straight to mail, I would find it having just started a fetch. I'm not sure how I could find out if Firefox goes offline and back. I've never been able to find anyone experiencing similar behavior until the mailing list thread that lead me here. I'm using Intel gfx with the modesetting X driver, but was probably still using xf86-video-intel at the 13.1 to 42.1 switch when this was first observed. It's very annoying having the CZ reconnects interrupting IRC conversations whenever I want to use a VT.
Comment 3 Peter Sütterlin 2018-03-04 15:34:48 UTC
TBH, I don't think that is related.  It rather looks like a (private) network config via NM and/or session management.  E.g., if I am playing music in audacious and switch to a VT music stops....
Comment 4 Felix Miata 2018-03-04 15:51:01 UTC
"It" what?

# zypper se -si ager
...
S | Name                   | Type    | Version        | Arch   | Repository
--+------------------------+---------+----------------+--------+-----------
i | vcdimager              | package | 0.7.24cvs-16.3 | x86_64 | OSS
i | yast2-packager         | package | 3.2.25-3.1     | x86_64 | Update
i | yast2-services-manager | package | 3.2.1-1.4      | noarch | OSS

When I switch to a VT, VLC continues (audio).
Comment 5 Stefan Dirsch 2018-03-04 21:22:41 UTC
Please provide results of running nvidia-bugreport.sh.
Comment 6 Peter Sütterlin 2018-03-05 09:50:19 UTC
Created attachment 762648 [details]
requested logfile from nvidia-bugreport

I hope this is fine - I ran it remotely (as root), don't have physical access to the machine before tonight....
Comment 7 Michal Srb 2018-03-05 10:19:06 UTC
The message "Desktop effects have been restarted due to a graphics reset" is KWin telling you that it handled the graphic reset correctly. KWin is quite robust and can deal with driver specialties, but not all applications can deal with them the same way. KWin uses the NV_robustness_video_memory_purge [1] OpenGL extension to detect that the GPU context was lost.

The overview of the extension describes what is happening:

> The NVIDIA OpenGL driver architecture on Linux has a limitation:
> resources located in video memory are not persistent across certain
> events. VT switches, suspend/resume events, and mode switching
> events may erase the contents of video memory. Any resource that
> is located exclusively in video memory, such as framebuffer objects
> (FBOs), will be lost. As the OpenGL specification makes no mention
> of events where the video memory is allowed to be cleared, the
> driver attempts to hide this fact from the application, but cannot
> do it for all resources.
>
> This extension provides a way for applications to discover when video
> memory content has been lost, so that the application can re-populate
> the video memory content as necessary.
>
> This extension will have a limited lifespan, as planned architectural
> evolutions in the NVIDIA Linux driver stack will allow
> video memory to be persistent. Any driver that exposes this
> extension is a driver that considers video memory to be
> volatile. Once the driver stack has been improved, the extension
> will no longer be exposed.

The Vivaldi error message "failed because offscreen FBO was incomplete" matches that - FBO was lost, now rendering into it fails and apparently that puts it into some kind of infinite loop. It is possible that Vivaldi is at fault here - maybe it created context claiming that it can handle resets, but then it actually does not handle them. (Just a wild guess.) Are there any other applications that break after VT-switch? Ideally some that we distribute in OpenSUSE.

It may also be that the reset is no longer supposed to happen with the current nvidia drivers and it is a bug, nvidia could answer that.

[1] https://www.khronos.org/registry/OpenGL/extensions/NV/NV_robustness_video_memory_purge.txt
Comment 8 Stefan Dirsch 2018-03-05 10:29:54 UTC
Ok. Better add NVIDIA at this point. Possibly Daniel can shed a light on this.
Comment 9 Peter Sütterlin 2018-03-05 10:44:37 UTC
@Michal:  Thanks for this detailed explanation!

(In reply to Michal Srb from comment #7)

> The Vivaldi error message "failed because offscreen FBO was incomplete"
> matches that - FBO was lost, now rendering into it fails and apparently that
> puts it into some kind of infinite loop. It is possible that Vivaldi is at
> fault here - maybe it created context claiming that it can handle resets,
> but then it actually does not handle them. (Just a wild guess.) Are there
> any other applications that break after VT-switch? Ideally some that we
> distribute in OpenSUSE.

I'll try some other applications tonight and report back.
And will also send a bug report to the vivaldi team.

> It may also be that the reset is no longer supposed to happen with the
> current nvidia drivers and it is a bug, nvidia could answer that.

If I find the time I might try installing an older version of the driver manually, as I cannot remember having seen it earlier.
Comment 10 Stefan Dirsch 2018-03-27 02:03:59 UTC
Meanwhile NVIDIA has released driver version 390.42 ...

This should be tried first.
Comment 11 Peter Sütterlin 2018-03-27 08:02:20 UTC
Yes, running it here (with TW 20180320).

Symptoms are unchanged, still Desktop Effects notify a restart, and Vivaldi coredumps :(
Comment 12 Michael Hirmke 2018-03-27 09:47:58 UTC
I also still get the message "Desktop effects were restarted due to a graphics reset" from KWin after switching to a console window and back to the X window.

Latest Tumbleweed snaspshot, latest nvidia driver from https://download.nvidia.com/opensuse/tumbleweed.
Graphics adapter is:
NVIDIA Corporation GM107 [GeForce GTX 750] (rev a2)
Comment 13 Michal Srb 2018-03-27 10:44:41 UTC
The kwin message may be annoying and, as far as I can tell, useless to the user. We could suggest upstream to remove it.

But if Vivaldi is the only 3D application that crashes, then it seems like a bug in Vivaldi which we can not do anything about. It should be reported to them.
Comment 14 Peter Sütterlin 2018-03-27 10:55:15 UTC
(In reply to Michal Srb from comment #13)
> The kwin message may be annoying and, as far as I can tell, useless to the
> user. We could suggest upstream to remove it.

It's a timed notification, and vanishes after a few seconds.  I don't find it that annoying TBH.  

> But if Vivaldi is the only 3D application that crashes, then it seems like a
> bug in Vivaldi which we can not do anything about. It should be reported to
> them.

As I mentioned, the bug has been reported (but the vivaldi bugtracker is not public, so I couldn't link to it).

The question is rather why is this reset necessary now, where it wasn't before 390.25(?).  That's a question for nvidia developers though....
Comment 15 Stefan Dirsch 2018-04-09 10:02:58 UTC
Meanwhile 390.48 has been released. But this issue hasn't been mentioned in the changelog as been fixed. So likely this update won't help. :-(
Comment 16 Stefan Dirsch 2018-06-05 13:28:11 UTC
Meanwhile 390.59 is available.
Comment 17 Stefan Dirsch 2018-07-06 13:25:50 UTC
And meanwhile we're at 390.67. Please retest ..
Comment 18 Peter Sütterlin 2018-07-06 13:34:08 UTC
Sorry, I cannot.  The affected machine with TW and NVidia card is in my Stockholm appartement and offline, as I work in Spain during summer/autumn.  The machine I have here runs Leap 42.3 (because it sits behind a very slow ADSL line), which does not produce the failure....
So I'm currently out of options :(
Comment 19 Stefan Dirsch 2018-07-20 10:04:58 UTC
Ok, please reopen once you can provide feedback. In case things got fixed by a later driver please let us know as well. Thanks a lot!