Bug 1113533 - VLC crashes with Mesa 18.2.3
VLC crashes with Mesa 18.2.3
Status: RESOLVED FIXED
Classification: openSUSE
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: X.Org
Current
Other Other
: P3 - Medium : Normal (vote)
: ---
Assigned To: Michal Srb
E-mail List
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2018-10-26 13:21 UTC by Fabian Vogt
Modified: 2018-12-12 12:40 UTC (History)
3 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
Workaround for applications that create two GL contexts, each using different loader. (1.55 KB, patch)
2018-11-15 12:34 UTC, Michal Srb
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Fabian Vogt 2018-10-26 13:21:03 UTC
When playing a video:

Thread 22 "vlc" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffc4168700 (LWP 3287)]
0x0000000000000000 in ?? ()
(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x00007fffcd5a5489 in put_image_shm (stride=4096, height=640, width=1024, y=<optimized out>,
    x=<optimized out>, offset=<optimized out>, shmaddr=<optimized out>, shmid=<optimized out>,
    dPriv=<optimized out>) at drisw.c:89
#2  drisw_put_image_shm (drawable=<optimized out>, shmid=<optimized out>, shmaddr=<optimized out>,
    offset=<optimized out>, x=<optimized out>, y=<optimized out>, width=1024, height=640,
    stride=4096) at drisw.c:188
#3  0x00007fffcd6d9dc5 in dri_sw_displaytarget_display (ws=0x555555647dc0, dt=<optimized out>,
    context_private=0x555555670e90, box=0x0) at dri_sw_winsys.c:266
#4  0x00007fffcd5a550d in drisw_present_texture (sub_box=0x0, ptex=0x5555558e1300,
    dPriv=0x555555645360) at drisw.c:201
#5  drisw_copy_to_front (ptex=0x5555558e1300, dPriv=0x555555645360) at drisw.c:218
#6  drisw_swap_buffers (dPriv=0x555555645360) at drisw.c:245
#7  0x00007fffccdb4b2e in ?? () from /usr/lib64/libEGL_mesa.so.0
#8  0x00007fffccda7d76 in eglSwapBuffers () from /usr/lib64/libEGL_mesa.so.0
#9  0x00007fffb7503b16 in ?? () from /usr/lib64/vlc/plugins/video_output/libgl_plugin.so
#10 0x00007fffb7507f6d in ?? () from /usr/lib64/vlc/plugins/video_output/libgl_plugin.so
#11 0x00007ffff7d1e786 in ?? () from /usr/lib64/libvlccore.so.9
#12 0x00007ffff7d205b6 in ?? () from /usr/lib64/libvlccore.so.9
#13 0x00007ffff7f79ed4 in start_thread () from /lib64/libpthread.so.0
#14 0x00007ffff7ea6cff in clone () from /lib64/libc.so.6

This seems to be a regression introduced by https://github.com/mesa3d/mesa/commit/cf54bd5e8381dba18d52fe438acda20cc1685bf3

ws->lf->put_image_shm points to drisw_put_image_shm, but that is itself just a wrapper around DRIswrastLoaderExtension::swrast_loader.
In this case it's NULL (not implemented on DRI2 with EGL), so it crashes.

AFAICT the check for if (ws->lf->put_image_shm) in line 134 is incorrect or at least not sufficient.
Comment 1 Fabian Vogt 2018-10-26 13:29:09 UTC
(In reply to Fabian Vogt from comment #0)
> In this case it's NULL (not implemented on DRI2 with EGL), so it crashes.

By removing the EGL plugin GLX is used, so rm /usr/lib64/vlc/plugins/video_output/libegl_x11_plugin.so works around the bug on X11.
Comment 3 Stefan Dirsch 2018-10-26 13:44:18 UTC
No, this fix is already in ...
Comment 4 Michal Srb 2018-10-26 15:39:18 UTC
The check for `(ws->lf->put_image_shm)` should be enough because the `drisw_init_screen` will set it if and only if the loader is version >= 4 and the `loader->putImageShm` is non-null.

I was able to reproduce it. It seems that `drisw_init_screen` is called twice - first call originates from Qt and is with loader version 4 that has putImageShm, so the put_image_shm is set, second call originates from the libegl_x11_plugin and has loader version 1 - so the put_image_shm is not set, but as it is global variable, it remains set from before. Later the put_image_shm is called with the second loader and the crash happens.

I don't see any obvious fix for this upstream. If we need a quick fix, we could disable the SHM for swrast.
Comment 5 Fabian Vogt 2018-10-26 18:11:20 UTC
Funny - while I debugged this in gdb I suggested that VLC should default to GLX on X11 and only use EGL on wayland as Qt uses GLX anyway and EGL on X11 is known to be buggy. With the root cause known there is a compelling reason to do this sooner than later now.

Is it supported by the API to have both an EGL and GLX context open?
Comment 6 Dominique Leuenberger 2018-11-06 14:03:06 UTC
(In reply to Michal Srb from comment #4)
> I don't see any obvious fix for this upstream. If we need a quick fix, we
> could disable the SHM for swrast.

That might be an option for short-term. Would that be a regression compared to Mesa 18.1 which we currently ship in TW?
Comment 7 Michal Srb 2018-11-13 08:21:10 UTC
(In reply to Dominique Leuenberger from comment #6)
> (In reply to Michal Srb from comment #4)
> > I don't see any obvious fix for this upstream. If we need a quick fix, we
> > could disable the SHM for swrast.
> 
> That might be an option for short-term. Would that be a regression compared
> to Mesa 18.1 which we currently ship in TW?

Sorry for the late reply. It would not be regression. The use of SHM in drisw was introduced by commit 63c427fa, which is not yet present in Mesa 18.1.7, which we have in Tumbleweed.
Comment 8 Dominique Leuenberger 2018-11-15 12:14:13 UTC
(In reply to Michal Srb from comment #7)
> Sorry for the late reply. It would not be regression. The use of SHM in
> drisw was introduced by commit 63c427fa, which is not yet present in Mesa
> 18.1.7, which we have in Tumbleweed.

Then, in the spirit of being able to move forward, I'd disable shm for drisw for the time being - which would at least allow us to move forwardwith Mesa 18.2.x
Comment 9 Michal Srb 2018-11-15 12:32:11 UTC
(In reply to Fabian Vogt from comment #5)
> Is it supported by the API to have both an EGL and GLX context open?

I could not find any documentation saying either yes or no. Both EGL and GLX create and bind the OpenGL context to the current thread and both have a way to rebind it to a different thread. Both allow to have different contexts active in different threads at the same time.

My expectation is that one should be able to create context using EGL in one thread and different context using GLX in another thread without them breaking each other. VLC does create them in different threads.

That would mean that using the global variable is a bug in Mesa. It did not matter until recently because the values were always the same for all contexts. But it is not true anymore, so the variable should be per-context instead of global. But I will need more time to verify my claims and implement the change.

(In reply to Dominique Leuenberger from comment #8)
> Then, in the spirit of being able to move forward, I'd disable shm for drisw
> for the time being - which would at least allow us to move forwardwith Mesa
> 18.2.x

Ok, I have already prepared patch that disables it and verified that VLC works with it. I will submit it to Mesa.
Comment 10 Michal Srb 2018-11-15 12:34:16 UTC
Created attachment 789834 [details]
Workaround for applications that create two GL contexts, each using different loader.
Comment 12 Swamp Workflow Management 2018-11-15 17:50:10 UTC
This is an autogenerated message for OBS integration:
This bug (1113533) was mentioned in
https://build.opensuse.org/request/show/649332 Factory / Mesa
Comment 14 Swamp Workflow Management 2018-11-17 23:10:13 UTC
This is an autogenerated message for OBS integration:
This bug (1113533) was mentioned in
https://build.opensuse.org/request/show/649970 Factory / Mesa
Comment 17 Michal Srb 2018-11-23 15:07:16 UTC
I made a proper (I hope) fix for the issue and sent it to mesa-dev:
https://lists.freedesktop.org/archives/mesa-dev/2018-November/210290.html
https://lists.freedesktop.org/archives/mesa-dev/2018-November/210291.html

Lets see what Mesa developers think about it. If its accepted, we can replace the workaround with it.
Comment 19 Michal Srb 2018-11-27 11:29:56 UTC
I have now replaced the workaround with the two new patches. Closing the bug.
Comment 20 Swamp Workflow Management 2018-11-27 11:40:10 UTC
This is an autogenerated message for OBS integration:
This bug (1113533) was mentioned in
https://build.opensuse.org/request/show/652142 Factory / Mesa
Comment 23 Swamp Workflow Management 2018-11-29 14:00:12 UTC
This is an autogenerated message for OBS integration:
This bug (1113533) was mentioned in
https://build.opensuse.org/request/show/652620 Factory / Mesa
Comment 24 Swamp Workflow Management 2018-12-07 14:50:10 UTC
This is an autogenerated message for OBS integration:
This bug (1113533) was mentioned in
https://build.opensuse.org/request/show/656073 Factory / Mesa
Comment 26 Swamp Workflow Management 2018-12-11 14:30:12 UTC
This is an autogenerated message for OBS integration:
This bug (1113533) was mentioned in
https://build.opensuse.org/request/show/657204 Factory / Mesa
Comment 28 Swamp Workflow Management 2018-12-12 12:40:12 UTC
This is an autogenerated message for OBS integration:
This bug (1113533) was mentioned in
https://build.opensuse.org/request/show/657521 Factory / Mesa