Bug 1099974

Summary: using Optimus secondary card with optirun fails on Xorg 1.20
Product: [openSUSE] openSUSE Tumbleweed Reporter: Peter Sütterlin <P.Suetterlin>
Component: X.OrgAssignee: E-mail List <xorg-maintainer-bugs>
Status: RESOLVED WORKSFORME QA Contact: E-mail List <xorg-maintainer-bugs>
Severity: Normal    
Priority: P5 - None CC: msrb, P.Suetterlin, w01dnick
Version: Current   
Target Milestone: ---   
Hardware: Other   
OS: Other   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: requested Xorg.log from optirun
Xorg.log from 1.19.6

Description Peter Sütterlin 2018-07-03 11:16:58 UTC
HW: Laptop Lenovo T460p with Optimus graphics (HD530/940MX).
I run the system on intel using the modesetting driver, and use bumblebee/optirun for specific applications.

This works fine with the standard Tumbleweed Xorg 1.19.6

In context of Bug #1099812 I installed xorg-x11-server-1.20 from the X11:Xorg repo (xorg-x11-server-1.20.0-545.1.x86_64).  With that one, optirun fails:

The second server is started properly, there are no errors logged in /var/log/Xorg.8.log (I use 'optirun glxspheres' as quick test).
The program starts, as I can see from the output on the console:
 woodstock:~% optirun glxspheres
 Polygons in scene: 62464 (61 spheres * 1024 polys/spheres)
 Visual ID of window: 0x13c
 Context is Direct

I can shortly see the outline of a window popping up, but it immediately vanishes again, and the command terminates without further errors.

After downgrading back to 1.19.6 the window opens again as expected.
Comment 1 Stefan Dirsch 2018-07-03 12:13:21 UTC
Well, you may need to update other components of X11:XOrg as well. Could you attach this /var/log/Xorg.8.log, please?
Comment 2 Peter Sütterlin 2018-07-03 12:30:27 UTC
Created attachment 775985 [details]
requested Xorg.log from optirun

Installing the server pulled some more dependencies,
(xf86-video-vesa xf86-video-nv xf86-video-nouveau xf86-video-fbdev xf86-video-amdgpu) that were also installed.  At least zypper was fine with (only) those.

Attached the requested X log with 1.20.
Comment 3 Stefan Dirsch 2018-07-03 12:56:47 UTC
Ok. I'm just wondering why there is only a NVIDIA GPU and no Intel GPU mentioned in the logfile !?!

[ 15337.860] (EE) /dev/dri/card1: failed to set DRM interface version 1.4: Permission denied
[ 15337.868] (--) PCI:*(2@0:0:0) 10de:134d:17aa:5050 rev 162, Mem @ 0xf3000000/16777216, 0xe0000000/268435456, 0xf0000000/33554432, I/O @ 0x0000d000/128
[ 15337.868] (II) LoadModule: "glx"

Not sure if the issue with /dev/dri/card1 is normal. I don't know much about Bumblebee/optirun.
Comment 4 Peter Sütterlin 2018-07-03 13:11:58 UTC
Created attachment 775994 [details]
Xorg.log from 1.19.6

This is normal.  Bumblebee starts a second X server and copies the stuff over somehow.  The attached log is from the previous 1.19.6 where things work.  It's (AFAICS) identical except for cosmetic differences...
Comment 5 Stefan Dirsch 2018-07-03 13:34:29 UTC
The only possibly relevant difference might be

--- Xorg.8.log.formatted        2018-07-03 15:28:04.802211000 +0200
+++ Xorg.8.log.old.formatted    2018-07-03 15:28:12.999221000 +0200
[...]
+ (--) Depth 24 pixmap format is 32 bpp

Have you tried more simpler examples like

- glxinfo
- glxinfo

(Mesa-demo-x package)
Comment 6 Stefan Dirsch 2018-07-03 13:36:00 UTC
I meant

- glxinfo
- glxgears

...
Comment 7 Peter Sütterlin 2018-07-03 14:18:26 UTC
Oh, I didn't spot that one :(

But no difference, also glxgears dies.  glxinfo does work, it doesn't open a window.

But I got a step further - silly me hadn't looked at syslog so far :o

I get 
Jul 03 14:51:14 woodstock.pitnet kernel: glxgears[30477]: segfault at 74 ip 00007f19ddf06cbf sp 00007f19da988b30 error 4 in i965_dri.so[7f19ddaa0000+896000]

dito for the other progs:
Jul 03 14:56:50 woodstock.pitnet kernel: glxspheres[30964]: segfault at 74 ip 00007f555f115cbf sp 00007f555b74bb30 error 4 in i965_dri.so[7f555ecaf000+896000]

and I tried google-earth-pro which provided a backtrace

Major Version 7
Minor Version 3
Build Number 0002
Build Date Jun 22 2018
Build Time 03:19:19
OS Type 3
OS Major Version 4
OS Minor Version 17
OS Build Version 3
OS Patch Version 0
Crash Signal 11
Crash Time 1530626929
Up Time 2.6509

Stacktrace from glibc:
/opt/google/earth/pro/libgoogleearth_pro.so(+0x1bdfed)[0x7f135a723fed]
/opt/google/earth/pro/libgoogleearth_pro.so(+0x1f32ac)[0x7f135a7592ac]
/lib64/libc.so.6(+0x382d0)[0x7f135a1de2d0]
/usr/lib64/dri/i965_dri.so(+0x466cbf)[0x7f12b058ccbf]
/usr/lib64/dri/i965_dri.so(+0xbbbec)[0x7f12b01e1bec]
/usr/lib64/primus/libGL.so.1(+0x2a635)[0x7f13541c3635]
/lib64/libpthread.so.0(+0x7554)[0x7f1359f8e554]
/lib64/libc.so.6(clone+0x3f)[0x7f135a2a0fdf]


i965_dri.so is from Mesa-dri.  X11:Xorg is at Mesa 18.1.3 already - maybe that one is needed for xorg 1.20?  I'll see if I can try to install those, too.
Comment 8 Stefan Dirsch 2018-07-03 14:27:20 UTC
Yeah. You probably want to update to the package versions from X11:XOrg. At least Mesa (+ subpackages) and libglvnd.
Comment 9 Peter Sütterlin 2018-07-03 14:52:37 UTC
So I updated all the (27) Mesa packages to 18.1.3
Still the same error (segfault of glxspheres).  Saw your post, installed libglvnd stuff, too.  No change.
Seeing libGL.so.1 from primus involved I fetched the source rpm and recompiled here.  Still no change :(
Comment 10 Stefan Dirsch 2018-07-03 14:57:10 UTC
I guess you need to install the debuginfo packages here ...
Comment 11 Peter Sütterlin 2018-07-03 14:58:38 UTC
Sorry for the mailflood :(

Final test for today: install xf86-video-intel (also from X11:Xorg).
With that one optirun works (but display is *much* slower)

So it looks like it really is the modesetting driver that is ***** up.
Comment 12 Peter Sütterlin 2018-07-24 12:59:03 UTC
So 1.20 just made it to TW.  Sorry for having forgotten about it - I had to switch back to 1.19, as I need a working system :o

The issue is still there with 1.20 after the update to 20180722.  

Any guides how to narrow it down?  I hopefully do have some free time to test things this week...
Comment 13 Stefan Dirsch 2018-07-24 13:11:01 UTC
When searching for optirun and X.Org 1.20 some users claim that removing the xf86-input-mouse driver (package) would have fixed the issue from them?!?. But I'm *very* sceptical on this ...
Comment 14 Peter Sütterlin 2018-07-24 13:17:58 UTC
indeed, removing the package & restarting X didn't change anything :(
Still segfault in i965_dri.so
Comment 15 Stefan Dirsch 2018-07-24 13:33:44 UTC
Well, you could install the appropriate debug packages. This should be in the log, which packages are needed and how to install them via zypper ...
Comment 16 Stefan Dirsch 2018-07-24 13:51:34 UTC
Sorry, it is gdb who tells you this. And I'm afraid you need to force the creating core files with

  ulimit -c unlimited

but probably not so easy to do this system wide ...

Then you could look into this by running 

  coredumpctl gdb

and so on to see where it crashes ...
Comment 17 Peter Sütterlin 2018-07-24 13:54:24 UTC
Yes, I had started with Mesa-dri debug.  But I have no idea how to make use of it.  I assume I'd have to start it in a debugger, but how to do with a program started via optirun?
I had tried googleearth again, but the crashlog didn't get more verbose :(

Ah, OK.  'optirun gdb glxspheres' seems to work :D

I then get

.....
[New Thread 0x7fffee1e3700 (LWP 10797)]
Visual ID of window: 0x13c
Context is Direct
OpenGL Renderer: GeForce 940MX/PCIe/SSE2
Missing separate debuginfo for /usr/lib64/libXcursor.so.1
Try: zypper install -C "debuginfo(build-id)=d8a4c63bc3f61b76bfe4afadaab51891cc25bdd5"
Missing separate debuginfo for /usr/lib64/libXrender.so.1
Try: zypper install -C "debuginfo(build-id)=113f0e0871cd5ca22351638a4ed04216d21d8532"
[New Thread 0x7fffecdd0700 (LWP 10798)]
[New Thread 0x7fffe7fff700 (LWP 10799)]
[New Thread 0x7fffe77fe700 (LWP 10800)]

Thread 3 "glxspheres" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffecdd0700 (LWP 10798)]
0x00007ffff0b2be2f in do_blit_drawpixels (pixels=0x0, unpack=0x7fffe01551f8, type=5121, format=32993, height=1080, width=1920, y=0, x=0, ctx=0x7fffe014bc70) at intel_pixel_draw.c:83
83      intel_pixel_draw.c: No such file or directory.
(gdb) bt
#0  0x00007ffff0b2be2f in do_blit_drawpixels (pixels=0x0, unpack=0x7fffe01551f8, type=5121, format=32993, height=1080, width=1920, y=0, x=0, ctx=0x7fffe014bc70) at intel_pixel_draw.c:83
#1  intelDrawPixels (ctx=0x7fffe014bc70, x=0, y=0, width=1920, height=1080, format=32993, type=5121, unpack=0x7fffe01551f8, pixels=0x0) at intel_pixel_draw.c:167
#2  0x00007ffff0780bec in _mesa_DrawPixels (width=1920, height=1080, format=32993, type=5121, pixels=0x0) at main/drawpix.c:162
#3  0x00007ffff7bbc3c5 in test_drawpixels_fast (dconfig=<optimized out>, ctx=0x7fffe0026cc0, dpy=0x7fffe0000b20) at libglfork.cpp:362
#4  display_work (vd=<optimized out>) at libglfork.cpp:402
#5  0x00007ffff6c78554 in start_thread () from /lib64/libpthread.so.0
#6  0x00007ffff6f8accf in clone () from /lib64/libc.so.6

Is this useful?
Comment 18 Stefan Dirsch 2018-07-24 14:05:22 UTC
Mesa-drivers-debugsource package still missing?
Comment 19 Peter Sütterlin 2018-07-24 14:18:31 UTC
Indeed.  I should have said, I don't really know what I'm doing :D

With that:
Thread 3 "glxspheres" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffecdd0700 (LWP 12199)]
0x00007ffff0b2be2f in do_blit_drawpixels (pixels=0x0, unpack=0x7fffe01551f8, type=5121, format=32993, height=1080, width=1920, y=0, x=0, ctx=0x7fffe014bc70) at intel_pixel_draw.c:83
83         src_format = _mesa_get_srgb_format_linear(src_format);

the BT is the same
Comment 20 Peter Sütterlin 2018-07-24 14:31:37 UTC
Yikes.  Due to a request on the factory list I had remembered that I had changed the default bridge in /etc/bumblebee/bumblebee.conf to
Bridge=primus
as it gave much better performance.  I just switched that to 
Bridge=virtualgl
and with that I do *not* get a segfault.  So likely something in the changed/adapted libGL from primus?

(I had again tried just recompiling the primus package on the local machine, but that doesn't help)
Comment 21 Mykola Krachkovsky 2018-07-24 16:30:32 UTC
Primus backend crashes when PRIMUS_UPLOAD is set to 0 (autodetect) which is default. And works fine when PRIMUS_UPLOAD explicitly set to 1 (texture) or 2 (PBO).

`PRIMUS_UPLOAD=2 optirun -b primus glxspheres`
wokrs fine, while

`PRIMUS_UPLOAD=0 optirun -b primus glxspheres`
crashes.
Comment 22 Mykola Krachkovsky 2018-07-24 16:59:23 UTC
So, primus crashes in test_drawpixels_fast, libglfork.cpp:362:
```
    primus.dfns.glDrawPixels(width, height, GL_BGRA, GL_UNSIGNED_BYTE, NULL);
```
where data is set to NULL. Is that allowed?
Comment 23 Michal Srb 2018-07-25 14:48:16 UTC
(In reply to Mykola Krachkovsky from comment #22)
> So, primus crashes in test_drawpixels_fast, libglfork.cpp:362:
> ```
>     primus.dfns.glDrawPixels(width, height, GL_BGRA, GL_UNSIGNED_BYTE, NULL);
> ```
> where data is set to NULL. Is that allowed?

Yes. If a buffer is bound to GL_PIXEL_UNPACK_BUFFER, then the pixel data are taken from that buffer and the data parameter serves as offset into it. So it can be null in that case.
Comment 24 Stefan Dirsch 2018-10-19 12:55:01 UTC
Possibly you may consider switching from Bumblebee to Prime instead. See boo#1103816.
Comment 25 Peter Sütterlin 2018-10-20 14:50:08 UTC
Oh, I completely forgot about this one, sorry :(
I had left Bridge set to virtualgl and never checked again.

So I can confirm that with my current config (TW 20181015) and xorg-x11-server-1.20.1 / Mesa-18.1.7-208.1 none of the combinations mentioned in this thread (bridge/PRIMUS_UPLOAD) crashes.  All work as expected.  

(As for prime:  I've never used that, but thought it is not/no longer supported in recent OS versions?  At least ISTR having read such comments)