Bug 1065464 - Don't allow multiple OpenCL ICD packages to be installed simultaneously
Summary: Don't allow multiple OpenCL ICD packages to be installed simultaneously
Status: RESOLVED FIXED
Alias: None
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Other (show other bugs)
Version: Current
Hardware: Other Other
: P5 - None : Normal (vote)
Target Milestone: ---
Assignee: Michal Srb
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-10-27 13:30 UTC by Mircea Kitsune
Modified: 2024-12-11 00:30 UTC (History)
6 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Mircea Kitsune 2017-10-27 13:30:03 UTC
I considered this important enough to warrant a bug report. The forum thread where I discovered and further explained the issue can be found here:

https://forums.opensuse.org/showthread.php/527821-Blender-crashes-at-startup-with-LLVM-error

When running the command 'zypper dup' without additional parameters, zypper wants to install the OpenCL ICD named pocl. However Mesa already uses another ICD which seems to come in the package libOpenCL1. This is a problem because from the looks of it, you may only have one ICD installed on your system at once; More than that will cause a conflict, and every OpenCL application will crash with the following error:

mesa: CommandLine Error: Option 'enable-value-profiling' registered more than once!
LLVM ERROR: inconsistency in registered CommandLine options

To test and confirm this, you can install both "libOpenCL1" and "pocl" simultaneously then attempt to run Blender 3D: The application should crash on startup with the console output quoted above. After removing the pocl package however, Blender will start and work properly again.

Until this bug is solved by the X11 developers, it might be a good idea for the openSUSE repositories to be aware of it and mark multiple OpenCL ICD packages as incompatible, only recommending that you install one at once. Users may find that OpenCL applications have suddenly stopped working, and not know what to do in order to fix the problem. A workaround for the time being is to mark pocl as "taboo / never install".
Comment 1 Andreas Stieger 2017-10-27 14:14:20 UTC
This needs to be done by the packages themselves, not repo/YaST/libzypp.

Either via direct explicit conflict, or...

Provides: opencl-icd
Conflicts: otherproviders(opencl-icd)

Assign to maintainer of both packages.
Comment 2 Martin Hauke 2017-10-29 20:32:05 UTC
When multiple ICDs are installed, libopencl1 needs to dlopen() them all
to find out which one works on the available hardware.  If they are
dynamically linked, this leads to them sharing a libllvm, which has
enough global state that this is likely to error out.  (This is a known
LLVM bug, https://bugs.llvm.org/show_bug.cgi?id=22952 , but currently
has no real fix.)

I'm regularly using multiple ICDs (pocl, nvidia-binary, intel-binary) and never had any issues and I do not really like the idea allowing only one ICD installed at once.

Afaik we're actually shipping three ICDs in Tubleweed that make use of libllvm:
  * beignet
  * pocl
  * mesa

A workaround that is used by the debian packaging team is to statically link all these packages to avoid sharing a libllvm.

Imo we should do the same. Any objections ?
Comment 3 Mircea Kitsune 2017-10-29 22:45:02 UTC
(In reply to Martin Hauke from comment #2)

I don't have any knowledge in how the OpenCL ICD is supposed to work, so I don't know whether the problem I'm seeing is normal or the result of a bug. From what you're saying, it is in fact a bug... therefore my proposal here would be a temporary workaround, not the proper permanent solution.

Whether or not it's a good idea would depend on how fast this bug is expected to be solved in LLVM: If it's going to take more than an year like a lot of issues seem to nowadays, it makes sense to me that openSUSE recommends you don't install multiple ICD's. The report you linked was opened in 2015 and last modified in 2016... since 2017 is almost over I'm definitely not getting my hopes up for a timely solution.
Comment 4 Mircea Kitsune 2017-10-29 22:48:03 UTC
Sorry, forgot to write the other part of my last comment: If statically linking them solves the crash, I definitely have no objections. This feels like the cleanest solution to my limited understanding.
Comment 5 Sergey Kondakov 2017-11-05 07:33:34 UTC
(In reply to Martin Hauke from comment #2)
> When multiple ICDs are installed, libopencl1 needs to dlopen() them all
> to find out which one works on the available hardware.  If they are
> dynamically linked, this leads to them sharing a libllvm, which has
> enough global state that this is likely to error out.  (This is a known
> LLVM bug, https://bugs.llvm.org/show_bug.cgi?id=22952 , but currently
> has no real fix.)
> 
> I'm regularly using multiple ICDs (pocl, nvidia-binary, intel-binary) and
> never had any issues and I do not really like the idea allowing only one ICD
> installed at once.
> 
> Afaik we're actually shipping three ICDs in Tubleweed that make use of
> libllvm:
>   * beignet
>   * pocl
>   * mesa
> 
> A workaround that is used by the debian packaging team is to statically link
> all these packages to avoid sharing a libllvm.
> 
> Imo we should do the same. Any objections ?

Please, use static linking hack ASAP. The upstream for this particular bug seems to be - https://bugs.llvm.org/show_bug.cgi?id=30587
But this is not limited to OpenCL, recently openSUSE builds for RPCS3 the PS3 emulator were broken because openSUSE have dropped llvm-devel-static while RPCS3 have dropped option for shared linking because they too don't want to deal with LLVM linking nonsense. Both shared and dynamic options are broken differently, recently openSUSE switched from "shared" to "dynamic" to avoid problems with the former, so now there is this.

Seeing how it's known for almost 3 years and looking at the attitude of that Google's Eric Christopher guy, it seems like it will never be fixed unless someone really qualified perseveres or gets paid.
Comment 6 Sergey Kondakov 2017-12-24 13:14:16 UTC
So, I've worked around that for myself by reverting SUSE's deletion of static LLVM, importing Debian's static linking patch for Beignet and configuring Mesa/Clover, POCL and RPCS3 to link to LLVM statically. That's all LLVM-linking software that I know.
I also patched xorg-x11-server and xdpyinfo to use display's real DPI instead of made-up 96-dot bullshit that is in fashion now.

If you want to use those changes (see diff) or try to push them into SUSE, please, be my guest. Personally, I'm not going to do it.
https://build.opensuse.org/package/show/home:X0F:HSF/llvm5
https://build.opensuse.org/package/show/home:X0F:HSF/Mesa
https://build.opensuse.org/package/show/home:X0F:HSF/beignet
https://build.opensuse.org/package/show/home:X0F:HSF/pocl
https://build.opensuse.org/package/show/home:X0F:branches:Emulators/rpcs3
https://build.opensuse.org/package/show/home:X0F:HSF/xorg-x11-server
https://build.opensuse.org/package/show/home:X0F:HSF/xdpyinfo
Comment 7 Michal Srb 2018-01-16 15:30:16 UTC
First of all, the whole purpose of ICDs is to give multiple drivers the ability to coexist on the same system and the applications a method of enumerating them and choosing one. So making the packages providing ICDs conflict with each other would defeat their purpose.

Secondly, making llvm static would have big negative impact on the distribution. It is against our guidelines (https://en.opensuse.org/openSUSE:Packaging_guidelines#Static_Libraries), which exist for good reasons. Building it statically would mean that every package that uses it would have to be rebuild and retested every time llvm changes. It would also increase the resources used for building llvm and all packages that use it. Building llvm already takes lot of resources and puts stress on our build service. I have spent last two months optimizing the llvm build to get it to at least acceptable levels.

Note that we have never distributed llvm static libraries. The recent changes in the package only changed how we get rid of them during build.

So I consider switching llvm to static as the last resort if there is no other solution.

I have reproduced this bug and analyzed it. The issue is not that libLLVM.so is loaded twice, but that Mesa loads libLLVM.so.5 while pocl loads libLLVM.so.4. That is the thing that recently changed - we have introduced llvm 5 and made it the default. Mesa was rebuilt against llvm 5, but pocl was not. It is because we have pocl version 0.14 that supports at most llvm 4.

I have updated pocl to version 1.0 (which supports llvm 5) and rebuilt it with llvm 5. It seems to solve the issue. I will double check it and if it is correct, submit the updated pocl.
Comment 8 Michal Srb 2018-01-16 15:40:12 UTC
Now I have noticed that Martin already prepared update to version 1.0 in the devel project and it is on its way to Factory and eventually to Tumbleweed:
https://build.opensuse.org/package/show/science/pocl

Mircea, can you try to install it and test?
Comment 9 Mircea Kitsune 2018-01-16 15:46:33 UTC
(In reply to Michal Srb from comment #8)

What does testing imply? This is my desktop computer, thus I can't risk playing with the software repositories in a way that might mess anything up. However I run openSUSE Tumbleweed: Once the changes are in, I can safely remove the Taboo (Never Install) lock from the package and see how the system behaves after a new 'zypper dup'.
Comment 10 Michal Srb 2018-01-16 15:51:20 UTC
(In reply to Mircea Kitsune from comment #9)
> What does testing imply? This is my desktop computer, thus I can't risk
> playing with the software repositories in a way that might mess anything up.

If you do not want to mess with repositories, it is enough if you download and install this single RPM:
https://download.opensuse.org/repositories/science/openSUSE_Tumbleweed/x86_64/pocl-1.0-36.9.x86_64.rpm

All its dependencies are in Tumbleweed repository and most likely you already have them installed.

Try to start blender, observe if it crashes or works.

You can again uninstall the pocl package after that.
Comment 11 Mircea Kitsune 2018-01-16 20:02:31 UTC
(In reply to Michal Srb from comment #10)

Thanks for the info. I tried that package and unfortunately, it still appears to induce the exact same crash as the one currently in Tumbleweed:

mircea@linux-qz0r:~> blender
Read prefs: /home/mircea/.config/blender/2.79/config/userpref.blend
mesa: CommandLine Error: Option 'enable-value-profiling' registered more than once!
LLVM ERROR: inconsistency in registered CommandLine options
Comment 12 Sergey Kondakov 2018-01-16 21:41:16 UTC
(In reply to Michal Srb from comment #7)
> I have reproduced this bug and analyzed it. The issue is not that libLLVM.so
> is loaded twice, but that Mesa loads libLLVM.so.5 while pocl loads
> libLLVM.so.4. That is the thing that recently changed - we have introduced
> llvm 5 and made it the default. Mesa was rebuilt against llvm 5, but pocl
> was not. It is because we have pocl version 0.14 that supports at most llvm
> 4.
> 
> I have updated pocl to version 1.0 (which supports llvm 5) and rebuilt it
> with llvm 5. It seems to solve the issue. I will double check it and if it
> is correct, submit the updated pocl.

You seem to completely misinterpreted it while "analyzing". This issue is about
https://bugs.llvm.org/show_bug.cgi?id=30587 and
https://bugs.llvm.org/show_bug.cgi?id=22952
LLVM-linking software concurrently messes with its global state and crashes each other. Actually, it seems that LLVM just fails to initialize to avoid later potential undefined behaviour. And the only workarounds are to link each LLVM-user against different shared version of LLVM or link them all statically. So much so that it actually preferable to link every Mesa driver that way, not just OpenCL part. They say on Mesa's mailing list that LTO may dramatically decrease the size of resulting binaries but it puts even more strain on RAM and storage while building.

> Secondly, making llvm static would have big negative impact on the
> distribution. It is against our guidelines
> (https://en.opensuse.org/openSUSE:Packaging_guidelines#Static_Libraries),
> which exist for good reasons. Building it statically would mean that every
> package that uses it would have to be rebuild and retested every time llvm
> changes. It would also increase the resources used for building llvm and all
> packages that use it. Building llvm already takes lot of resources and puts
> stress on our build service. I have spent last two months optimizing the
> llvm build to get it to at least acceptable levels.
> 
> Note that we have never distributed llvm static libraries. The recent
> changes in the package only changed how we get rid of them during build.
> 
> So I consider switching llvm to static as the last resort if there is no
> other solution.
> 

There is a reason why all other distributions do it that way (well, not linking drivers statically, they skip on that and it, probably, doesn't horribly break notebooks with Intel+AMD graphics because Intel driver doesn't use Gallium or something) and it's not because they like to keep static libraries or waste RAM on keeping copies. And why every upstream of every LLVM-linking software ends discussion on "just link it statically then". That "every package that uses it" are Mesa/Clover, Beignet, POCL and RPCS3 which is not exactly disastrously large list. I'm not sure what "rebuild and retested every time llvm changes" is supposed to mean since OBS rebuilds every package every time for any reason anyway. "Increase in resources"  for that is just RAM and storage space + several tens of megabytes in package size.

It is stupendously wasteful process but not as wasteful as normal OBS tendency to rebuild everything in spite of not having compiler cache.

PS: damn your bugzilla for losing the first version of this message !
Comment 13 Michal Srb 2018-01-18 15:18:42 UTC
Ok, updating pocl to libLLVM5 did solve my initial issue, but that was different issue than the one you are seeing. After I installed both pocl and beignet packages, I can see the same error as you:
"Option 'enable-value-profiling' registered more than once!"

The 'enable-value-profiling' is option from clang, specifically from the CodeGen component. It is registered in a constructor of a static variable in clang. It registers itself by calling a llvm function which stores it into a map that is also stored in a static variable in the llvm library.

The issue here is that both pocl and beignet are linked with static libclangCodeGen.a and both link dynamically with libLLVM.so. So each of them has their own copy of clang's CodeGen. They both try to register the same option during initialization and end up saving it into the same map in the shared llvm library. The second attempt to register it fails.

Compiling all clang and llvm libraries as static would be a solution, true. Similarly compiling them all as dynamic libraries would work too, but originally I thought we can not do that because using BUILD_SHARED_LIBS=ON is not supported and buggy and LLVM_LINK_LLVM_DYLIB=ON does not work with libclang. I've checked what other distributions do and I would like to try Fedora's approach: Combine it and build clang with BUILD_SHARED_LIBS=ON and everything else LLVM_LINK_LLVM_DYLIB=ON.

That way we hopefully 1) solve this bug 2) avoid the bugs that BUILD_SHARED_LIBS=ON was causing 3) avoid building huge amount of static libraries.

Martin, since pocl seems to be fine and this is a llvm issue, I am reassigning it to myself.
Comment 15 Swamp Workflow Management 2018-02-02 19:30:07 UTC
This is an autogenerated message for OBS integration:
This bug (1065464) was mentioned in
https://build.opensuse.org/request/show/572112 Factory / Mesa
Comment 16 Michal Srb 2018-02-06 09:55:38 UTC
The above mentioned changes finally bubbled to Tumbleweed and I was able to test them. The issue is fixed on my machine. Pocl, Beignet and other clang users now link to libclang and libLLVM libraries dynamically, so only one instance is loaded and the options are registered only once.

Closing the bug as fixed.
Comment 17 Mircea Kitsune 2018-02-06 14:26:59 UTC
Confirming that the issue was solved: I unblocked pocl and ran 'zypper dup' which reinstalled it. I can however now open Blender without it crashing or printing out any errors. Thank you!
Comment 18 Sergey Kondakov 2018-02-06 14:43:07 UTC
Thanks for such clean solution. Strange that it is not recommended by upstream yet.
Comment 25 Swamp Workflow Management 2018-10-31 17:28:25 UTC
SUSE-SU-2018:3591-1: An update that solves 10 vulnerabilities and has 17 fixes is now available.

Category: security (important)
Bug References: 1012260,1021577,1026191,1041469,1041894,1049703,1061204,1064786,1065464,1066489,1073210,1078436,1091551,1092697,1094767,1096515,1107343,1108771,1108986,1109363,1109465,1110506,1110507,703591,839074,857131,893359
CVE References: CVE-2017-16541,CVE-2018-12376,CVE-2018-12377,CVE-2018-12378,CVE-2018-12379,CVE-2018-12381,CVE-2018-12383,CVE-2018-12385,CVE-2018-12386,CVE-2018-12387
Sources used:
SUSE OpenStack Cloud 7 (src):    MozillaFirefox-60.2.2esr-109.46.1, MozillaFirefox-branding-SLE-60-32.3.1, apache2-mod_nss-1.0.14-19.6.3, mozilla-nspr-4.19-19.3.1, mozilla-nss-3.36.4-58.15.3
SUSE Linux Enterprise Software Development Kit 12-SP3 (src):    MozillaFirefox-60.2.2esr-109.46.1, mozilla-nspr-4.19-19.3.1, mozilla-nss-3.36.4-58.15.3
SUSE Linux Enterprise Server for SAP 12-SP2 (src):    MozillaFirefox-60.2.2esr-109.46.1, MozillaFirefox-branding-SLE-60-32.3.1, apache2-mod_nss-1.0.14-19.6.3, mozilla-nspr-4.19-19.3.1, mozilla-nss-3.36.4-58.15.3
SUSE Linux Enterprise Server for SAP 12-SP1 (src):    MozillaFirefox-60.2.2esr-109.46.1, MozillaFirefox-branding-SLE-60-32.3.1, apache2-mod_nss-1.0.14-19.6.3, mozilla-nspr-4.19-19.3.1, mozilla-nss-3.36.4-58.15.3
SUSE Linux Enterprise Server 12-SP3 (src):    MozillaFirefox-60.2.2esr-109.46.1, MozillaFirefox-branding-SLE-60-32.3.1, apache2-mod_nss-1.0.14-19.6.3, mozilla-nspr-4.19-19.3.1, mozilla-nss-3.36.4-58.15.3
SUSE Linux Enterprise Server 12-SP2-LTSS (src):    MozillaFirefox-60.2.2esr-109.46.1, MozillaFirefox-branding-SLE-60-32.3.1, apache2-mod_nss-1.0.14-19.6.3, mozilla-nspr-4.19-19.3.1, mozilla-nss-3.36.4-58.15.3
SUSE Linux Enterprise Server 12-SP1-LTSS (src):    MozillaFirefox-60.2.2esr-109.46.1, MozillaFirefox-branding-SLE-60-32.3.1, apache2-mod_nss-1.0.14-19.6.3, mozilla-nspr-4.19-19.3.1, mozilla-nss-3.36.4-58.15.3
SUSE Linux Enterprise Server 12-LTSS (src):    MozillaFirefox-60.2.2esr-109.46.1, MozillaFirefox-branding-SLE-60-32.3.1, mozilla-nspr-4.19-19.3.1, mozilla-nss-3.36.4-58.15.3
SUSE Linux Enterprise Desktop 12-SP3 (src):    MozillaFirefox-60.2.2esr-109.46.1, MozillaFirefox-branding-SLE-60-32.3.1, mozilla-nspr-4.19-19.3.1, mozilla-nss-3.36.4-58.15.3
SUSE Enterprise Storage 4 (src):    MozillaFirefox-60.2.2esr-109.46.1, MozillaFirefox-branding-SLE-60-32.3.1, apache2-mod_nss-1.0.14-19.6.3, mozilla-nspr-4.19-19.3.1, mozilla-nss-3.36.4-58.15.3
SUSE CaaS Platform ALL (src):    mozilla-nspr-4.19-19.3.1, mozilla-nss-3.36.4-58.15.3
SUSE CaaS Platform 3.0 (src):    mozilla-nspr-4.19-19.3.1, mozilla-nss-3.36.4-58.15.3
Comment 26 Swamp Workflow Management 2018-12-05 14:21:26 UTC
SUSE-SU-2018:3591-2: An update that solves 10 vulnerabilities and has 17 fixes is now available.

Category: security (important)
Bug References: 1012260,1021577,1026191,1041469,1041894,1049703,1061204,1064786,1065464,1066489,1073210,1078436,1091551,1092697,1094767,1096515,1107343,1108771,1108986,1109363,1109465,1110506,1110507,703591,839074,857131,893359
CVE References: CVE-2017-16541,CVE-2018-12376,CVE-2018-12377,CVE-2018-12378,CVE-2018-12379,CVE-2018-12381,CVE-2018-12383,CVE-2018-12385,CVE-2018-12386,CVE-2018-12387
Sources used:
SUSE Linux Enterprise Software Development Kit 12-SP4 (src):    MozillaFirefox-60.2.2esr-109.46.1, mozilla-nspr-4.19-19.3.1, mozilla-nss-3.36.4-58.15.3
SUSE Linux Enterprise Server 12-SP4 (src):    MozillaFirefox-60.2.2esr-109.46.1, MozillaFirefox-branding-SLE-60-32.3.1, apache2-mod_nss-1.0.14-19.6.3, mozilla-nspr-4.19-19.3.1, mozilla-nss-3.36.4-58.15.3
SUSE Linux Enterprise Desktop 12-SP4 (src):    MozillaFirefox-60.2.2esr-109.46.1, MozillaFirefox-branding-SLE-60-32.3.1, mozilla-nspr-4.19-19.3.1, mozilla-nss-3.36.4-58.15.3
Comment 30 OBSbugzilla Bot 2020-07-24 06:50:39 UTC
This is an autogenerated message for OBS integration:
This bug (1065464) was mentioned in
https://build.opensuse.org/request/show/822551 Factory / llvm10
Comment 31 OBSbugzilla Bot 2021-11-19 01:40:37 UTC
This is an autogenerated message for OBS integration:
This bug (1065464) was mentioned in
https://build.opensuse.org/request/show/932377 Backports:SLE-15-SP3 / llvm12
Comment 36 OBSbugzilla Bot 2023-05-25 08:35:10 UTC
This is an autogenerated message for OBS integration:
This bug (1065464) was mentioned in
https://build.opensuse.org/request/show/1088949 Backports:SLE-15-SP4 / llvm15
Comment 41 OBSbugzilla Bot 2024-03-12 09:55:06 UTC
This is an autogenerated message for OBS integration:
This bug (1065464) was mentioned in
https://build.opensuse.org/request/show/1157115 Backports:SLE-15-SP5 / llvm17