Bugzilla – Bug 1065464
Don't allow multiple OpenCL ICD packages to be installed simultaneously
Last modified: 2024-12-11 00:30:14 UTC
I considered this important enough to warrant a bug report. The forum thread where I discovered and further explained the issue can be found here: https://forums.opensuse.org/showthread.php/527821-Blender-crashes-at-startup-with-LLVM-error When running the command 'zypper dup' without additional parameters, zypper wants to install the OpenCL ICD named pocl. However Mesa already uses another ICD which seems to come in the package libOpenCL1. This is a problem because from the looks of it, you may only have one ICD installed on your system at once; More than that will cause a conflict, and every OpenCL application will crash with the following error: mesa: CommandLine Error: Option 'enable-value-profiling' registered more than once! LLVM ERROR: inconsistency in registered CommandLine options To test and confirm this, you can install both "libOpenCL1" and "pocl" simultaneously then attempt to run Blender 3D: The application should crash on startup with the console output quoted above. After removing the pocl package however, Blender will start and work properly again. Until this bug is solved by the X11 developers, it might be a good idea for the openSUSE repositories to be aware of it and mark multiple OpenCL ICD packages as incompatible, only recommending that you install one at once. Users may find that OpenCL applications have suddenly stopped working, and not know what to do in order to fix the problem. A workaround for the time being is to mark pocl as "taboo / never install".
This needs to be done by the packages themselves, not repo/YaST/libzypp. Either via direct explicit conflict, or... Provides: opencl-icd Conflicts: otherproviders(opencl-icd) Assign to maintainer of both packages.
When multiple ICDs are installed, libopencl1 needs to dlopen() them all to find out which one works on the available hardware. If they are dynamically linked, this leads to them sharing a libllvm, which has enough global state that this is likely to error out. (This is a known LLVM bug, https://bugs.llvm.org/show_bug.cgi?id=22952 , but currently has no real fix.) I'm regularly using multiple ICDs (pocl, nvidia-binary, intel-binary) and never had any issues and I do not really like the idea allowing only one ICD installed at once. Afaik we're actually shipping three ICDs in Tubleweed that make use of libllvm: * beignet * pocl * mesa A workaround that is used by the debian packaging team is to statically link all these packages to avoid sharing a libllvm. Imo we should do the same. Any objections ?
(In reply to Martin Hauke from comment #2) I don't have any knowledge in how the OpenCL ICD is supposed to work, so I don't know whether the problem I'm seeing is normal or the result of a bug. From what you're saying, it is in fact a bug... therefore my proposal here would be a temporary workaround, not the proper permanent solution. Whether or not it's a good idea would depend on how fast this bug is expected to be solved in LLVM: If it's going to take more than an year like a lot of issues seem to nowadays, it makes sense to me that openSUSE recommends you don't install multiple ICD's. The report you linked was opened in 2015 and last modified in 2016... since 2017 is almost over I'm definitely not getting my hopes up for a timely solution.
Sorry, forgot to write the other part of my last comment: If statically linking them solves the crash, I definitely have no objections. This feels like the cleanest solution to my limited understanding.
(In reply to Martin Hauke from comment #2) > When multiple ICDs are installed, libopencl1 needs to dlopen() them all > to find out which one works on the available hardware. If they are > dynamically linked, this leads to them sharing a libllvm, which has > enough global state that this is likely to error out. (This is a known > LLVM bug, https://bugs.llvm.org/show_bug.cgi?id=22952 , but currently > has no real fix.) > > I'm regularly using multiple ICDs (pocl, nvidia-binary, intel-binary) and > never had any issues and I do not really like the idea allowing only one ICD > installed at once. > > Afaik we're actually shipping three ICDs in Tubleweed that make use of > libllvm: > * beignet > * pocl > * mesa > > A workaround that is used by the debian packaging team is to statically link > all these packages to avoid sharing a libllvm. > > Imo we should do the same. Any objections ? Please, use static linking hack ASAP. The upstream for this particular bug seems to be - https://bugs.llvm.org/show_bug.cgi?id=30587 But this is not limited to OpenCL, recently openSUSE builds for RPCS3 the PS3 emulator were broken because openSUSE have dropped llvm-devel-static while RPCS3 have dropped option for shared linking because they too don't want to deal with LLVM linking nonsense. Both shared and dynamic options are broken differently, recently openSUSE switched from "shared" to "dynamic" to avoid problems with the former, so now there is this. Seeing how it's known for almost 3 years and looking at the attitude of that Google's Eric Christopher guy, it seems like it will never be fixed unless someone really qualified perseveres or gets paid.
So, I've worked around that for myself by reverting SUSE's deletion of static LLVM, importing Debian's static linking patch for Beignet and configuring Mesa/Clover, POCL and RPCS3 to link to LLVM statically. That's all LLVM-linking software that I know. I also patched xorg-x11-server and xdpyinfo to use display's real DPI instead of made-up 96-dot bullshit that is in fashion now. If you want to use those changes (see diff) or try to push them into SUSE, please, be my guest. Personally, I'm not going to do it. https://build.opensuse.org/package/show/home:X0F:HSF/llvm5 https://build.opensuse.org/package/show/home:X0F:HSF/Mesa https://build.opensuse.org/package/show/home:X0F:HSF/beignet https://build.opensuse.org/package/show/home:X0F:HSF/pocl https://build.opensuse.org/package/show/home:X0F:branches:Emulators/rpcs3 https://build.opensuse.org/package/show/home:X0F:HSF/xorg-x11-server https://build.opensuse.org/package/show/home:X0F:HSF/xdpyinfo
First of all, the whole purpose of ICDs is to give multiple drivers the ability to coexist on the same system and the applications a method of enumerating them and choosing one. So making the packages providing ICDs conflict with each other would defeat their purpose. Secondly, making llvm static would have big negative impact on the distribution. It is against our guidelines (https://en.opensuse.org/openSUSE:Packaging_guidelines#Static_Libraries), which exist for good reasons. Building it statically would mean that every package that uses it would have to be rebuild and retested every time llvm changes. It would also increase the resources used for building llvm and all packages that use it. Building llvm already takes lot of resources and puts stress on our build service. I have spent last two months optimizing the llvm build to get it to at least acceptable levels. Note that we have never distributed llvm static libraries. The recent changes in the package only changed how we get rid of them during build. So I consider switching llvm to static as the last resort if there is no other solution. I have reproduced this bug and analyzed it. The issue is not that libLLVM.so is loaded twice, but that Mesa loads libLLVM.so.5 while pocl loads libLLVM.so.4. That is the thing that recently changed - we have introduced llvm 5 and made it the default. Mesa was rebuilt against llvm 5, but pocl was not. It is because we have pocl version 0.14 that supports at most llvm 4. I have updated pocl to version 1.0 (which supports llvm 5) and rebuilt it with llvm 5. It seems to solve the issue. I will double check it and if it is correct, submit the updated pocl.
Now I have noticed that Martin already prepared update to version 1.0 in the devel project and it is on its way to Factory and eventually to Tumbleweed: https://build.opensuse.org/package/show/science/pocl Mircea, can you try to install it and test?
(In reply to Michal Srb from comment #8) What does testing imply? This is my desktop computer, thus I can't risk playing with the software repositories in a way that might mess anything up. However I run openSUSE Tumbleweed: Once the changes are in, I can safely remove the Taboo (Never Install) lock from the package and see how the system behaves after a new 'zypper dup'.
(In reply to Mircea Kitsune from comment #9) > What does testing imply? This is my desktop computer, thus I can't risk > playing with the software repositories in a way that might mess anything up. If you do not want to mess with repositories, it is enough if you download and install this single RPM: https://download.opensuse.org/repositories/science/openSUSE_Tumbleweed/x86_64/pocl-1.0-36.9.x86_64.rpm All its dependencies are in Tumbleweed repository and most likely you already have them installed. Try to start blender, observe if it crashes or works. You can again uninstall the pocl package after that.
(In reply to Michal Srb from comment #10) Thanks for the info. I tried that package and unfortunately, it still appears to induce the exact same crash as the one currently in Tumbleweed: mircea@linux-qz0r:~> blender Read prefs: /home/mircea/.config/blender/2.79/config/userpref.blend mesa: CommandLine Error: Option 'enable-value-profiling' registered more than once! LLVM ERROR: inconsistency in registered CommandLine options
(In reply to Michal Srb from comment #7) > I have reproduced this bug and analyzed it. The issue is not that libLLVM.so > is loaded twice, but that Mesa loads libLLVM.so.5 while pocl loads > libLLVM.so.4. That is the thing that recently changed - we have introduced > llvm 5 and made it the default. Mesa was rebuilt against llvm 5, but pocl > was not. It is because we have pocl version 0.14 that supports at most llvm > 4. > > I have updated pocl to version 1.0 (which supports llvm 5) and rebuilt it > with llvm 5. It seems to solve the issue. I will double check it and if it > is correct, submit the updated pocl. You seem to completely misinterpreted it while "analyzing". This issue is about https://bugs.llvm.org/show_bug.cgi?id=30587 and https://bugs.llvm.org/show_bug.cgi?id=22952 LLVM-linking software concurrently messes with its global state and crashes each other. Actually, it seems that LLVM just fails to initialize to avoid later potential undefined behaviour. And the only workarounds are to link each LLVM-user against different shared version of LLVM or link them all statically. So much so that it actually preferable to link every Mesa driver that way, not just OpenCL part. They say on Mesa's mailing list that LTO may dramatically decrease the size of resulting binaries but it puts even more strain on RAM and storage while building. > Secondly, making llvm static would have big negative impact on the > distribution. It is against our guidelines > (https://en.opensuse.org/openSUSE:Packaging_guidelines#Static_Libraries), > which exist for good reasons. Building it statically would mean that every > package that uses it would have to be rebuild and retested every time llvm > changes. It would also increase the resources used for building llvm and all > packages that use it. Building llvm already takes lot of resources and puts > stress on our build service. I have spent last two months optimizing the > llvm build to get it to at least acceptable levels. > > Note that we have never distributed llvm static libraries. The recent > changes in the package only changed how we get rid of them during build. > > So I consider switching llvm to static as the last resort if there is no > other solution. > There is a reason why all other distributions do it that way (well, not linking drivers statically, they skip on that and it, probably, doesn't horribly break notebooks with Intel+AMD graphics because Intel driver doesn't use Gallium or something) and it's not because they like to keep static libraries or waste RAM on keeping copies. And why every upstream of every LLVM-linking software ends discussion on "just link it statically then". That "every package that uses it" are Mesa/Clover, Beignet, POCL and RPCS3 which is not exactly disastrously large list. I'm not sure what "rebuild and retested every time llvm changes" is supposed to mean since OBS rebuilds every package every time for any reason anyway. "Increase in resources" for that is just RAM and storage space + several tens of megabytes in package size. It is stupendously wasteful process but not as wasteful as normal OBS tendency to rebuild everything in spite of not having compiler cache. PS: damn your bugzilla for losing the first version of this message !
Ok, updating pocl to libLLVM5 did solve my initial issue, but that was different issue than the one you are seeing. After I installed both pocl and beignet packages, I can see the same error as you: "Option 'enable-value-profiling' registered more than once!" The 'enable-value-profiling' is option from clang, specifically from the CodeGen component. It is registered in a constructor of a static variable in clang. It registers itself by calling a llvm function which stores it into a map that is also stored in a static variable in the llvm library. The issue here is that both pocl and beignet are linked with static libclangCodeGen.a and both link dynamically with libLLVM.so. So each of them has their own copy of clang's CodeGen. They both try to register the same option during initialization and end up saving it into the same map in the shared llvm library. The second attempt to register it fails. Compiling all clang and llvm libraries as static would be a solution, true. Similarly compiling them all as dynamic libraries would work too, but originally I thought we can not do that because using BUILD_SHARED_LIBS=ON is not supported and buggy and LLVM_LINK_LLVM_DYLIB=ON does not work with libclang. I've checked what other distributions do and I would like to try Fedora's approach: Combine it and build clang with BUILD_SHARED_LIBS=ON and everything else LLVM_LINK_LLVM_DYLIB=ON. That way we hopefully 1) solve this bug 2) avoid the bugs that BUILD_SHARED_LIBS=ON was causing 3) avoid building huge amount of static libraries. Martin, since pocl seems to be fine and this is a llvm issue, I am reassigning it to myself.
This is an autogenerated message for OBS integration: This bug (1065464) was mentioned in https://build.opensuse.org/request/show/572112 Factory / Mesa
The above mentioned changes finally bubbled to Tumbleweed and I was able to test them. The issue is fixed on my machine. Pocl, Beignet and other clang users now link to libclang and libLLVM libraries dynamically, so only one instance is loaded and the options are registered only once. Closing the bug as fixed.
Confirming that the issue was solved: I unblocked pocl and ran 'zypper dup' which reinstalled it. I can however now open Blender without it crashing or printing out any errors. Thank you!
Thanks for such clean solution. Strange that it is not recommended by upstream yet.
SUSE-SU-2018:3591-1: An update that solves 10 vulnerabilities and has 17 fixes is now available. Category: security (important) Bug References: 1012260,1021577,1026191,1041469,1041894,1049703,1061204,1064786,1065464,1066489,1073210,1078436,1091551,1092697,1094767,1096515,1107343,1108771,1108986,1109363,1109465,1110506,1110507,703591,839074,857131,893359 CVE References: CVE-2017-16541,CVE-2018-12376,CVE-2018-12377,CVE-2018-12378,CVE-2018-12379,CVE-2018-12381,CVE-2018-12383,CVE-2018-12385,CVE-2018-12386,CVE-2018-12387 Sources used: SUSE OpenStack Cloud 7 (src): MozillaFirefox-60.2.2esr-109.46.1, MozillaFirefox-branding-SLE-60-32.3.1, apache2-mod_nss-1.0.14-19.6.3, mozilla-nspr-4.19-19.3.1, mozilla-nss-3.36.4-58.15.3 SUSE Linux Enterprise Software Development Kit 12-SP3 (src): MozillaFirefox-60.2.2esr-109.46.1, mozilla-nspr-4.19-19.3.1, mozilla-nss-3.36.4-58.15.3 SUSE Linux Enterprise Server for SAP 12-SP2 (src): MozillaFirefox-60.2.2esr-109.46.1, MozillaFirefox-branding-SLE-60-32.3.1, apache2-mod_nss-1.0.14-19.6.3, mozilla-nspr-4.19-19.3.1, mozilla-nss-3.36.4-58.15.3 SUSE Linux Enterprise Server for SAP 12-SP1 (src): MozillaFirefox-60.2.2esr-109.46.1, MozillaFirefox-branding-SLE-60-32.3.1, apache2-mod_nss-1.0.14-19.6.3, mozilla-nspr-4.19-19.3.1, mozilla-nss-3.36.4-58.15.3 SUSE Linux Enterprise Server 12-SP3 (src): MozillaFirefox-60.2.2esr-109.46.1, MozillaFirefox-branding-SLE-60-32.3.1, apache2-mod_nss-1.0.14-19.6.3, mozilla-nspr-4.19-19.3.1, mozilla-nss-3.36.4-58.15.3 SUSE Linux Enterprise Server 12-SP2-LTSS (src): MozillaFirefox-60.2.2esr-109.46.1, MozillaFirefox-branding-SLE-60-32.3.1, apache2-mod_nss-1.0.14-19.6.3, mozilla-nspr-4.19-19.3.1, mozilla-nss-3.36.4-58.15.3 SUSE Linux Enterprise Server 12-SP1-LTSS (src): MozillaFirefox-60.2.2esr-109.46.1, MozillaFirefox-branding-SLE-60-32.3.1, apache2-mod_nss-1.0.14-19.6.3, mozilla-nspr-4.19-19.3.1, mozilla-nss-3.36.4-58.15.3 SUSE Linux Enterprise Server 12-LTSS (src): MozillaFirefox-60.2.2esr-109.46.1, MozillaFirefox-branding-SLE-60-32.3.1, mozilla-nspr-4.19-19.3.1, mozilla-nss-3.36.4-58.15.3 SUSE Linux Enterprise Desktop 12-SP3 (src): MozillaFirefox-60.2.2esr-109.46.1, MozillaFirefox-branding-SLE-60-32.3.1, mozilla-nspr-4.19-19.3.1, mozilla-nss-3.36.4-58.15.3 SUSE Enterprise Storage 4 (src): MozillaFirefox-60.2.2esr-109.46.1, MozillaFirefox-branding-SLE-60-32.3.1, apache2-mod_nss-1.0.14-19.6.3, mozilla-nspr-4.19-19.3.1, mozilla-nss-3.36.4-58.15.3 SUSE CaaS Platform ALL (src): mozilla-nspr-4.19-19.3.1, mozilla-nss-3.36.4-58.15.3 SUSE CaaS Platform 3.0 (src): mozilla-nspr-4.19-19.3.1, mozilla-nss-3.36.4-58.15.3
SUSE-SU-2018:3591-2: An update that solves 10 vulnerabilities and has 17 fixes is now available. Category: security (important) Bug References: 1012260,1021577,1026191,1041469,1041894,1049703,1061204,1064786,1065464,1066489,1073210,1078436,1091551,1092697,1094767,1096515,1107343,1108771,1108986,1109363,1109465,1110506,1110507,703591,839074,857131,893359 CVE References: CVE-2017-16541,CVE-2018-12376,CVE-2018-12377,CVE-2018-12378,CVE-2018-12379,CVE-2018-12381,CVE-2018-12383,CVE-2018-12385,CVE-2018-12386,CVE-2018-12387 Sources used: SUSE Linux Enterprise Software Development Kit 12-SP4 (src): MozillaFirefox-60.2.2esr-109.46.1, mozilla-nspr-4.19-19.3.1, mozilla-nss-3.36.4-58.15.3 SUSE Linux Enterprise Server 12-SP4 (src): MozillaFirefox-60.2.2esr-109.46.1, MozillaFirefox-branding-SLE-60-32.3.1, apache2-mod_nss-1.0.14-19.6.3, mozilla-nspr-4.19-19.3.1, mozilla-nss-3.36.4-58.15.3 SUSE Linux Enterprise Desktop 12-SP4 (src): MozillaFirefox-60.2.2esr-109.46.1, MozillaFirefox-branding-SLE-60-32.3.1, mozilla-nspr-4.19-19.3.1, mozilla-nss-3.36.4-58.15.3
This is an autogenerated message for OBS integration: This bug (1065464) was mentioned in https://build.opensuse.org/request/show/822551 Factory / llvm10
This is an autogenerated message for OBS integration: This bug (1065464) was mentioned in https://build.opensuse.org/request/show/932377 Backports:SLE-15-SP3 / llvm12
This is an autogenerated message for OBS integration: This bug (1065464) was mentioned in https://build.opensuse.org/request/show/1088949 Backports:SLE-15-SP4 / llvm15
This is an autogenerated message for OBS integration: This bug (1065464) was mentioned in https://build.opensuse.org/request/show/1157115 Backports:SLE-15-SP5 / llvm17