Bugzilla – Bug 1101262
ceph's SPDK builds with march=native
Last modified: 2021-07-19 10:41:39 UTC
code built with gcc's -march=native might use opcodes available on new build machines that old target machines don't understand. That will cause hard-to-debug segfaults that depend on the build machine. The param can be dropped with sed -i 's|-march=native||' `find -name CMakeLists.txt -o -name \*.mk` but then the ceph/dpdk build fails in the same way as an unpatched ceph build in a VM started with qemu-kvm -cpu kvm64 build failure is: In file included from /usr/lib64/gcc/x86_64-suse-linux/8/include/x86int rin.h:39, from /home/abuild/rpmbuild/BUILD/ceph-13.2.0-210-g0e990e900d/build/src/dpdk/include/rte_vect.h:56, from /home/abuild/rpmbuild/BUILD/ceph-13.2.0-210-g0e990e900d/build/src/dpdk/include/rte_memcpy.h:46, from /home/abuild/rpmbuild/BUILD/ceph-13.2.0-210-g0e990e900d/build/src/dpdk/include/rte_mempool.h:79, from env.c:41: /usr/lib64/gcc/x86_64-suse-linux/8/include/tmmintrin.h:185:1: error: inlining failed in call to always_inline '_mm_alignr_epi8': target specific option mismatch _mm_alignr_epi8(__m128i __X, __m128i __Y, const int __N) ^~~~~~~~~~~~~~~ In file included from /home/abuild/rpmbuild/BUILD/ceph-13.2.0-210-g0e990e900d/build/src/dpdk/include/rte_mempool.h:79, from env.c:41: /home/abuild/rpmbuild/BUILD/ceph-13.2.0-210-g0e990e900d/build/src/dpdk/include/rte_memcpy.h:641:13: note: called from here _mm_storeu_si128((__m128i *)((uint8_t *)dst + 1 * 16), _mm_alignr_epi8(xmm2, xmm1, offset)); \ ^~~~~~~~~~~~~ I guess that is because the code assumes availability of some features like SSE* without specifying the appropriate gcc flags. but OTOH our dpdk Factory package does not have this problem - are these very different versions? https://build.suse.de/request/show/119320 says sse3 is required for dpdk. I'll try adding a -msse3 param and see if that fixes the build.
This is probably something we ultimately want to address upstream.
The Ceph tarball is generated anew (from a git repo) for every build. The DPDK is brought in as a git submodule of SPDK, which is itself a git submodule of Ceph. I would not be surprised if the DPDK version in Ceph is very different from the Tumbleweed DPDK package. And I agree with Tim, any fix will need to be undertaken upstream via PR to https://github.com/ceph/ceph.git - I can help with that if need be.
I looked at a recent build log from filesystems:ceph and could not find any indication that "-march=native" is being triggered. I think this is because DPDK is disabled by default: https://github.com/ceph/ceph/blob/master/CMakeLists.txt#L365 and our build does not turn it on: [ 91s] + cmake .. -DCMAKE_INSTALL_PREFIX=/usr -DCMAKE_INSTALL_LIBDIR=/usr/lib64 -DCMAKE_INSTALL_LIBEXECDIR=/usr/lib -DCMAKE_INSTALL_LOCALSTATEDIR=/var -DCMAKE_INSTALL_SYSCONFDIR=/etc -DCMAKE_INSTALL_MANDIR=/usr/share/man -DCMAKE_INSTALL_DOCDIR=/usr/share/doc/packages/ceph -DCMAKE_INSTALL_INCLUDEDIR=/usr/include -DWITH_EMBEDDED=OFF -DWITH_MANPAGE=ON -DWITH_PYTHON3=ON -DWITH_MGR_DASHBOARD_FRONTEND=OFF -DWITH_PYTHON2=OFF -DMGR_PYTHON_VERSION=3 -DWITH_TESTS=OFF -DWITH_LTTNG=ON -DWITH_BABELTRACE=ON -DWITH_OCF=ON -DWITH_BOOST_CONTEXT=ON -DBOOST_J=4 The only instance of "-march-native" I could find in the source code is in src/CMakeLists.txt inside a DPDK conditional block. Absent evidence to the contrary, I would say it does not get run in our builds. @Bernhard - are you sure this is really a bug?
Build happens non-verbose by default, so you cannot see the -march= param in build logs. also there are more places that set -march=native https://github.com/ceph/rocksdb/blob/9090ae3ecfbf9b50a398a5d8b178f14b88dc047e/CMakeLists.txt#L207 https://github.com/ceph/spdk/blob/7d45ab345d7293c6679bedd89d5dc16310026ba9/mk/spdk.common.mk#L77 and there is ceph/src/spdk/dpdk/
Created attachment 777026 [details] strace of execve calls during ceph build As you can see in strace output, even after patching CMakeLists, there were still cc calls with -march=native Also, I found this bug when working on reproducible builds for openSUSE. It showed that my local Leap-15.0 ceph build had variations in assembly. /usr/bin/ceph-kvstore-tool /usr/bin/ceph-bluestore-tool /usr/bin/ceph-objectstore-tool /usr/bin/ceph-osd had diffs like: spdk_nvme_ctrlr_get_default_ctrlr_opts@@Base: push %r12 - push %rbp mov %rsi,%rdx + push %rbp + mov %rdi,%rbp push %rbx [...plenty more...]
(In reply to Bernhard Wiedemann from comment #4) > https://github.com/ceph/rocksdb/blob/ > 9090ae3ecfbf9b50a398a5d8b178f14b88dc047e/CMakeLists.txt#L207 That one at least should be prevented by https://github.com/ceph/ceph/commit/652bc5e832
Yeah, I see now. WITH_SPDK is enabled by default. SPDK is built via cmake/modules/BuildSPDK.cmake which triggers cmake/modules/BuildDPDK.cmake. And on x86_64 the DPDK build is triggered with the "native" machine template: https://github.com/ceph/ceph/blob/master/cmake/modules/BuildDPDK.cmake#L10-L13 Regarding versions, it looks like Ceph 13.2.0 is using DPDK version 17.11, while openSUSE:Factory is currently on 18.02.2. Looking at the dpdk.spec file in Factory, at first glance it looks like it's building using the "native" machine template, too: %ifarch x86_64 %define machine native %define target x86_64-%{machine}-linuxapp-gcc %endif However, in the %build section of dpdk.spec I see it is munging the DPDK config file to change "CONFIG_RTE_MACHINE=native" to "CONFIG_RTE_MACHINE=default". This is explained by the following comment in the spec file: # DPDK defaults to using builder-specific compiler flags. However, # the config has been changed by specifying CONFIG_RTE_MACHINE=default # in order to build for a more generic host. NOTE: It is possible that # the compiler flags used still won't work for all Fedora-supported # machines, but runtime checks in DPDK will catch those situations.
I am now trying to patch Ceph in OBS so it builds its DPDK submodule with "CONFIG_RTE_MACHINE=default".
I found that SPDK builds with -march=core2
(In reply to Bernhard Wiedemann from comment #9) > I found that SPDK builds with -march=core2 It looks very probable that SPDK is building DPDK with -march=native The version of DPDK that is being built is https://github.com/spdk/dpdk/tree/6ece49ad5a26f5e2f5c4af6c06c30376c0ddc387 In that tree, it looks to me that -march=native is getting specified by this file: https://github.com/spdk/dpdk/tree/6ece49ad5a26f5e2f5c4af6c06c30376c0ddc387/mk/machine/native In that file it says we need SSE4.2 support "to get everything to compile", and that the minimum target with SSE4.2 is "-march=corei7", but presumably there's no way we can assume Ceph will be running on such a machine. So, it looks like we need to be building DPDK with "-march=x86-64" in order for our packages to run fine on all possible x86_64 machines.
(In reply to Bernhard Wiedemann from comment #9) > I found that SPDK builds with -march=core2 Where are you seeing that? In the strace output you posted in Comment 5 I could see that all the .c files in SPDK are being compiled with -march=native and indeed I think I found the smoking gun. Upstream bug report and PR open: http://tracker.ceph.com/issues/24948 https://github.com/ceph/ceph/pull/23076
Hrm, and that triggers the exact same build error(s) Bernhard quotes in the bug description.
(In reply to Bernhard Wiedemann from comment #9) > I found that SPDK builds with -march=core2 Ah, I believe you mean SPDK builds with -march=core2 _in Tumbleweed_, while in Ceph it is building with -march=native. I opened an upstream PR to address this: https://github.com/ceph/ceph/pull/23078 As I just found out, the Ceph build is patching DPDK to build with CONFIG_RTE_MACHINE=default: https://github.com/ceph/ceph/blob/master/cmake/modules/BuildDPDK.cmake#L73-L76 https://github.com/ceph/ceph/blob/master/cmake/modules/patch-dpdk-conf.sh#L21 so DPDK is not the culprit here.
(In reply to Nathan Cutler from comment #13) > (In reply to Bernhard Wiedemann from comment #9) > > I found that SPDK builds with -march=core2 > > Ah, I believe you mean SPDK builds with -march=core2 _in Tumbleweed_, while > in Ceph it is building with -march=native. I meant, it _can_ be built with -march=core2 to avoid the compile error. That still gets us decent hardware compatibility, performance and reproducible builds. I saw some ceph parts were compiled with -march=corei7 so the change is probably not even decreasing hardware support.
Update: https://github.com/ceph/ceph/pull/23078 has been approved and it looks like it will be merged. It could then be backported to mimic and reach our downstream builds. I'll come right out and say it, though - since this fix is based on a patch in upstream's fork of SPDK, there's a clear and present danger that it will disappear in a future update of the SPDK submodule. To prevent this, we would need to get a fix accepted by SPDK upstream. I have gotten patches accepted there before, so I can try. . . .
Upstream update: * master PR: https://github.com/ceph/ceph/pull/23078 (merged) * mimic PR: https://github.com/ceph/ceph/pull/23175 (pending review) Once the mimic PR is merged, I'll make a Factory SR.
Flagged for upstream integration testing. Hoping to get it into the upcoming mimic point release (13.2.2).
Upstream mimic PR mentioned in Comment 16 has been merged. The fix will be in the upcoming 13.2.2 maintenance update.
This is an autogenerated message for OBS integration: This bug (1101262) was mentioned in https://build.opensuse.org/request/show/667784 15.0 / ceph
This is an autogenerated message for OBS integration: This bug (1101262) was mentioned in https://build.opensuse.org/request/show/683881 15.0 / ceph
SUSE-SU-2019:0586-1: An update that solves 5 vulnerabilities and has two fixes is now available. Category: security (moderate) Bug References: 1084645,1086613,1096748,1099162,1101262,1111177,1114567 CVE References: CVE-2018-10861,CVE-2018-1128,CVE-2018-1129,CVE-2018-14662,CVE-2018-16846 Sources used: SUSE Linux Enterprise Module for Open Buildservice Development Tools 15 (src): ceph-13.2.4.125+gad802694f5-3.7.2 SUSE Linux Enterprise Module for Basesystem 15 (src): ceph-13.2.4.125+gad802694f5-3.7.2
openSUSE-SU-2019:1284-1: An update that solves 5 vulnerabilities and has three fixes is now available. Category: security (moderate) Bug References: 1084645,1086613,1096748,1099162,1101262,1111177,1114567,1114710 CVE References: CVE-2018-10861,CVE-2018-1128,CVE-2018-1129,CVE-2018-14662,CVE-2018-16846 Sources used: openSUSE Leap 15.0 (src): ceph-13.2.4.125+gad802694f5-lp150.2.3.1, ceph-test-13.2.4.125+gad802694f5-lp150.2.3.1