Bugzilla – Bug 1130438
meep fails 1 test on old CPU
Last modified: 2019-06-11 08:42:39 UTC
While working on reproducible builds for openSUSE, I found that possibly depending on build host load, the 2D_convergence test sometimes fails. diff between a good and a bad run: Frequency difference with a of 15 is 0.0153091/15/15 -frequency for a=10 is 0.180252, 0 (shifted), 0.0901258 (mean) +frequency for a=10 is 0.180252, 0.181218 (shifted), 0.180735 (mean) Unshifted freq error is 0.0307579/10/10 -Shifted freq error is -17.9944/10/10 -meep: Frequency doesn't converge properly with a. -FAIL 2D_convergence (exit status: 1) +Shifted freq error is 0.127421/10/10 please investigate, fix and/or coordinate with upstream devs
Seems to be related to the build machine's CPU type. On a newer DDR4-era machine I had to use osc build --noservice --vm-type=kvm --build-opt=--vm-custom-opt=-cpu\ qemu64 to trigger the test failure. On a DDR3-era machine from 2010 a plain osc build always failed.
Does this still apply with https://build.opensuse.org/request/show/686091 ? In my experience the 2D_convergence fails sometimes, maybe I had a machine where it always fail. I'll try to find out which one it was. For the moment I'm fine with disabling the test and file a bug upstream with all the information we gathered :)
Same test failure with your 1.8.0 package. More playing with the -cpu param showed that it needs all these CPU flags: +avx,+avx2,+fma,+xsave,+xsaveopt
(In reply to Bernhard Wiedemann from comment #3) > More playing with the -cpu param showed that it needs all these CPU flags: > +avx,+avx2,+fma,+xsave,+xsaveopt So does the test fail with this parameters or are these required to let the test pass? Then we should conditionally disable the test, if the requirements are not met for the test to pass (stable).
The CPU flags are required atm to make the test pass.
I already filed a bug in February upstream, as the 2D_convergence did failed on Tumbleweed but passed on Leap 15.0 on the same machine: https://github.com/NanoComp/meep/issues/727 So I assumed back then that the different build toolchain (gcc, glibc) results in different code for the test, which in the end results in the fail.
In the mean time they release meep 1.9.0. I packaged it here: home:jbrielmaier:branches:science/meep I did run the tests on different machines. My workstation, Intel i7-3770, avx, no avx2, no fma TW: PASS Leap 15.1: PASS My laptop, Intel i7-6600U, avx, avx2, fma TW: FAIL Leap 15.1: PASS Intel Xeon E5-2650Lv3, avx, avx2, fma TW: FAIL Leap 15.1: PASS AMD Opteron 8218, no avx, no avx2, no fma TW: PASS Leap 15.1: PASS As we didn't found the root case why this test is failing and upstream doesn't came up with a solution, I just went ahead and disabled this test on Tumbleweed: https://build.opensuse.org/request/show/705648
Test was disabled in https://build.opensuse.org/request/show/707898 I mark it as NORESPONSE as we didn't get any help from upstream and I wont invest any more time here. Feel free to reopen the bug and find a proper solution :)