Bug 1142164 - LTO: petsc:gnu-openmpi3-hpc build hang
LTO: petsc:gnu-openmpi3-hpc build hang
Status: RESOLVED FIXED
Classification: openSUSE
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Other
Current
x86-64 Other
: P3 - Medium : Normal (vote)
: ---
Assigned To: Egbert Eich
E-mail List
:
Depends on:
Blocks: 1133084
  Show dependency treegraph
 
Reported: 2019-07-19 08:58 UTC by Michel Normand
Modified: 2019-07-19 16:39 UTC (History)
1 user (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
petsc_gnu_openmpi3_hpc_standard_x86_64_201907190841.log (88.44 KB, text/x-log)
2019-07-19 08:58 UTC, Michel Normand
Details
petsc_gnu_openmpi3_hpc_standard_x86_64_201907191102.log (710.75 KB, text/x-log)
2019-07-19 09:04 UTC, Michel Normand
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Michel Normand 2019-07-19 08:58:53 UTC
Created attachment 810999 [details]
petsc_gnu_openmpi3_hpc_standard_x86_64_201907190841.log

LTO: petsc:gnu-openmpi3-hpc build hang

as reported in 
https://build.opensuse.org/project/show/openSUSE:Factory:Staging:adi:26

the first log extract
===
[  120s] TESTING: checkMPICHorOpenMPI from config.packages.MPI(/home/abuild/rpmbuild/BUILD/petsc-3.8.3/config/BuildSystem/config/packages/MPI.py:431)
[  120s] TESTING: checkSharedLibrary from config.packages.MPI(/home/abuild/rpmbuild/BUILD/petsc-3.8.3/config/BuildSystem/config/packages/MPI.py:128)
[ 5530s] TESTING: configureMPIEXEC from config.packages.MPI(/home/abuild/rpmbuild/BUILD/petsc-3.8.3/config/BuildSystem/config/packages/MPI.py:141)qemu-system-x86_64: terminating on signal 15 from pid 12869 (<unknown process>)
===

If I am removing lto with "%define _lto_cflags %{nil}"
then no more failure as per second attached log.
Comment 1 Michel Normand 2019-07-19 09:04:28 UTC
Created attachment 811000 [details]
petsc_gnu_openmpi3_hpc_standard_x86_64_201907191102.log

this 2nd log is w/o LTO and build completed.
Comment 2 Michel Normand 2019-07-19 10:51:09 UTC
while petsc:gnu-openmpi3-hpc build passed for x86_64 i586  in my own adi:26 branch (1) it failed for i586 in my own science branch (2)
both with LTO disabled :(
So not sure this bypass is sufficient.

(1) https://build.opensuse.org/package/show/home:michel_mno:branches:openSUSE:Factory:Staging:adi:26/petsc
(2) https://build.opensuse.org/package/show/home:michel_mno:branches:science/petsc
Comment 3 Egbert Eich 2019-07-19 16:15:49 UTC
Michel, we see these transient build failures quite often with HPC packages. Usually, they are due to memory starvation: a lot of OBS machines are quite small in memory size. Of course, more CPUs will get amplified when having more cores available of course.
For trilinos we had to do some fun games with macros to limit the number of cores to 4.

The way you deal with this problem is using constraints:
https://openbuildservice.org/help/manuals/obs-reference-guide/cha.obs.build_job_constraints.html

One way to get numbers to put in there is to wait for a build to succeed and then download the binaries. The _statistics file contains numbers which should guide you.
I will up these to 6G for petsc.
Comment 4 Egbert Eich 2019-07-19 16:18:50 UTC
Downside of this approach: the higher the constraint, the fewer machines are available meeting them.
Comment 5 Egbert Eich 2019-07-19 16:39:07 UTC
ok, the openmpi3-hpc variant has built now in the devel project which hadn't before.
Closing.