Bugzilla – Bug 1040589
bash/gcc/gzip/python differ between builds because of profiling
Last modified: 2023-04-14 12:11:44 UTC
In https://build.opensuse.org/project/prjconf/openSUSE:Factory we have %do_profiling 1 and because of that in our bash.spec we enable gcc's 'profile feedback directed optimizations' but that causes the jobs.o and resulting bash binary to differ between builds, even when running on the same build host. And because of that, build-compare always thinks there is a change and triggers a re-publish and rebuild of depending packages We also have such binary diffs in gcc6 and gcc6.spec calls a make profiledbootstrap The do_profiling macro is used in bash gzip hello python3-base python-base sed xz and in http://rb.zq1.de/compare.factory-20170523/bash-compare.out http://rb.zq1.de/compare.factory-20170523/gcc6-compare.out http://rb.zq1.de/compare.factory-20170523/gzip-compare.out http://rb.zq1.de/compare.factory-20170523/python-base-compare.out http://rb.zq1.de/compare.factory-20170523/python3-base-compare.out we have strange diffs in assembler that I could not trace down to other sources of non-determinism until today. Diffs did go away when building without profiling (it was harder to disable for gcc6 and bash though) Do the profiles just count invocations of functions or do they depend on the type and speed of the system? In the first case, it should be possible to fix the profiling runs to be deterministic, but for that it would be useful to be able to see the differences between runs. How could I diff gcc's .gcda files?
(In reply to Bernhard Wiedemann from comment #0) > In https://build.opensuse.org/project/prjconf/openSUSE:Factory > we have > %do_profiling 1 > > and because of that > in our bash.spec we enable gcc's > 'profile feedback directed optimizations' > > but that causes the jobs.o and resulting bash binary > to differ between builds, even when running on the same build host. > > And because of that, build-compare always thinks there is a change > and triggers a re-publish > and rebuild of depending packages > > We also have such binary diffs in gcc6 > and gcc6.spec calls a > make profiledbootstrap > > > The do_profiling macro is used in > bash > gzip > hello > python3-base > python-base > sed > xz > > and in > http://rb.zq1.de/compare.factory-20170523/bash-compare.out > http://rb.zq1.de/compare.factory-20170523/gcc6-compare.out > http://rb.zq1.de/compare.factory-20170523/gzip-compare.out > http://rb.zq1.de/compare.factory-20170523/python-base-compare.out > http://rb.zq1.de/compare.factory-20170523/python3-base-compare.out > > we have strange diffs in assembler that I could not trace down > to other sources of non-determinism until today. > Diffs did go away when building without profiling > (it was harder to disable for gcc6 and bash though) > > > Do the profiles just count invocations of functions > or do they depend on the type and speed of the system? > > In the first case, it should be possible to fix the profiling runs > to be deterministic, but for that it would be useful > to be able to see the differences between runs. > How could I diff gcc's .gcda files? That is you problem, no mine ... do not touch bash test suite!
(In reply to Bernhard Wiedemann from comment #0) > In https://build.opensuse.org/project/prjconf/openSUSE:Factory > we have > %do_profiling 1 > > and because of that > in our bash.spec we enable gcc's > 'profile feedback directed optimizations' > > but that causes the jobs.o and resulting bash binary > to differ between builds, even when running on the same build host. > > And because of that, build-compare always thinks there is a change > and triggers a re-publish > and rebuild of depending packages > > We also have such binary diffs in gcc6 > and gcc6.spec calls a > make profiledbootstrap > > > The do_profiling macro is used in > bash > gzip > hello > python3-base > python-base > sed > xz > > and in > http://rb.zq1.de/compare.factory-20170523/bash-compare.out > http://rb.zq1.de/compare.factory-20170523/gcc6-compare.out > http://rb.zq1.de/compare.factory-20170523/gzip-compare.out > http://rb.zq1.de/compare.factory-20170523/python-base-compare.out > http://rb.zq1.de/compare.factory-20170523/python3-base-compare.out > > we have strange diffs in assembler that I could not trace down > to other sources of non-determinism until today. Is the buildroot 1:1 the same and then you get different code? > Diffs did go away when building without profiling > (it was harder to disable for gcc6 and bash though) > > > Do the profiles just count invocations of functions > or do they depend on the type and speed of the system? Just counting. Inconsistencies can pop up when you run the instrumented binaries in parallel (gcov uses file locking but still we see that happen). > In the first case, it should be possible to fix the profiling runs > to be deterministic, but for that it would be useful > to be able to see the differences between runs. > How could I diff gcc's .gcda files? There is gcov-dump (might need to pick it up from devel:gcc gcc packages).
Thanks for the hints Richard. I experimented some more with gzip (because it is the fastest to build) and found that it differed even when running 'osc build' twice without cleanup on the same rootfs and that I could make builds fully reproducible with https://build.opensuse.org/request/show/497997 because before, tar would include random timestamps, ordering, paxheaders (PID) and even variations in the filename of $tmpfile (for gzip without -n) caused slight differences in the built binary.
This is an autogenerated message for OBS integration: This bug (1040589) was mentioned in https://build.opensuse.org/request/show/498391 Factory / bash
This is an autogenerated message for OBS integration: This bug (1040589) was mentioned in https://build.opensuse.org/request/show/499887 Factory / gzip
Apparently, .gcda file creation in gcc7 is racy otherwise, why would those SRs help make it build reproducible? https://build.opensuse.org/request/show/596431 sed https://build.opensuse.org/request/show/596586 pcre gcc-4.8 from 42.3 seems to behave the same, so not a regression. (unfortunately no older gccs are available anymore for Factory)
Honza, are you aware of any issue regarding reproducability of the final .gcda files when merging results of multiple invocations? IIRC we use file-locking to serialize the update but of course different parallel program invocations might result in random order of merging. How can that ever be an issue? Maybe for counter overflows? or some ordering of items and then rippling down effects when the profile is read by -fprofile-use?
(In reply to Richard Biener from comment #7) > Honza, are you aware of any issue regarding reproducability of the final > .gcda files when merging results of multiple invocations? IIRC we use > file-locking to serialize the update but of course different parallel > program invocations might result in random order of merging. How can that > ever be an issue? Maybe for counter overflows? or some ordering of items > and then rippling down effects when the profile is read by -fprofile-use? Looking at libgcov-merge.c there are several types of counters (most value-profiling ones) that have merging routines that are sensitive to the order of merging. Some even look wrong to me (__gcov_merge_single - though maybe counters[1] is just too little specified). So it's not really "races" but for example when merging single-value counters in the written order { 1, 2, 10 }, { 2, 4, 10 }, { 1, 4, 10 } then we get { 2, 4, 30 } and in any other order we'd get { 1, 6, 30 } Ok, not really, somehow we massage counter[1] to avoid this? Here we'd go merging the 2nd into the first { 2, 2, 20 } and then merging in the third { 1, 2, 30 }. When starting with the 2nd we'd go { 2, 2, 20 }, { 1, 2, 30 } so here it seems to "work" but I'd have a hard time believing the simple "trick" makes the merging independent of order. Using _add for GCOV_COUNTER_AVERAGE also looks odd. Combining 200 1 and 20000 1 yields 20200 2 ...
So I did analysis of 2 packages before the changes in Factory happened: 1) gzip - as mentioned a temp file was used and zipped. Temp file name probably causes the divergence as it's always different. 2) sed - it uses pthreads (but not passed as CFLAGS), thus I added -fprofile-update=atomic to cflags. See documentation here: https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html and I've got it, diff in between 2 runs: ./gnulib-tests/nanosleep.gcda: a3000000: 37:PROGRAM_SUMMARY checksum=0x99d0fa93 [snip] ./gnulib-tests/nanosleep.gcda: 01a10000: 14:COUNTERS arcs 7 counts ./gnulib-tests/nanosleep.gcda: 0 2 1 1 0 1 29 30 ./gnulib-tests/nanosleep.gcda: a3000000: 37:PROGRAM_SUMMARY checksum=0xc3046d04 [snip] ./gnulib-tests/nanosleep.gcda: 0 2 1 1 0 1 30 31 ./lib/stat-time.h: stat_time_normalize (int result, struct stat *st _GL_UNUSED) { #if defined __sun && defined STAT_TIMESPEC if (result == 0) { long int timespec_resolution = 1000000000; short int const ts_off[] = { offsetof (struct stat, st_atim), offsetof (struct stat, st_mtim), offsetof (struct stat, st_ctim) }; int i; for (i = 0; i < sizeof ts_off / sizeof *ts_off; i++) { struct timespec *ts = (struct timespec *) ((char *) st + ts_off[i]); long int q = ts->tv_nsec / timespec_resolution; long int r = ts->tv_nsec % timespec_resolution; if (r < 0) { r += timespec_resolution; q--; } ts->tv_nsec = r; /* Overflow is possible, as Solaris 11 stat can yield tv_sec == TYPE_MINIMUM (time_t) && tv_nsec == -1000000000. INT_ADD_WRAPV is OK, since time_t is signed on Solaris. */ if (INT_ADD_WRAPV (q, ts->tv_sec, &ts->tv_sec)) { errno = EOVERFLOW; return -1; } } } #endif return result; } It loops based on how fast time flies. This can't be stable.
Can't reproduce a binary diff for a 10 runs of 'xz' package. But I guess it's multi-threaded, thus -fprofile-update=atomic should be used.
I've done https://build.opensuse.org/request/show/597754 Then we can check the packages and eventually revert the changes that were done as a workaround.
This is an autogenerated message for OBS integration: This bug (1040589) was mentioned in https://build.opensuse.org/request/show/597793 Factory / rpm
Now I built a reproducible gcc8 in https://build.opensuse.org/project/show/home:bmwiedemann:reproducible:test:gccnoprof and ran some benchmarks to see how much is the difference in performance on a 6-core Intel Haswell system # prep: osc co openSUSE:Factory/pcre && cd $_ mnt=/mnt/3/ mkdir -p $mnt mount -t tmpfs tmpfs $mnt OSC_BUILD_ROOT=$mnt/build-prof/ osc build --alternative-project=devel:gcc OSC_BUILD_ROOT=$mnt/build-noprof/ osc build \ --alternative-project=home:bmwiedemann:reproducible:test:gccnoprof # bench: for i in $(seq 4) ; do time taskset 1 chroot $mnt/build-noprof bash -c "cd /home/abuild/rpmbuild/BUILD/*/ ; make clean all >/dev/null 2>&1" ; done real 0m48.686s real 0m48.720s real 0m48.778s real 0m48.715s real 0m48.758s user 0m47.079s sys 0m1.667s for i in $(seq 4) ; do time taskset 1 chroot $mnt/build-prof bash -c "cd /home/abuild/rpmbuild/BUILD/*/ ; make clean all >/dev/null 2>&1" ; done real 0m55.178s real 0m55.138s real 0m55.127s real 0m55.164s real 0m55.207s user 0m53.376s sys 0m1.819s so my gcc8 version was faster by 12% - that was a big surprise. Variance between measurements is low, because no HDD, no network and no parallelism is involved # same prep but now with apache2 # noprof: real 1m19.673s real 1m19.539s real 1m19.673s real 1m19.577s user 1m14.167s sys 0m5.287s # prof: real 1m24.323s real 1m24.277s real 1m24.289s real 1m24.274s user 1m18.195s sys 0m5.955s also 5% faster with apache2 # same with pcre on AMD A10-5800K # noprof: real 1m12,261s real 1m12,644s real 1m12,162s real 1m12,280s user 1m9,574s sys 0m2,612s # prof: real 1m20,481s real 1m20,475s real 1m20,500s real 1m20,191s user 1m16,891s sys 0m3,132s also 11% faster Did I benchmark this wrong? Small things can make a difference, e.g. >/dev/null makes it 1% faster so do you guys want to debug why profiledbootstrap makes things slower or do we just disable it and everyone is happy? One thing that confuses me is that the 'sys' value also goes up by some % Maybe the kernel cannot properly do accounting there. Or is the profiled compiler really doing more/slower syscalls? Also I re-tested profiling in sed and pcre and found that even with -fprofile-update=atomic from the rpm update, parallel builds are unreproducible but non-parallel builds are reproducible across machines so I guess it is caused by the non-commutative merging of .gcda files
argh. found it. I had debuginfo disabled in OBS meta only for the noprof side # pcre/Haswell noprof real 0m59.597s real 0m59.442s real 0m59.597s real 0m59.467s user 0m57.562s sys 0m1.894s so profiling does make things 7-8% faster.
(In reply to Bernhard Wiedemann from comment #15) > argh. found it. > I had debuginfo disabled in OBS meta only for the noprof side > # pcre/Haswell noprof > real 0m59.597s > real 0m59.442s > real 0m59.597s > real 0m59.467s > user 0m57.562s > sys 0m1.894s > > so profiling does make things 7-8% faster. That's more expected ;) It's said that building with LTO yields another 2-3% (but only when combined with profile feedback). !SLES also has plugins enabled (exporting all dynamic symbols) that could limit the usefulness of LTO a bit (no dead function removal). I wonder if you can do timings (and verify whether binaries are reproducible) when changing the build to use -fprofile-generate -fno-profile-values. It looks like you need to patch sources to achieve that conveniently, the toplevel Makefile.in contains STAGEprofile_CFLAGS = $(STAGE2_CFLAGS) -fprofile-generate where you'd need to add -fno-profile-values
I built a gcc8 with --fprofile-generate fno-profile-values and it generated slightly smaller .gcda files but they still differed between builds (even in size). All 2136 of them. I also built a whole gcc8 using make profiledbootstrap without -j to avoid parallelism but that did not help either, so maybe we should start from the other end and get a reduced profiling run (e.g. compiling just one .c file) to deliver reproducible .gcda files and grow it from there. looking at gcov-dump -l for libcpp/gch.gcda I get a lengthy diff that starts with data:magic `gcda':version `A81*' -stamp 1012471907 - a3000000: 292:PROGRAM_SUMMARY checksum=0xdbd9b6da - counts=863745, runs=1767, sum_all=46197241408, run_max=8930117, sum_max=642108707 +stamp 469824253 + a3000000: 297:PROGRAM_SUMMARY checksum=0xb69e9fcb + counts=863745, runs=1767, sum_all=46197120574, run_max=8930117, sum_max=642108707 counter histogram: - 0: num counts=748885, min counter=0, cum_counter=0 - 1: num counts=110, min counter=1, cum_counter=110 - 2: num counts=924, min counter=2, cum_counter=1848 - 3: num counts=120, min counter=3, cum_counter=360 - 4: num counts=1246, min counter=4, cum_counter=4984 - 5: num counts=7, min counter=5, cum_counter=35 - 6: num counts=403, min counter=6, cum_counter=2418 + 0: num counts=748891, min counter=0, cum_counter=0 + 1: num counts=105, min counter=1, cum_counter=105 + 2: num counts=922, min counter=2, cum_counter=1844 + 3: num counts=124, min counter=3, cum_counter=372 + 4: num counts=1248, min counter=4, cum_counter=4992 + 5: num counts=11, min counter=5, cum_counter=55 + 6: num counts=392, min counter=6, cum_counter=2352 7: num counts=977, min counter=7, cum_counter=6839
was debugging this further in gcc8 I built it with a patch to use -fprofile-update=atomic -fprofile-generate -fno-profile-values I found xgcc would call cc1 introducing some unwanted parallelism but even when calling cc1 -E directly, it still generated .gcda file variations 5-10% of the time Setup is thusly: osc build --debuginfo --noservice osc chroot cd /home/abuild/rpmbuild/BUILD/gcc-8.1.1+r261583 PATH=/home/abuild/rpmbuild/BUILD/gcc-8.1.1+r261583/obj-x86_64-suse-linux/stageprofile-gcc/:$PATH echo "int main() {return 0;}" > test.c find -name \*.gcda|xargs rm ; ltrace -s 4100 -o /tmp/l2 cc1 -E -quiet -I ./obj-x86_64-suse-linux/stageprofile-gcc/include/ -iprefix /home/abuild/rpmbuild/BUILD/gcc-8.1.1+r261583/obj-x86_64-suse-linux/stageprofile-gcc/../lib64/gcc/x86_64-suse-linux/8/ test.c -mtune=generic -march=x86-64 -O0 ; find -name \*.gcda|sort|xargs cat|md5sum Diffing the ltrace outputs I found that the .gcda variations correlated with random extra calloc calls in at least 2 places. @@ -15334,6 +15334,7 @@ realloc() = <void> free() = <void> calloc(1, 120) = 0x4xxxxxx +calloc(4096, 8) = 0x4xxxxxx realloc() = <void> free() = <void> calloc(1, 56) = xxxxy memset() = <void> bindtextdomain("cpplib-8", "/usr/share/locale") = "/usr/share/locale" @@ -20008,7 +20009,6 @@ memcpy() = <void> free() = <void> calloc(1, 72) = xxxxy -calloc(4096, 8) = xxxxy memcpy() = <void> memcpy() = <void> free() = <void> and that 'bindtextdomain' allows to find that this is in or shortly before libcpp/init.c init_library disabling ALSR by prefixing the cc1 call with setarch x86_64 -R seems to avoid this indeterminism it even helps to make an xgcc -c hello.c create reproducible .gcda files so I'm trying that combination now in home:bmwiedemann:reproducible:test:gccnoprofilevalues/gcc8
Created attachment 775386 [details] 3 ltraces from runs with different .gcda results
(In reply to Bernhard Wiedemann from comment #18) > > disabling ALSR by prefixing the cc1 call with > setarch x86_64 -R > seems to avoid this indeterminism > > it even helps to make an xgcc -c hello.c > create reproducible .gcda files Interesting, so we have some code in the profiling area that depends on addresses. We should change this somewhen. > so I'm trying that combination now in > home:bmwiedemann:reproducible:test:gccnoprofilevalues/gcc8
(In reply to Michael Matz from comment #20) > Interesting, so we have some code in the profiling area that depends on > addresses. We should change this somewhen. Actually, not in the profiling area, but it can be anywhere in cc1. Maybe some hash table that uses addrs as keys or a memmove that behaves differently depending on which pointer is larger and if any of that results in a different if/else branch taken, the .gcda files might differ. https://github.com/bmwiedemann/theunreproduciblepackage/tree/master/aslr has some examples of packages that were unreproducible with ASLR and their patches The good news is that disabling ASLR during profiledbootstrap, gave me a reproducible gcc8 build. Running the quick benchmark again showed it to be as fast as the original gcc8 (within +-0.1%)
just noticed that testing with pcre is a lot faster. There I found that -fno-profile-values and disabling ASLR together allow for a parallelized profile run - make %{?_smp_mflags} CFLAGS="%{optflags} %{cflags_profile_generate}" V=1 - make CFLAGS="%{optflags} %{cflags_profile_generate}" check + make %{?_smp_mflags} CFLAGS="%{optflags} %{cflags_profile_generate} -fno-profile-values" V=1 + setarch `arch` -R make %{?_smp_mflags} ... check so this still leaves pcre build reproducible. Doesnt make a difference there though, since the make check just takes a few seconds
parallelized gcc profiledbootstrap still varies. build-compare assembler diffs show optimization variations in _Z13c_finish_loopjP9tree_nodeS0_S0_S0_S0_b.cold.141 _Z22build_conditional_exprjP9tree_nodebS0_S0_jS0_S0_j.cold.158 _Z15build_binary_opj9tree_codeP9tree_nodeS1_b.cold.162 _ZL13string_lengthPKvjj.cold.107 _ZL13store_one_argP8arg_dataP7rtx_defiii.cold.146 _ZL22expand_one_stack_var_1P9tree_node _Z20duplicate_insn_chainP8rtx_insnS0_.cold.130 _Z19cfg_layout_finalizev.cold.133 _ZN11symtab_node10unregisterEv.cold.121 _ZL18handle_alias_pairsv.cold.92 _ZL24maybe_record_trace_startP8rtx_insnS0_.cold.130 _Z29dwarf2out_switch_text_sectionv.cold.584 _ZL11compare_icsP10conversionS0_.cold.83 and that is just from 5% of the full cc1 asm diff, but maybe it gives a hint where variations come from.
Thanks Bernhard for working on that. Note that GCC uses garbage collector, thus I guess edge counts depend on memory layout (heap). For instance GGC page release code depends on which pages are free. Can you please paste diff of gcda files of parallel GCC build?
Uploaded all 2136 .gcda files from 2 differing runs to /suse/bwiedemann/Export/temp/no-aslr-fno-profile-values.tar.xz e.g. filterdiff gcovdumpfilter ?/gcc-8.1.1+r261583/*/libcpp/pch.gcda starts with data:magic `gcda':version `A81*' -stamp 1118532836 - a3000000: 282:PROGRAM_SUMMARY checksum=0x7c4f9ce1 +stamp 575964339 + a3000000: 287:PROGRAM_SUMMARY checksum=0xda08eb83 counts=863745, runs=1767, sum_all=46197068155, run_max=8930117, sum_max=642108707 using https://github.com/bmwiedemann/reproducibleopensuse/blob/master/filterdiff https://github.com/bmwiedemann/reproducibleopensuse/blob/master/gcovdumpfilter
(In reply to Bernhard Wiedemann from comment #25) > Uploaded all 2136 .gcda files from 2 differing runs to > /suse/bwiedemann/Export/temp/no-aslr-fno-profile-values.tar.xz > > e.g. > > filterdiff gcovdumpfilter ?/gcc-8.1.1+r261583/*/libcpp/pch.gcda > > starts with > > data:magic `gcda':version `A81*' > -stamp 1118532836 > - a3000000: 282:PROGRAM_SUMMARY checksum=0x7c4f9ce1 > +stamp 575964339 > + a3000000: 287:PROGRAM_SUMMARY checksum=0xda08eb83 > counts=863745, runs=1767, sum_all=46197068155, > run_max=8930117, sum_max=642108707 Thanks for it! The checksum change is strange, let me investigate that. > > > using > https://github.com/bmwiedemann/reproducibleopensuse/blob/master/filterdiff > https://github.com/bmwiedemann/reproducibleopensuse/blob/master/ > gcovdumpfilter
This is an autogenerated message for OBS integration: This bug (1040589) was mentioned in https://build.opensuse.org/request/show/648389 Factory / grep
We've got a team meeting and we discuss the topic quite thoroughly. There's an analysis why the mentioned packages have not reproducible builds: 1) GCC: it's using garbage collected memory and similar mechanisms that are dependent on memory layout (ASLR, heap allocation). That makes it complicated to achieve a reproducible builds. 2) Python: I haven't analyzed the package, but I would expect it's affected by similar problem. 3) gzip - looks stable to me right now 4) xz - likewise 5) sed - analyzed here: https://bugzilla.suse.com/show_bug.cgi?id=1040589#c9 6) hello - the package looks stable 7) bash: I did concat of gcov-dump of all gcda and gcno files (by adding --coverage): $ find /var/tmp/ -name '*.gcda' 2>/dev/null | sort | xargs -L1 gcov-dump -l > .. and using a simple script: https://github.com/marxin/script-misc/blob/master/gcov-diff.py $ ./gcov-diff.py /tmp/gcda1 /tmp/gcda2 /tmp/gcno1 | colordiff Difference for fn 1039863413 (fmtumax fmtulong.c:84) --- +++ @@ -1,6 +1,6 @@ 01a10000: 60:COUNTERS arcs 30 counts 0: 0 0 275415 91 0 0 0 0 - 8: 198528 76887 12 184735 0 0 0 0 + 8: 198528 76887 12 184812 0 0 0 0 16: 0 0 0 0 0 0 0 0 24: 0 0 0 0 0 91 01a30000: 8:COUNTERS interval 4 counts Difference for fn 1133735831 (mbschr mbschr.c:43) --- +++ @@ -1,5 +1,5 @@ 01a10000: 24:COUNTERS arcs 12 counts - 0: 1720761 1105721 1105721 669139 1665477 667538 667538 667533 - 8: 1035 6682747 668104 1987469 + 0: 1720761 1105721 1105721 669139 1665499 667538 667538 667533 + 8: 1035 6682747 668104 1987491 01af0000: 2:COUNTERS time_profiler 1 counts 0: 38 Difference for fn 1191969440 (timeval_to_cpu timeval.c:73) --- +++ @@ -1,6 +1,6 @@ 01a10000: 16:COUNTERS arcs 8 counts 0: 6 0 0 0 36 24 0 6 01a70000: 6:COUNTERS single 3 counts - 0: 9 4 6 + 0: 11 2 6 01af0000: 2:COUNTERS time_profiler 1 counts 0: 265 Difference for fn 58810953 (make_word_flags make_cmd.c:102) --- +++ @@ -1,5 +1,5 @@ 01a10000: 22:COUNTERS arcs 11 counts - %2
[the post was somehow cut] Difference for fn 58810953 (make_word_flags make_cmd.c:102) --- +++ @@ -1,5 +1,5 @@ 01a10000: 22:COUNTERS arcs 11 counts - 0: 36690 178 651 488230 0 10634 112 0 + 0: 36690 178 651 488252 0 10634 112 0 8: 0 32671 36690 01a70000: 0:COUNTERS single 0 counts 01ab0000: 0:COUNTERS average 0 counts Difference for fn 2120680264 (unquoted_glob_pattern_p pathexp.c:63) --- +++ @@ -1,6 +1,6 @@ 01a10000: 36:COUNTERS arcs 18 counts - 0: 591965 5839 106 6638 47 6627 6504 39 - 8: 0 1067486 7 7694 6173 6180 814 700698 + 0: 591966 5839 106 6638 47 6627 6504 39 + 8: 0 1067509 7 7694 6173 6180 814 700698 16: 700698 251221 01a90000: 0:COUNTERS indirect_call 0 counts 01af0000: 2:COUNTERS time_profiler 1 counts Difference for fn 95503897 (dequote_string subst.c:4233) --- +++ @@ -1,6 +1,6 @@ 01a10000: 44:COUNTERS arcs 22 counts 0: 470437 0 468193 388300 347974 4556 4556 178067 - 8: 0 287077 178804 21545467 0 21120019 16 8842 + 8: 0 287077 178804 21545468 0 21120020 16 8842 16: 19233 2534 659422 0 680265 178804 01a70000: 0:COUNTERS single 0 counts 01a90000: 0:COUNTERS indirect_call 0 counts Difference for fn 48042273 (quote_string subst.c:4199) --- +++ @@ -1,5 +1,5 @@ 01a10000: 26:COUNTERS arcs 13 counts - 0: 383 217052 383 217052 20812837 16 8836 19230 + 0: 383 217052 383 217052 20812838 16 8836 19230 8: 2530 625854 0 646694 217052 01a70000: 0:COUNTERS single 0 counts 01a90000: 0:COUNTERS indirect_call 0 counts Difference for fn 151936190 (quote_escapes_internal subst.c:4037) --- +++ @@ -1,7 +1,7 @@ 01a10000: 50:COUNTERS arcs 25 counts 0: 102577 0 102206 138372 36 165889 138372 7 - 8: 102577 102577 757685 473 757360 155 1251 239 - 16: 701084 3 0 142 6 56779 0 56960 +
Function timeval_to_cpu should clearly be marked with no_profile_instrument_function attribute as number of loop iterations depends on time. The other differences are in string functions, very hard to tell what's the difference. But as we know any of these can potentially cause a code generation difference. Last topic we discussed are -fprofile-values counters. There can really diverge when running a parallel test-suite. It would be possible to resolve in GCC 10.
This is an autogenerated message for OBS integration: This bug (1040589) was mentioned in https://build.opensuse.org/request/show/683607 Factory / pcp
This is an autogenerated message for OBS integration: This bug (1040589) was mentioned in https://build.opensuse.org/request/show/683832 Factory / pcp
Just for the record: gcc9 package has been updated to provide more stable builds that are caused by value profiling differences: https://build.opensuse.org/request/show/709739
tested devel:gcc/gcc9 and even with minimized non-determinism, it produced variations /usr/lib64/gcc/x86_64-suse-linux/9/gnat1 differs in assembler output /usr/lib64/gcc/x86_64-suse-linux/9/cc1 differs in assembler output _ZN7ipa_icf18sem_item_optimizer28do_congruence_step_for_indexEPNS_16congruence_classEj.cold A lot of register assignments changed, but also some code moved.
This is an autogenerated message for OBS integration: This bug (1040589) was mentioned in https://build.opensuse.org/request/show/735118 Factory / MozillaFirefox
This is an autogenerated message for OBS integration: This bug (1040589) was mentioned in https://build.opensuse.org/request/show/766786 15.2 / python-rjsmin
This is an autogenerated message for OBS integration: This bug (1040589) was mentioned in https://build.opensuse.org/request/show/818103 Factory / grep
bison-3.7.3 started to vary from PGO. For profiling it uses a pretty large 'make check' run with 650 substeps.
(In reply to Bernhard Wiedemann from comment #43) > bison-3.7.3 started to vary from PGO. Can you help me understand if this is a (new) problem in 3.7.3, and give the steps to compare the binaries? Last I found was: http://rb.zq1.de/compare.factory-20200430/bison-compare.out
I found my debugging note from/before 2020-06-29: > PGO again via ASLR, filesys, date https://build.opensuse.org/request/show/676711 made it reproducible back then and looking closer at old results, 3.5.2 was the first one marked to differ https://rb.zq1.de/compare.factory-20200303/bison-compare.out Here is the general description of the problem of PGO with reproducibility: https://github.com/bmwiedemann/theunreproduciblepackage/tree/master/pgo Would disabling PGO for bison be an option?
(In reply to Bernhard Wiedemann from comment #45) > I found my debugging note from/before 2020-06-29: > > PGO again via ASLR, filesys, date > > https://build.opensuse.org/request/show/676711 made it reproducible back then > and looking closer at old results, > 3.5.2 was the first one marked to differ > https://rb.zq1.de/compare.factory-20200303/bison-compare.out > > Here is the general description of the problem of PGO with reproducibility: > https://github.com/bmwiedemann/theunreproduciblepackage/tree/master/pgo > > Would disabling PGO for bison be an option? No! I do not wish to damage speed and/or optimized size of binaries because of this.
@Bernhard Wiedemann: Can we close this issue, please?
I dont think, this has been fully solved. gcc, bison, python all still suffer from PGO. I suspect bash-4.4 in 15.3 might also be affected. Maybe also MozillaFirefox. One intermediate solution would be to use in these .spec files %if 0%{?do_profiling} && !0%{?only_reproducible_profiling}
SUSE-RU-2021:2173-1: An update that has 5 recommended fixes and contains one feature can now be installed. Category: recommended (moderate) Bug References: 1040589,1047218,1182604,1185540,1186049 CVE References: JIRA References: SLE-17848 Sources used: SUSE MicroOS 5.0 (src): pcre-8.41-6.4.2 SUSE Linux Enterprise Module for Development Tools 15-SP3 (src): brp-check-suse-84.87+git20181106.224b37d-3.18.2 SUSE Linux Enterprise Module for Development Tools 15-SP2 (src): brp-check-suse-84.87+git20181106.224b37d-3.18.2 SUSE Linux Enterprise Module for Basesystem 15-SP3 (src): automake-1.15.1-4.10.2, pcre-8.41-6.4.2 SUSE Linux Enterprise Module for Basesystem 15-SP2 (src): automake-1.15.1-4.10.2, pcre-8.41-6.4.2 NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination.
openSUSE-RU-2021:0963-1: An update that has 5 recommended fixes and contains one feature can now be installed. Category: recommended (moderate) Bug References: 1040589,1047218,1182604,1185540,1186049 CVE References: JIRA References: SLE-17848 Sources used: openSUSE Leap 15.2 (src): automake-1.15.1-lp152.5.3.1, automake-testsuite-1.15.1-lp152.5.3.1, brp-check-suse-84.87+git20181106.224b37d-lp152.2.12.1, pcre-8.41-lp152.7.3.1
openSUSE-RU-2021:2173-1: An update that has 5 recommended fixes and contains one feature can now be installed. Category: recommended (moderate) Bug References: 1040589,1047218,1182604,1185540,1186049 CVE References: JIRA References: SLE-17848 Sources used: openSUSE Leap 15.3 (src): automake-1.15.1-4.10.2, brp-check-suse-84.87+git20181106.224b37d-3.18.2, pcre-8.41-6.4.2
This is an autogenerated message for OBS integration: This bug (1040589) was mentioned in https://build.opensuse.org/request/show/962180 Factory / grep
SUSE-RU-2022:1887-1: An update that has one recommended fix and contains one feature can now be installed. Category: recommended (moderate) Bug References: 1040589 CVE References: JIRA References: SLE-24115 Sources used: openSUSE Leap 15.4 (src): grep-3.1-150000.4.6.1 openSUSE Leap 15.3 (src): grep-3.1-150000.4.6.1 SUSE Linux Enterprise Realtime Extension 15-SP2 (src): grep-3.1-150000.4.6.1 SUSE Linux Enterprise Module for Basesystem 15-SP4 (src): grep-3.1-150000.4.6.1 SUSE Linux Enterprise Module for Basesystem 15-SP3 (src): grep-3.1-150000.4.6.1 SUSE Linux Enterprise Micro 5.2 (src): grep-3.1-150000.4.6.1 SUSE Linux Enterprise Micro 5.1 (src): grep-3.1-150000.4.6.1 SUSE Linux Enterprise Micro 5.0 (src): grep-3.1-150000.4.6.1 NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination.