Bug 1040589 - bash/gcc/gzip/python differ between builds because of profiling
Summary: bash/gcc/gzip/python differ between builds because of profiling
Status: IN_PROGRESS
Alias: None
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Basesystem (show other bugs)
Version: Current
Hardware: Other openSUSE 13.2
: P5 - None : Normal (vote)
Target Milestone: ---
Assignee: Martin Liška
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on: 1197575
Blocks: 1081754
  Show dependency treegraph
 
Reported: 2017-05-24 12:27 UTC by Bernhard Wiedemann
Modified: 2023-04-14 12:11 UTC (History)
4 users (show)

See Also:
Found By: Development
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
3 ltraces from runs with different .gcda results (25.89 KB, application/x-xz)
2018-06-27 09:21 UTC, Bernhard Wiedemann
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Bernhard Wiedemann 2017-05-24 12:27:48 UTC
In https://build.opensuse.org/project/prjconf/openSUSE:Factory
we have
%do_profiling 1

and because of that
in our bash.spec we enable gcc's
'profile feedback directed optimizations'

but that causes the jobs.o and resulting bash binary
to differ between builds, even when running on the same build host.

And because of that, build-compare always thinks there is a change
and triggers a re-publish
and rebuild of depending packages

We also have such binary diffs in gcc6
and gcc6.spec calls a
make profiledbootstrap


The do_profiling macro is used in
bash
gzip
hello
python3-base
python-base
sed
xz

and in
http://rb.zq1.de/compare.factory-20170523/bash-compare.out
http://rb.zq1.de/compare.factory-20170523/gcc6-compare.out
http://rb.zq1.de/compare.factory-20170523/gzip-compare.out
http://rb.zq1.de/compare.factory-20170523/python-base-compare.out
http://rb.zq1.de/compare.factory-20170523/python3-base-compare.out

we have strange diffs in assembler that I could not trace down
to other sources of non-determinism until today.
Diffs did go away when building without profiling
(it was harder to disable for gcc6 and bash though)


Do the profiles just count invocations of functions
or do they depend on the type and speed of the system?

In the first case, it should be possible to fix the profiling runs
to be deterministic, but for that it would be useful
to be able to see the differences between runs.
How could I diff gcc's .gcda files?
Comment 1 Dr. Werner Fink 2017-05-24 13:04:30 UTC
(In reply to Bernhard Wiedemann from comment #0)
> In https://build.opensuse.org/project/prjconf/openSUSE:Factory
> we have
> %do_profiling 1
> 
> and because of that
> in our bash.spec we enable gcc's
> 'profile feedback directed optimizations'
> 
> but that causes the jobs.o and resulting bash binary
> to differ between builds, even when running on the same build host.
> 
> And because of that, build-compare always thinks there is a change
> and triggers a re-publish
> and rebuild of depending packages
> 
> We also have such binary diffs in gcc6
> and gcc6.spec calls a
> make profiledbootstrap
> 
> 
> The do_profiling macro is used in
> bash
> gzip
> hello
> python3-base
> python-base
> sed
> xz
> 
> and in
> http://rb.zq1.de/compare.factory-20170523/bash-compare.out
> http://rb.zq1.de/compare.factory-20170523/gcc6-compare.out
> http://rb.zq1.de/compare.factory-20170523/gzip-compare.out
> http://rb.zq1.de/compare.factory-20170523/python-base-compare.out
> http://rb.zq1.de/compare.factory-20170523/python3-base-compare.out
> 
> we have strange diffs in assembler that I could not trace down
> to other sources of non-determinism until today.
> Diffs did go away when building without profiling
> (it was harder to disable for gcc6 and bash though)
> 
> 
> Do the profiles just count invocations of functions
> or do they depend on the type and speed of the system?
> 
> In the first case, it should be possible to fix the profiling runs
> to be deterministic, but for that it would be useful
> to be able to see the differences between runs.
> How could I diff gcc's .gcda files?

That is you problem, no mine ... do not touch bash test suite!
Comment 2 Richard Biener 2017-05-24 13:29:40 UTC
(In reply to Bernhard Wiedemann from comment #0)
> In https://build.opensuse.org/project/prjconf/openSUSE:Factory
> we have
> %do_profiling 1
> 
> and because of that
> in our bash.spec we enable gcc's
> 'profile feedback directed optimizations'
> 
> but that causes the jobs.o and resulting bash binary
> to differ between builds, even when running on the same build host.
> 
> And because of that, build-compare always thinks there is a change
> and triggers a re-publish
> and rebuild of depending packages
> 
> We also have such binary diffs in gcc6
> and gcc6.spec calls a
> make profiledbootstrap
> 
> 
> The do_profiling macro is used in
> bash
> gzip
> hello
> python3-base
> python-base
> sed
> xz
> 
> and in
> http://rb.zq1.de/compare.factory-20170523/bash-compare.out
> http://rb.zq1.de/compare.factory-20170523/gcc6-compare.out
> http://rb.zq1.de/compare.factory-20170523/gzip-compare.out
> http://rb.zq1.de/compare.factory-20170523/python-base-compare.out
> http://rb.zq1.de/compare.factory-20170523/python3-base-compare.out
> 
> we have strange diffs in assembler that I could not trace down
> to other sources of non-determinism until today.

Is the buildroot 1:1 the same and then you get different code?

> Diffs did go away when building without profiling
> (it was harder to disable for gcc6 and bash though)
> 
> 
> Do the profiles just count invocations of functions
> or do they depend on the type and speed of the system?

Just counting.  Inconsistencies can pop up when you run the instrumented
binaries in parallel (gcov uses file locking but still we see that happen).

> In the first case, it should be possible to fix the profiling runs
> to be deterministic, but for that it would be useful
> to be able to see the differences between runs.
> How could I diff gcc's .gcda files?

There is gcov-dump (might need to pick it up from devel:gcc gcc packages).
Comment 3 Bernhard Wiedemann 2017-05-24 14:26:49 UTC
Thanks for the hints Richard.

I experimented some more with gzip (because it is the fastest to build)
and found that it differed even when running 'osc build' twice
without cleanup on the same rootfs
and that I could make builds fully reproducible with
https://build.opensuse.org/request/show/497997

because before, tar would include random timestamps, ordering, paxheaders (PID)
and even variations in the filename of $tmpfile (for gzip without -n)
caused slight differences in the built binary.
Comment 4 Bernhard Wiedemann 2017-05-26 08:01:11 UTC
This is an autogenerated message for OBS integration:
This bug (1040589) was mentioned in
https://build.opensuse.org/request/show/498391 Factory / bash
Comment 5 Bernhard Wiedemann 2017-05-31 06:00:35 UTC
This is an autogenerated message for OBS integration:
This bug (1040589) was mentioned in
https://build.opensuse.org/request/show/499887 Factory / gzip
Comment 6 Bernhard Wiedemann 2018-04-15 23:09:28 UTC
Apparently, .gcda file creation in gcc7 is racy
otherwise, why would those SRs help make it build reproducible?
https://build.opensuse.org/request/show/596431 sed
https://build.opensuse.org/request/show/596586 pcre

gcc-4.8 from 42.3 seems to behave the same, so not a regression.
(unfortunately no older gccs are available anymore for Factory)
Comment 7 Richard Biener 2018-04-17 06:52:59 UTC
Honza, are you aware of any issue regarding reproducability of the final .gcda files when merging results of multiple invocations?  IIRC we use file-locking to serialize the update but of course different parallel program invocations might result in random order of merging.  How can that ever be an issue?  Maybe for counter overflows?  or some ordering of items and then rippling down effects when the profile is read by -fprofile-use?
Comment 8 Richard Biener 2018-04-18 07:46:10 UTC
(In reply to Richard Biener from comment #7)
> Honza, are you aware of any issue regarding reproducability of the final
> .gcda files when merging results of multiple invocations?  IIRC we use
> file-locking to serialize the update but of course different parallel
> program invocations might result in random order of merging.  How can that
> ever be an issue?  Maybe for counter overflows?  or some ordering of items
> and then rippling down effects when the profile is read by -fprofile-use?

Looking at libgcov-merge.c there are several types of counters (most value-profiling ones) that have merging routines that are sensitive to the order
of merging.  Some even look wrong to me (__gcov_merge_single - though
maybe counters[1] is just too little specified).

So it's not really "races" but for example when merging single-value
counters in the written order { 1, 2, 10 }, { 2, 4, 10 }, { 1, 4, 10 }
then we get { 2, 4, 30 } and in any other order we'd get { 1, 6, 30 }
Ok, not really, somehow we massage counter[1] to avoid this?  Here
we'd go merging the 2nd into the first { 2, 2, 20 } and then merging in
the third { 1, 2, 30 }.  When starting with the 2nd we'd go
{ 2, 2, 20 }, { 1, 2, 30 } so here it seems to "work" but I'd have a
hard time believing the simple "trick" makes the merging independent of
order.

Using _add for GCOV_COUNTER_AVERAGE also looks odd.  Combining
200 1 and 20000 1 yields 20200 2 ...
Comment 9 Martin Liška 2018-04-18 10:34:31 UTC
So I did analysis of 2 packages before the changes in Factory happened:

1) gzip - as mentioned a temp file was used and zipped. Temp file name probably causes the divergence as it's always different.

2) sed - it uses pthreads (but not passed as CFLAGS), thus I added -fprofile-update=atomic to cflags. See documentation here:

https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html

and I've got it, diff in between 2 runs:

./gnulib-tests/nanosleep.gcda: a3000000:  37:PROGRAM_SUMMARY checksum=0x99d0fa93
[snip]
./gnulib-tests/nanosleep.gcda:  01a10000:  14:COUNTERS arcs 7 counts
./gnulib-tests/nanosleep.gcda:		0 2 1 1 0 1 29 30

./gnulib-tests/nanosleep.gcda: a3000000:  37:PROGRAM_SUMMARY checksum=0xc3046d04
[snip]
./gnulib-tests/nanosleep.gcda:		0 2 1 1 0 1 30 31

./lib/stat-time.h:

stat_time_normalize (int result, struct stat *st _GL_UNUSED)
{
#if defined __sun && defined STAT_TIMESPEC
  if (result == 0)
    {
      long int timespec_resolution = 1000000000;
      short int const ts_off[] = { offsetof (struct stat, st_atim),
                                   offsetof (struct stat, st_mtim),
                                   offsetof (struct stat, st_ctim) };
      int i;
      for (i = 0; i < sizeof ts_off / sizeof *ts_off; i++)
        {
          struct timespec *ts = (struct timespec *) ((char *) st + ts_off[i]);
          long int q = ts->tv_nsec / timespec_resolution;
          long int r = ts->tv_nsec % timespec_resolution;
          if (r < 0)
            {
              r += timespec_resolution;
              q--;
            }
          ts->tv_nsec = r;
          /* Overflow is possible, as Solaris 11 stat can yield
             tv_sec == TYPE_MINIMUM (time_t) && tv_nsec == -1000000000.
             INT_ADD_WRAPV is OK, since time_t is signed on Solaris.  */
          if (INT_ADD_WRAPV (q, ts->tv_sec, &ts->tv_sec))
            {
              errno = EOVERFLOW;
              return -1;
            }
        }
    }
#endif
  return result;
}

It loops based on how fast time flies. This can't be stable.
Comment 10 Martin Liška 2018-04-18 11:17:02 UTC
Can't reproduce a binary diff for a 10 runs of 'xz' package.
But I guess it's multi-threaded, thus -fprofile-update=atomic should be used.
Comment 11 Martin Liška 2018-04-18 11:53:02 UTC
I've done https://build.opensuse.org/request/show/597754
Then we can check the packages and eventually revert the changes that were done as a workaround.
Comment 12 Swamp Workflow Management 2018-04-18 14:10:07 UTC
This is an autogenerated message for OBS integration:
This bug (1040589) was mentioned in
https://build.opensuse.org/request/show/597793 Factory / rpm
Comment 14 Bernhard Wiedemann 2018-06-26 04:20:59 UTC
Now I built a reproducible gcc8 in
https://build.opensuse.org/project/show/home:bmwiedemann:reproducible:test:gccnoprof

and ran some benchmarks to see how much is the difference in performance
on a 6-core Intel Haswell system

# prep:
osc co openSUSE:Factory/pcre && cd $_
mnt=/mnt/3/
mkdir -p $mnt
mount -t tmpfs tmpfs $mnt
OSC_BUILD_ROOT=$mnt/build-prof/ osc build --alternative-project=devel:gcc
OSC_BUILD_ROOT=$mnt/build-noprof/ osc build \
  --alternative-project=home:bmwiedemann:reproducible:test:gccnoprof

# bench:
for i in $(seq 4) ; do time taskset 1 chroot $mnt/build-noprof bash -c "cd /home/abuild/rpmbuild/BUILD/*/ ; make clean all >/dev/null 2>&1" ; done 
real    0m48.686s
real    0m48.720s
real    0m48.778s
real    0m48.715s
real    0m48.758s
user    0m47.079s
sys     0m1.667s

for i in $(seq 4) ; do time taskset 1 chroot $mnt/build-prof bash -c "cd /home/abuild/rpmbuild/BUILD/*/ ; make clean all >/dev/null 2>&1" ; done
real    0m55.178s
real    0m55.138s
real    0m55.127s
real    0m55.164s
real    0m55.207s
user    0m53.376s
sys     0m1.819s


so my gcc8 version was faster by 12% - that was a big surprise.
Variance between measurements is low, 
because no HDD, no network and no parallelism is involved

# same prep but now with apache2
# noprof:
real    1m19.673s
real    1m19.539s
real    1m19.673s
real    1m19.577s
user    1m14.167s
sys     0m5.287s

# prof:
real    1m24.323s
real    1m24.277s
real    1m24.289s
real    1m24.274s
user    1m18.195s
sys     0m5.955s


also 5% faster with apache2

# same with pcre on AMD A10-5800K
# noprof:
real    1m12,261s
real    1m12,644s
real    1m12,162s
real    1m12,280s
user    1m9,574s
sys     0m2,612s

# prof:
real    1m20,481s
real    1m20,475s
real    1m20,500s
real    1m20,191s
user    1m16,891s
sys     0m3,132s

also 11% faster

Did I benchmark this wrong?
Small things can make a difference, e.g. >/dev/null makes it 1% faster


so do you guys want to debug why profiledbootstrap makes things slower
or do we just disable it and everyone is happy?

One thing that confuses me is that the 'sys' value also goes up by some %
Maybe the kernel cannot properly do accounting there.
Or is the profiled compiler really doing more/slower syscalls?


Also I re-tested profiling in sed and pcre and found that even with
-fprofile-update=atomic from the rpm update, parallel builds are unreproducible
but non-parallel builds are reproducible across machines
so I guess it is caused by the non-commutative merging of .gcda files
Comment 15 Bernhard Wiedemann 2018-06-26 05:14:16 UTC
argh. found it.
I had debuginfo disabled in OBS meta only for the noprof side
# pcre/Haswell noprof
real    0m59.597s
real    0m59.442s
real    0m59.597s
real    0m59.467s 
user    0m57.562s 
sys     0m1.894s

so profiling does make things 7-8% faster.
Comment 16 Richard Biener 2018-06-26 07:11:05 UTC
(In reply to Bernhard Wiedemann from comment #15)
> argh. found it.
> I had debuginfo disabled in OBS meta only for the noprof side
> # pcre/Haswell noprof
> real    0m59.597s
> real    0m59.442s
> real    0m59.597s
> real    0m59.467s 
> user    0m57.562s 
> sys     0m1.894s
> 
> so profiling does make things 7-8% faster.

That's more expected ;)  It's said that building with LTO yields another 2-3%
(but only when combined with profile feedback).  !SLES also has plugins
enabled (exporting all dynamic symbols) that could limit the usefulness
of LTO a bit (no dead function removal).

I wonder if you can do timings (and verify whether binaries are reproducible)
when changing the build to use -fprofile-generate -fno-profile-values.  It
looks like you need to patch sources to achieve that conveniently,
the toplevel Makefile.in contains

STAGEprofile_CFLAGS = $(STAGE2_CFLAGS) -fprofile-generate

where you'd need to add -fno-profile-values
Comment 17 Bernhard Wiedemann 2018-06-26 16:43:24 UTC
I built a gcc8 with --fprofile-generate fno-profile-values
and it generated slightly smaller .gcda files
but they still differed between builds (even in size). All 2136 of them.

I also built a whole gcc8 using make profiledbootstrap without -j to avoid parallelism
but that did not help either, so maybe we should start from the other end
and get a reduced profiling run (e.g. compiling just one .c file) to deliver reproducible .gcda files and grow it from there.


looking at gcov-dump -l for libcpp/gch.gcda
I get a lengthy diff that starts with

 data:magic `gcda':version `A81*'
-stamp 1012471907
-  a3000000: 292:PROGRAM_SUMMARY checksum=0xdbd9b6da 
-                counts=863745, runs=1767, sum_all=46197241408, run_max=8930117, sum_max=642108707
+stamp 469824253
+  a3000000: 297:PROGRAM_SUMMARY checksum=0xb69e9fcb 
+                counts=863745, runs=1767, sum_all=46197120574, run_max=8930117, sum_max=642108707
                 counter histogram:
-                 0: num counts=748885, min counter=0, cum_counter=0
-                 1: num counts=110, min counter=1, cum_counter=110
-                 2: num counts=924, min counter=2, cum_counter=1848
-                 3: num counts=120, min counter=3, cum_counter=360
-                 4: num counts=1246, min counter=4, cum_counter=4984
-                 5: num counts=7, min counter=5, cum_counter=35
-                 6: num counts=403, min counter=6, cum_counter=2418
+                 0: num counts=748891, min counter=0, cum_counter=0
+                 1: num counts=105, min counter=1, cum_counter=105
+                 2: num counts=922, min counter=2, cum_counter=1844
+                 3: num counts=124, min counter=3, cum_counter=372
+                 4: num counts=1248, min counter=4, cum_counter=4992
+                 5: num counts=11, min counter=5, cum_counter=55
+                 6: num counts=392, min counter=6, cum_counter=2352
                  7: num counts=977, min counter=7, cum_counter=6839
Comment 18 Bernhard Wiedemann 2018-06-27 09:19:01 UTC
was debugging this further in gcc8

I built it with a patch to use
-fprofile-update=atomic -fprofile-generate -fno-profile-values

I found xgcc would call cc1 introducing some unwanted parallelism
but even when calling cc1 -E directly, it still generated .gcda file variations 5-10% of the time

Setup is thusly:

osc build --debuginfo --noservice
osc chroot
cd /home/abuild/rpmbuild/BUILD/gcc-8.1.1+r261583
PATH=/home/abuild/rpmbuild/BUILD/gcc-8.1.1+r261583/obj-x86_64-suse-linux/stageprofile-gcc/:$PATH
echo "int main() {return 0;}" > test.c
find -name \*.gcda|xargs rm ; ltrace -s 4100 -o /tmp/l2 cc1 -E -quiet -I ./obj-x86_64-suse-linux/stageprofile-gcc/include/ -iprefix /home/abuild/rpmbuild/BUILD/gcc-8.1.1+r261583/obj-x86_64-suse-linux/stageprofile-gcc/../lib64/gcc/x86_64-suse-linux/8/ test.c -mtune=generic -march=x86-64 -O0 ; find -name \*.gcda|sort|xargs cat|md5sum


Diffing the ltrace outputs I found that the .gcda variations correlated with random extra calloc calls in at least 2 places.

@@ -15334,6 +15334,7 @@
 realloc()                                        = <void>
 free()                                           = <void>
 calloc(1, 120)                                   = 0x4xxxxxx
+calloc(4096, 8)                                  = 0x4xxxxxx
 realloc()                                        = <void>
 free()                                           = <void>
 calloc(1, 56)                                    = xxxxy
 memset()                                         = <void>
 bindtextdomain("cpplib-8", "/usr/share/locale")  = "/usr/share/locale"
@@ -20008,7 +20009,6 @@
 memcpy()                                         = <void>
 free()                                           = <void>
 calloc(1, 72)                                    = xxxxy
-calloc(4096, 8)                                  = xxxxy
 memcpy()                                         = <void>
 memcpy()                                         = <void>
 free()                                           = <void>


and that 'bindtextdomain' allows to find that this is in or shortly before
libcpp/init.c init_library

disabling ALSR by prefixing the cc1 call with
setarch x86_64 -R
seems to avoid this indeterminism

it even helps to make an xgcc -c hello.c
create reproducible .gcda files
so I'm trying that combination now in
home:bmwiedemann:reproducible:test:gccnoprofilevalues/gcc8
Comment 19 Bernhard Wiedemann 2018-06-27 09:21:00 UTC
Created attachment 775386 [details]
3 ltraces from runs with different .gcda results
Comment 20 Michael Matz 2018-06-27 13:15:50 UTC
(In reply to Bernhard Wiedemann from comment #18)
> 
> disabling ALSR by prefixing the cc1 call with
> setarch x86_64 -R
> seems to avoid this indeterminism
> 
> it even helps to make an xgcc -c hello.c
> create reproducible .gcda files

Interesting, so we have some code in the profiling area that depends on
addresses.  We should change this somewhen.

> so I'm trying that combination now in
> home:bmwiedemann:reproducible:test:gccnoprofilevalues/gcc8
Comment 21 Bernhard Wiedemann 2018-06-27 14:17:43 UTC
(In reply to Michael Matz from comment #20)
> Interesting, so we have some code in the profiling area that depends on
> addresses.  We should change this somewhen.

Actually, not in the profiling area, but it can be anywhere in cc1.
Maybe some hash table that uses addrs as keys
or a memmove that behaves differently depending on which pointer is larger

and if any of that results in a different if/else branch taken,
the .gcda files might differ.

https://github.com/bmwiedemann/theunreproduciblepackage/tree/master/aslr
has some examples of packages that were unreproducible with ASLR
and their patches

The good news is that disabling ASLR during profiledbootstrap,
gave me a reproducible gcc8 build.

Running the quick benchmark again showed it to be as fast as the original gcc8 (within +-0.1%)
Comment 22 Bernhard Wiedemann 2018-06-27 19:31:21 UTC
just noticed that testing with pcre is a lot faster.
There I found that -fno-profile-values and disabling ASLR together
allow for a parallelized profile run

-  make %{?_smp_mflags} CFLAGS="%{optflags} %{cflags_profile_generate}" V=1
-  make CFLAGS="%{optflags} %{cflags_profile_generate}" check
+  make %{?_smp_mflags} CFLAGS="%{optflags} %{cflags_profile_generate} -fno-profile-values" V=1
+  setarch `arch` -R make %{?_smp_mflags} ... check

so this still leaves pcre build reproducible.
Doesnt make a difference there though,
since the make check just takes a few seconds
Comment 23 Bernhard Wiedemann 2018-06-28 04:39:28 UTC
parallelized gcc profiledbootstrap still varies.

build-compare assembler diffs show optimization variations in

_Z13c_finish_loopjP9tree_nodeS0_S0_S0_S0_b.cold.141
_Z22build_conditional_exprjP9tree_nodebS0_S0_jS0_S0_j.cold.158
_Z15build_binary_opj9tree_codeP9tree_nodeS1_b.cold.162
_ZL13string_lengthPKvjj.cold.107
_ZL13store_one_argP8arg_dataP7rtx_defiii.cold.146
_ZL22expand_one_stack_var_1P9tree_node
_Z20duplicate_insn_chainP8rtx_insnS0_.cold.130
_Z19cfg_layout_finalizev.cold.133
_ZN11symtab_node10unregisterEv.cold.121
_ZL18handle_alias_pairsv.cold.92
_ZL24maybe_record_trace_startP8rtx_insnS0_.cold.130
_Z29dwarf2out_switch_text_sectionv.cold.584
_ZL11compare_icsP10conversionS0_.cold.83

and that is just from 5% of the full cc1 asm diff, but maybe it gives a hint where variations come from.
Comment 24 Martin Liška 2018-06-28 06:17:37 UTC
Thanks Bernhard for working on that. Note that GCC uses garbage collector, thus I guess edge counts depend on memory layout (heap). For instance GGC page release code depends on which pages are free. Can you please paste diff of gcda files of parallel GCC build?
Comment 25 Bernhard Wiedemann 2018-06-28 07:47:40 UTC
Uploaded all 2136 .gcda files from 2 differing runs to
/suse/bwiedemann/Export/temp/no-aslr-fno-profile-values.tar.xz

e.g.

filterdiff gcovdumpfilter ?/gcc-8.1.1+r261583/*/libcpp/pch.gcda

starts with

 data:magic `gcda':version `A81*'
-stamp 1118532836
-  a3000000: 282:PROGRAM_SUMMARY checksum=0x7c4f9ce1
+stamp 575964339
+  a3000000: 287:PROGRAM_SUMMARY checksum=0xda08eb83
                 counts=863745, runs=1767, sum_all=46197068155, run_max=8930117, sum_max=642108707


using
https://github.com/bmwiedemann/reproducibleopensuse/blob/master/filterdiff
https://github.com/bmwiedemann/reproducibleopensuse/blob/master/gcovdumpfilter
Comment 26 Martin Liška 2018-06-29 10:23:48 UTC
(In reply to Bernhard Wiedemann from comment #25)
> Uploaded all 2136 .gcda files from 2 differing runs to
> /suse/bwiedemann/Export/temp/no-aslr-fno-profile-values.tar.xz
> 
> e.g.
> 
> filterdiff gcovdumpfilter ?/gcc-8.1.1+r261583/*/libcpp/pch.gcda
> 
> starts with
> 
>  data:magic `gcda':version `A81*'
> -stamp 1118532836
> -  a3000000: 282:PROGRAM_SUMMARY checksum=0x7c4f9ce1
> +stamp 575964339
> +  a3000000: 287:PROGRAM_SUMMARY checksum=0xda08eb83
>                  counts=863745, runs=1767, sum_all=46197068155,
> run_max=8930117, sum_max=642108707

Thanks for it! The checksum change is strange, let me investigate that.

> 
> 
> using
> https://github.com/bmwiedemann/reproducibleopensuse/blob/master/filterdiff
> https://github.com/bmwiedemann/reproducibleopensuse/blob/master/
> gcovdumpfilter
Comment 27 Swamp Workflow Management 2018-11-12 08:50:06 UTC
This is an autogenerated message for OBS integration:
This bug (1040589) was mentioned in
https://build.opensuse.org/request/show/648389 Factory / grep
Comment 28 Martin Liška 2019-02-07 08:47:26 UTC
We've got a team meeting and we discuss the topic quite thoroughly.
There's an analysis why the mentioned packages have not reproducible builds:

1) GCC: it's using garbage collected memory and similar mechanisms that are dependent on memory layout (ASLR, heap allocation). That makes it complicated to achieve a reproducible builds.

2) Python: I haven't analyzed the package, but I would expect it's affected by similar problem.

3) gzip - looks stable to me right now
4) xz - likewise
5) sed - analyzed here: https://bugzilla.suse.com/show_bug.cgi?id=1040589#c9
6) hello - the package looks stable
7) bash:

I did concat of gcov-dump of all gcda and gcno files (by adding --coverage):
$ find /var/tmp/ -name '*.gcda' 2>/dev/null | sort | xargs -L1 gcov-dump -l > ..

and using a simple script:
https://github.com/marxin/script-misc/blob/master/gcov-diff.py

$ ./gcov-diff.py /tmp/gcda1 /tmp/gcda2 /tmp/gcno1 | colordiff
Difference for fn 1039863413 (fmtumax fmtulong.c:84)
--- 
+++ 
@@ -1,6 +1,6 @@
     01a10000:  60:COUNTERS arcs 30 counts
                    0: 0 0 275415 91 0 0 0 0
-                   8: 198528 76887 12 184735 0 0 0 0
+                   8: 198528 76887 12 184812 0 0 0 0
                   16: 0 0 0 0 0 0 0 0
                   24: 0 0 0 0 0 91
     01a30000:   8:COUNTERS interval 4 counts

Difference for fn 1133735831 (mbschr mbschr.c:43)
--- 
+++ 
@@ -1,5 +1,5 @@
     01a10000:  24:COUNTERS arcs 12 counts
-                   0: 1720761 1105721 1105721 669139 1665477 667538 667538 667533
-                   8: 1035 6682747 668104 1987469
+                   0: 1720761 1105721 1105721 669139 1665499 667538 667538 667533
+                   8: 1035 6682747 668104 1987491
     01af0000:   2:COUNTERS time_profiler 1 counts
                    0: 38

Difference for fn 1191969440 (timeval_to_cpu timeval.c:73)
--- 
+++ 
@@ -1,6 +1,6 @@
     01a10000:  16:COUNTERS arcs 8 counts
                    0: 6 0 0 0 36 24 0 6
     01a70000:   6:COUNTERS single 3 counts
-                   0: 9 4 6
+                   0: 11 2 6
     01af0000:   2:COUNTERS time_profiler 1 counts
                    0: 265

Difference for fn 58810953 (make_word_flags make_cmd.c:102)
--- 
+++ 
@@ -1,5 +1,5 @@
     01a10000:  22:COUNTERS arcs 11 counts
-          %2
Comment 29 Martin Liška 2019-02-07 08:50:20 UTC
[the post was somehow cut]

Difference for fn 58810953 (make_word_flags make_cmd.c:102)
--- 
+++ 
@@ -1,5 +1,5 @@
     01a10000:  22:COUNTERS arcs 11 counts
-                   0: 36690 178 651 488230 0 10634 112 0
+                   0: 36690 178 651 488252 0 10634 112 0
                    8: 0 32671 36690
     01a70000:   0:COUNTERS single 0 counts
     01ab0000:   0:COUNTERS average 0 counts

Difference for fn 2120680264 (unquoted_glob_pattern_p pathexp.c:63)
--- 
+++ 
@@ -1,6 +1,6 @@
     01a10000:  36:COUNTERS arcs 18 counts
-                   0: 591965 5839 106 6638 47 6627 6504 39
-                   8: 0 1067486 7 7694 6173 6180 814 700698
+                   0: 591966 5839 106 6638 47 6627 6504 39
+                   8: 0 1067509 7 7694 6173 6180 814 700698
                   16: 700698 251221
     01a90000:   0:COUNTERS indirect_call 0 counts
     01af0000:   2:COUNTERS time_profiler 1 counts

Difference for fn 95503897 (dequote_string subst.c:4233)
--- 
+++ 
@@ -1,6 +1,6 @@
     01a10000:  44:COUNTERS arcs 22 counts
                    0: 470437 0 468193 388300 347974 4556 4556 178067
-                   8: 0 287077 178804 21545467 0 21120019 16 8842
+                   8: 0 287077 178804 21545468 0 21120020 16 8842
                   16: 19233 2534 659422 0 680265 178804
     01a70000:   0:COUNTERS single 0 counts
     01a90000:   0:COUNTERS indirect_call 0 counts

Difference for fn 48042273 (quote_string subst.c:4199)
--- 
+++ 
@@ -1,5 +1,5 @@
     01a10000:  26:COUNTERS arcs 13 counts
-                   0: 383 217052 383 217052 20812837 16 8836 19230
+                   0: 383 217052 383 217052 20812838 16 8836 19230
                    8: 2530 625854 0 646694 217052
     01a70000:   0:COUNTERS single 0 counts
     01a90000:   0:COUNTERS indirect_call 0 counts

Difference for fn 151936190 (quote_escapes_internal subst.c:4037)
--- 
+++ 
@@ -1,7 +1,7 @@
     01a10000:  50:COUNTERS arcs 25 counts
                    0: 102577 0 102206 138372 36 165889 138372 7
-                   8: 102577 102577 757685 473 757360 155 1251 239
-                  16: 701084 3 0 142 6 56779 0 56960
+
Comment 30 Martin Liška 2019-02-07 08:50:45 UTC
Function timeval_to_cpu should clearly be marked with no_profile_instrument_function attribute
as number of loop iterations depends on time.

The other differences are in string functions, very hard to tell what's the difference.
But as we know any of these can potentially cause a code generation difference.

Last topic we discussed are -fprofile-values counters. There can really diverge when running a parallel
test-suite. It would be possible to resolve in GCC 10.
Comment 31 Swamp Workflow Management 2019-03-10 20:40:06 UTC
This is an autogenerated message for OBS integration:
This bug (1040589) was mentioned in
https://build.opensuse.org/request/show/683607 Factory / pcp
Comment 34 Swamp Workflow Management 2019-03-11 12:40:06 UTC
This is an autogenerated message for OBS integration:
This bug (1040589) was mentioned in
https://build.opensuse.org/request/show/683832 Factory / pcp
Comment 37 Martin Liška 2019-06-26 13:21:47 UTC
Just for the record: gcc9 package has been updated to provide more stable builds that are caused by value profiling differences:
https://build.opensuse.org/request/show/709739
Comment 38 Bernhard Wiedemann 2019-06-28 10:06:48 UTC
tested devel:gcc/gcc9 and even with minimized non-determinism,
it produced variations

/usr/lib64/gcc/x86_64-suse-linux/9/gnat1 differs in assembler output
/usr/lib64/gcc/x86_64-suse-linux/9/cc1 differs in assembler output
_ZN7ipa_icf18sem_item_optimizer28do_congruence_step_for_indexEPNS_16congruence_classEj.cold

A lot of register assignments changed, but also some code moved.
Comment 39 Swamp Workflow Management 2019-10-04 13:10:07 UTC
This is an autogenerated message for OBS integration:
This bug (1040589) was mentioned in
https://build.opensuse.org/request/show/735118 Factory / MozillaFirefox
Comment 40 Swamp Workflow Management 2020-02-04 09:40:08 UTC
This is an autogenerated message for OBS integration:
This bug (1040589) was mentioned in
https://build.opensuse.org/request/show/766786 15.2 / python-rjsmin
Comment 42 OBSbugzilla Bot 2020-07-01 12:50:07 UTC
This is an autogenerated message for OBS integration:
This bug (1040589) was mentioned in
https://build.opensuse.org/request/show/818103 Factory / grep
Comment 43 Bernhard Wiedemann 2020-10-22 12:24:32 UTC
bison-3.7.3 started to vary from PGO.
For profiling it uses a pretty large 'make check' run with 650 substeps.
Comment 44 Andreas Stieger 2020-10-22 14:35:00 UTC
(In reply to Bernhard Wiedemann from comment #43)
> bison-3.7.3 started to vary from PGO.

Can you help me understand if this is a (new) problem in 3.7.3, and give the steps to compare the binaries? Last I found was:
http://rb.zq1.de/compare.factory-20200430/bison-compare.out
Comment 45 Bernhard Wiedemann 2020-10-22 15:27:07 UTC
I found my debugging note from/before 2020-06-29:
> PGO again via ASLR, filesys, date

https://build.opensuse.org/request/show/676711 made it reproducible back then
and looking closer at old results, 
3.5.2 was the first one marked to differ
https://rb.zq1.de/compare.factory-20200303/bison-compare.out

Here is the general description of the problem of PGO with reproducibility:
https://github.com/bmwiedemann/theunreproduciblepackage/tree/master/pgo

Would disabling PGO for bison be an option?
Comment 46 Martin Pluskal 2020-10-22 17:10:35 UTC
(In reply to Bernhard Wiedemann from comment #45)
> I found my debugging note from/before 2020-06-29:
> > PGO again via ASLR, filesys, date
> 
> https://build.opensuse.org/request/show/676711 made it reproducible back then
> and looking closer at old results, 
> 3.5.2 was the first one marked to differ
> https://rb.zq1.de/compare.factory-20200303/bison-compare.out
> 
> Here is the general description of the problem of PGO with reproducibility:
> https://github.com/bmwiedemann/theunreproduciblepackage/tree/master/pgo
> 
> Would disabling PGO for bison be an option?

No! I do not wish to damage speed and/or optimized size of binaries because of this.
Comment 47 Martin Liška 2021-03-02 14:07:49 UTC
@Bernhard Wiedemann: Can we close this issue, please?
Comment 48 Bernhard Wiedemann 2021-03-04 13:32:31 UTC
I dont think, this has been fully solved.

gcc, bison, python all still suffer from PGO.
I suspect bash-4.4 in 15.3 might also be affected.
Maybe also MozillaFirefox.

One intermediate solution would be to use in these .spec files
%if 0%{?do_profiling} && !0%{?only_reproducible_profiling}
Comment 50 Swamp Workflow Management 2021-06-28 16:18:31 UTC
SUSE-RU-2021:2173-1: An update that has 5 recommended fixes and contains one feature can now be installed.

Category: recommended (moderate)
Bug References: 1040589,1047218,1182604,1185540,1186049
CVE References: 
JIRA References: SLE-17848
Sources used:
SUSE MicroOS 5.0 (src):    pcre-8.41-6.4.2
SUSE Linux Enterprise Module for Development Tools 15-SP3 (src):    brp-check-suse-84.87+git20181106.224b37d-3.18.2
SUSE Linux Enterprise Module for Development Tools 15-SP2 (src):    brp-check-suse-84.87+git20181106.224b37d-3.18.2
SUSE Linux Enterprise Module for Basesystem 15-SP3 (src):    automake-1.15.1-4.10.2, pcre-8.41-6.4.2
SUSE Linux Enterprise Module for Basesystem 15-SP2 (src):    automake-1.15.1-4.10.2, pcre-8.41-6.4.2

NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination.
Comment 51 Swamp Workflow Management 2021-07-03 01:16:13 UTC
openSUSE-RU-2021:0963-1: An update that has 5 recommended fixes and contains one feature can now be installed.

Category: recommended (moderate)
Bug References: 1040589,1047218,1182604,1185540,1186049
CVE References: 
JIRA References: SLE-17848
Sources used:
openSUSE Leap 15.2 (src):    automake-1.15.1-lp152.5.3.1, automake-testsuite-1.15.1-lp152.5.3.1, brp-check-suse-84.87+git20181106.224b37d-lp152.2.12.1, pcre-8.41-lp152.7.3.1
Comment 52 Swamp Workflow Management 2021-07-11 10:24:47 UTC
openSUSE-RU-2021:2173-1: An update that has 5 recommended fixes and contains one feature can now be installed.

Category: recommended (moderate)
Bug References: 1040589,1047218,1182604,1185540,1186049
CVE References: 
JIRA References: SLE-17848
Sources used:
openSUSE Leap 15.3 (src):    automake-1.15.1-4.10.2, brp-check-suse-84.87+git20181106.224b37d-3.18.2, pcre-8.41-6.4.2
Comment 57 OBSbugzilla Bot 2022-03-16 13:40:10 UTC
This is an autogenerated message for OBS integration:
This bug (1040589) was mentioned in
https://build.opensuse.org/request/show/962180 Factory / grep
Comment 59 Swamp Workflow Management 2022-05-31 13:18:21 UTC
SUSE-RU-2022:1887-1: An update that has one recommended fix and contains one feature can now be installed.

Category: recommended (moderate)
Bug References: 1040589
CVE References: 
JIRA References: SLE-24115
Sources used:
openSUSE Leap 15.4 (src):    grep-3.1-150000.4.6.1
openSUSE Leap 15.3 (src):    grep-3.1-150000.4.6.1
SUSE Linux Enterprise Realtime Extension 15-SP2 (src):    grep-3.1-150000.4.6.1
SUSE Linux Enterprise Module for Basesystem 15-SP4 (src):    grep-3.1-150000.4.6.1
SUSE Linux Enterprise Module for Basesystem 15-SP3 (src):    grep-3.1-150000.4.6.1
SUSE Linux Enterprise Micro 5.2 (src):    grep-3.1-150000.4.6.1
SUSE Linux Enterprise Micro 5.1 (src):    grep-3.1-150000.4.6.1
SUSE Linux Enterprise Micro 5.0 (src):    grep-3.1-150000.4.6.1

NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination.