Bug 1155889 - LTO makes bluez unit test segfault
LTO makes bluez unit test segfault
Status: RESOLVED FIXED
Classification: openSUSE
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Basesystem
Current
Other Other
: P5 - None : Normal (vote)
: ---
Assigned To: Vladimir Botka
E-mail List
:
Depends on:
Blocks: 1133084
  Show dependency treegraph
 
Reported: 2019-11-05 10:01 UTC by Stefan Seyfried
Modified: 2020-12-15 11:11 UTC (History)
2 users (show)

See Also:
Found By: Community User
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Stefan Seyfried 2019-11-05 10:01:00 UTC
with bluez-5.52 update, unit/test-mesh-crypto segfaults.

Disabling LTO makes it succeed.

https://marc.info/?t=157293976600036&r=1&w=2 for details.

Package with LTO disabled is right now in home:seife:testing/bluez, will submit to Base:System after some testing.
Comment 1 Martin Liška 2019-11-05 10:27:25 UTC
I'm taking a look right now ..
Comment 2 Martin Liška 2019-11-05 11:08:24 UTC
As mentioned here:
https://marc.info/?l=linux-bluetooth&m=157294761620482&w=2

I see:

==4565== Invalid read of size 8
==4565==    at 0x10D389: l_queue_clear (queue.c:103)
==4565==    by 0x10D339: l_queue_destroy (queue.c:83)
==4565==    by 0x10D281: free_debug_sections (log.c:454)
==4565==    by 0x400FC12: _dl_fini (in /lib64/ld-2.30.so)
==4565==    by 0x4889876: __run_exit_handlers (in /lib64/libc-2.30.so)
==4565==    by 0x4889A2B: exit (in /lib64/libc-2.30.so)
==4565==    by 0x4871E11: (below main) (in /lib64/libc-2.30.so)
==4565==  Address 0x4a15040 is 0 bytes inside a block of size 24 free'd
==4565==    at 0x48389AB: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==4565==    by 0x10B742: l_free (util.c:136)
==4565==    by 0x10D345: l_queue_destroy (queue.c:84)
==4565==    by 0x400FC12: _dl_fini (in /lib64/ld-2.30.so)
==4565==    by 0x4889876: __run_exit_handlers (in /lib64/libc-2.30.so)
==4565==    by 0x4889A2B: exit (in /lib64/libc-2.30.so)
==4565==    by 0x4871E11: (below main) (in /lib64/libc-2.30.so)
==4565==  Block was alloc'd at
==4565==    at 0x483777F: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==4565==    by 0x10B60E: l_malloc (util.c:62)
==4565==    by 0x10D2AD: l_queue_new (queue.c:63)
==4565==    by 0x10D06C: l_debug_add_section (log.c:376)
==4565==    by 0x10D25E: register_debug_section (log.c:449)
==4565==    by 0x12BF24: __libc_csu_init (elf-init.c:88)
==4565==    by 0x4871D99: (below main) (in /lib64/libc-2.30.so)

Where is a destruction of a global object __run_exit_handlers. So it seems to me that l_queue_destroy
is called twice for a single object.

I would check tests compiled with -O0 if it helps?
Comment 3 Stefan Seyfried 2019-11-05 13:08:24 UTC
-O0 does not help, still segfaults.

valgrind output looks about the same.
Note that I need to disable the test anyway for OBS, because the missing kernel modules in the build vms will make it fail anyway.
Comment 4 Stefan Seyfried 2019-11-05 13:14:47 UTC
note it is easy to reproduce:

tar xf bluez-5.52.tar.xz
cd bluez-5.52
CFLAGS="-flto=auto" ./configure --enable-mesh
make check

...
  CC       ell/unit_test_mesh_crypto-dbus-name-cache.o
  CC       ell/unit_test_mesh_crypto-dbus-filter.o
  CC       ell/unit_test_mesh_crypto-gvariant-util.o
  CC       ell/unit_test_mesh_crypto-siphash.o
  CCLD     unit/test-mesh-crypto
./test-driver: line 107:  8946 Segmentation fault      (core dumped) "$@" > $log_file 2>&1
FAIL: unit/test-mesh-crypto
============================================================================
Testsuite summary for bluez 5.52
============================================================================
# TOTAL: 26
# PASS:  25
# SKIP:  0
# XFAIL: 0
# FAIL:  1
# XPASS: 0
# ERROR: 0
============================================================================
See ./test-suite.log
============================================================================
make[3]: *** [Makefile:10068: test-suite.log] Error 1
make[2]: *** [Makefile:10176: check-TESTS] Error 2
make[1]: *** [Makefile:10570: check-am] Error 2
make: *** [Makefile:10572: check] Error 2


seife@strolchi:~/buildservice/testing/bluez/bluez-5.52> gcc --version
gcc (SUSE Linux) 9.2.1 20190903 [gcc-9-branch revision 275330]
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Comment 5 Martin Liška 2019-11-05 13:36:58 UTC
(In reply to Stefan Seyfried from comment #4)
> note it is easy to reproduce:
> 
> tar xf bluez-5.52.tar.xz
> cd bluez-5.52
> CFLAGS="-flto=auto" ./configure --enable-mesh
> make check
> 
> ...
>   CC       ell/unit_test_mesh_crypto-dbus-name-cache.o
>   CC       ell/unit_test_mesh_crypto-dbus-filter.o
>   CC       ell/unit_test_mesh_crypto-gvariant-util.o
>   CC       ell/unit_test_mesh_crypto-siphash.o
>   CCLD     unit/test-mesh-crypto
> ./test-driver: line 107:  8946 Segmentation fault      (core dumped) "$@" >
> $log_file 2>&1
> FAIL: unit/test-mesh-crypto
> ============================================================================
> Testsuite summary for bluez 5.52
> ============================================================================
> # TOTAL: 26
> # PASS:  25
> # SKIP:  0
> # XFAIL: 0
> # FAIL:  1
> # XPASS: 0
> # ERROR: 0
> ============================================================================
> See ./test-suite.log
> ============================================================================
> make[3]: *** [Makefile:10068: test-suite.log] Error 1
> make[2]: *** [Makefile:10176: check-TESTS] Error 2
> make[1]: *** [Makefile:10570: check-am] Error 2
> make: *** [Makefile:10572: check] Error 2
> 
> 
> seife@strolchi:~/buildservice/testing/bluez/bluez-5.52> gcc --version
> gcc (SUSE Linux) 9.2.1 20190903 [gcc-9-branch revision 275330]
> Copyright (C) 2019 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions.  There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Great, I can confirm that, let me debug that then..
Comment 6 Martin Liška 2019-11-05 13:51:10 UTC
I've got it. Problem is that:
static void free_debug_sections
is called twice. It's very likely caused by fact that the object (ell/log.o) will be linked into the final test twice via multiple *.a files:

libtool: link: gcc -flto=16 -g -o unit/test-mesh-crypto unit/test_mesh_crypto-test-mesh-crypto.o ell/unit_test_mesh_crypto-util.o ell/unit_test_mesh_crypto-log.o ell/unit_test_mesh_crypto-queue.o ell/unit_test_mesh_crypto-hashmap.o ell/unit_test_mesh_crypto-random.o ell/unit_test_mesh_crypto-signal.o ell/unit_test_mesh_crypto-timeout.o ell/unit_test_mesh_crypto-io.o ell/unit_test_mesh_crypto-idle.o ell/unit_test_mesh_crypto-main.o ell/unit_test_mesh_crypto-strv.o ell/unit_test_mesh_crypto-string.o ell/unit_test_mesh_crypto-cipher.o ell/unit_test_mesh_crypto-checksum.o ell/unit_test_mesh_crypto-utf8.o ell/unit_test_mesh_crypto-dbus.o ell/unit_test_mesh_crypto-dbus-message.o ell/unit_test_mesh_crypto-dbus-util.o ell/unit_test_mesh_crypto-dbus-service.o ell/unit_test_mesh_crypto-dbus-client.o ell/unit_test_mesh_crypto-dbus-name-cache.o ell/unit_test_mesh_crypto-dbus-filter.o ell/unit_test_mesh_crypto-gvariant-util.o ell/unit_test_mesh_crypto-siphash.o  src/.libs/libshared-ell.a ell/.libs/libell-internal.a
P

I bet it's in libell-internal.a and one another object. That's why it's called twice.

One can fix it by making it public:

diff --git a/ell/log.c b/ell/log.c
index f3f4b0c..4341e3a 100644
--- a/ell/log.c
+++ b/ell/log.c
@@ -441,7 +441,7 @@ LIB_EXPORT void l_debug_disable(void)
 	debug_pattern = NULL;
 }
 
-__attribute__((constructor)) static void register_debug_section()
+__attribute__((constructor)) void register_debug_section()
 {
 	extern struct l_debug_desc __start___ell_debug[];
 	extern struct l_debug_desc __stop___ell_debug[];
@@ -449,7 +449,7 @@ __attribute__((constructor)) static void register_debug_section()
 	l_debug_add_section(__start___ell_debug, __stop___ell_debug);
 }
 
-__attribute__((destructor(65535))) static void free_debug_sections()
+__attribute__((destructor(65535))) void free_debug_sections()
 {
 	l_queue_destroy(debug_sections, l_free);
 }
Comment 7 Stefan Seyfried 2019-11-06 09:25:48 UTC
A Makefile fix avoiding the double linkage was posted on the devel list, https://marc.info/?l=linux-bluetooth&m=157299472708987&w=2

I guess we can close this bug, as it is not really a LTO issue but an error that was just uncovered by LTO.

Thanks for helping to debug this, much appreciated.
Comment 8 Martin Liška 2019-11-06 09:45:33 UTC
(In reply to Stefan Seyfried from comment #7)
> A Makefile fix avoiding the double linkage was posted on the devel list,
> https://marc.info/?l=linux-bluetooth&m=157299472708987&w=2

Great you fixed that upstream!

> 
> I guess we can close this bug, as it is not really a LTO issue but an error
> that was just uncovered by LTO.
> 
> Thanks for helping to debug this, much appreciated.

Pleasure. You helped me a lot with a reproducer.
Comment 9 Stefan Seyfried 2019-11-06 09:55:38 UTC
(In reply to Martin Liška from comment #8)

> Great you fixed that upstream!

I just was the messenger in both directions ;-)
Comment 10 Martin Liška 2020-12-15 11:11:06 UTC
LTO is used now.