Bug 1092877 - vasprintf(&dest, "%s", str) doesn't always fill whole destination
vasprintf(&dest, "%s", str) doesn't always fill whole destination
Status: RESOLVED DUPLICATE of bug 1094154
Classification: openSUSE
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Basesystem
Current
Other Other
: P5 - None : Major (vote)
: ---
Assigned To: Andreas Schwab
E-mail List
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2018-05-11 07:21 UTC by Marco Varlese
Modified: 2018-12-17 00:05 UTC (History)
7 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
Sample program; bug.cc (1.36 KB, text/plain)
2018-05-15 17:53 UTC, Michael Andres
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Marco Varlese 2018-05-11 07:21:46 UTC
When running "zypper dup" the program core dumps due to a memory allocation failure.

The version installation taken from /etc/os-release below:
NAME="openSUSE Tumbleweed"
# VERSION="20180320"
ID=opensuse
ID_LIKE="suse"
VERSION_ID="20180320"
PRETTY_NAME="openSUSE Tumbleweed"

=====/=====

The full BT below:

Loading repository data...
Reading installed packages...
Computing distribution upgrade...
malloc(): memory corruption

Program received signal SIGABRT, Aborted.
0x00007ffff653d9eb in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff653d9eb in raise () from /lib64/libc.so.6
#1  0x00007ffff653eda1 in abort () from /lib64/libc.so.6
#2  0x00007ffff6580c87 in __libc_message () from /lib64/libc.so.6
#3  0x00007ffff658729a in malloc_printerr () from /lib64/libc.so.6
#4  0x00007ffff658a30c in _int_malloc () from /lib64/libc.so.6
#5  0x00007ffff658badc in malloc () from /lib64/libc.so.6
#6  0x00007ffff65864a7 in __GI__IO_str_overflow () from /lib64/libc.so.6
#7  0x00007ffff6585504 in __GI__IO_default_xsputn () from /lib64/libc.so.6
#8  0x00007ffff6557a65 in vfprintf () from /lib64/libc.so.6
#9  0x00007ffff6611a38 in __vasprintf_chk () from /lib64/libc.so.6
#10 0x00007ffff7a5165d in zypp::str::form[abi:cxx11](char const*, ...) () from /usr/lib64/libzypp.so.1702
#11 0x00007ffff7a61417 in zypp::base::LogControl::LineFormater::format(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, zypp::base::logger::LogLevel, char const*, char const*, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /usr/lib64/libzypp.so.1702
#12 0x00007ffff7a68641 in ?? () from /usr/lib64/libzypp.so.1702
#13 0x00007ffff6bf257a in std::ostream::put(char) () from /usr/lib64/libstdc++.so.6
#14 0x00007ffff6bf275f in std::basic_ostream<char, std::char_traits<char> >& std::endl<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&) () from /usr/lib64/libstdc++.so.6
#15 0x00007ffff78b5d69 in zypp::solver::detail::SATResolver::solverInit(std::__cxx11::list<zypp::PoolItem, std::allocator<zypp::PoolItem> > const&) () from /usr/lib64/libzypp.so.1702
#16 0x00007ffff78b7f26 in zypp::solver::detail::SATResolver::resolvePool(std::unordered_set<zypp::Capability, std::hash<zypp::Capability>, std::equal_to<zypp::Capability>, std::allocator<zypp::Capability> > const&, std::unordered_set<zypp::Capability, std::hash<zypp::Capability>, std::equal_to<zypp::Capability>, std::allocator<zypp::Capability> > const&, std::__cxx11::list<zypp::PoolItem, std::allocator<zypp::PoolItem> > const&, std::set<zypp::Repository, std::less<zypp::Repository>, std::allocator<zypp::Repository> > const&) () from /usr/lib64/libzypp.so.1702
#17 0x0000555555670cf5 in solve_and_commit(Zypper&) ()
#18 0x00005555555ccfae in Zypper::doCommand() ()
#19 0x00005555555dedd1 in Zypper::safeDoCommand() ()
#20 0x000055555559e7a5 in Zypper::main(int, char**) ()
#21 0x000055555559dd89 in main ()
Comment 1 Michael Andres 2018-05-14 12:08:54 UTC
@Marco: We can't reproduce this. Could you please attach the core file?
Comment 2 Marco Varlese 2018-05-15 06:23:04 UTC
(In reply to Michael Andres from comment #1)
> @Marco: We can't reproduce this. Could you please attach the core file?

Info provided as per IRC chat
Comment 3 Michael Andres 2018-05-15 17:49:10 UTC
Thanks Marco, for access to your orthos machine (I don't need it anymore). 

After a few hours of debugging:

1) By now, the issue can not be reproduced on any other machine. The code and the amount of data however is the same on  any  TW system perfoming a 'zypper dup'. I'd expect more report like this, if this is a common issue.

2) The zypp logger is using `vasprintf` (man ASPRINTF(3)) to format it's log limes. The crash happens when writing out a large log line (> 200000 chars). I was finally able to create a small test program not using libzypp at all (will be attached as bug.cc). 

It basically creates std::strings of a given size and compares them with the result of 'vasprintf( "%s" string )'. The expected result are equal strings. On your machine however the strings returned by vasprintf are truncated if the input string exceeds ~170000 chars. Causing various memory allocation issues and crashes.


So this seems to be no libzypp issue, probably not even an issue in glibc/libstdc++ (as the tests behaved well on any TW,Leap,SLE system I checked.
Comment 4 Michael Andres 2018-05-15 17:53:06 UTC
Created attachment 770328 [details]
Sample program; bug.cc

Sample program: g++ -Wall -std=c++11 bug.cc -o bug
Usage: bug [STRINGLENGTH [NUMBER_OF_CHECKS]]
Default: bug 200000 10000

Prints a line whenver the size of the input string and the formated strings differ.
Comment 5 Michael Andres 2018-05-16 13:22:32 UTC
@Michael Matz: 
The `vasprintf` behavior on Marcos machine is really strange. May it be due to some obscure optimization which is not active on the other CPUs I tested? Are you interested in further investigations?
Comment 6 Michael Matz 2018-05-16 13:34:42 UTC
I'd like to investigate this a bit.  On which machine can I see the testcase
breaking?
Comment 8 Michael Matz 2018-05-17 13:47:37 UTC
Thanks for the initial analysis.  I agree that this is a bug in vasprintf(),
and indeed it only seems to happen on some hardware (or needs very specific
environments to occur).  I can reproduce the problem on rancid.arch.suse.de
with this testcase:

------------ snip ------------------
#define _GNU_SOURCE
#include <stdio.h>
#include <stdarg.h>
#include <stdlib.h>
#include <string.h>

#define N 200000
int main()
{
        char *src, *dest;
        //for (int N = 0; N < 300000; N++) {
                src = malloc(N+1);
                memset (src, 'x', N);
                src[N] = 0;
                //dest = form ("%s", src);
                asprintf(&dest, "%s", src);
                if (strlen (dest) != strlen(src))
                        printf("%d: failed\n", N);
                /*else if (!(N & 63))
                        printf("%d: passed\n", N);*/
                free (dest);
                free (src);
        //}
        return 0;
}
---------------- snap ------------------

When the bug occurs (with 200000 chars on that machine always), then
the destination string is only 102300 characters long.  The _allocated_
length of the destination block is 200001, though, as one can see when
tracing the thing in gdb.  It's also the case that the destination buffer
holds more 'x' characters after some intermediate zeros:

(The last malloc done from within IO_str_overflow):
Breakpoint 2, 0x00007ffff7a9da50 in malloc () from /lib64/libc.so.6
(gdb) p $rdi
$19 = 204700

So it's now allocating a buffer of 204700 bytes (before that all mallocs
ultimately were too small to contain the input string).

Then breaking at the final realloc that supposedly sizes the returned buffer
to the exact length necessary, in vasprintf:

Breakpoint 3, 0x00007ffff7a9e1f0 in realloc () from /lib64/libc.so.6
(gdb) p $rdi
$31 = 140737353601040
(gdb) p/x $rdi
$32 = 0x7ffff7f7d010
(gdb) p $rsi
$33 = 200001

So, %rdi is the returned string, it's allocated size will be 200001, now looking
at content:

(gdb) p ((char*)$rdi)[102300]@100
$44 = "x", '\000' <repeats 83 times>, 'x' <repeats 16 times>

So, [102300] still has an 'x', then follows some zeros (ultimately making
the string length less than the allocated size), then more 'x' (in fact
until the end, though I haven't checked if there are more intermediate blocks
of zeros).

So something is going amiss when copying the string buffers I believe.
Comment 9 Andreas Schwab 2018-05-17 14:02:30 UTC
Which memcpy variant does this use?
Comment 10 Michael Matz 2018-05-17 14:10:55 UTC
On this Xeon PHI machine it's using the AVX512 variants, and that might indeed
be a good hint:

#0  0x00007ffff7b79bb0 in __memmove_avx512_no_vzeroupper () from /lib64/libc.so.6
#1  0x00007ffff7a984c6 in __GI__IO_str_overflow () from /lib64/libc.so.6
#2  0x00007ffff7a97504 in __GI__IO_default_xsputn () from /lib64/libc.so.6
#3  0x00007ffff7a69a65 in vfprintf () from /lib64/libc.so.6
#4  0x00007ffff7a91ac9 in vasprintf () from /lib64/libc.so.6
#5  0x00007ffff7a71104 in asprintf () from /lib64/libc.so.6
#6  0x00000000004006cb in main () at bug3.c:16
Comment 11 Michael Matz 2018-05-17 14:48:53 UTC
It's definitely the memcpy/memmove involved there.  From some gdb session:

Breakpoint 6, 0x00007ffff7b79b90 in __mempcpy_avx512_no_vzeroupper () from /lib64/libc.so.6
(gdb) p/x $rdi
$91 = 0x7ffff7f95fad
(gdb) p/x $rsi
$92 = 0x7ffff7fc7fad
(gdb) p $rdx
$93 = 97699
(gdb) p/x $rsi-102301
$94 = 0x7ffff7faf010
(gdb) p/x $rdi-102301
$95 = 0x7ffff7f7d010
(gdb) p ((char*)$94)[0]@200000
$97 = 'x' <repeats 200000 times>
(gdb) p ((char*)$95)[0]@200000
$98 = 'x' <repeats 102301 times>...

So, this is the last block copying involved.  Parts of the final destination
string are already constructed (the first 102300 bytes).  The full source string
(at $rsi-102301) correctly consists of 200000 'x'.  The destination string
already has 102301 'x'.  The copy destination ($rdi) points to dest+102301,
i.e. exactly after all the 'x' that are already there.  This copy operation
hence should copy a further 97699 bytes from source+102301, which are all 'x'.
So, finishing this copy operation, and looking at the destination string again:

(gdb) finish
Run till exit from #0  0x00007ffff7b79b90 in __mempcpy_avx512_no_vzeroupper () from /lib64/libc.so.6
0x00007ffff7a9753c in __GI__IO_default_xsputn () from /lib64/libc.so.6

(gdb) p ((char*)$95)[0]@200000
$99 = 'x' <repeats 102301 times>...
(gdb) p ((char*)$95)[102301]@200000
$100 = '\000' <repeats 83 times>, 'x' <repeats 97744 times>...

So, there are zeros where there should be 'x'.  And what's more, after the 83 zeros (that shouldn't be there), there are 97744 'x' characters.  Which is actually more than the amount that should have been copied (the above 97699
from the memcpy call).  So, not only did it insert zeros, it also wrote
more 'x'.  The number of 'x' is too large by 45, plus the number of zeros (83),
so that's overall 128 bytes too large, which is a nice round power of two, and
the width of two avx512 registers.

Probably, given the above numbers one can construct a memcpy testcase that
breaks similarly.  It probably would require using the same mis-alignment
of the input pointers and same size (or misalignment of that size), to trigger
the same code paths.
Comment 12 Andreas Schwab 2018-05-17 15:29:42 UTC
$ cat mempcpy.c
#define _GNU_SOURCE 1
#include <string.h>
#include <assert.h>

#define N 97699
char a[N];
char b[N+128];

int
main (void)
{
  memset (a, 'x', N);
  char *c = mempcpy (b, a, N);
  assert (*c == 0);
}
$ gcc -g mempcpy.c -o mempcpy -fno-builtin-mempcpy
$ ./mempcpy 
mempcpy: mempcpy.c:14: main: Assertion `*c == 0' failed.
Comment 16 Andreas Schwab 2018-05-22 09:14:57 UTC
Moving to security bug.

*** This bug has been marked as a duplicate of bug 1094154 ***
Comment 17 Swamp Workflow Management 2018-06-05 14:00:42 UTC
This is an autogenerated message for OBS integration:
This bug (1092877) was mentioned in
https://build.opensuse.org/request/show/614277 15.0 / glibc
Comment 18 Swamp Workflow Management 2018-06-09 13:13:47 UTC
openSUSE-SU-2018:1633-1: An update that solves one vulnerability and has one errata is now available.

Category: security (moderate)
Bug References: 1092877,1094154
CVE References: CVE-2018-11237
Sources used:
openSUSE Leap 15.0 (src):    glibc-2.26-lp150.11.3.2, glibc-testsuite-src-2.26-lp150.11.3.2, glibc-utils-src-2.26-lp150.11.3.2
Comment 19 Swamp Workflow Management 2018-07-19 13:12:13 UTC
SUSE-SU-2018:1991-1: An update that solves three vulnerabilities and has two fixes is now available.

Category: security (moderate)
Bug References: 1082318,1092877,1094150,1094154,1094161
CVE References: CVE-2017-18269,CVE-2018-11236,CVE-2018-11237
Sources used:
SUSE Linux Enterprise Module for Development Tools 15 (src):    glibc-2.26-13.3.1, glibc-utils-src-2.26-13.3.1
SUSE Linux Enterprise Module for Basesystem 15 (src):    glibc-2.26-13.3.1
Comment 20 Swamp Workflow Management 2018-08-01 16:08:33 UTC
openSUSE-SU-2018:2159-1: An update that solves three vulnerabilities and has two fixes is now available.

Category: security (moderate)
Bug References: 1082318,1092877,1094150,1094154,1094161
CVE References: CVE-2017-18269,CVE-2018-11236,CVE-2018-11237
Sources used:
openSUSE Leap 15.0 (src):    glibc-2.26-lp150.11.6.120, glibc-testsuite-src-2.26-lp150.11.6.120, glibc-utils-src-2.26-lp150.11.6.120