Bugzilla – Bug 1143192
open-iscsi-2.0.877-55.1.aarch64: segfault at startup
Last modified: 2020-12-15 11:12:53 UTC
Created attachment 811891 [details] systemd coredump Hi, I am running open-iscsi-2.0.877-55.1.aarch64 at openSUSE Tumbleweed 20190724 and see the following issue: # systemctl status iscsi ● iscsi.service - Login and scanning of iSCSI devices Loaded: loaded (/usr/lib/systemd/system/iscsi.service; enabled; vendor preset: enabled) Active: failed (Result: core-dump) since Tue 2019-05-28 12:58:52 MSK; 2 months 1 days ago Docs: man:iscsiadm(8) man:iscsid(8) Process: 1078 ExecStart=/sbin/iscsiadm -m node --loginall=automatic (code=dumped, signal=SEGV) Main PID: 1078 (code=dumped, signal=SEGV) мая 28 12:58:50 localhost systemd[1]: Starting Login and scanning of iSCSI devices... мая 28 12:58:52 localhost systemd[1]: iscsi.service: Main process exited, code=dumped, status=11/SEGV мая 28 12:58:52 localhost systemd[1]: iscsi.service: Failed with result 'core-dump'. мая 28 12:58:52 localhost systemd[1]: Failed to start Login and scanning of iSCSI devices.
Created attachment 811892 [details] relevant debuginfo package
Matwey, and chance you can run gdb on this core file and dump out the stack trace? I'm guessing it's something simple, but no way to know without reproducing it or looking at the core. I have aarch64 systems in my remote lab. I will try to reproduce this there. What iscsi targets do you have if any?
Please supply output of "iscsiadm -m node --op show" (as an attachment). I am running Tumbleweed 20190726 and do not see this on x86_64, so I'm guessing it has something to do with aarch64.
Setting NEEDSINFO. Please clear it when you've supplied the info.
The backtrace is not quite informative for me: (gdb) bt #0 0x0000aaaae59cb04c in memset (__len=<optimized out>, __ch=<optimized out>, __dest=<optimized out>) at ../include/list.h:29 #1 main (argc=4, argv=0xffffc96b2e28) at iscsiadm.c:3557 I use iscsi LIO target on x86_64 server running openSUSE Leap 15.1. Also, open-iscsi-2.0.876-lp150.9.13.2.aarch64 works good at other aarch64 system running Leap 15.0 with the same x86_64 target. Unfortunately, `iscsiadm -m node --op show' crashed without any output.
(In reply to Matwey Kornilov from comment #5) > The backtrace is not quite informative for me: > > (gdb) bt > #0 0x0000aaaae59cb04c in memset (__len=<optimized out>, __ch=<optimized > out>, __dest=<optimized out>) > at ../include/list.h:29 > #1 main (argc=4, argv=0xffffc96b2e28) at iscsiadm.c:3557 > > I use iscsi LIO target on x86_64 server running openSUSE Leap 15.1. > > Also, open-iscsi-2.0.876-lp150.9.13.2.aarch64 works good at other aarch64 > system running Leap 15.0 with the same x86_64 target. > > Unfortunately, `iscsiadm -m node --op show' crashed without any output. The line that seems to be failing is: > ... > 3531 int timeout = ISCSID_REQ_TIMEOUT; > 3532 struct sigaction sa_old; > 3533 struct sigaction sa_new; > ... > 3552 > 3553 INIT_LIST_HEAD(¶ms); > 3554 INIT_LIST_HEAD(&ifaces); > 3555 /* do not allow ctrl-c for now... */ > 3556 memset(&sa_old, 0, sizeof(struct sigaction)); > 3557 memset(&sa_new, 0, sizeof(struct sigaction)); <<== FAIL? > 3558 > ... The stack trace says the memset() is failing, which seems impossible, unless it some sort of strange (new?) alignment error. But the stack trace output also mentions list.h:29, which has this code: > ... > 18 struct list_head { > 19 struct list_head *next, *prev; > 20 }; > 21 > 22 #define LIST_HEAD_INIT(name) { &(name), &(name) } > 23 > ... > 27 static inline void INIT_LIST_HEAD(struct list_head *list) > 28 { > 29 list->next = list; <<=== FAIL? > 30 list->prev = list; > 31 } This spot also does not make sense, unless (again) related to some sort of alignment error. I will build a test set of RPMs for you, with full debugging enabled and perhaps a few debug statements.
Can it be related to LTO enabled by default in Factory?
Created attachment 812330 [details] TEST aarch64 open-iscsi Tumbleweek RPMs Please try these one-off test RPMs. put them in a newly-created subdirectory, cd there, and run "zypper up *.rpm". The only change from standard build is adding the "-g" flag on compilation, so we can get a better stack trace. If this works, I will add some debug statements, but I'm at a loss as to why a memcpy of an assignment to something on the stack would cause a core dump.
(In reply to Matwey Kornilov from comment #7) > Can it be related to LTO enabled by default in Factory? Perhaps, especially if it was recently enabled.
(In reply to Lee Duncan from comment #9) > (In reply to Matwey Kornilov from comment #7) > > Can it be related to LTO enabled by default in Factory? > > Perhaps, especially if it was recently enabled. If you can try to supplied test RPMs, that would be great. You should just be able to run "iscsiadm -m node --op show" and see if it works or not, since that currently fails. If this still fails I can build a set of RPMs for you with lto disabled.
`iscsiadm -m node --op show' is still failing as previously...
Created attachment 812370 [details] V2 of TEST RPM tarball I believe the last tarball did not add anything, but this one seems to be different. Please try it, and if it fails please try to get a stack trace with gdb, e.g. "gdb iscsiadm", then "run -m node --op show".
Created attachment 812492 [details] /sbin/iscsiadm -m node --op show open-iscsi-2.0.877-240.1.aarch64 works for me. iscsi initiator service is started now. Attached is the output for /sbin/iscsiadm -m node --op show
(In reply to Matwey Kornilov from comment #13) > Created attachment 812492 [details] > /sbin/iscsiadm -m node --op show > > open-iscsi-2.0.877-240.1.aarch64 works for me. iscsi initiator service is > started now. > > Attached is the output for /sbin/iscsiadm -m node --op show Excellent. (he output you attached doesn't really matter, since we have evidently found the problem, which is Link Time Automation on aarch64, but thank you for validating it's running now. And I assume your iscsi service starts up correctly. I will attach one more version of the RPM. I still need to make a production version of my changes. Your help in validating it will be most helpful, if you can do that.
(In reply to Lee Duncan from comment #14) > (In reply to Matwey Kornilov from comment #13) > > Created attachment 812492 [details] > > /sbin/iscsiadm -m node --op show > > > > open-iscsi-2.0.877-240.1.aarch64 works for me. iscsi initiator service is > > started now. > > > > Attached is the output for /sbin/iscsiadm -m node --op show > > Excellent. (he output you attached doesn't really matter, since we have > evidently found the problem, which is Link Time Automation on aarch64, but > thank you for validating it's running now. And I assume your iscsi service > starts up correctly. > Yes, block devices appeared too. > I will attach one more version of the RPM. I still need to make a production > version of my changes. Your help in validating it will be most helpful, if > you can do that. Sure. I can test more versions.
Created attachment 812503 [details] V3 of test aarch64 TEST RPMs for Tumbleweed Please test what I hope is the last set of RPMs. This one has debug turned back off but LTO disabled, this time only for aarch64 Thank you for your help.
(In reply to Lee Duncan from comment #16) > Created attachment 812503 [details] > V3 of test aarch64 TEST RPMs for Tumbleweed > > Please test what I hope is the last set of RPMs. This one has debug turned > back off but LTO disabled, this time only for aarch64 > > Thank you for your help. open-iscsi-2.0.877-241.1.aarch64 also works.
Submitted to Factory. See req#720704 I will leave this open until that is accepted. Thank you Matwey for your help in testing!
I believe this is fixed now.
Upstream bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91386
Ok, we've got a patch candidate for GCC. I'll later backport that to our gcc9 package. I'll take this issue.
(In reply to Martin Liška from comment #21) > Ok, we've got a patch candidate for GCC. I'll later backport that to our > gcc9 package. I'll take this issue. I can remove the "no lto" macro in open-iscsi when you fix gcc.
(In reply to Lee Duncan from comment #22) > (In reply to Martin Liška from comment #21) > > Ok, we've got a patch candidate for GCC. I'll later backport that to our > > gcc9 package. I'll take this issue. > > I can remove the "no lto" macro in open-iscsi when you fix gcc. Yes, I'll do it when we'll have the fix in our gcc9 package.
Currently, I still see the same behavior with open-iscsi-2.0.877-57.1.aarch64 from Factory.
(In reply to Matwey Kornilov from comment #24) > Currently, I still see the same behavior with > open-iscsi-2.0.877-57.1.aarch64 from Factory. I found the source RPM for this package, and sure enough LTO has been enabled again. Can you please re-try the V3 TEST RPM from this bug and re-validate it works, and I'll re-disable LTO. Evidently Martin re-enabled it too early? @Martin?
Sorry, I haven't tested that properly. Anyway, gcc9 is updated in Factory. Can you please test the rebuild package?
LTO is used now again.