Bug 1143192 - open-iscsi-2.0.877-55.1.aarch64: segfault at startup
open-iscsi-2.0.877-55.1.aarch64: segfault at startup
Status: RESOLVED FIXED
Classification: openSUSE
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Basesystem
Current
Other Other
: P5 - None : Normal (vote)
: ---
Assigned To: Martin Liška
E-mail List
:
Depends on:
Blocks: 1133084
  Show dependency treegraph
 
Reported: 2019-07-29 08:09 UTC by Matwey Kornilov
Modified: 2020-12-15 11:12 UTC (History)
4 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---
lduncan: needinfo? (matwey.kornilov)


Attachments
systemd coredump (138.60 KB, application/octet-stream)
2019-07-29 08:09 UTC, Matwey Kornilov
Details
relevant debuginfo package (980.20 KB, application/x-rpm)
2019-07-29 08:10 UTC, Matwey Kornilov
Details
TEST aarch64 open-iscsi Tumbleweek RPMs (500.15 KB, application/x-compressed-tar)
2019-07-31 18:32 UTC, Lee Duncan
Details
V2 of TEST RPM tarball (487.36 KB, application/x-compressed-tar)
2019-07-31 22:29 UTC, Lee Duncan
Details
/sbin/iscsiadm -m node --op show (3.43 KB, text/plain)
2019-08-01 16:53 UTC, Matwey Kornilov
Details
V3 of test aarch64 TEST RPMs for Tumbleweed (487.83 KB, application/x-compressed-tar)
2019-08-01 19:20 UTC, Lee Duncan
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Matwey Kornilov 2019-07-29 08:09:05 UTC
Created attachment 811891 [details]
systemd coredump

Hi,

I am running open-iscsi-2.0.877-55.1.aarch64 at openSUSE Tumbleweed 20190724 and see the following issue:

# systemctl status iscsi
● iscsi.service - Login and scanning of iSCSI devices
   Loaded: loaded (/usr/lib/systemd/system/iscsi.service; enabled; vendor preset: enabled)
   Active: failed (Result: core-dump) since Tue 2019-05-28 12:58:52 MSK; 2 months 1 days ago
     Docs: man:iscsiadm(8)
           man:iscsid(8)
  Process: 1078 ExecStart=/sbin/iscsiadm -m node --loginall=automatic (code=dumped, signal=SEGV)
 Main PID: 1078 (code=dumped, signal=SEGV)

мая 28 12:58:50 localhost systemd[1]: Starting Login and scanning of iSCSI devices...
мая 28 12:58:52 localhost systemd[1]: iscsi.service: Main process exited, code=dumped, status=11/SEGV
мая 28 12:58:52 localhost systemd[1]: iscsi.service: Failed with result 'core-dump'.
мая 28 12:58:52 localhost systemd[1]: Failed to start Login and scanning of iSCSI devices.
Comment 1 Matwey Kornilov 2019-07-29 08:10:11 UTC
Created attachment 811892 [details]
relevant debuginfo package
Comment 2 Lee Duncan 2019-07-30 23:27:02 UTC
Matwey, and chance you can run gdb on this core file and dump out the stack trace? I'm guessing it's something simple, but no way to know without reproducing it or looking at the core.

I have aarch64 systems in my remote lab. I will try to reproduce this there.

What iscsi targets do you have if any?
Comment 3 Lee Duncan 2019-07-31 00:07:12 UTC
Please supply output of "iscsiadm -m node --op show" (as an attachment).

I am running Tumbleweed 20190726 and do not see this on x86_64, so I'm guessing it has something to do with aarch64.
Comment 4 Lee Duncan 2019-07-31 00:07:35 UTC
Setting NEEDSINFO. Please clear it when you've supplied the info.
Comment 5 Matwey Kornilov 2019-07-31 06:56:44 UTC
The backtrace is not quite informative for me:

(gdb) bt
#0  0x0000aaaae59cb04c in memset (__len=<optimized out>, __ch=<optimized out>, __dest=<optimized out>)
    at ../include/list.h:29
#1  main (argc=4, argv=0xffffc96b2e28) at iscsiadm.c:3557

I use iscsi LIO target on x86_64 server running openSUSE Leap 15.1.

Also, open-iscsi-2.0.876-lp150.9.13.2.aarch64 works good at other aarch64 system running Leap 15.0 with the same x86_64 target.

Unfortunately, `iscsiadm -m node --op show' crashed without any output.
Comment 6 Lee Duncan 2019-07-31 15:42:24 UTC
(In reply to Matwey Kornilov from comment #5)
> The backtrace is not quite informative for me:
> 
> (gdb) bt
> #0  0x0000aaaae59cb04c in memset (__len=<optimized out>, __ch=<optimized
> out>, __dest=<optimized out>)
>     at ../include/list.h:29
> #1  main (argc=4, argv=0xffffc96b2e28) at iscsiadm.c:3557
> 
> I use iscsi LIO target on x86_64 server running openSUSE Leap 15.1.
> 
> Also, open-iscsi-2.0.876-lp150.9.13.2.aarch64 works good at other aarch64
> system running Leap 15.0 with the same x86_64 target.
> 
> Unfortunately, `iscsiadm -m node --op show' crashed without any output.

The line that seems to be failing is:

> ...
> 3531         int timeout = ISCSID_REQ_TIMEOUT;
> 3532         struct sigaction sa_old;
> 3533         struct sigaction sa_new;
> ...
> 3552 
> 3553         INIT_LIST_HEAD(&params);
> 3554         INIT_LIST_HEAD(&ifaces);
> 3555         /* do not allow ctrl-c for now... */
> 3556         memset(&sa_old, 0, sizeof(struct sigaction));
> 3557         memset(&sa_new, 0, sizeof(struct sigaction)); <<== FAIL?
> 3558 
> ...

The stack trace says the memset() is failing, which seems impossible, unless it some sort of strange (new?) alignment error. But the stack trace output also mentions list.h:29, which has this code:

> ...
> 18 struct list_head {
> 19         struct list_head *next, *prev;
> 20 };  
> 21     
> 22 #define LIST_HEAD_INIT(name) { &(name), &(name) }
> 23
> ...    
> 27 static inline void INIT_LIST_HEAD(struct list_head *list)
> 28 {   
> 29         list->next = list;                           <<=== FAIL?
> 30         list->prev = list;
> 31 }   

This spot also does not make sense, unless (again) related to some sort of alignment error.

I will build a test set of RPMs for you, with full debugging enabled and perhaps a few debug statements.
Comment 7 Matwey Kornilov 2019-07-31 17:53:37 UTC
Can it be related to LTO enabled by default in Factory?
Comment 8 Lee Duncan 2019-07-31 18:32:22 UTC
Created attachment 812330 [details]
TEST aarch64 open-iscsi Tumbleweek RPMs

Please try these one-off test RPMs. put them in a newly-created subdirectory, cd there, and run "zypper up *.rpm".

The only change from standard build is adding the "-g" flag on compilation, so we can get a better stack trace.

If this works, I will add some debug statements, but I'm at a loss as to why a memcpy of an assignment to something on the stack would cause a core dump.
Comment 9 Lee Duncan 2019-07-31 18:46:03 UTC
(In reply to Matwey Kornilov from comment #7)
> Can it be related to LTO enabled by default in Factory?

Perhaps, especially if it was recently enabled.
Comment 10 Lee Duncan 2019-07-31 18:48:17 UTC
(In reply to Lee Duncan from comment #9)
> (In reply to Matwey Kornilov from comment #7)
> > Can it be related to LTO enabled by default in Factory?
> 
> Perhaps, especially if it was recently enabled.

If you can try to supplied test RPMs, that would be great. You should just be able to run "iscsiadm -m node --op show" and see if it works or not, since that currently fails.

If this still fails I can build a set of RPMs for you with lto disabled.
Comment 11 Matwey Kornilov 2019-07-31 18:59:21 UTC
`iscsiadm -m node --op show' is still failing as previously...
Comment 12 Lee Duncan 2019-07-31 22:29:07 UTC
Created attachment 812370 [details]
V2 of TEST RPM tarball

I believe the last tarball did not add anything, but this one seems to be different.

Please try it, and if it fails please try to get a stack trace with gdb, e.g. "gdb iscsiadm", then "run -m node --op show".
Comment 13 Matwey Kornilov 2019-08-01 16:53:26 UTC
Created attachment 812492 [details]
/sbin/iscsiadm -m node --op show

open-iscsi-2.0.877-240.1.aarch64 works for me. iscsi initiator service is started now.

Attached is the output for /sbin/iscsiadm -m node --op show
Comment 14 Lee Duncan 2019-08-01 18:43:55 UTC
(In reply to Matwey Kornilov from comment #13)
> Created attachment 812492 [details]
> /sbin/iscsiadm -m node --op show
> 
> open-iscsi-2.0.877-240.1.aarch64 works for me. iscsi initiator service is
> started now.
> 
> Attached is the output for /sbin/iscsiadm -m node --op show

Excellent. (he output you attached doesn't really matter, since we have evidently found the problem, which is Link Time Automation on aarch64, but thank you for validating it's running now. And I assume your iscsi service starts up correctly.

I will attach one more version of the RPM. I still need to make a production version of my changes. Your help in validating it will be most helpful, if you can do that.
Comment 15 Matwey Kornilov 2019-08-01 18:47:24 UTC
(In reply to Lee Duncan from comment #14)
> (In reply to Matwey Kornilov from comment #13)
> > Created attachment 812492 [details]
> > /sbin/iscsiadm -m node --op show
> > 
> > open-iscsi-2.0.877-240.1.aarch64 works for me. iscsi initiator service is
> > started now.
> > 
> > Attached is the output for /sbin/iscsiadm -m node --op show
> 
> Excellent. (he output you attached doesn't really matter, since we have
> evidently found the problem, which is Link Time Automation on aarch64, but
> thank you for validating it's running now. And I assume your iscsi service
> starts up correctly.
> 

Yes, block devices appeared too.

> I will attach one more version of the RPM. I still need to make a production
> version of my changes. Your help in validating it will be most helpful, if
> you can do that.

Sure. I can test more versions.
Comment 16 Lee Duncan 2019-08-01 19:20:51 UTC
Created attachment 812503 [details]
V3 of test aarch64 TEST RPMs for Tumbleweed

Please test what I hope is the last set of RPMs. This one has debug turned back off but LTO disabled, this time only for aarch64

Thank you for your help.
Comment 17 Matwey Kornilov 2019-08-01 20:01:04 UTC
(In reply to Lee Duncan from comment #16)
> Created attachment 812503 [details]
> V3 of test aarch64 TEST RPMs for Tumbleweed
> 
> Please test what I hope is the last set of RPMs. This one has debug turned
> back off but LTO disabled, this time only for aarch64
> 
> Thank you for your help.

open-iscsi-2.0.877-241.1.aarch64 also works.
Comment 18 Lee Duncan 2019-08-02 16:25:55 UTC
Submitted to Factory. See req#720704

I will leave this open until that is accepted.

Thank you Matwey for your help in testing!
Comment 19 Lee Duncan 2019-08-05 15:13:42 UTC
I believe this is fixed now.
Comment 20 Martin Liška 2019-08-07 12:12:03 UTC
Upstream bug:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91386
Comment 21 Martin Liška 2019-08-08 08:01:40 UTC
Ok, we've got a patch candidate for GCC. I'll later backport that to our gcc9 package. I'll take this issue.
Comment 22 Lee Duncan 2019-08-08 13:26:11 UTC
(In reply to Martin Liška from comment #21)
> Ok, we've got a patch candidate for GCC. I'll later backport that to our
> gcc9 package. I'll take this issue.

I can remove the "no lto" macro in open-iscsi when you fix gcc.
Comment 23 Martin Liška 2019-08-09 06:47:30 UTC
(In reply to Lee Duncan from comment #22)
> (In reply to Martin Liška from comment #21)
> > Ok, we've got a patch candidate for GCC. I'll later backport that to our
> > gcc9 package. I'll take this issue.
> 
> I can remove the "no lto" macro in open-iscsi when you fix gcc.

Yes, I'll do it when we'll have the fix in our gcc9 package.
Comment 24 Matwey Kornilov 2019-09-08 09:01:09 UTC
Currently, I still see the same behavior with open-iscsi-2.0.877-57.1.aarch64 from Factory.
Comment 25 Lee Duncan 2019-09-08 17:23:54 UTC
(In reply to Matwey Kornilov from comment #24)
> Currently, I still see the same behavior with
> open-iscsi-2.0.877-57.1.aarch64 from Factory.

I found the source RPM for this package, and sure enough LTO has been enabled again.

Can you please re-try the V3 TEST RPM from this bug and re-validate it works, and I'll re-disable LTO.

Evidently Martin re-enabled it too early? @Martin?
Comment 27 Martin Liška 2019-09-09 07:46:18 UTC
Sorry, I haven't tested that properly. Anyway, gcc9 is updated in Factory. Can you please test the rebuild package?
Comment 28 Martin Liška 2020-12-15 11:12:53 UTC
LTO is used now again.