Bugzilla – Bug 952621
alx ethernet driver crashes with fatal interrupt 0x4019607, resetting
Last modified: 2018-07-03 20:59:15 UTC
the alx ethernet driver (for example the AR8161 chip) sometimes crashes with following entries in dmesg: alx 0000:03:00.0 eth0: fatal interrupt 0x4019607, resetting this happens on my openSUSE Tumbleweed x86_64 (previous also on 13.2) on an Asus G750J notebook. it seems to be the same issue like the following: https://bugzilla.kernel.org/show_bug.cgi?id=70761 https://lists.debian.org/debian-kernel/2015/06/msg00164.html https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1264125 https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1330844 https://bugs.archlinux.org/task/44315 http://www.archivum.info/ubuntu-bugs@lists.ubuntu.com/2015-06/18667/%28Bug-1469628%29-%28NEW%29-alx-module-not-working-correctly.html in https://bugzilla.kernel.org/show_bug.cgi?id=70761 there seems to be a fix found. (there are 3 patches added which should fix this). i tried to set up the build environment to patch, build, install and try the patched alx driver myself, but i failed.
OK, I created a quickfix KMP you can just install to either openSUSE 13.2, Leap or FACTORY. The packages are available at OBS home:tiwai:bnc952621 repo. (The build of Leap is still pending due to the kernel update.) Give it a try.
(In reply to Takashi Iwai from comment #1) > OK, I created a quickfix KMP you can just install to either openSUSE 13.2, > Leap or FACTORY. The packages are available at OBS home:tiwai:bnc952621 > repo. > (The build of Leap is still pending due to the kernel update.) > > Give it a try. great. thanks a lot. as i have updated to latest stable kernel 4.2.5 from http://download.opensuse.org/repositories/Kernel:/stable/standard, will i be able to use this alx-quickfix-kmp-default-1.0_k4.2.3_1-2.1.x86_64.rpm ?
(In reply to Rainer Klier from comment #2) > (In reply to Takashi Iwai from comment #1) > > OK, I created a quickfix KMP you can just install to either openSUSE 13.2, > > Leap or FACTORY. The packages are available at OBS home:tiwai:bnc952621 > > repo. > > (The build of Leap is still pending due to the kernel update.) > > > > Give it a try. > > great. > > thanks a lot. > > as i have updated to latest stable kernel 4.2.5 from > http://download.opensuse.org/repositories/Kernel:/stable/standard, will i be > able to use this alx-quickfix-kmp-default-1.0_k4.2.3_1-2.1.x86_64.rpm ? Maybe not. Try it with the latest FACTORY kernel at first. Meanwhile I added kernel:stable and Kernel:HEAD builds to the project, so KMPs will be available later.
(In reply to Takashi Iwai from comment #3) > (In reply to Rainer Klier from comment #2) > > (In reply to Takashi Iwai from comment #1) > Meanwhile I added kernel:stable and Kernel:HEAD builds to the project, so > KMPs will be available later. great. thanks. can i use the source rpm (alx-quickfix-1.0-2.1.src.rpm) to build the module myself? if yes, HOW? what do i have to do to compile it for my running kernel? i think i have installed most of the things to compile kernel modules, because i always install the nvidia proprietary driver on each kernel update from the driver package from nvidia. so i have installed: gcc-5-1.24.x86_64 kernel-devel-4.2.5-1.1.g27d2719.noarch kernel-default-4.2.5-1.1.g27d2719.x86_64 kernel-firmware-20150925git-35.1.noarch kernel-default-devel-4.2.5-1.1.g27d2719.x86_64 kernel-source-4.2.5-1.1.g27d2719.noarch kernel-macros-4.2.5-1.1.g27d2719.noarch
I don't recommend it if you couldn't manage to build the kernel itself. KMP is a tricky stuff. Just let OBS building the KMP.
(In reply to Takashi Iwai from comment #5) > I don't recommend it if you couldn't manage to build the kernel itself. KMP i didn't try to build the kernel itself lately. in the past i did, but that is years ago... ;-) > is a tricky stuff. Just let OBS building the KMP. ok. will your home:tiwai:bnc952621 repo update the packages when new kernels are out? so that i can install the next stable kernel when available, and then also update alx module from home:tiwai:bnc952621?
Yes.
(In reply to Takashi Iwai from comment #7) > Yes. thank you very much!!! if this works you helped me soo much! this error is so annoying....
pru, the author of the patches from https://bugzilla.kernel.org/show_bug.cgi?id=70761 just commented the following in https://bugzilla.kernel.org/show_bug.cgi?id=70761#c36 Note to the patch 0003 - it might be good to schedule rx refill on a timer instead immediate queue on underflow. I spent only more than one day on this so treat the patches as the proof of concept, even they do the job.
ok, i updated kernel and installed the quickfix KMP for alx. until now, it works everything ok so far. we have to wait and see. because the bug appeared only from time to time, after some usage. i will update this ticket when i can report any news. so far, i can only say thank you very much!
(In reply to Rainer Klier from comment #10) > ok, i updated kernel and installed the quickfix KMP for alx. > > we have to wait and see. > because the bug appeared only from time to time, after some usage. > > i will update this ticket when i can report any news. sadly, i had the crash again today: [ 287.796808] alx 0000:04:00.0 eth0: fatal interrupt 0x4019607, resetting so it seems, that it only takes longer to appear, but the changes do not solve it.
Just to be sure: you're still using the KMP driver, right? Check it via modinfo alx | grep filename If it points to a directory containing "updates" or "weak-updates", it's fine, that's the KMP driver. If not, KMP isn't updated for the running kernel.
(In reply to Takashi Iwai from comment #12) > Just to be sure: you're still using the KMP driver, right? Check it via > modinfo alx | grep filename > > If it points to a directory containing "updates" or "weak-updates", it's > fine, that's the KMP driver. If not, KMP isn't updated for the running > kernel. # modinfo alx | grep filename filename: /lib/modules/4.3.0-1.g7b374a4-default/updates/drivers/net/ethernet/atheros/alx/alx.ko yes, it is the kmp driver. about 2 weeks ago i had my first crash after using the kmp driver. but i wanted to be sure, and therefor i deleted the alx.ko file (and folder) from /lib/modules/4.3.0-1.g7b374a4-default/kernel/drivers/net so that IT MUST use the updated driver from the kmp package.
there is a new patch for this in https://bugzilla.kernel.org/show_bug.cgi?id=70761#c46 Takashi, could you please apply this new patch to the sources and create a new KMP module in you repo, so i can test it? thanks.
(In reply to Rainer Klier from comment #14) > there is a new patch for this in > https://bugzilla.kernel.org/show_bug.cgi?id=70761#c46 > > Takashi, could you please apply this new patch to the sources and create a > new KMP module in you repo, so i can test it? > > thanks. OK, now updated. It's being rebuilt with the new patch. All old patches have been dropped there.
(In reply to Takashi Iwai from comment #15) > OK, now updated. It's being rebuilt with the new patch. All old patches > have been dropped there. thanks. i can see in http://download.opensuse.org/repositories/home:/tiwai:/bnc952621/ that openSUSE_13.2, openSUSE_Factory and openSUSE_Leap_42.1 flavours are made on 2015/12/04 but Kernel_stable is still from 2015/12/02. i think/hope that Kernel_stable will follow.... ;-)
FYI, I merged the fix (improvement) patch from bugzilla.kernel.org you tested via KMP into SUSE kernel trees, as the patch got finally merged to the upstream tree. It'll be included in the next update kernels.
(In reply to Takashi Iwai from comment #17) > FYI, I merged the fix (improvement) patch from bugzilla.kernel.org you > tested via KMP into SUSE kernel trees, as the patch got finally merged to > the upstream tree. It'll be included in the next update kernels. great. thanks. so, when installing next kernel 4.3.4 or 4.4.0 in the future, i don't need the extra kmp module any more until there comes up a new/another patch/fix which i will test via the extra kmp module?
(In reply to Rainer Klier from comment #18) > (In reply to Takashi Iwai from comment #17) > > FYI, I merged the fix (improvement) patch from bugzilla.kernel.org you > > tested via KMP into SUSE kernel trees, as the patch got finally merged to > > the upstream tree. It'll be included in the next update kernels. > > great. > thanks. > > so, when installing next kernel 4.3.4 or 4.4.0 in the future, i don't need > the extra kmp module any more until there comes up a new/another patch/fix > which i will test via the extra kmp module? Right. Although this doesn't seem fixing the issue completely, I'd like to close this bug report for now in order to clean up the bug tracking. For further crashes, please open another bug report to track the rest issue, and mention this bug number there. Thanks.
openSUSE-SU-2016:0280-1: An update that solves 10 vulnerabilities and has 18 fixes is now available. Category: security (important) Bug References: 865096,865259,913996,950178,950998,952621,954324,954532,954647,955422,956708,957152,957988,957990,958439,958463,958504,958510,958886,958951,959190,959399,960021,960710,961263,961509,962075,962597 CVE References: CVE-2015-7550,CVE-2015-8539,CVE-2015-8543,CVE-2015-8550,CVE-2015-8551,CVE-2015-8552,CVE-2015-8569,CVE-2015-8575,CVE-2015-8767,CVE-2016-0728 Sources used: openSUSE Leap 42.1 (src): kernel-debug-4.1.15-8.1, kernel-default-4.1.15-8.1, kernel-docs-4.1.15-8.3, kernel-ec2-4.1.15-8.1, kernel-obs-build-4.1.15-8.2, kernel-obs-qa-4.1.15-8.1, kernel-obs-qa-xen-4.1.15-8.1, kernel-pae-4.1.15-8.1, kernel-pv-4.1.15-8.1, kernel-source-4.1.15-8.1, kernel-syms-4.1.15-8.1, kernel-vanilla-4.1.15-8.1, kernel-xen-4.1.15-8.1
sorry, i have to reopen this. the problem still persists. currently i use kernel 4.6.0 from Kernel:stable_standard repo and i still have the crashes. but there is some news in the appropriate issue on bugzilla.kernel.org: https://bugzilla.kernel.org/show_bug.cgi?id=70761#c66 Feng Tang from intel created a new "new_skb_allocator". maybe this new patch helps. could you please again, create a patched version of alx with this patch, and place it in the same repo we used the last time (http://download.opensuse.org/repositories/home:/tiwai:/bnc952621/)? thanks a lot.
OK, updated the package now, and it's being rebuilt. Give it a try later.
(In reply to Takashi Iwai from comment #22) > OK, updated the package now, and it's being rebuilt. Give it a try later. thank you very much. i installed it, and will report ASAP if i can observe anything. as i recently installed to kernel 4.6.0-1.gd9e67cc-default i didn't update the whole kernel again, to have exactly 4.6.0_1.gaf7ce24 like your module wants. i simply installed your package with rpm -Uhv --force --nodeps --notriggers --noscripts /home/rainer/Downloads/alx-quickfix-kmp-default-1.0_k4.6.0_1.gaf7ce24-4.1.x86_64.rpm and then moved the new alx.ko file to /lib/modules/4.6.0-1.gd9e67cc-default, and, to be sure, i deleted/moved the original alx.ko to some backup folder, and then did a "depmod -a". i hope this is ok. at least i have networking after reboot and dmesg reports: [ 6.137860] alx 0000:04:00.0 eth0: Qualcomm Atheros AR816x/AR817x Ethernet [ac:22:0b:b8:00:59] [ 10.977642] alx 0000:04:00.0 eth0: NIC Up: 1 Gbps Full [ 18.065415] alx 0000:04:00.0 eth0: NIC Up: 1 Gbps Full
hi Takashi, (In reply to Takashi Iwai from comment #22) > OK, updated the package now, and it's being rebuilt. Give it a try later. the kernel.org ticket was updated: https://bugzilla.kernel.org/show_bug.cgi?id=70761#c75 there seems to be a new solution from Feng Tang and he is asking for testing-help. if you could integrate the new patch and build a new module i could test it.
OK, the OBS repo was updated again with the new patch. Give it a try.
(In reply to Takashi Iwai from comment #25) > OK, the OBS repo was updated again with the new patch. Give it a try. great! thanks! i will do this immediately!
hi Takashi, i installed the new module, and there are no problems yet. but as i am using it only a few hours, this does not really tell us much. but i have a question to you: which patches does the current module you made include? only the latest work_around_dma_issue.patch, or also all the previous patches? do the patches mutually exclude each other, or are they intended to be used together, or can they, but don't have to be used together?
Only work_around_dma_issue.patch. The new skb alloc isn't enabled in KMP.
sadly, the crash is still here... :-( i tried to download http://gd.tuwien.ac.at/opsys/linux/opensuse/tumbleweed/iso/openSUSE-Tumbleweed-DVD-x86_64-Snapshot20160530-Media.iso and after about half the download was finished the crash happened: 2016-06-01T12:42:41.588079+02:00 lap051-linux kernel: [14812.714591] alx 0000:04:00.0 eth0: fatal interrupt 0x4019607, resetting 2016-06-01T12:42:41.588093+02:00 lap051-linux kernel: [14812.714609] alx 0000:04:00.0 eth0: fatal interrupt 0x4019607, resetting 2016-06-01T12:42:41.588226+02:00 lap051-linux kernel: [14812.714784] alx 0000:04:00.0 eth0: fatal interrupt 0x4019607, resetting maybe the new_skb_allocator patch which we tried a few days ago works better: https://bugzilla.opensuse.org/show_bug.cgi?id=952621#c21 https://bugzilla.opensuse.org/show_bug.cgi?id=952621#c22 could we again try this patch?
hi Takashi, as this bug still exists and is annoying, i again searched in bugzilla.kernel.org for it and found https://bugzilla.kernel.org/show_bug.cgi?id=102171 and this seems to be exactly the same problem. can you please check this bug and especially read comment 6 and 7 of this bug and check, if you can do anything with this information? maybe you can again create a test kernel module which i can test. thanks in advance.
hi Takashi, can you please look at https://bugzilla.kernel.org/show_bug.cgi?id=102171#c16 would it be possible for you to create a patched 4.9.0 kernel based on kernel from http://download.opensuse.org/repositories/Kernel:/stable/standard with the changes/patches/reverts from https://bugzilla.kernel.org/attachment.cgi?id=248301&action=diff ?
OK, I created now a repo building the patched kernel in OBS home:tiwai:bnc952621-kernel. Give it a try later once when the build finishes.
hi Takashi, (In reply to Takashi Iwai from comment #32) > OK, I created now a repo building the patched kernel in OBS > home:tiwai:bnc952621-kernel. Give it a try later once when the build > finishes. thank you very much! i will try ASAP. i will give feedback ASAP. did you see https://bugzilla.kernel.org/show_bug.cgi?id=102171#c18 ? what does this mean? that the patch dos not work?
hi Takashi, i tried the kernel you made. i thought it works and the problem is solved, but yesterday it happened again. so this is not the fix. :-(
Does this still happen with 4.11?
(In reply to Jiri Slaby from comment #35) > Does this still happen with 4.11? hi, thanks for asking. yes, unfortunately it still does... :-( currently i am using 4.11.7.
This is handled in https://bugzilla.kernel.org/show_bug.cgi?id=102171, apparently.