Bug 1106833 - "NETDEV WATCHDOG: enp2s0 (r8169): transmit queue 0 timed out" panic effectively takes network interface down
"NETDEV WATCHDOG: enp2s0 (r8169): transmit queue 0 timed out" panic effective...
Status: RESOLVED UPSTREAM
Classification: openSUSE
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Kernel
Current
Other Other
: P5 - None : Major (vote)
: ---
Assigned To: E-mail List
E-mail List
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2018-09-01 14:56 UTC by Andrei Dziahel
Modified: 2018-10-12 02:18 UTC (History)
2 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
relevant dmesg log (213.91 KB, text/plain)
2018-09-03 23:06 UTC, Andrei Dziahel
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Andrei Dziahel 2018-09-01 14:56:34 UTC
Steps to reproduce:

1. Boot, run a torrent client actively downloading/uploading data

Expected: no panic

Instead: kernel panic eventually happens, then after a while device becomes inaccessible via network.

Relevant snippet from journal:

вер 01 16:00:55 server kernel: ------------[ cut here ]------------
вер 01 16:00:55 server kernel: NETDEV WATCHDOG: enp2s0 (r8169): transmit queue 0 timed out
вер 01 16:00:55 server kernel: WARNING: CPU: 2 PID: 0 at net/sched/sch_generic.c:461 dev_watchdog+0x21a/0x220
вер 01 16:00:55 server kernel: Modules linked in: fuse rfcomm nf_log_ipv6 xt_comment nf_log_ipv4 nf_log_common xt_LOG xt_limit af_packet iscsi_ibft iscsi_boot_sysfs bnep msr ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT nf_reject_ipv4 xt_pkttype xt_tcpudp iptable_filter bpfilter ip6table_mangle nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables xt_conntrack nf_conntrack ip6table_filter ip6_tables x_tables nls_iso8859_1 nls_cp437 vfat fat snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic arc4 intel_rapl x86_pkg_temp_thermal snd_hda_intel intel_powerclamp snd_hda_codec iwldvm snd_hda_core snd_hwdep btusb snd_pcm coretemp mac80211 btrtl btbcm snd_timer gpio_ich btintel snd iwlwifi kvm soundcore irqbypass bluetooth iTCO_wdt r8169 mei_me iTCO_vendor_support
вер 01 16:00:55 server kernel:  i2c_i801 cfg80211 mei lpc_ich pcc_cpufreq crct10dif_pclmul pcspkr crc32_pclmul ecdh_generic mii rfkill ghash_clmulni_intel ie31200_edac thermal fan cryptd nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c crc32c_intel i915 sr_mod cdrom serio_raw ehci_pci ehci_hcd i2c_algo_bit drm_kms_helper xhci_pci xhci_hcd syscopyarea sysfillrect sysimgblt fb_sys_fops video drm button usbcore sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua efivarfs
вер 01 16:00:55 server kernel: CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.18.5-1-default #1 openSUSE Tumbleweed (unreleased)
вер 01 16:00:55 server kernel: Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./Z77N-WIFI, BIOS F3 05/29/2013
вер 01 16:00:55 server kernel: RIP: 0010:dev_watchdog+0x21a/0x220
вер 01 16:00:55 server kernel: Code: 49 63 4c 24 e8 eb 8c 4c 89 ef c6 05 d0 97 c2 00 01 e8 4a fe fc ff 89 d9 4c 89 ee 48 c7 c7 80 10 11 a4 48 89 c2 e8 20 26 98 ff <0f> 0b eb be 66 90 0f 1f 44 00 00 41 57 45 89 cf 41 56 49 89 d6 41 
вер 01 16:00:55 server kernel: RSP: 0018:ffff99375f303e98 EFLAGS: 00010292
вер 01 16:00:55 server kernel: RAX: 000000000000003b RBX: 0000000000000000 RCX: 0000000000000000
вер 01 16:00:55 server kernel: RDX: 0000000000040400 RSI: 00000000000000f6 RDI: 0000000000000300
вер 01 16:00:55 server kernel: RBP: ffff9937534f245c R08: 0000000000000001 R09: 000000000000037b
вер 01 16:00:55 server kernel: R10: 0000000000000004 R11: 0000000000000000 R12: ffff9937534f2478
вер 01 16:00:55 server kernel: R13: ffff9937534f2000 R14: 0000000000000001 R15: ffff993751d48480
вер 01 16:00:55 server kernel: FS:  0000000000000000(0000) GS:ffff99375f300000(0000) knlGS:0000000000000000
вер 01 16:00:55 server kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
вер 01 16:00:55 server kernel: CR2: 00007fd329765000 CR3: 000000008620a006 CR4: 00000000001606e0
вер 01 16:00:55 server kernel: Call Trace:
вер 01 16:00:55 server kernel:  <IRQ>
вер 01 16:00:55 server kernel:  ? pfifo_fast_dequeue+0x160/0x160
вер 01 16:00:55 server kernel:  call_timer_fn+0x2b/0x150
вер 01 16:00:55 server kernel:  ? pfifo_fast_dequeue+0x160/0x160
вер 01 16:00:55 server kernel:  run_timer_softirq+0x3c3/0x3f0
вер 01 16:00:55 server kernel:  ? _raw_spin_lock_irq+0x15/0x40
вер 01 16:00:55 server kernel:  ? __hrtimer_run_queues+0x100/0x2a0
вер 01 16:00:55 server kernel:  ? recalibrate_cpu_khz+0x10/0x10
вер 01 16:00:55 server kernel:  ? ktime_get+0x36/0xa0
вер 01 16:00:55 server kernel:  __do_softirq+0x111/0x370
вер 01 16:00:55 server kernel:  irq_exit+0xca/0xd0
вер 01 16:00:55 server kernel:  smp_apic_timer_interrupt+0x74/0x160
вер 01 16:00:55 server kernel:  apic_timer_interrupt+0xf/0x20
вер 01 16:00:55 server kernel:  </IRQ>
вер 01 16:00:55 server kernel: RIP: 0010:cpuidle_enter_state+0xbc/0x2e0
вер 01 16:00:55 server kernel: Code: 80 7c 24 03 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 00 02 00 00 31 ff e8 30 f3 a8 ff e8 db 10 af ff fb 66 0f 1f 44 00 00 <48> b8 ff ff ff ff f3 01 00 00 4c 29 f3 ba ff ff ff 7f 48 39 c3 7f 
вер 01 16:00:55 server kernel: RSP: 0018:ffffb1b7c0cebeb0 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
вер 01 16:00:55 server kernel: RAX: 0000000000000001 RBX: 0000003ea2d92c8a RCX: 000000000000001f
вер 01 16:00:55 server kernel: RDX: 0000003ea2d92c8a RSI: 0000000000022980 RDI: 0000000000000000
вер 01 16:00:55 server kernel: RBP: 0000000000000004 R08: 000000df6ad921de R09: 0000000000000f48
вер 01 16:00:55 server kernel: R10: 0000000000002042 R11: ffff99375f3220a8 R12: ffffd1b7bfd10a30
вер 01 16:00:55 server kernel: R13: ffffffffa42dae98 R14: 0000003ea1c90aad R15: 0000000000000000
вер 01 16:00:55 server kernel:  do_idle+0x21d/0x270
вер 01 16:00:55 server kernel:  cpu_startup_entry+0x5f/0x70
вер 01 16:00:55 server kernel:  start_secondary+0x1a0/0x1e0
вер 01 16:00:55 server kernel:  secondary_startup_64+0xa5/0xb0
вер 01 16:00:55 server kernel: ---[ end trace 08f8f757effbd008 ]---
вер 01 16:00:55 server kernel: r8169 0000:02:00.0 enp2s0: link up
вер 01 16:01:58 server wickedd-dhcp4[1146]: enp2s0: Committed DHCPv4 lease with address 192.168.1.250 (lease time 600 sec, renew in 300 sec, rebind in 525 sec)
вер 01 16:01:58 server wickedd[1171]: route ipv4 0.0.0.0/0 via 192.168.1.1 dev enp2s0#2 type unicast table main scope universe protocol dhcp covered by a ipv4:dhcp lease
вер 01 16:02:20 server kernel: r8169 0000:02:00.0 enp2s0: link up
вер 01 16:02:39 server kernel: r8169 0000:02:00.0 enp2s0: link up
вер 01 16:05:47 server kernel: r8169 0000:02:00.0 enp2s0: link up
вер 01 16:05:53 server kernel: r8169 0000:02:00.0 enp2s0: link up

Issue is worked around now by booting unaffected kernel v4.17.15.
Comment 1 Takashi Iwai 2018-09-03 14:05:37 UTC
David, another r8169 bug report.
Comment 2 david chang 2018-09-03 15:17:30 UTC
(In reply to Takashi Iwai from comment #1)
> David, another r8169 bug report.

Ok, I'll take a look into it. Thanks! :)
Comment 3 david chang 2018-09-03 15:31:57 UTC
It's similar to bsc#1105573, after upgrade kernel from v4.17 to v4.18 then the netwok interface fail. But I'm not sure what's the version of your realtek chip.
Could you please provide the full 'dmesg' kernel log? Thanks!

If your realtek chip is "RTL8169/RTL8169s/RTL8110s", it's worth to try the kmp:
https://download.opensuse.org/repositories/home:/david_chang:/bsc1105573/openSUSE_Tumbleweed/x86_64/r8169-kmp-12d42c5-kmp-default-2.3_k4.18.5_1-1.1.x86_64.rpm
Comment 4 Andrei Dziahel 2018-09-03 23:06:32 UTC
Created attachment 781763 [details]
relevant dmesg log

@david.chang there
Comment 5 david chang 2018-09-05 09:26:19 UTC
(In reply to Andrei Dziahel from comment #4)
> Created attachment 781763 [details]
> relevant dmesg log
> 
> @david.chang there

Your chip is 'RTL8168evl/8111evl'. So it might be different problem.
I'll look into it later, thanks!
Comment 6 Andrei Dziahel 2018-09-05 09:47:48 UTC
(In reply to david chang from comment #5)
> (In reply to Andrei Dziahel from comment #4)
> > Created attachment 781763 [details]
> > relevant dmesg log
> > 
> > @david.chang there
> 
> Your chip is 'RTL8168evl/8111evl'. So it might be different problem.
> I'll look into it later, thanks!

Thanks mate, looking forward for it! I'm OK to test custom kernel if it comes to it.
Comment 7 david chang 2018-09-06 08:00:25 UTC
Hi,

I got a laptop with the same chip.

[   17.388724] r8169 0000:01:00.0 eth0: RTL8168evl/8111evl at 0x00000000e223b803, 44:1e:a1:79:1a:a5, XID 0c900800 IRQ 39

Linux linux-7gzy 4.17.12-1-default 
Welcome to openSUSE Tumbleweed 20180808 - Kernel \r (\l).

I'll update kernel version to v4.18 later. How long does it take to reproduce this issue happen when transferring data? Is there any easier way to reproduce this issue? Thanks!
Comment 8 Andrei Dziahel 2018-09-06 11:16:00 UTC
> I'll update kernel version to v4.18 later. How long does it take to
> reproduce this issue happen when transferring data? Is there any easier way
> to reproduce this issue? Thanks!

It was matter of few minutes (of downloading a torrent with thousands of active peers) to put link down. Haven't tested it without network activity yet.
Comment 9 david chang 2018-10-08 08:45:23 UTC
Hi,

I still did not encounter this issue, but there is a upstream patch you might be interesting.
https://www.spinics.net/lists/netdev/msg526003.html

I created a kmp with the patch, would you please to give it a try? Thanks!
https://build.opensuse.org/package/binary/download/home:david_chang:bsc1106833/r8169-backport-ad5f97f/openSUSE_Tumbleweed/x86_64/r8169-backport-ad5f97f-kmp-default-2.3_k4.18.9_1-1.1.x86_64.rpm
Comment 10 Andrei Dziahel 2018-10-08 11:27:43 UTC
(In reply to david chang from comment #9)
> Hi,
> 
> I still did not encounter this issue, but there is a upstream patch you
> might be interesting.
> https://www.spinics.net/lists/netdev/msg526003.html
> 
> I created a kmp with the patch, would you please to give it a try? Thanks!
> https://build.opensuse.org/package/binary/download/home:david_chang:
> bsc1106833/r8169-backport-ad5f97f/openSUSE_Tumbleweed/x86_64/r8169-backport-
> ad5f97f-kmp-default-2.3_k4.18.9_1-1.1.x86_64.rpm

OK, kernel 4.18.9 and this KMP is not affected. Let me try removing patched module.
Comment 11 Andrei Dziahel 2018-10-08 11:37:10 UTC
The patched KMP has been removed, and panic popped up almost immediately.
Comment 12 Andrei Dziahel 2018-10-08 18:42:55 UTC
(In reply to david chang from comment #9)
> Hi,
> 
> I still did not encounter this issue, but there is a upstream patch you
> might be interesting.
> https://www.spinics.net/lists/netdev/msg526003.html
> 
> I created a kmp with the patch, would you please to give it a try? Thanks!
> https://build.opensuse.org/package/binary/download/home:david_chang:
> bsc1106833/r8169-backport-ad5f97f/openSUSE_Tumbleweed/x86_64/r8169-backport-
> ad5f97f-kmp-default-2.3_k4.18.9_1-1.1.x86_64.rpm

OK, so the patch seems to fix the issue for me: left a NAS with patch applied for couple of hours and it's doing great! Should I file a request to https://build.opensuse.org/package/show/Kernel:stable/kernel-source?

Thank you David!
Comment 13 david chang 2018-10-11 01:50:19 UTC
(In reply to Andrei Dziahel from comment #12)
> (In reply to david chang from comment #9)
> > Hi,
> > 
> > I still did not encounter this issue, but there is a upstream patch you
> > might be interesting.
> > https://www.spinics.net/lists/netdev/msg526003.html
> > 
> > I created a kmp with the patch, would you please to give it a try? Thanks!
> > https://build.opensuse.org/package/binary/download/home:david_chang:
> > bsc1106833/r8169-backport-ad5f97f/openSUSE_Tumbleweed/x86_64/r8169-backport-
> > ad5f97f-kmp-default-2.3_k4.18.9_1-1.1.x86_64.rpm
> 
> OK, so the patch seems to fix the issue for me: left a NAS with patch
> applied for couple of hours and it's doing great! Should I file a request to
> https://build.opensuse.org/package/show/Kernel:stable/kernel-source?

Thank you for the testing and confirmation.

No. The patch will be merged into -stable tree, so it will be merged into TW kernel tree soon.
FYI. https://www.spinics.net/lists/netdev/msg526492.html
In my opinion, maybe you can use this kmp until TW's kernel got update.
Comment 14 Andrei Dziahel 2018-10-11 09:52:20 UTC
(In reply to david chang from comment #13)
> 
> Thank you for the testing and confirmation.
> 
> No. The patch will be merged into -stable tree, so it will be merged into TW
> kernel tree soon.
> FYI. https://www.spinics.net/lists/netdev/msg526492.html
> In my opinion, maybe you can use this kmp until TW's kernel got update.

Yup, I'm perfectly fine with that. What update would be that, 4.18.13+, or 4.19.x?
Comment 15 david chang 2018-10-12 02:18:57 UTC
(In reply to Andrei Dziahel from comment #14)
> Yup, I'm perfectly fine with that. What update would be that, 4.18.13+, or
> 4.19.x?

It will be merged into v4.18.13+ stable kernel.

Please feel free to reopen this report if needed, Thank you!