Bug 1119649 - Realtek r8169 receive performance regression in kernel 4.19
Realtek r8169 receive performance regression in kernel 4.19
Status: RESOLVED FIXED
Classification: openSUSE
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Kernel
Current
x86-64 openSUSE Factory
: P3 - Medium : Normal (vote)
: ---
Assigned To: david chang
E-mail List
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2018-12-16 10:13 UTC by Martti Laaksonen
Modified: 2019-05-06 03:23 UTC (History)
4 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
2 GB file transfer test over NFS with kernel 4.18 (5.19 KB, text/plain)
2018-12-16 10:13 UTC, Martti Laaksonen
Details
2 GB file transfer test over NFS with kernel 4.19 (4.94 KB, text/plain)
2018-12-16 10:15 UTC, Martti Laaksonen
Details
repeat 2 GB file transfer test with 4.19 kernel with NM connection auto-negotiation enabled (5.26 KB, text/plain)
2018-12-19 17:36 UTC, Martti Laaksonen
Details
Test results with r8169-098b01a driver version (8.44 KB, text/plain)
2019-01-04 15:47 UTC, Martti Laaksonen
Details
Test results with r8169-9675931 driver version (8.49 KB, text/plain)
2019-01-04 15:50 UTC, Martti Laaksonen
Details
Rerun file transfer test with kernel 4.18.15 (9.76 KB, text/plain)
2019-01-09 18:55 UTC, Martti Laaksonen
Details
Rerun file transfer test with kernel 4.19.11 (9.79 KB, text/plain)
2019-01-09 18:56 UTC, Martti Laaksonen
Details
Rerun file transfer test with kernel 4.19.11 + r8169-098b01a (10.70 KB, text/plain)
2019-01-09 18:58 UTC, Martti Laaksonen
Details
Rerun file transfer test with kernel 4.19.11 + r8169-9675931 (10.75 KB, text/plain)
2019-01-09 19:00 UTC, Martti Laaksonen
Details
File transfer test with kernel 4.19.11 + r8169-a2965f1 (10.75 KB, text/plain)
2019-01-09 19:04 UTC, Martti Laaksonen
Details
File transfer test with kernel 4.19.11 + r8169-6fcf9b1 (10.75 KB, text/plain)
2019-01-09 19:05 UTC, Martti Laaksonen
Details
File transfer test with kernel 4.19.11 + r8169-a2965f1 (new build) (10.85 KB, text/plain)
2019-01-10 18:59 UTC, Martti Laaksonen
Details
File transfer test with kernel 4.19.11 + r8169-6fcf9b1 (new build) (10.71 KB, text/plain)
2019-01-10 19:00 UTC, Martti Laaksonen
Details
File transfer test with kernel 4.19.11 + proprietary r8168 (9.88 KB, text/plain)
2019-01-10 19:01 UTC, Martti Laaksonen
Details
File transfer test with Manjaro 18.0, kernel 4.19.13, r8169 (10.59 KB, text/plain)
2019-01-10 19:02 UTC, Martti Laaksonen
Details
File transfer test with Manjaro 18.0, kernel 4.19.13, r8168 (8.20 KB, text/plain)
2019-01-10 19:03 UTC, Martti Laaksonen
Details
File transfer test with kernel 4.18.15 (11.68 KB, text/plain)
2019-01-31 18:04 UTC, Martti Laaksonen
Details
File transfer test with kernel 4.18.15, pcie_aspm=off (11.87 KB, text/plain)
2019-01-31 18:06 UTC, Martti Laaksonen
Details
File transfer test with kernel 4.18.15, pcie_aspm.policy=performance (12.15 KB, text/plain)
2019-01-31 18:09 UTC, Martti Laaksonen
Details
File transfer test with kernel 4.19.11, pcie_aspm=off (11.90 KB, text/plain)
2019-01-31 18:10 UTC, Martti Laaksonen
Details
File transfer test with kernel 4.19.11, pcie_aspm.policy=performance (11.99 KB, text/plain)
2019-01-31 18:11 UTC, Martti Laaksonen
Details
iperf test with kernel 4.18.15 (16.98 KB, text/plain)
2019-02-02 07:11 UTC, Martti Laaksonen
Details
iperf test with kernel 4.19.11 (16.73 KB, text/plain)
2019-02-02 07:12 UTC, Martti Laaksonen
Details
File transfer test with kernel 4.20.6 + r8169-test-kmp (12.61 KB, text/plain)
2019-02-12 17:54 UTC, Martti Laaksonen
Details
iperf test with kernel 4.20.6 + r8169-test-kmp (22.32 KB, text/plain)
2019-02-12 17:55 UTC, Martti Laaksonen
Details
File transfer test with kernel 4.20.6 using stock r8169 (11.69 KB, text/plain)
2019-02-13 17:13 UTC, Martti Laaksonen
Details
iperf test with kernel 4.20.6 using stock r8169 (21.40 KB, text/plain)
2019-02-13 17:14 UTC, Martti Laaksonen
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Martti Laaksonen 2018-12-16 10:13:02 UTC
It looks like there's some sort of performance regression wrt. Realtek r8169 driver in kernel 4.19.

When receiving files over NFS an a 1 Gb ethernet network I am seeing receive speeds in the 10-12 MB/s range (100 Mbps), whereas transmit speeds are somewhere around 80-90 MB/s (700 Mbps). Ethtool device statistics also shows quite a lot of rx_missed errors.

With kernel 4.18.15 both receive and transmit speeds are more on par of what I would expect, i.e. receive speed is around 90-96 MB/s (760 Mbps) and transmit speed again somewhere around 80-90 MB/s (700 Mbps) on average.

I've included file transfer test results along with ethtool output with both kernel versions. I hope they're at least somewhat useful.
Comment 1 Martti Laaksonen 2018-12-16 10:13:57 UTC
Created attachment 792809 [details]
2 GB file transfer test over NFS with kernel 4.18
Comment 2 Martti Laaksonen 2018-12-16 10:15:58 UTC
Created attachment 792810 [details]
2 GB file transfer test over NFS with kernel 4.19
Comment 3 Takashi Iwai 2018-12-18 12:47:31 UTC
There have been a fair amount of changes wrt r8169 driver between 4.18 and 4.19.

David, any clue?
Comment 4 david chang 2018-12-19 07:30:31 UTC
(In reply to Takashi Iwai from comment #3)
> There have been a fair amount of changes wrt r8169 driver between 4.18 and
> 4.19.
> 
> David, any clue?

I will look into it, thanks Takashi!
Comment 5 david chang 2018-12-19 09:03:47 UTC
I quick compare these results and found some interesting differences.

-kernel version: 4.18.15-1-default
+kernel version: 4.19.7-1-default

-       Supported ports: [ TP MII ]
+       Supported ports: [ TP AUI BNC MII FIBRE ]
Supported ports type has been changed.

-       Advertised auto-negotiation: Yes
+       Advertised auto-negotiation: No
Did you disable auto-negotiation? If not, what happen if turned it on?

-     tx_errors: 0
+     tx_errors: 11
-     rx_missed: 0
+     rx_missed: 63536
There is a slightly increase on tx_errors but a dramatically increase on rx_missed. This is strange.

Could you provide the output of following command? (For identifying the chip type of r8169.)
$ sudo journalctl -k|grep r8169

Thanks!
Comment 6 Martti Laaksonen 2018-12-19 17:35:07 UTC
(In reply to david chang from comment #5)
> I quick compare these results and found some interesting differences.
> 
> -kernel version: 4.18.15-1-default
> +kernel version: 4.19.7-1-default
> 
> -       Supported ports: [ TP MII ]
> +       Supported ports: [ TP AUI BNC MII FIBRE ]
> Supported ports type has been changed.
> 
> -       Advertised auto-negotiation: Yes
> +       Advertised auto-negotiation: No
> Did you disable auto-negotiation? If not, what happen if turned it on?

No, didn't touch network device configuration, I let NetworkManager handle it. But since you mentioned it, I checked the network connection configuration and auto-negotiation was disabled in there. Not sure why it would then show auto-negotiation as enabled with 4.18 kernel, though. Basially the only difference between system configurations was (or should have been) the kernel version, nothing else was changed.

I enabled auto-negotiation in NetworkManager connection configuration and ran the same test set again, but I did not observe any noticeable improvement in receive speed. I will attach those results too.

> -     tx_errors: 0
> +     tx_errors: 11
> -     rx_missed: 0
> +     rx_missed: 63536
> There is a slightly increase on tx_errors but a dramatically increase on
> rx_missed. This is strange.

From what I have observed, those tx_errors seem to come quite early, perhaps when the link brought up or right after.

> Could you provide the output of following command? (For identifying the chip
> type of r8169.)
> $ sudo journalctl -k|grep r8169

Dec 19 18:41:40 darkangel kernel: libphy: r8169: probed
Dec 19 18:41:40 darkangel kernel: r8169 0000:01:00.0 eth0: RTL8168h/8111h, ec:8e:b5:5a:2c:f5, XID 54100880, IRQ 128
Dec 19 18:41:40 darkangel kernel: r8169 0000:01:00.0 eth0: jumbo features [frames: 9200 bytes, tx checksumming: ko]
Dec 19 18:41:40 darkangel kernel: r8169 0000:01:00.0 enp1s0: renamed from eth0
Dec 19 18:41:44 darkangel kernel: Generic PHY r8169-100:00: attached PHY driver [Generic PHY] (mii_bus:phy_addr=r8169-100:00, irq=IGNORE)
Dec 19 18:41:44 darkangel kernel: r8169 0000:01:00.0 enp1s0: Link is Down
Dec 19 18:41:47 darkangel.localnet kernel: r8169 0000:01:00.0 enp1s0: Link is Up - 1Gbps/Full - flow control off
Dec 19 18:41:47 darkangel.localnet kernel: r8169 0000:01:00.0 enp1s0: Link is Down
Dec 19 18:41:50 darkangel.localnet kernel: r8169 0000:01:00.0 enp1s0: Link is Up - 1Gbps/Full - flow control off
Dec 19 18:46:38 darkangel.localnet kernel: r8169 0000:01:00.0 enp1s0: Link is Down
Dec 19 18:46:42 darkangel.localnet kernel: r8169 0000:01:00.0 enp1s0: Link is Up - 1Gbps/Full - flow control off
Dec 19 18:47:21 darkangel.localnet kernel: r8169 0000:01:00.0 enp1s0: Link is Down
Dec 19 18:47:38 darkangel.localnet kernel: r8169 0000:01:00.0 enp1s0: Link is Up - 1Gbps/Full - flow control off
Dec 19 18:47:38 darkangel.localnet kernel: r8169 0000:01:00.0 enp1s0: Link is Down
Dec 19 18:47:41 darkangel.localnet kernel: r8169 0000:01:00.0 enp1s0: Link is Up - 1Gbps/Full - flow control off
Dec 19 18:48:55 darkangel.localnet kernel: r8169 0000:01:00.0 enp1s0: Link is Down
Dec 19 18:49:14 darkangel.localnet kernel: r8169 0000:01:00.0 enp1s0: Link is Down
Dec 19 18:49:17 darkangel.localnet kernel: r8169 0000:01:00.0 enp1s0: Link is Up - 1Gbps/Half - flow control off
Dec 19 18:49:17 darkangel.localnet kernel: r8169 0000:01:00.0 enp1s0: Link is Up - 1Gbps/Full - flow control off


Here's the kernel command line, in case that has any significance:
BOOT_IMAGE=/boot/vmlinuz-4.19.7-1-default root=UUID=d92fac4d-cc81-4af7-a706-e7dade059609 resume=/dev/sda3 splash=silent quiet showopts pci=noaer

That pci=noaer is for the Wi-Fi interface, otherwise kernel logs are flooded with AER messages.
Comment 7 Martti Laaksonen 2018-12-19 17:36:42 UTC
Created attachment 793059 [details]
repeat 2 GB file transfer test with 4.19 kernel with NM connection auto-negotiation enabled
Comment 8 david chang 2018-12-26 09:26:20 UTC
(In reply to Martti Laaksonen from comment #6)
> Dec 19 18:41:40 darkangel kernel: libphy: r8169: probed
> Dec 19 18:41:40 darkangel kernel: r8169 0000:01:00.0 eth0: RTL8168h/8111h,
> ec:8e:b5:5a:2c:f5, XID 54100880, IRQ 128

Thank you for the information.

I only found a patch may affect your netcard.
f7ffa9ae2bb9 r8169: remove mii_if_info member from struct rtl8169_private (v4.19-rc1)
However the changes of the patch looks reasonable.

Could you also provide the output of following command with v4.18 kernel? 
$ sudo journalctl -k|grep r8169

Thank you!
Comment 9 Martti Laaksonen 2018-12-26 18:20:49 UTC
(In reply to david chang from comment #8)
> (In reply to Martti Laaksonen from comment #6)
> > Dec 19 18:41:40 darkangel kernel: libphy: r8169: probed
> > Dec 19 18:41:40 darkangel kernel: r8169 0000:01:00.0 eth0: RTL8168h/8111h,
> > ec:8e:b5:5a:2c:f5, XID 54100880, IRQ 128
> 
> Thank you for the information.
> 
> I only found a patch may affect your netcard.
> f7ffa9ae2bb9 r8169: remove mii_if_info member from struct rtl8169_private
> (v4.19-rc1)
> However the changes of the patch looks reasonable.
> 
> Could you also provide the output of following command with v4.18 kernel? 
> $ sudo journalctl -k|grep r8169

Here you go:
Dec 26 20:12:35 darkangel kernel: r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
Dec 26 20:12:35 darkangel kernel: r8169 0000:01:00.0: can't disable ASPM; OS doesn't have ASPM control
Dec 26 20:12:35 darkangel kernel: r8169 0000:01:00.0 eth0: RTL8168h/8111h, ec:8e:b5:5a:2c:f5, XID 54100880, IRQ 128
Dec 26 20:12:35 darkangel kernel: r8169 0000:01:00.0 eth0: jumbo features [frames: 9200 bytes, tx checksumming: ko]
Dec 26 20:12:36 darkangel kernel: r8169 0000:01:00.0 enp1s0: renamed from eth0
Dec 26 20:12:38 darkangel kernel: r8169 0000:01:00.0 enp1s0: link down
Dec 26 20:12:38 darkangel kernel: r8169 0000:01:00.0 enp1s0: link down
Dec 26 20:12:42 darkangel.localnet kernel: r8169 0000:01:00.0 enp1s0: link up
Dec 26 20:12:42 darkangel.localnet kernel: r8169 0000:01:00.0 enp1s0: link down
Dec 26 20:12:45 darkangel.localnet kernel: r8169 0000:01:00.0 enp1s0: link up
Comment 10 david chang 2019-01-04 09:29:48 UTC
Hi,

At beginning, I though it might be an issue for all r8169 chip version.
But I can not reproduce it locally with RTL8168g/8111g chip on v4.19.11.(Because I did not have v4.19.7 kernel) And I've checked there is no update for r8169 between v4.19.7..v4.19.11.

Because I have no clue yet why there is the performance decrease with v4.19 kernel, I created the kernel module packages for testing. It would be appreciate if you could give it a try.

The first is:
https://build.opensuse.org/package/binary/download/home:david_chang:bsc1119649/r8169-098b01a/openSUSE_Tumbleweed/x86_64/r8169-098b01a-kmp-default-2.3_k4.19.11_1-2.1.x86_64.rpm
After this commit(098b01a), there is a change for interrupt handler. (05bbe5584ff9 r8169: simplify interrupt handler)

The second is:
https://build.opensuse.org/package/binary/download/home:david_chang:bsc1119649/r8169-9675931/openSUSE_Tumbleweed/x86_64/r8169-9675931-kmp-default-2.3_k4.19.11_1-1.1.x86_64.rpm
After this upstream commit(9675931), there is a change about napi handler which is related to rx buffer.(6b839b6cf9ea r8169: fix NAPI handling under high load)

Thank you!
Comment 11 Martti Laaksonen 2019-01-04 15:41:34 UTC
Hi,

(In reply to david chang from comment #10)
> Because I have no clue yet why there is the performance decrease with v4.19
> kernel, I created the kernel module packages for testing. It would be
> appreciate if you could give it a try.
> 
> The first is:
> https://build.opensuse.org/package/binary/download/home:david_chang:
> bsc1119649/r8169-098b01a/openSUSE_Tumbleweed/x86_64/r8169-098b01a-kmp-
> default-2.3_k4.19.11_1-2.1.x86_64.rpm
> After this commit(098b01a), there is a change for interrupt handler.
> (05bbe5584ff9 r8169: simplify interrupt handler)
> 
> The second is:
> https://build.opensuse.org/package/binary/download/home:david_chang:
> bsc1119649/r8169-9675931/openSUSE_Tumbleweed/x86_64/r8169-9675931-kmp-
> default-2.3_k4.19.11_1-1.1.x86_64.rpm
> After this upstream commit(9675931), there is a change about napi handler
> which is related to rx buffer.(6b839b6cf9ea r8169: fix NAPI handling under
> high load)

I tested both driver versions you provided but unfortunately I did not see any observable improvement in receive speed and ethtool still reported rx_missed errors.

I'll attach the test results from both driver versions.

Here's a general description how I tested the drivers:
1. install driver package - rpm -ihv <driver rpm>
2. disconnect ethernet connection through NetworkManager
3. unload current driver module - modprobe -r r8169
4. load new driver - modprobe r8169
5. connect ethernet connection through NetworkManager
6. execute test script
7. disconnect ethernet connection through NetworkManager
8. unload driver module - modprobe -r r8169
9. uninstall driver package - rpm -e <driver rpm>
Comment 12 Martti Laaksonen 2019-01-04 15:47:57 UTC
Created attachment 793675 [details]
Test results with r8169-098b01a driver version

Test results with r8169-098b01a driver version.

dmesg output:
Jan 04 16:23:45 darkangel kernel: libphy: r8169: probed
Jan 04 16:23:45 darkangel kernel: r8169 0000:01:00.0 eth0: RTL8168h/8111h, ec:8e:b5:5a:2c:f5, XID 54100880, IRQ 128
Jan 04 16:23:45 darkangel kernel: r8169 0000:01:00.0 eth0: jumbo features [frames: 9200 bytes, tx checksumming: ko]
Jan 04 16:23:45 darkangel kernel: r8169 0000:01:00.0 enp1s0: renamed from eth0
Jan 04 16:23:48 darkangel kernel: Generic PHY r8169-100:00: attached PHY driver [Generic PHY] (mii_bus:phy_addr=r8169-100:00, irq=IGNORE)
Jan 04 16:23:48 darkangel kernel: r8169 0000:01:00.0 enp1s0: Link is Down
Jan 04 16:23:51 darkangel.localnet kernel: r8169 0000:01:00.0 enp1s0: Link is Up - 1Gbps/Full - flow control off
Jan 04 16:23:51 darkangel.localnet kernel: r8169 0000:01:00.0 enp1s0: Link is Up - 1Gbps/Full - flow control off
Jan 04 16:42:00 darkangel.localnet kernel: r8169: loading out-of-tree module taints kernel.
Jan 04 16:42:00 darkangel.localnet kernel: r8169: module verification failed: signature and/or required key missing - tainting kernel
Jan 04 16:42:00 darkangel.localnet kernel: libphy: r8169: probed
Jan 04 16:42:00 darkangel.localnet kernel: r8169 0000:01:00.0 eth0: RTL8168h/8111h, ec:8e:b5:5a:2c:f5, XID 54100880, IRQ 128
Jan 04 16:42:00 darkangel.localnet kernel: r8169 0000:01:00.0 eth0: jumbo features [frames: 9200 bytes, tx checksumming: ko]
Jan 04 16:42:00 darkangel.localnet kernel: r8169 0000:01:00.0 enp1s0: renamed from eth0
Jan 04 16:42:00 darkangel.localnet kernel: Generic PHY r8169-100:00: attached PHY driver [Generic PHY] (mii_bus:phy_addr=r8169-100:00, irq=IGNORE)
Jan 04 16:42:03 darkangel.localnet kernel: r8169 0000:01:00.0 enp1s0: Link is Up - 1Gbps/Full - flow control off
Jan 04 16:45:15 darkangel.localnet kernel: r8169 0000:01:00.0 enp1s0: Link is Up - 1Gbps/Full - flow control off
Comment 13 Martti Laaksonen 2019-01-04 15:50:13 UTC
Created attachment 793676 [details]
Test results with r8169-9675931 driver version

dmesg output:

Jan 04 17:03:16 darkangel.localnet kernel: libphy: r8169: probed
Jan 04 17:03:16 darkangel.localnet kernel: r8169 0000:01:00.0 eth0: RTL8168h/8111h, ec:8e:b5:5a:2c:f5, XID 54100800, IRQ 128
Jan 04 17:03:16 darkangel.localnet kernel: r8169 0000:01:00.0 eth0: jumbo features [frames: 9200 bytes, tx checksumming: ko]
Jan 04 17:03:16 darkangel.localnet kernel: r8169 0000:01:00.0 enp1s0: renamed from eth0
Jan 04 17:03:16 darkangel.localnet kernel: Generic PHY r8169-100:00: attached PHY driver [Generic PHY] (mii_bus:phy_addr=r8169-100:00, irq=IGNORE)
Jan 04 17:03:20 darkangel.localnet kernel: r8169 0000:01:00.0 enp1s0: Link is Up - 1Gbps/Full - flow control off
Jan 04 17:05:14 darkangel.localnet kernel: r8169 0000:01:00.0 enp1s0: Link is Up - 1Gbps/Full - flow control off
Comment 14 david chang 2019-01-09 08:43:31 UTC
Hi,

Thank you for your testing!

There are two more kernel module packages for testing.
One is before the commit (f7ffa9ae2bb9 r8169: remove mii_if_info member from struct rtl8169_private (v4.19-rc1)) which I mentioned in comment#8.
https://build.opensuse.org/package/binary/download/home:david_chang:bsc1119649/r8169-a2965f1/openSUSE_Tumbleweed/x86_64/r8169-a2965f1-kmp-default-2.3_k4.19.11_1-2.1.x86_64.rpm

The other one is before the phylib support for r8169.
https://build.opensuse.org/package/binary/download/home:david_chang:bsc1119649/r8169-6fcf9b1/openSUSE_Tumbleweed/x86_64/r8169-6fcf9b1-kmp-default-2.3_k4.19.11_1-1.1.x86_64.rpm

Thanks!
Comment 15 Martti Laaksonen 2019-01-09 18:52:16 UTC
(In reply to david chang from comment #14)
> There are two more kernel module packages for testing.
> One is before the commit (f7ffa9ae2bb9 r8169: remove mii_if_info member from
> struct rtl8169_private (v4.19-rc1)) which I mentioned in comment#8.
> https://build.opensuse.org/package/binary/download/home:david_chang:
> bsc1119649/r8169-a2965f1/openSUSE_Tumbleweed/x86_64/r8169-a2965f1-kmp-
> default-2.3_k4.19.11_1-2.1.x86_64.rpm
> 
> The other one is before the phylib support for r8169.
> https://build.opensuse.org/package/binary/download/home:david_chang:
> bsc1119649/r8169-6fcf9b1/openSUSE_Tumbleweed/x86_64/r8169-6fcf9b1-kmp-
> default-2.3_k4.19.11_1-1.1.x86_64.rpm

I tested both of these new driver versions but neither of them showed any improvement in receive speed and there were still a lot of rx_missed errors.

By the way, was that latter driver package built correctly? From your description I got the impression that it should not use phylib, but I noticed that loading the driver still pulls in libphy as a dependency.

This time I executed the test by first installing the driver package and then power cycling the machine before running my test script. And since I have modified the script to collect some extra information, I reran the test also with the previous driver packages you asked to try out as well as with the default driver that comes with kernel 4.19.11 and 4.18.15. Test results attached.
Comment 16 Martti Laaksonen 2019-01-09 18:55:13 UTC
Created attachment 794009 [details]
Rerun file transfer test with kernel 4.18.15
Comment 17 Martti Laaksonen 2019-01-09 18:56:43 UTC
Created attachment 794010 [details]
Rerun file transfer test with kernel 4.19.11
Comment 18 Martti Laaksonen 2019-01-09 18:58:40 UTC
Created attachment 794011 [details]
Rerun file transfer test with kernel 4.19.11 + r8169-098b01a
Comment 19 Martti Laaksonen 2019-01-09 19:00:07 UTC
Created attachment 794012 [details]
Rerun file transfer test with kernel 4.19.11 + r8169-9675931
Comment 20 Martti Laaksonen 2019-01-09 19:04:32 UTC
Created attachment 794013 [details]
File transfer test with kernel 4.19.11 + r8169-a2965f1
Comment 21 Martti Laaksonen 2019-01-09 19:05:58 UTC
Created attachment 794014 [details]
File transfer test with kernel 4.19.11 + r8169-6fcf9b1
Comment 22 david chang 2019-01-10 07:11:32 UTC
(In reply to Martti Laaksonen from comment #15)
> By the way, was that latter driver package built correctly? From your
> description I got the impression that it should not use phylib, but I
> noticed that loading the driver still pulls in libphy as a dependency.

Oops! I copied the wrong source yesterday, sorry about that!
I just created new packages with commit a2965f1 and 6fcf9b1 respectively.

* Undo the relevant change of RTL8168h/8111h. (before commit f7ffa9ae2bb9 r8169: remove mii_if_info member from struct rtl8169_private (v4.19-rc1))
https://download.opensuse.org/repositories/home:/david_chang:/bsc1119649/openSUSE_Tumbleweed/x86_64/r8169-a2965f1-kmp-default-2.3_k4.19.11_1-3.1.x86_64.rpm

* Before phylib support. (before commit f1e911d5d0df r8169: add basic phylib support (v4.19-rc1))
https://download.opensuse.org/repositories/home:/david_chang:/bsc1119649/openSUSE_Tumbleweed/x86_64/r8169-6fcf9b1-kmp-default-2.3_k4.19.11_1-2.1.x86_64.rpm


> This time I executed the test by first installing the driver package and
> then power cycling the machine before running my test script. And since I
> have modified the script to collect some extra information, I reran the test
> also with the previous driver packages you asked to try out as well as with
> the default driver that comes with kernel 4.19.11 and 4.18.15. Test results
> attached.

Really appreciate your help! Let me check these data first. Thank you!
Comment 23 Martti Laaksonen 2019-01-10 18:57:40 UTC
(In reply to david chang from comment #22)
> * Undo the relevant change of RTL8168h/8111h. (before commit f7ffa9ae2bb9
> r8169: remove mii_if_info member from struct rtl8169_private (v4.19-rc1))
> https://download.opensuse.org/repositories/home:/david_chang:/bsc1119649/
> openSUSE_Tumbleweed/x86_64/r8169-a2965f1-kmp-default-2.3_k4.19.11_1-3.1.
> x86_64.rpm
> 
> * Before phylib support. (before commit f1e911d5d0df r8169: add basic phylib
> support (v4.19-rc1))
> https://download.opensuse.org/repositories/home:/david_chang:/bsc1119649/
> openSUSE_Tumbleweed/x86_64/r8169-6fcf9b1-kmp-default-2.3_k4.19.11_1-2.1.
> x86_64.rpm

I tested both of these driver versions but the receive speeds still hang in the 12-13 MiB/s range with occasional peaks to 40-50 MiB/s. No change in rx_missed errors either. I was almost certain that that libphy was the cause of the bad receive speeds, but apparently that is not the case.

I also ran the test using the proprietary r8168 driver that's available in packman repo but with that the receive speeds were even worse at about 8 MiB/s.

I also ran the same test in Manjaro 18.0 installation I have in the same machine, running kernel 4.19.13 with both stock r8169 driver and the proprietary r8168 driver.
The stock r8169 performed pretty much equal to tumbleweed's but with the proprietary r8168 receive speeds were back to normal 80-90 MiB/s and there were no rx_missed errors either.

Results from all of the above attached.
Comment 24 Martti Laaksonen 2019-01-10 18:59:07 UTC
Created attachment 794114 [details]
File transfer test with kernel 4.19.11 + r8169-a2965f1 (new build)
Comment 25 Martti Laaksonen 2019-01-10 19:00:14 UTC
Created attachment 794115 [details]
File transfer test with kernel 4.19.11 + r8169-6fcf9b1 (new build)
Comment 26 Martti Laaksonen 2019-01-10 19:01:10 UTC
Created attachment 794116 [details]
File transfer test with kernel 4.19.11 + proprietary r8168
Comment 27 Martti Laaksonen 2019-01-10 19:02:32 UTC
Created attachment 794117 [details]
File transfer test with Manjaro 18.0, kernel 4.19.13, r8169
Comment 28 Martti Laaksonen 2019-01-10 19:03:21 UTC
Created attachment 794118 [details]
File transfer test with Manjaro 18.0, kernel 4.19.13, r8168
Comment 29 david chang 2019-01-31 02:39:22 UTC
Hi,

There is a similar issue upstream but with different chip.
https://lore.kernel.org/netdev/aead4da3-e1b0-ab6c-2842-634e175b33ab@gmail.com/

Could you give it a try with the following kernel parameter separately.
- pcie_aspm=off
- pcie_aspm.policy=performance

Thanks!
Comment 30 david chang 2019-01-31 07:49:51 UTC
(In reply to david chang from comment #29)
> Hi,
> 
> There is a similar issue upstream but with different chip.
> https://lore.kernel.org/netdev/aead4da3-e1b0-ab6c-2842-634e175b33ab@gmail.
> com/
> 
> Could you give it a try with the following kernel parameter separately.
> - pcie_aspm=off
> - pcie_aspm.policy=performance
> 
> Thanks!

Also could you please get the register dump of the interface from v4.18 and v4.19 kernel?
# ethtool -d enp1s0
Comment 31 Martti Laaksonen 2019-01-31 18:03:15 UTC
(In reply to david chang from comment #30)
> (In reply to david chang from comment #29)
> > Hi,
> > 
> > There is a similar issue upstream but with different chip.
> > https://lore.kernel.org/netdev/aead4da3-e1b0-ab6c-2842-634e175b33ab@gmail.
> > com/
> > 
> > Could you give it a try with the following kernel parameter separately.
> > - pcie_aspm=off
> > - pcie_aspm.policy=performance
> > 
> > Thanks!
> 
> Also could you please get the register dump of the interface from v4.18 and
> v4.19 kernel?
> # ethtool -d enp1s0

Alright, I tried both kernel parameters with 4.18.15 and 4.19.11 kernels but did not see any change in receive performance, i.e. kernel 4.18.15 performs well whereas kernel 4.19.11 crawls at 12 MB/s when receiving.

That PCIe ASPM policy setting probably does nothing since the kernel reports this:
tammi 31 18:20:52 darkangel kernel: ACPI FADT declares the system doesn't support PCIe ASPM, so disable it
tammi 31 18:20:52 darkangel kernel: acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI]
tammi 31 18:20:52 darkangel kernel: acpi PNP0A08:00: FADT indicates ASPM is unsupported, using BIOS configuration

Test results attached. The register dumps you asked are also included in the results.
Comment 32 Martti Laaksonen 2019-01-31 18:04:59 UTC
Created attachment 795734 [details]
File transfer test with kernel 4.18.15
Comment 33 Martti Laaksonen 2019-01-31 18:06:01 UTC
Created attachment 795735 [details]
File transfer test with kernel 4.18.15, pcie_aspm=off
Comment 34 Martti Laaksonen 2019-01-31 18:09:22 UTC
Created attachment 795736 [details]
File transfer test with kernel 4.18.15, pcie_aspm.policy=performance
Comment 35 Martti Laaksonen 2019-01-31 18:10:21 UTC
Created attachment 795737 [details]
File transfer test with kernel 4.19.11, pcie_aspm=off
Comment 36 Martti Laaksonen 2019-01-31 18:11:20 UTC
Created attachment 795738 [details]
File transfer test with kernel 4.19.11, pcie_aspm.policy=performance
Comment 37 david chang 2019-02-01 04:14:11 UTC
(In reply to Martti Laaksonen from comment #31)
> (In reply to david chang from comment #30)
> > (In reply to david chang from comment #29)
> > > Hi,
> > > 
> > > There is a similar issue upstream but with different chip.
> > > https://lore.kernel.org/netdev/aead4da3-e1b0-ab6c-2842-634e175b33ab@gmail.
> > > com/
> > > 
> > > Could you give it a try with the following kernel parameter separately.
> > > - pcie_aspm=off
> > > - pcie_aspm.policy=performance
> > > 
> > > Thanks!
> > 
> > Also could you please get the register dump of the interface from v4.18 and
> > v4.19 kernel?
> > # ethtool -d enp1s0
> 
> Alright, I tried both kernel parameters with 4.18.15 and 4.19.11 kernels but
> did not see any change in receive performance, i.e. kernel 4.18.15 performs
> well whereas kernel 4.19.11 crawls at 12 MB/s when receiving.
> 
> That PCIe ASPM policy setting probably does nothing since the kernel reports
> this:
> tammi 31 18:20:52 darkangel kernel: ACPI FADT declares the system doesn't
> support PCIe ASPM, so disable it
> tammi 31 18:20:52 darkangel kernel: acpi PNP0A08:00: _OSC: OS supports
> [ExtendedConfig ASPM ClockPM Segments MSI]
> tammi 31 18:20:52 darkangel kernel: acpi PNP0A08:00: FADT indicates ASPM is
> unsupported, using BIOS configuration
> 
> Test results attached. The register dumps you asked are also included in the
> results.

Thank you for the testing!

Did you ever do the performance test without NFS? ex: download directly or iperf..
Comment 38 Martti Laaksonen 2019-02-02 07:10:33 UTC
Hi,

(In reply to david chang from comment #37)
> Did you ever do the performance test without NFS? ex: download directly or
> iperf..

No, I didn't but since you mentioned it, I ran some iperf tests too.

With kernel 4.18.15 the receive and transmit speeds were pretty constant at 112 MB/s every time for the duration of the test (I did several runs).

With kernel 4.19.11 the results were a bit more interesting. Usually the receive speed would crawl somewhere around 12-14 MB/s for several seconds and then suddenly jump up to 112 MB/s and stay there, sometimes until the end of the test, and sometimes it would drop back to 12-14 MB/s after couple of seconds.

Results attached.
Comment 39 Martti Laaksonen 2019-02-02 07:11:48 UTC
Created attachment 795852 [details]
iperf test with kernel 4.18.15
Comment 40 Martti Laaksonen 2019-02-02 07:12:31 UTC
Created attachment 795853 [details]
iperf test with kernel 4.19.11
Comment 41 david chang 2019-02-12 06:32:40 UTC
Hi,

Sorry for the delay!

I've created the test kmp which disable aspm as Heiner's suggestion. Could you please give it a try?

https://download.opensuse.org/repositories/home:/david_chang:/bsc1119649/openSUSE_Tumbleweed/x86_64/r8169-test-kmp-default-2.3_k4.20.6_1-1.1.x86_64.rpm

Thank you!
Comment 42 Martti Laaksonen 2019-02-12 17:52:53 UTC
(In reply to david chang from comment #41)
> I've created the test kmp which disable aspm as Heiner's suggestion. Could
> you please give it a try?
> 
> https://download.opensuse.org/repositories/home:/david_chang:/bsc1119649/
> openSUSE_Tumbleweed/x86_64/r8169-test-kmp-default-2.3_k4.20.6_1-1.1.x86_64.
> rpm

With that test driver both receive and transmit performance seems to be more or less on par with kernel 4.18.15 performance again.

I ran the usual NFS transfer and iperf tests several times and the results were pretty stable on each test run.

Results attached.
Comment 43 Martti Laaksonen 2019-02-12 17:54:29 UTC
Created attachment 796627 [details]
File transfer test with kernel 4.20.6 + r8169-test-kmp
Comment 44 Martti Laaksonen 2019-02-12 17:55:55 UTC
Created attachment 796629 [details]
iperf test with kernel 4.20.6 + r8169-test-kmp
Comment 45 david chang 2019-02-13 02:10:46 UTC
(In reply to Martti Laaksonen from comment #42)
> (In reply to david chang from comment #41)
> > I've created the test kmp which disable aspm as Heiner's suggestion. Could
> > you please give it a try?
> > 
> > https://download.opensuse.org/repositories/home:/david_chang:/bsc1119649/
> > openSUSE_Tumbleweed/x86_64/r8169-test-kmp-default-2.3_k4.20.6_1-1.1.x86_64.
> > rpm
> 
> With that test driver both receive and transmit performance seems to be more
> or less on par with kernel 4.18.15 performance again.
> 
> I ran the usual NFS transfer and iperf tests several times and the results
> were pretty stable on each test run.
> 
> Results attached.

Thank you for the testing! And it's good to hear that! 

I have one more question, since the Tumbleweed is using v4.20.6 kernel now. I want to make sure that is it bad receive performance if you use in-box r8169 driver of v4.20 (without the kmp)? Thanks!
Comment 46 Martti Laaksonen 2019-02-13 17:12:06 UTC
(In reply to david chang from comment #45)
> I have one more question, since the Tumbleweed is using v4.20.6 kernel now.
> I want to make sure that is it bad receive performance if you use in-box
> r8169 driver of v4.20 (without the kmp)? Thanks!

Yes, the receive speeds using the stock r8169 driver are still as bad as ever, so no change there.
Except that when I looked at the test results I noticed that nowadays libphy also seems to pull in a separate realtek driver as a dependency, too. That wasn't there when I ran the tests using kernel 4.19.11 or 4.20.0.

kernel 4.19.11 loaded modules:
lsmod | grep r8169:
r8169                  90112  0
libphy                 77824  2 r8169

kernel 4.20.0 loaded modules:
lsmod | grep r8169:
r8169                  94208  0
libphy                 86016  2 r8169

kernel 4.20.6 loaded modules:
lsmod | grep r8169:
r8169                  94208  0
libphy                 86016  3 r8169,realtek

That realtek module is also included in kernel 4.19.11 modules (and in 4.20.0) but for some reason it was not pulled in at that time.
It doesn't seem to make much of a difference, though, whether it is loaded or not.

Test results using stock kernel 4.20.6 r8169 driver attached. You'll notice that in the iperf test once again the receive speed peaks at 112 MB/s for a few seconds only to drop back to 12-13 MB/s.
Comment 47 Martti Laaksonen 2019-02-13 17:13:25 UTC
Created attachment 796740 [details]
File transfer test with kernel 4.20.6 using stock r8169
Comment 48 Martti Laaksonen 2019-02-13 17:14:14 UTC
Created attachment 796741 [details]
iperf test with kernel 4.20.6 using stock r8169
Comment 49 david chang 2019-02-14 02:23:00 UTC
(In reply to Martti Laaksonen from comment #46)
> (In reply to david chang from comment #45)
> > I have one more question, since the Tumbleweed is using v4.20.6 kernel now.
> > I want to make sure that is it bad receive performance if you use in-box
> > r8169 driver of v4.20 (without the kmp)? Thanks!
> 
> Yes, the receive speeds using the stock r8169 driver are still as bad as
> ever, so no change there.
> Except that when I looked at the test results I noticed that nowadays libphy
> also seems to pull in a separate realtek driver as a dependency, too. That
> wasn't there when I ran the tests using kernel 4.19.11 or 4.20.0.

Thank you for your conformation! I will report upstream soon.

> That realtek module is also included in kernel 4.19.11 modules (and in
> 4.20.0) but for some reason it was not pulled in at that time.
> It doesn't seem to make much of a difference, though, whether it is loaded
> or not.

Good to know.

> 
> Test results using stock kernel 4.20.6 r8169 driver attached. You'll notice
> that in the iperf test once again the receive speed peaks at 112 MB/s for a
> few seconds only to drop back to 12-13 MB/s.

Yeah, it's interesting. Thank you!
Comment 50 david chang 2019-02-15 08:38:58 UTC
Hi,

As the driver maintainer said, this issue will not be fixed shortly. Since it might have HW issue. I would suggest that you use the last kmp for workaround the performance issue until the driver gets a fix for it.

And we could leave this report for tracking the issue. Thanks!
Comment 51 Martti Laaksonen 2019-02-17 19:24:11 UTC
Hi,

(In reply to david chang from comment #50)
> As the driver maintainer said, this issue will not be fixed shortly. Since
> it might have HW issue. I would suggest that you use the last kmp for
> workaround the performance issue until the driver gets a fix for it.

Well that is a bit disappointing as it implies that I would be either "stuck" using 4.20.6 kernel or alternatively I would have to build the driver anew for each new kernel version with that workaround the driver maintainer suggested.

By the way, if the receive speed problem is sort of fixed by commenting out / removing rtl_hw_aspm_clkreq_enable(tp, true) at the end of rtl_hw_start_8168h_1(), why won't setting pcie_aspm=off in the kernel command line have any effect?
Comment 52 david chang 2019-02-18 02:23:36 UTC
(In reply to Martti Laaksonen from comment #51) 
> Well that is a bit disappointing as it implies that I would be either
> "stuck" using 4.20.6 kernel or alternatively I would have to build the
> driver anew for each new kernel version with that workaround the driver
> maintainer suggested.

Not really, the KMP can handle kernel update.

> By the way, if the receive speed problem is sort of fixed by commenting out
> / removing rtl_hw_aspm_clkreq_enable(tp, true) at the end of
> rtl_hw_start_8168h_1(), why won't setting pcie_aspm=off in the kernel
> command line have any effect?

That depends on the driver. Apparently the r8169 driver does not support this kernel parameter.
Comment 53 david chang 2019-05-06 03:23:01 UTC
Hi,

There is a fix from upstream for most realtek chips of this issue. ("b75bb8a5b755 r8169: disable ASPM again") And it was included in kernel version 5.0.8. So please kindly remove the testing kmp once you have updated to the latest kernel version. 

I would close this report for now, please reopen if you encounter any problem. Thanks!