Bug 1088382

Summary: Heavy performance loss since update mid-March
Product: [openSUSE] openSUSE Tumbleweed Reporter: Michael Zapf <Michael.Zapf>
Component: KernelAssignee: E-mail List <kernel-maintainers>
Status: RESOLVED INVALID QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P5 - None CC: jslaby, Michael.Zapf, tiwai
Version: CurrentFlags: tiwai: needinfo? (Michael.Zapf)
Target Milestone: ---   
Hardware: x86-64   
OS: Other   
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---

Description Michael Zapf 2018-04-05 21:00:57 UTC
On three different desktop PCs I noticed a heavy performance loss since about mid March. The strange thing is that while this happened for all three of them, it was at different kernel releases, so maybe it is not the kernel but something else.

The performance loss got obvious while I was working on MAME, the emulation framework. When you start it with the option "-bench <n>", it deactivates the screen output (just empty window frame) and deactivates all timing delays, thus running as fast as possible.

I am getting this output when I boot the current kernel:

Linux capella.daheim.lan 4.15.13-1-default #1 SMP PREEMPT Sun Mar 25 08:34:58 UTC 2018 (950fc49) x86_64 x86_64 x86_64 GNU/Linux
Average speed: 302.08% (19 seconds)

However, when I boot the kernel 4.15.10, I am getting this:

Linux capella.daheim.lan 4.15.10-1-default #1 SMP PREEMPT Thu Mar 15 20:31:17 UTC 2018 (5e4329c) x86_64 x86_64 x86_64 GNU/Linux
Average speed: 896.21% (19 seconds)

which is the speed that I had expected. The CPU is an i7-6700K.

This output is not an artifact; it is well observable during running the emulation (e.g. chopped sound). Also, oither programs suffer from a similar performance loss. I am using the Leawo Bluray Player running in Wine; this is not possible anymore - the framerate drops to a crawl. Just by booting the older kernel (4.15.10), the performance is back again.

As said, I could verify this performance loss on three different PCs, but interestingly, not on two laptops that I also installed Tumbleweed on. On those three PCs, the critical point in time seems different. I can say for sure that by the end of February (also with the old style of Tumbleweed splash screen), all systems ran fast (I can reproduce that by rolling back), and that by 4.15.13, all systems (except the laptops) slowed down.

I will try a clean install on a spare SSD to see whether this can be reproduced on a newly installed system. I should add that I already tried to remove my AMD RX480 card to rule out amdgpu influence, so I temporarily used the onboard graphics. This had no effect on the observation.

Also, kernel boot parameters "nopti" and "nospec" had no effect.
Comment 1 Takashi Iwai 2018-04-05 21:21:17 UTC
At least there is no difference in kernel config between two kernel versions (git commits) you showed.  That is, if any, it must be a change (possibly a regression) in a kernel patch.

But I can't judge more than that for now, and the best would be if you can bisect.  Since you have a decent machine and the range is relatively narrow, it shouldn't be too hard.

Though, maybe it would be better to check 4.16 kernel before doing it.  It might have been already addressed.  So, could you try the kernel in OBS Kernel:stable repo at first?
Comment 2 Michael Zapf 2018-04-08 19:21:28 UTC
tl;dr: Same issue with new install

I just performed a clean install of Tumbleweed on a newly prepared SSD and ran the benchmark, but again with a result of 300%, whereas approx. 900% should be expected and were achieved until end of February (with the same hardware).

First thing: Please try the benchmark at your installation. I packed everything that is required into a tar file. Download the file from http://www.mizapf.eu/files/mame_bench.tar.bz2  (6 MiB) and unpack it. Run the bench script and check the benchmark result, that is all. You may have to install libSDL2_ttf-2_0-0 first.

The benchmark pops up an empty window (no video output from the emulation), closes it after some time, and reports the emulation speed, which is the ratio of the (unsynchronized) emulated time by the real time.

In the meantime I will try to set up everything for building older kernels. I have good experience about git and building (MAME), but I did not build a kernel for the last 15 years (although Windows users keep believing that Linux users always build their kernels). I guess I should clone the kernel source from Github; is there a recent HOWTO concerning configuration and installing the results?


Comment 3 Michael Zapf 2018-04-27 17:36:55 UTC
It seems I found the problem. I can get the old fast speed, reproducibly, by adding the kernel boot parameter "spectre_v2=off".

I verified this on all three PCs that showed the issue.

All of them deliver this output without the parameter:

$ cat /sys/devices/system/cpu/vulnerabilities/spectre_v2
Mitigation: Indirect Branch Restricted Speculation, IBPB, IBRS_FW


In sum, this means that by this mitigation, the system speed is effectively thrown back to pre-2011 level for CPU-intense applications on my Skylake and both KabyLake systems.

As I said, my 2012 laptop now outperforms each of them (Ivy Bridge).

I'm going to turn off the mitigation for the time being, until a) a better solution will be found or b) my PC will be old enough for yet another hardware upgrade.
Comment 4 Jiri Slaby 2018-06-15 11:27:28 UTC
I don't think there is anything we can do about it. (Except sending a thank letter to intel.)