Bugzilla – Bug 1088902
amdgpu: Screen flickering every few seconds since update 2018-04-09
Last modified: 2018-04-18 08:56:12 UTC
since the update to Tumbleweed yesterday (2018-04-09) my screen flickers ever now and then. When this happens I get the following call trace of the kernel:
[33171.510555] amdgpu 0000:01:00.0: swiotlb buffer is full (sz: 2097152 bytes)
[33171.510557] amdgpu 0000:01:00.0: swiotlb: coherent allocation failed, size=2097152
[33171.510559] CPU: 8 PID: 2224 Comm: X Tainted: G O 4.16.0-1-default #1 openSUSE Tumbleweed (unreleased)
[33171.510560] Hardware name: System manufacturer System Product Name/P9X79 PRO, BIOS 4801 07/25/2014
[33171.510560] Call Trace:
[33171.510579] ttm_dma_pool_get_pages+0x1ed/0x5b0 [ttm]
[33171.510583] ttm_dma_populate+0x25e/0x350 [ttm]
[33171.510586] ttm_tt_bind+0x2c/0x60 [ttm]
[33171.510589] ttm_bo_handle_move_mem+0x577/0x5b0 [ttm]
[33171.510593] ttm_bo_validate+0x100/0x110 [ttm]
[33171.510609] ? drm_vma_offset_add+0x41/0x60 [drm]
[33171.510617] ? do_detailed_mode+0x51e/0x5a0 [drm]
[33171.510620] ttm_bo_init_reserved+0x382/0x430 [ttm]
[33171.510659] amdgpu_bo_do_create+0x1e9/0x480 [amdgpu]
[33171.510679] ? amdgpu_fill_buffer+0x2d0/0x2d0 [amdgpu]
[33171.510697] amdgpu_bo_create+0x3d/0x220 [amdgpu]
[33171.510717] amdgpu_gem_object_create+0x6a/0xf0 [amdgpu]
[33171.510737] ? amdgpu_gem_object_close+0x1b0/0x1b0 [amdgpu]
[33171.510755] amdgpu_gem_create_ioctl+0x1c3/0x240 [amdgpu]
[33171.510774] ? amdgpu_gem_object_close+0x1b0/0x1b0 [amdgpu]
[33171.510781] drm_ioctl_kernel+0x5b/0xb0 [drm]
[33171.510787] drm_ioctl+0x2ad/0x350 [drm]
[33171.510805] ? amdgpu_gem_object_close+0x1b0/0x1b0 [amdgpu]
[33171.510808] ? timerqueue_add+0x52/0x80
[33171.510825] amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
[33171.510831] ? __sys_recvmsg+0x5d/0x70
[33171.510834] ? __fget+0x6e/0xb0
[33171.510842] RIP: 0033:0x7fe9516f7967
[33171.510843] RSP: 002b:00007ffe3d16ae48 EFLAGS: 00003246 ORIG_RAX: 0000000000000010
[33171.510845] RAX: ffffffffffffffda RBX: 000055b6e55af6c0 RCX: 00007fe9516f7967
[33171.510845] RDX: 00007ffe3d16ae90 RSI: 00000000c0206440 RDI: 0000000000000018
[33171.510846] RBP: 00007ffe3d16ae90 R08: 000055b6e55af6c0 R09: 0000000000000004
[33171.510847] R10: 000055b6e4907010 R11: 0000000000003246 R12: 00000000c0206440
[33171.510847] R13: 0000000000000018 R14: 00007ffe3d16af18 R15: 000055b6e603ae70
Linux magrathea.fritz.box 4.16.0-1-default #1 SMP PREEMPT Wed Apr 4 13:35:56 UTC 2018 (e16f96d) x86_64 x86_64 x86_64 GNU/Linux
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/580] (rev cf) (prog-if 00 [VGA controller])
Subsystem: PC Partner Limited / Sapphire Technology Radeon RX 470/480
Flags: bus master, fast devsel, latency 0, IRQ 38
Memory at c0000000 (64-bit, prefetchable) [size=256M]
Memory at d0000000 (64-bit, prefetchable) [size=2M]
I/O ports at e000 [size=256]
Memory at fbe00000 (32-bit, non-prefetchable) [size=256K]
Expansion ROM at 000c0000 [disabled] [size=128K]
Capabilities:  Vendor Specific Information: Len=08 <?>
Capabilities:  Power Management version 3
Capabilities:  Express Legacy Endpoint, MSI 00
Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Capabilities:  Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
Capabilities:  Advanced Error Reporting
Capabilities:  #15
Capabilities:  #19
Capabilities: [2b0] Address Translation Service (ATS)
Capabilities: [2c0] Page Request Interface (PRI)
Capabilities: [2d0] Process Address Space ID (PASID)
Capabilities:  Latency Tolerance Reporting
Capabilities:  Alternative Routing-ID Interpretation (ARI)
Capabilities:  L1 PM Substates
Kernel driver in use: amdgpu
Kernel modules: amdgpu
Could you try the kernel in OBS home:tiwai:bnc1088658?
It's not published yet but the build of kernel-default already finished, so you can fetch binaries via "osc getbinaries home:tiwai:bnc1088658/kernel-default"
There is a known regression wrt swiotlb, and this looks like that.
... and it's published now:
I've installed the kernel you pointed me to and it looks like the issue is fixed. I don't see the call trace anymore.
I had a few flickers when starting firefox, but there stopped. I think that the kernel fixes the issue!
Thanks Takashi for the quick reply!
I still have flickering, either it goes to white or black. But I dunno how I could debug that. It might be mesa or amdgpu ...
So we seem to have multiple issues.
Could you try amdgpu.dc=0 boot option?
I've tried it and after unlocking my encrypted partition I ended up with a black screen, probably loading X. I needed a hardware reset to restart the machine.
I refreshed the repo home:tiwai:bnc1088658 with a couple of more patches to improve the memory allocations.
I'm not sure whether the symptom is relevant with it, but in anyway, please give it a try later (now it's being built, so one hour later or so).
I've tested with kernel-default-4.16.1-2.1.g98a8438.x86_64
and after I typed in my password in the desktop manager and it loads KDE (probably the compositor for OpenGL) the machine completely freezes.
Hrm so it's worse than before? That wasn't expected :-<
Well, I didn't remove the 4.16.1-2.g98a8438-default kernel. So this morning I booted into this Kernel and it worked. Maybe the video card needed a complete power down.
The flickering is mostly gone. However I still have flickering when playing a youtube video with firefox. When I move the mouse or switch to a different window (youtube video still in sight) I get a complete white screen part of a second.
Ok, when switching windows it happens sometimes too, but it isn't as bad with the kernel you gave me.
OK, the fixs included in the previous 4.16.1-2.g98a8438-default are now in stable branch. Could you check the kernel in OBS Kernel:stable works now equivalently?
I also updated again again home:tiwai:bnc1088658 repo based on 4.16.2.
If you have time, please try it again for double-check whether the further DMA allocation change causes the problem.
And, the rest flickering is a different cause, I suppose. Do you see any kernel message if filtering happens?
I've installed 4.16.2-1.g4ef185f-default and it works fine.
I still have some flickering I dunno where it is coming from or what is causing it. There are no relevant messages in dmesg when it happens.
There was another amdgpu DC regression report and it pointed the buggy commit to revert.
A test kernel is being built on OBS home:tiwai:bnc1089615 repo.
Please check it later.
Takashi, that fixed the issue! Well spotted ;-)
Good to hear! I'm going to merge the fix patch to stable branch.
One last favor: could you try also the kernel in OBS home:tiwai:bnc1088902 repo?
I'd like to see whether the DMA32 fix patch really gives any bad result. Thanks.
Seems to work fine too.
Thanks, now I can submit the DMA32 patch to upstream, after confirming that it has no big side effect.
I think all issues are addressed, and the fix patch was now merged to stable branch, so let's close.
Thank you very much for your help adressing those issues!
*** Bug 1089998 has been marked as a duplicate of this bug. ***