Bug 1169174 - kernel shows trace with 5.6 kernel with AMD Ryzen 7 PRO 3700U
kernel shows trace with 5.6 kernel with AMD Ryzen 7 PRO 3700U
Status: RESOLVED NORESPONSE
Classification: openSUSE
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Kernel
Current
Other Other
: P5 - None : Normal (vote)
: ---
Assigned To: Borislav Petkov
E-mail List
https://bugzilla.kernel.org/show_bug....
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2020-04-10 08:47 UTC by Tomáš Chvátal
Modified: 2021-01-17 19:07 UTC (History)
6 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
dmesg.txt (95.41 KB, text/plain)
2020-04-11 18:10 UTC, Tomáš Chvátal
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Tomáš Chvátal 2020-04-10 08:47:09 UTC
Linux letik 5.6.2-1-default #1 SMP Thu Apr 2 06:31:32 UTC 2020 (c8170d6) x86_64 x86_64 x86_64 GNU/Linux


[  669.093097] irq 7: nobody cared (try booting with the "irqpoll" option)
[  669.093105] CPU: 0 PID: 1678 Comm: kworker/u33:2 Not tainted 5.6.2-1-default #1 openSUSE Tumbleweed (unreleased)
[  669.093105] Hardware name: LENOVO 20NJS0DC00/20NJS0DC00, BIOS R12ET49W(1.19 ) 01/06/2020
[  669.093114] Workqueue:  0x0 (hci0)
[  669.093116] Call Trace:
[  669.093120]  <IRQ>
[  669.093128]  dump_stack+0x8f/0xd0
[  669.093133]  __report_bad_irq+0x38/0xad
[  669.093134]  note_interrupt.cold+0xb/0x6e
[  669.093136]  handle_irq_event_percpu+0x72/0x80
[  669.093137]  handle_irq_event+0x3c/0x5c
[  669.093138]  handle_fasteoi_irq+0xa3/0x160
[  669.093142]  do_IRQ+0x53/0xe0
[  669.093145]  common_interrupt+0xf/0xf
[  669.093146]  </IRQ>
[  669.093149] RIP: 0010:finish_task_switch+0x85/0x280
[  669.093150] Code: 01 00 0f 1f 44 00 00 0f 1f 44 00 00 41 c7 45 38 00 00 00 00 4c 89 e7 c6 07 00 0f 1f 40 00 e8 c2 07 0e 00 fb 66 0f 1f 44 00 00 <65> 48 8b 04 25 c0 ab 01 00 0f 1f 44 00 00 4d 85 f6 74 21 65 48 8b
[  669.093151] RSP: 0018:ffffac7ac0f97e10 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffc7
[  669.093152] RAX: 0000000080000000 RBX: ffffa0076ed3be00 RCX: 0000000000000000
[  669.093153] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffa00770a2f280
[  669.093153] RBP: ffffac7ac0f97e38 R08: 0000000000000001 R09: 0000000000000000
[  669.093154] R10: 000000000000037c R11: ffffa00770a2e064 R12: ffffa00770a2f280
[  669.093154] R13: ffffffff96e14840 R14: 0000000000000000 R15: 0000000000000000
[  669.093158]  __schedule+0x2e5/0x770
[  669.093160]  schedule+0x4a/0xb0
[  669.093163]  worker_thread+0xd1/0x400
[  669.093165]  kthread+0xf9/0x130
[  669.093166]  ? process_one_work+0x3b0/0x3b0
[  669.093167]  ? kthread_park+0x90/0x90
[  669.093168]  ret_from_fork+0x27/0x50
[  669.093170] handlers:
[  669.093174] [<00000000e8247db6>] amd_gpio_irq_handler [pinctrl_amd]
[  669.093175] Disabling IRQ #7
Comment 1 Borislav Petkov 2020-04-11 10:46:40 UTC
Yah, a known issue and a long story. :-(

Does the splat go away if you blacklist that module - pinctrl_amd?
Comment 2 Tomáš Chvátal 2020-04-11 18:08:31 UTC
Yep, also adding the bug from kernel.org to URL, I forgot to do before.
Comment 3 Tomáš Chvátal 2020-04-11 18:10:57 UTC
Created attachment 835467 [details]
dmesg.txt

New boot dmesg looks oky.

Interestingly anyone knows what is that spammy 'testing the buffer' message? Google didn't really help.
Comment 4 Tomáš Chvátal 2020-04-11 18:13:30 UTC
Also should I boot it with "iommu=off" with respect to the error in the dmesg output?
Comment 5 Takashi Iwai 2020-04-12 07:44:06 UTC
(In reply to Tomáš Chvátal from comment #3)
> Interestingly anyone knows what is that spammy 'testing the buffer' message?
> Google didn't really help.

It's been already addressed, see bug 1168664
Comment 6 Borislav Petkov 2020-04-12 23:22:42 UTC
(In reply to Tomáš Chvátal from comment #4)
> Also should I boot it with "iommu=off" with respect to the error in the
> dmesg output?

You mean that or something else?

[    1.522731] pci 0000:00:00.2: AMD-Vi: Unable to read/write to IOMMU perf counter.
[    1.522854] pci 0000:00:00.2: can't derive routing for PCI INT A
[    1.522855] pci 0000:00:00.2: PCI INT A: not connected
Comment 7 Tomáš Chvátal 2020-04-13 08:37:06 UTC
(In reply to Borislav Petkov from comment #6)
> (In reply to Tomáš Chvátal from comment #4)
> > Also should I boot it with "iommu=off" with respect to the error in the
> > dmesg output?
> 
> You mean that or something else?
> 
> [    1.522731] pci 0000:00:00.2: AMD-Vi: Unable to read/write to IOMMU perf
> counter.
> [    1.522854] pci 0000:00:00.2: can't derive routing for PCI INT A
> [    1.522855] pci 0000:00:00.2: PCI INT A: not connected

Yes this, just asking as it is shown as 'red' in the output but I honestly dunno if disabling it would not bite me more.
Comment 8 Borislav Petkov 2020-04-13 08:52:53 UTC
I haven't seen that one either, let's ask Joerg.
Comment 9 Joerg Roedel 2020-04-15 15:34:35 UTC
(In reply to Borislav Petkov from comment #6)

> [    1.522731] pci 0000:00:00.2: AMD-Vi: Unable to read/write to IOMMU perf
> counter.

This message is harmless, it just means that there is no support for IOMMU performance counters on this system. But the IOMMU is otherwise okay and can be used.
Comment 10 Takashi Iwai 2020-05-19 15:26:42 UTC
So how is the situation now?  Does user still need to add blacklisting pinctrl-amd manually?
Comment 11 Tomáš Chvátal 2020-05-20 07:03:59 UTC
For the fun removed the blacklist, no change and it throws the stuff out still with the latest TW kernel.
Comment 12 Borislav Petkov 2020-06-07 19:06:33 UTC
Btw, is that box yours or can I get it for experimenting? As in, try kernels and some patches...
Comment 13 Tomáš Chvátal 2020-06-08 06:08:39 UTC
(In reply to Borislav Petkov from comment #12)
> Btw, is that box yours or can I get it for experimenting? As in, try kernels
> and some patches...

Yeah it is SUSE provided lappy.

Just limiting we ain't in the office and I need it kinda to connect to the HW in the office right now.  Whenever we can go back to the office I can get it to your desk and you can have fun with it tho :-).
Comment 14 Miroslav Beneš 2020-09-10 10:52:14 UTC
Anything new here?

Tomas, could you provide Boris with the access to the laptop?
Comment 15 Tomáš Chvátal 2020-09-10 14:22:31 UTC
(In reply to Miroslav Beneš from comment #14)
> Anything new here?
> 
> Tomas, could you provide Boris with the access to the laptop?

Well as we are stuck at home I am rather fond of having working machine :)

BUT there are at least 7 same laptops in our prg storage, so I suppose we can take one of those.

Also removing the blacklist shows the trace on latest in TW still.
Comment 16 Borislav Petkov 2020-09-10 14:33:57 UTC
Yes, but hold on to it for now until I find someone at AMD from the client unit to deal with those issues.

Lemme take that bug and we'll talk later.
Comment 17 Borislav Petkov 2021-01-17 19:07:00 UTC
Closing. Feel free to reopen when you've found a box which I can use to debug. Alternatively, I'll reopen when I have one.

Thx.