Bug 1090458 - Problem with ath10k firmware
Problem with ath10k firmware
Status: RESOLVED FIXED
Classification: openSUSE
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Kernel
Current
x86-64 Other
: P5 - None : Major (vote)
: ---
Assigned To: E-mail List
E-mail List
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2018-04-21 20:31 UTC by Ivan Levshin
Modified: 2018-04-26 18:13 UTC (History)
2 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
hwinfo output (288.07 KB, application/gzip)
2018-04-22 08:01 UTC, Ivan Levshin
Details
dmesg output (88.26 KB, text/plain)
2018-04-23 05:57 UTC, Ivan Levshin
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Ivan Levshin 2018-04-21 20:31:08 UTC
Hello,

I'm in trouble with my ath10K Wi-Fi card. I'm using MSI GS40 laptot with Killer Wi-Fi chip in it, actually that's Killer Wireless-AC 1525.

I found that after latest updates I'm in a serious trouble with Wi-Fi connection: it starts fine but after some time (3-5 mins) nothing works. I've checked dmesg output and found here:

[  425.846958] ath10k_pci 0000:3e:00.0: firmware crashed! (guid 918c42d8-9086-46eb-9736-056fbc0b706b)
[  425.846965] ath10k_pci 0000:3e:00.0: qca6174 hw2.1 target 0x05010000 chip_id 0x003405ff sub 1a56:1525
[  425.846967] ath10k_pci 0000:3e:00.0: kconfig debug 0 debugfs 0 tracing 0 dfs 0 testmode 0
[  425.847307] ath10k_pci 0000:3e:00.0: firmware ver SW_RM.1.1.1-00157-QCARMSWPZ-1 api 5 features ignore-otp,no-4addr-pad crc32 10bf8e08
[  425.847485] ath10k_pci 0000:3e:00.0: board_file api 2 bmi_id N/A crc32 ae2e275a
[  425.847488] ath10k_pci 0000:3e:00.0: htt-ver 3.1 wmi-op 4 htt-op 3 cal otp max-sta 32 raw 0 hwcrypto 1
[  425.849494] ath10k_pci 0000:3e:00.0: firmware register dump:
[  425.849500] ath10k_pci 0000:3e:00.0: [00]: 0x05010000 0x00000000 0x0092E4DC 0x43C88E15
[  425.849503] ath10k_pci 0000:3e:00.0: [04]: 0x0092E4DC 0x00060130 0x00000018 0x0041A760
[  425.849506] ath10k_pci 0000:3e:00.0: [08]: 0x43C88E01 0x00400000 0x00000000 0x000A5C88
[  425.849509] ath10k_pci 0000:3e:00.0: [12]: 0x00000009 0x00000000 0x0096C09C 0x0096C0A7
[  425.849511] ath10k_pci 0000:3e:00.0: [16]: 0x0096BDBC 0x009287BD 0x00000000 0x009287BD
[  425.849514] ath10k_pci 0000:3e:00.0: [20]: 0x4092E4DC 0x0041A710 0x00000000 0x0F000000
[  425.849517] ath10k_pci 0000:3e:00.0: [24]: 0x809432A7 0x0041A770 0x0040D400 0xC092E4DC
[  425.849520] ath10k_pci 0000:3e:00.0: [28]: 0x80942BC4 0x0041A790 0x43C88E01 0x00400000
[  425.849522] ath10k_pci 0000:3e:00.0: [32]: 0x80947BA7 0x0041A7B0 0x00404D88 0x0040E074
[  425.849525] ath10k_pci 0000:3e:00.0: [36]: 0x809BDECC 0x0041A7D0 0x00404D88 0x0040E074
[  425.849528] ath10k_pci 0000:3e:00.0: [40]: 0x8099638C 0x0041A7F0 0x00404D88 0x00000000
[  425.849531] ath10k_pci 0000:3e:00.0: [44]: 0x80992076 0x0041A810 0x0044FD68 0x0046FFE8
[  425.849534] ath10k_pci 0000:3e:00.0: [48]: 0x80996BD3 0x0041A830 0x0044FD68 0x00000000
[  425.849536] ath10k_pci 0000:3e:00.0: [52]: 0x800B4405 0x0041A850 0x00422318 0x00005002
[  425.849539] ath10k_pci 0000:3e:00.0: [56]: 0x809A6C34 0x0041A8E0 0x0042932C 0x0042CA44
[  425.849542] ath10k_pci 0000:3e:00.0: Copy Engine register dump:
[  425.849551] ath10k_pci 0000:3e:00.0: [00]: 0x00034400   1   1   3   3
[  425.849560] ath10k_pci 0000:3e:00.0: [01]: 0x00034800  31  31 386 387
[  425.849569] ath10k_pci 0000:3e:00.0: [02]: 0x00034c00  60  60  59  60
[  425.849577] ath10k_pci 0000:3e:00.0: [03]: 0x00035000  21  21  22  21
[  425.849586] ath10k_pci 0000:3e:00.0: [04]: 0x00035400 5881 5881 190 126
[  425.849595] ath10k_pci 0000:3e:00.0: [05]: 0x00035800   0   0   0   0
[  425.849603] ath10k_pci 0000:3e:00.0: [06]: 0x00035c00   9   9   9   9
[  425.849612] ath10k_pci 0000:3e:00.0: [07]: 0x00036000   1   1   1   1
[  425.855959] ath10k_pci 0000:3e:00.0: failed to create WMI vdev 0: -108
[  427.323553] ath10k_pci 0000:3e:00.0: device successfully recovered

It seems like module crashing because of firmware and kernel trying to recover it. I can say that sometimes it is recovering (but connection is absolutely slow), sometimes - don't (and I have no connection in this case).

I have 4.16.2-1-default #1 SMP PREEMPT Thu Apr 12 12:54:16 UTC 2018 (7b2d22b) x86_64 x86_64 x86_64 GNU/Linux kernel installed. Firmware version is 4.16.2-1-default #1 SMP PREEMPT Thu Apr 12 12:54:16 UTC 2018 (7b2d22b) x86_64 x86_64 x86_64 GNU/Linux

All I can say is that problem started after I applied latest updates, before that everything was working like a charm. Unfortunately I can't say what exactly  was updated as usually I do "zypper up" or "zypper dup" and installing all updates available at the same time. In my understanding something wrong is with firmware or kernel or with them both. Please guide me further: what info I should provide more, what tests will be be done etc. This thing makes me very unhappy, I dat to use my old Ethernet cable which is much less comfortable, you know :)

Thanks.
Comment 1 Takashi Iwai 2018-04-22 07:11:00 UTC
Can you boot with older kernel and confirm that WiFi still works as expected?
If yes, which kernel worked and which broken?  Please give dmesg outputs taken on both working and non-working kernels.

If the old kernel is also broken, it can be rather an issue of kernel-firmware package update instead.

In anyway, please upload the hwinfo output, too.
Comment 2 Ivan Levshin 2018-04-22 08:01:24 UTC
Created attachment 767910 [details]
hwinfo output

Hi Takashi, nice to meet you again

Please check attached hwinfo.tar.gz, I've added here 2 output: one is full ("hwinfo") and another one is for netcards only ("hwinfo --netcard")

Regarding elder kernels: yesterday I updated my system and eldest kernel I have at the moment is 4.16.1. Could you please tell me where to get any kernel from 4.15.x (4.16.0 definitely has the same problem as 4.16.1/4.16.2)? I could also try with elder version of kernel-firmware but can't understand how to get&test it as well.
Comment 3 Takashi Iwai 2018-04-22 08:15:19 UTC
Thanks.  Also, could you give the full dmesg output when WiFi crashes?

TW repo doesn't keep the old packages, unfortunately.  Sometimes one older version may remain for some time, but it's not persistent.

So, for testing purpose, now I'm building the last 4.15.x openSUSE kernel in OBS home:tiwai:kernel:4.15 repo.  It'll take some time (an hour or so), and give it a try later.

Getting the older kernel-firmware is also tricky, you'd need to rebuild too.  Maybe we should begin with checking which packages have been updated.

Do you remember about which date the WiFi worked?  You can take a look at /var/log/zypp/history and see which kernel-default and kernel-firmware packages have been updated.
Comment 4 Ivan Levshin 2018-04-22 08:33:08 UTC
Thanks for the advice. Unfortunately I also can't remember exactly when this problem started, all I can say at the moment that in the end of March I had no such problem.

Looked at /var/log/zypp/history I found that:
- April 01 2018: kernel-firmware-20180320-1.1 was installed together with kernel-default-4.15.14-1.6
- April 04 2018: kernel-firmware-20180402-1.1 was installed together with kernel-default-4.16.0-1.5

Approximately problem started since 04.04.18 and definitely it existed with kernel-firmware-20180402-1.1 and kernel-default-4.16.0-1.5, I'm sure as before yesterday's update I had 4.16.0 kernel and been trying to check if problem exist here or not. It was here.

I think we can try with kernel-firmware first, in my understanding it will impact less things that old kernel, isn't it?
Comment 5 Ivan Levshin 2018-04-23 05:57:29 UTC
Created attachment 767926 [details]
dmesg output

Hi Takashi, trust you're well

I think I found the root cause: that's firmware and not kernel. I found this in dmesg output:

[    5.661061] ath10k_pci 0000:3e:00.0: enabling device (0000 -> 0002)
[    5.662044] ath10k_pci 0000:3e:00.0: pci irq msi oper_irq_mode 2 irq_mode 0 reset_mode 0
[    5.984890] ath10k_pci 0000:3e:00.0: Direct firmware load for ath10k/pre-cal-pci-0000:3e:00.0.bin failed with error -2
[    5.984901] ath10k_pci 0000:3e:00.0: Direct firmware load for ath10k/cal-pci-0000:3e:00.0.bin failed with error -2
[    5.985538] ath10k_pci 0000:3e:00.0: Direct firmware load for ath10k/QCA6174/hw2.1/firmware-6.bin failed with error -2
[    5.989291] ath10k_pci 0000:3e:00.0: qca6174 hw2.1 target 0x05010000 chip_id 0x003405ff sub 1a56:1525
[    5.989293] ath10k_pci 0000:3e:00.0: kconfig debug 0 debugfs 0 tracing 0 dfs 0 testmode 0
[    5.989586] ath10k_pci 0000:3e:00.0: firmware ver SW_RM.1.1.1-00157-QCARMSWPZ-1 api 5 features ignore-otp,no-4addr-pad crc32 10bf8e08
[    6.055008] ath10k_pci 0000:3e:00.0: board_file api 2 bmi_id N/A crc32 ae2e275a
[    7.228251] ath10k_pci 0000:3e:00.0: htt-ver 3.1 wmi-op 4 htt-op 3 cal otp max-sta 32 raw 0 hwcrypto 1
[    7.343807] ath10k_pci 0000:3e:00.0 wlp62s0: renamed from wlan0

As you can see problem happening in the very beginning of boot so other troubles with WiFi might be explained with this problem: all things happening because of no firmware loaded.

I tried to gather additional info as per your recommendation for other bug (probably you remember we've been working with the Bluetooth bug on the same laptop) but was failed. Could you please tell me how to get additional info from the device? This time it's PCI and not USB device.
Comment 6 Takashi Iwai 2018-04-25 13:20:43 UTC
Well, the non-existing firmware isn't a problem.  The driver supports a new API version 6 while it falls back to API version 5.  Currently the API v6 firmware isn't publicly available, as it seems, so the behavior you've seen is correct.

Now I took a look at the recent ath10k1 development, I found a fix in 4.17-rc1 that is likely relevant with your problem.

55cc11da69895a680940c1733caabc37be685f5e
    Revert "ath10k: send (re)assoc peer command when NSS changed"

I'm going to build a kernel with this fix for testing.
Comment 7 Takashi Iwai 2018-04-25 13:32:43 UTC
A test kernel is being built on OBS home:tiwai:bnc1090458 repo.
It'll appear at
  http://download.opensuse.org/repositories/home:/tiwai:/bnc1090458/standard/

The kernel contains the possible fix for WiFi crash, and the revert of the previous BT fix.  So, BT may be broken as is with this kernel.

In this bug entry, let's concentrate on the WiFi part.  If the test kernel is confirmed to work for WiFi, I'll push the corresponding fix for TW kernel.
Comment 8 Ivan Levshin 2018-04-25 18:40:50 UTC
Hi Takashi,

Can't see x86_64 in this repo, could you please check if that will be available?
Comment 9 Takashi Iwai 2018-04-25 19:01:30 UTC
It's still being built.  You can check the build status via build.opensuse.org.
  https://build.opensuse.org/project/show/home:tiwai:bnc1090458
Comment 10 Ivan Levshin 2018-04-26 08:15:49 UTC
Hi Takashi,

kernel-default build failed, could you please check it?
Comment 11 Takashi Iwai 2018-04-26 08:43:52 UTC
Looks like some error in OBS.  I retriggered the build again.  Let's see whether it works now.
Comment 12 Ivan Levshin 2018-04-26 14:13:11 UTC
Takashi,

Now x86_64 built fine but when I'm trying to install kernel-default 4.16.4-2.1 from your repo zypper gives me a lot of warnings (for more than 2K of packets) and I'm affraid this kernel will breaks my system. Could you please tell me how to try this kernel without system disruption?
Comment 13 Takashi Iwai 2018-04-26 14:41:31 UTC
(In reply to Ivan Levshin from comment #12)
> Takashi,
> 
> Now x86_64 built fine but when I'm trying to install kernel-default
> 4.16.4-2.1 from your repo zypper gives me a lot of warnings (for more than
> 2K of packets) and I'm affraid this kernel will breaks my system. Could you
> please tell me how to try this kernel without system disruption?

Could you give the exact messages?  Are you sure that you're installing x86_64 one, right?
Comment 14 Ivan Levshin 2018-04-26 14:56:20 UTC
I've been trying wyth YaST when I got those messages, finally I installed it with zypper without any problem

Now WiFi seems to be running fine - I still need some time for monitoring. Right now I can say that firmware load error still here:

ivan@mynote:~> dmesg|grep ath10k
[    5.571975] ath10k_pci 0000:3e:00.0: enabling device (0000 -> 0002)
[    5.572932] ath10k_pci 0000:3e:00.0: pci irq msi oper_irq_mode 2 irq_mode 0 reset_mode 0
[    5.896645] ath10k_pci 0000:3e:00.0: Direct firmware load for ath10k/pre-cal-pci-0000:3e:00.0.bin failed with error -2
[    5.896654] ath10k_pci 0000:3e:00.0: Direct firmware load for ath10k/cal-pci-0000:3e:00.0.bin failed with error -2
[    5.896976] ath10k_pci 0000:3e:00.0: Direct firmware load for ath10k/QCA6174/hw2.1/firmware-6.bin failed with error -2
[    5.900202] ath10k_pci 0000:3e:00.0: qca6174 hw2.1 target 0x05010000 chip_id 0x003405ff sub 1a56:1525
[    5.900204] ath10k_pci 0000:3e:00.0: kconfig debug 0 debugfs 0 tracing 0 dfs 0 testmode 0
[    5.900524] ath10k_pci 0000:3e:00.0: firmware ver SW_RM.1.1.1-00157-QCARMSWPZ-1 api 5 features ignore-otp,no-4addr-pad crc32 10bf8e08
[    5.964525] ath10k_pci 0000:3e:00.0: board_file api 2 bmi_id N/A crc32 ae2e275a
[    7.137174] ath10k_pci 0000:3e:00.0: htt-ver 3.1 wmi-op 4 htt-op 3 cal otp max-sta 32 raw 0 hwcrypto 1
[    7.243263] ath10k_pci 0000:3e:00.0 wlp62s0: renamed from wlan0

But WiFi seems to running fine.
Comment 15 Takashi Iwai 2018-04-26 15:06:24 UTC
So far, so good.  The firmware load error is no real problem as long as it's about the firmware-6.bin and other *-pci-*.bin.
Comment 16 Ivan Levshin 2018-04-26 15:20:52 UTC
That's cool but what was the root cause if not firmware?
Comment 17 Takashi Iwai 2018-04-26 15:24:51 UTC
(In reply to Ivan Levshin from comment #16)
> That's cool but what was the root cause if not firmware?

It was a commit in the ath10k driver and that was reverted in 4.17-rc1.
It should have been sent to stable 4.16.y tree, too.  I'll submit it to Greg later.
Comment 18 Ivan Levshin 2018-04-26 16:13:08 UTC
Many thanks, Takashi
Comment 19 Takashi Iwai 2018-04-26 18:13:37 UTC
The fix patch was merged in stable branch and will be included in the next TW kernel.  Let's close.