Bug 1146769

Summary: kernel-firmware-amdgpu breaks polaris10
Product: [openSUSE] openSUSE Tumbleweed Reporter: Vlastimil Babka <vbabka>
Component: KernelAssignee: Daniel Molkentin <daniel>
Status: RESOLVED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P5 - None CC: tiwai, vbabka
Version: Current   
Target Milestone: ---   
Hardware: Other   
OS: Other   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Bug Depends on:    
Bug Blocks: 1161183, 1143959    
Attachments: Workaround patch
Another cleanup

Description Vlastimil Babka 2019-08-22 08:08:40 UTC
With kernel-firmware-amdgpu-20190815-277.1 on kernel 5.3.0-rc5-3.g20b5ce2-default
I get:

2019-08-22T09:20:59.517060+02:00 gehinom kernel: [    5.101024] amdgpu 0000:09:00.0: Direct firmware load for amdgpu/polaris10_mc.bin failed with error -2
2019-08-22T09:20:59.517061+02:00 gehinom kernel: [    5.101026] mc: Failed to load firmware "amdgpu/polaris10_mc.bin"
2019-08-22T09:20:59.517062+02:00 gehinom kernel: [    5.101102] [drm:gmc_v8_0_sw_init.cold [amdgpu]] *ERROR* Failed to load mc firmware!
2019-08-22T09:20:59.517144+02:00 gehinom kernel: [    5.101173] [drm:amdgpu_device_ip_init [amdgpu]] *ERROR* sw_init of IP block <gmc_v8_0> failed -2
2019-08-22T09:20:59.517145+02:00 gehinom kernel: [    5.101175] amdgpu 0000:09:00.0: amdgpu_device_ip_init failed
2019-08-22T09:20:59.517146+02:00 gehinom kernel: [    5.101177] amdgpu 0000:09:00.0: Fatal error during GPU init
2019-08-22T09:20:59.517150+02:00 gehinom kernel: [    5.101179] [drm] amdgpu: finishing device.

kernel-firmware works fine on the same kernel. Does amdgpu not support .xz firmware or what?
Comment 1 Takashi Iwai 2019-08-22 08:14:02 UTC
The compressed firmware support is in the kernel firmware loader core, and should be irrelevant with each driver code.

My wild guess is that the firmware is missing by some reason in initrd.

Could you check whether it's missing or not?
Comment 2 Vlastimil Babka 2019-08-22 08:18:50 UTC
(In reply to Takashi Iwai from comment #1)
> My wild guess is that the firmware is missing by some reason in initrd.
> 
> Could you check whether it's missing or not?

You're right, there's no lib/firmware in initrd at all. Dracut problem?
Comment 3 Takashi Iwai 2019-08-22 08:38:07 UTC
Possibly.  Are you running TW?  i.e. with the updated dracut?

Or, do you happen to have installed amdgpu-pro package?  This used to break things at uninstallation.
Comment 4 Vlastimil Babka 2019-08-22 08:48:55 UTC
(In reply to Takashi Iwai from comment #3)
> Possibly.  Are you running TW?  i.e. with the updated dracut?

Yep

Information for package dracut:
-------------------------------
Repository     : openSUSE-Tumbleweed-Oss           
Name           : dracut                            
Version        : 049+git104.1244eed7-1.1           
Arch           : x86_64                            
Vendor         : openSUSE                          
Installed Size : 1.2 MiB                           
Installed      : Yes (automatically)               
Status         : up-to-date                        

 
> Or, do you happen to have installed amdgpu-pro package?  This used to break
> things at uninstallation.

Doesn't seem so.
Comment 5 Takashi Iwai 2019-08-22 08:53:32 UTC
OK, I'll try to install TW on a test machine and see whether it really works.
Comment 6 Vlastimil Babka 2019-08-22 08:57:02 UTC
From what I see in /usr/lib/dracut/dracut-init.sh it should work, even if unpacking:

    for _fw in $(modinfo -k $kernel -F firmware $1 2>/dev/null); do
        _found=''
        for _fwdir in $fw_dir; do
            [[ -d $_fwdir ]] || continue
            if [[ -f $_fwdir/$_fw ]]; then
                inst_simple "$_fwdir/$_fw" "/lib/firmware/$_fw"
            elif [[ -f $_fwdir/$_fw.xz ]]; then
                inst_simple "$_fwdir/$_fw.xz" "/lib/firmware/$_fw.xz"
                rm -f "${initdir}/lib/firmware/$_fw"
                unxz -f "${initdir}/lib/firmware/$_fw.xz"
            else
                continue
            fi
            _found=yes
        done
        if [[ $_found != yes ]]; then
            if ! [[ -d $(echo /sys/module/${_modname//-/_}|{ read a b; echo $a; }) ]]; then
                dinfo "Possible missing firmware \"${_fw}\" for kernel module" \
                    "\"${_modname}.ko\""
            else
                dwarn "Possible missing firmware \"${_fw}\" for kernel module" \
                    "\"${_modname}.ko\""
            fi
        fi
    done

And I don't see those warnings
Comment 7 Takashi Iwai 2019-08-22 09:00:30 UTC
Did installing kernel-firmware.rpm instead of kernel-firmware-amdgpu cure the problem?  I'm wondering whether it's about the compressed file support in dracut or in a different part.
Comment 8 Vlastimil Babka 2019-08-22 09:05:56 UTC
(In reply to Takashi Iwai from comment #7)
> Did installing kernel-firmware.rpm instead of kernel-firmware-amdgpu cure
> the problem?  I'm wondering whether it's about the compressed file support
> in dracut or in a different part.

Yes it did.
Comment 9 Vlastimil Babka 2019-08-22 09:47:07 UTC
FWIW unpacking the files installed from kernel-firmware-amdgpu manually also worked.
Comment 10 Vlastimil Babka 2019-08-22 10:04:04 UTC
I'm confused, I don't see install_kmod_with_fw() from /usr/lib/dracut/dracut-init.sh called anywhere. I have no idea how firmware files end up in the initrd.
Comment 11 Vlastimil Babka 2019-08-22 10:13:06 UTC
(In reply to Vlastimil Babka from comment #10)
> I'm confused, I don't see install_kmod_with_fw() from
> /usr/lib/dracut/dracut-init.sh called anywhere. I have no idea how firmware
> files end up in the initrd.

I suspect it's done by the .c program dracut-install, not the shell scripts:
https://github.com/dracutdevs/dracut/blob/master/install/dracut-install.c#L1128
Comment 12 Takashi Iwai 2019-08-22 10:18:10 UTC
Indeed, thanks for catching this.

Daniel, it seems that the recent dracut moved the complete code and broke the compressed firmware loading.
Comment 13 Takashi Iwai 2019-08-22 11:00:12 UTC
The patch below is a quick bandaid fix.

I built a test package in OBS home:tiwai:branches:Base:System/dracut repo.
Vlastimil, could you check whether it works?
Comment 14 Takashi Iwai 2019-08-22 11:00:51 UTC
Created attachment 815238 [details]
Workaround patch
Comment 15 Takashi Iwai 2019-08-22 11:01:12 UTC
Created attachment 815239 [details]
Another cleanup
Comment 16 Vlastimil Babka 2019-08-22 12:37:33 UTC
(In reply to Takashi Iwai from comment #13)
> The patch below is a quick bandaid fix.
> 
> I built a test package in OBS home:tiwai:branches:Base:System/dracut repo.
> Vlastimil, could you check whether it works?

lsinitrd looks good (contains .xz files, while the bash version tried to unpack them, but that should be fine?) Thanks!
Comment 17 Takashi Iwai 2019-08-22 12:41:19 UTC
(In reply to Vlastimil Babka from comment #16)
> (In reply to Takashi Iwai from comment #13)
> > The patch below is a quick bandaid fix.
> > 
> > I built a test package in OBS home:tiwai:branches:Base:System/dracut repo.
> > Vlastimil, could you check whether it works?
> 
> lsinitrd looks good (contains .xz files, while the bash version tried to
> unpack them, but that should be fine?) Thanks!

Yes, the patch is intended to be as simple as possible, so it just copies the compressed firmware as-is.  It'd be better to be expanded in initrd beforehand, but we can live with that for now.
Comment 18 Takashi Iwai 2019-08-22 13:21:52 UTC
I submitted a PR to upstream:
  https://github.com/dracutdevs/dracut/pull/626
Comment 19 Takashi Iwai 2019-08-22 14:06:40 UTC
Daniel, please take a look.  Starting from 5.3 kernel, the firmware package will be switched to the compressed format, so we'd have to fix this ASAP.
Comment 20 Takashi Iwai 2019-08-26 12:45:20 UTC
Ping.  It's a critical bug and needs an immediate fix before we move to 5.3 kernel.
Comment 21 Daniel Molkentin 2019-08-26 12:55:56 UTC
I just returned from vacation today. Patch looks fine to me as a bandaid fix, but we should definitely find a way to store the files uncompressed.
Comment 22 Takashi Iwai 2019-08-26 13:04:16 UTC
(In reply to Daniel Molkentin from comment #21)
> I just returned from vacation today. Patch looks fine to me as a bandaid
> fix, but we should definitely find a way to store the files uncompressed.

Right.  And the same is true for *.ko files.
Comment 23 Daniel Molkentin 2019-08-26 14:46:05 UTC
Accepted upstream, backported and submitted to Factory as sr#726198.
Comment 24 Swamp Workflow Management 2019-08-26 15:20:07 UTC
This is an autogenerated message for OBS integration:
This bug (1146769) was mentioned in
https://build.opensuse.org/request/show/726198 Factory / dracut