Categories
Hardware Linux

Kernel module amdgpu failed to load after BIOS upgrade

I recently did a BIOS upgrade on my main desktop machine (ASUS motherboard). It runs Ubuntu 24.04, has a dedicated AMD GPU and I use the proprietary AMD driver packages for ROCm compute support. What came as a surprise was the non-accelerated Gnome Shell experience that greeted me after the BIOS upgrade. I could tell something was off, and it didn’t take me long to notice that the KMS graphics subsystem was using some generic driver with poor acceleration. How can a BIOS upgrade cause this to happen ? Was my GPU suddenly bricked in some way ?

No, fortunately ! I just forgot about the fact that ASUS BIOS upgrades tend to mess around with the UEFI Secure boot key database, which is in essence a firmware trust store of public keys that is used to verify signatures of operating system binary artifacts (like boot loaders, kernel images and modules). After probing around for while, I discovered that attempting to load the amdgpu kernel module failed with an error related to cryptographic signature. (I cannot remember the exact message at this time.) Since a lot of other kernel modules where already loaded successfully, I figured this has to be caused by the proprietary driver being compiled (and signed) externally by DKMS.

The obvious next step was to attempt a complete re-installation of the AMD driver packages and DKMS, which I hoped would fix-up the module signature issues. But it didn’t help at all, even though a full recompilation of the AMD kernel modules was completed without any obvious error messages popping up. The resulting module was still not trusted by the kernel. (Even wiping all packages and re-installing didn’t do it.)

I wasn’t planning on spending my morning messing with the UEFI firmware keys, and so I pondered just going back to the open source AMD driver, which already comes bundled with each kernel package (and no ROCm compute support..). But then I discovered references to DKMS and a cryptographic key pair, lurking under the system directory /var/lib/shim-signed/mok:

# ls /var/lib/shim-signed/mok
MOK.priv  MOK.der

That’s a certificate and private key, and «MOK» refers to the Secure Boot concept Machine Owner Key. You are normally allowed to adjust what Secure Boot trusts, if you have access to the physical machine on which it runs, which means you can modify the key database (trust store). After some research (and remembering I already did something very similar years ago), I figured I needed to try enrolling this public key into the UEFI key database:

mokutil --import /var/lib/shim-signed/mok/MOK.der

The mokutil command is used to interface with UEFI machine owner keys, and you can also check which MOKs are already enrolled using --list-enrolled.

The utility will ask for a password and you can simply invent something simple here. It is used only once as a challenge when entering the UEFI Mok manager utility after rebooting. From that firmware user interface you can then choose to enroll the new key and continue booting. After completing this step, the proprietary amdgpu kernel module loaded successfully once again and the problem was fixed.

I was disappointed by the Linux user experience here. A non-working graphics driver is not an obvious consequence of a BIOS upgrade, and had it not been for years of experience with Linux desktops, I would probably have given up much sooner. I don’t know if it’s just a bug that the key was not automatically re-enrolled by DKMS on re-installation, but that is certainly not a user friendly experience.