Categories
Hardware Linux

Kernel module amdgpu failed to load after BIOS upgrade

I recently did a BIOS upgrade on my main desktop machine (ASUS motherboard). It runs Ubuntu 24.04, has a dedicated AMD GPU and I use the proprietary AMD driver packages for ROCm compute support. What came as a surprise was the non-accelerated Gnome Shell experience that greeted me after the BIOS upgrade. I could tell something was off, and it didn’t take me long to notice that the KMS graphics subsystem was using some generic driver with poor acceleration. How can a BIOS upgrade cause this to happen ? Was my GPU suddenly bricked in some way ?

No, fortunately ! I just forgot about the fact that ASUS BIOS upgrades tend to mess around with the UEFI Secure boot key database, which is in essence a firmware trust store of public keys that is used to verify signatures of operating system binary artifacts (like boot loaders, kernel images and modules). After probing around for while, I discovered that attempting to load the amdgpu kernel module failed with an error related to cryptographic signature. (I cannot remember the exact message at this time.) Since a lot of other kernel modules where already loaded successfully, I figured this has to be caused by the proprietary driver being compiled (and signed) externally by DKMS.

The obvious next step was to attempt a complete re-installation of the AMD driver packages and DKMS, which I hoped would fix-up the module signature issues. But it didn’t help at all, even though a full recompilation of the AMD kernel modules was completed without any obvious error messages popping up. The resulting module was still not trusted by the kernel. (Even wiping all packages and re-installing didn’t do it.)

I wasn’t planning on spending my morning messing with the UEFI firmware keys, and so I pondered just going back to the open source AMD driver, which already comes bundled with each kernel package (and no ROCm compute support..). But then I discovered references to DKMS and a cryptographic key pair, lurking under the system directory /var/lib/shim-signed/mok:

# ls /var/lib/shim-signed/mok
MOK.priv  MOK.der

That’s a certificate and private key, and «MOK» refers to the Secure Boot concept Machine Owner Key. You are normally allowed to adjust what Secure Boot trusts, if you have access to the physical machine on which it runs, which means you can modify the key database (trust store). After some research (and remembering I already did something very similar years ago), I figured I needed to try enrolling this public key into the UEFI key database:

mokutil --import /var/lib/shim-signed/mok/MOK.der

The mokutil command is used to interface with UEFI machine owner keys, and you can also check which MOKs are already enrolled using --list-enrolled.

The utility will ask for a password and you can simply invent something simple here. It is used only once as a challenge when entering the UEFI Mok manager utility after rebooting. From that firmware user interface you can then choose to enroll the new key and continue booting. After completing this step, the proprietary amdgpu kernel module loaded successfully once again and the problem was fixed.

I was disappointed by the Linux user experience here. A non-working graphics driver is not an obvious consequence of a BIOS upgrade, and had it not been for years of experience with Linux desktops, I would probably have given up much sooner. I don’t know if it’s just a bug that the key was not automatically re-enrolled by DKMS on re-installation, but that is certainly not a user friendly experience.

Categories
Code Linux

Handling OS events in Emacs Lisp

Emacs can access files on any remote server running an ssh or SFTP service, assuming that an ssh client is installed on the host where Emacs runs. I use this extensively, since I run my own personal home servers available over the internet. On those servers, I have files and resources that I access all time, from anywhere.

/ssh:server.example.com:~/myfile.txt

Opening this path in Emacs will automatically open an ssh-connection to server.example.com and transparently allow editing of myfile.txt. Emacs uses the TRAMP package (Transparent Remote Access, Multiple Protocols) to provide this functionality.

TRAMP is designed to re-use existing ssh-connections for accessing multiple resources (on the same server). When using a laptop, where network conditions change and the system is frequently suspended, such persistent connections tend to hang for a while after the operating system has resumed operation, which can block Emacs. This is annoying, especially when I’d like to access a remote file immediately, and ssh hasn’t yet detected that its TCP connection has become unusable.

TRAMP provides a convenient function to clean up all existing connections, aptly named tramp-cleanup-all-connections, which I want to automatically call when the operating system wakes up from a suspended state or if the network changes.

Detecting operating system events on Linux

If running in a typical Linux environment, you can make Emacs listen for DBUS system bus events. Check out my packages nm.el and upower.el for code which reacts to networking and power management events.

For example, to automatically clean up remote connections whenever network connects or re-connects, the following code would work (requires that NetworkManager is used by Linux distro):

(load-file "/path/to/nm.el")
(add-hook 'nm-connected-hook 'tramp-cleanup-all-connections)
(nm-enable)

Or to do the same whenever the machine resumes from suspend:

(load-file "/path/to/upower.el")
(add-hook 'upower-resume-hook 'tramp-cleanup-all-connections)
(upower-enable)

Detecting resume by using system clock

I also use Emacs on Windows sometimes, in a WSL environment where DBUS is not available. But we can still detect if the machine has been resumed in a generic manner, by observing the system clock:

(defvar clock-jump-detector-hook nil
  "Functions to run when a system clock jump is detected.")
(defvar clock-jump-detector-threshold (* 5 60)
  "Minimum time skip (in seconds) to consider
it a system clock jump. When it is detected that
the system clock jumps with more than this number of seconds, then hooks in `clock-jump-detector-hook'
are run.")

(setq clock-jump-detector-time (current-time))
(defun clock-jump-detector ()
  (let ((time-passed (float-time (time-since clock-jump-detector-time))))
    (setq clock-jump-detector-time (current-time))
    (when (> time-passed clock-jump-detector-threshold)
      (message "Clock jump of %f seconds detected, running hooks .." time-passed)
      (run-hooks 'clock-jump-detector-hook))))
(run-at-time t 15 'clock-jump-detector)

The code records the current time every 15 seconds, and if a time jump beyond a threshold is detected (a «clock jump»), then all functions in hook-variable clock-jump-detector-hook are invoked.

By having Emacs listen for operating system events over DBUS and invoking functions in hook-variables, you can make anything happen in a loosely coupled fashion.

Categories
Code Linux

Find the largest directories

With human readable sizes.

📂 📂 📂

This command lists the sub directories of a given path, along with total disk space usage for each of them, sorted by largest first:

find /some/path -maxdepth 1 -mindepth 1 -type d -print0|\
    xargs -0 du -bs|\
    sort -nr -k1|\
    awk -F\\t '{
      printf("%.2f GiB\t%s\n", $1/1024/1024/1024, $2)
    }'

Example output:

0,98 GiB	/some/path/x
0,91 GiB	/some/path/y
0,50 GiB	/some/path/.cache
0,00 GiB	/some/path/.smalldir

You can use it to do a quick analysis of disk space usage in your home directory, or anywhere else.

Explanation

The find(1) command lists all immediate sub directories below the starting point /some/path. This is done using a combination of the -type d predicate, as well as -mindepth 1 and -maxdepth 1. The root path /some/path has depth 0 and so will not be included. For each directory found, the path is printed with the -print0 action. This action prints the path string, then a single null-byte. This byte works as a record separator in a similar fashion to newline characters, which are more commonly used in pipelines.

Output will look something like this:

$ find /some/path -maxdepth 1 -mindepth 1 -type d -print0
/some/path/x/some/path/.cache/some/path/y/some/path/.smalldir

Since the null byte is not a printable character, you will not see it directly in a terminal, and it looks like all the paths are mashed together into a single string. So why use this character ? Well, it is the absolute safest wrt. special characters and white space, as it cannot occur in file system names. So you are pretty much guaranteed that it will always work as a unique separator. Also, the xargs command, which consumes the find output in the pipeline, supports using the null character as separator, so why not.

|

Which brings us to the next command in the pipeline: xargs(1). This program will process input records according to rules documented in its manual page, and execute a specified command with one or more input records as command arguments. In this case, we tell xargs to invoke the command du -bs. To put it simply, xargs reads input records and uses those as arguments for a command that it will invoke. We also tell xargs to consider the null byte as the input record separator, with the -0 option.

The effect is that du is invoked like this:

$ du -bs /some/path/x /some/path/.cache /some/path/y /some/path/.smalldir
1048580096	/some/path/x
536875008	/some/path/.cache
978329600	/some/path/y
4096	/some/path/.smalldir

The du(1) command can calculate disk space usage for file system objects. We tell it to output sizes in bytes with -b. The -s option tells du to summarize each argument, which means only print total usage for exactly the provided arguments (skip info about their contents).

As we can see, du prints lines by default, where each line consists of two columns, and the columns are separated by a single tab character. The first column is total size in bytes and the second column is the directory path.

|

Next, the lines are sorted using sort(1). Option -k1 means use the first column as sorting key, -n tells sort to interpret the values numerically and -r reverses the natural ordering, so we get the biggest numbers first. Sort will output the same set of lines, but of course, in sorted order:

$ ...|sort -nr -k1
1048580096	/some/path/x
978329600	/some/path/y
536875008	/some/path/.cache
4096	/some/path/.smalldir

|

Lastly, awk(1) is used to transform the lines so that the raw byte sizes are converted to a more human readable form. We tell awk to only consider the tab character as column separator with -F\\t – which ensures all lines are parsed as exactly two fields1. Otherwise, paths with spaces would give us problems here.

The awk program simply executes printf("%.2f GiB\t%s\n"...) function for each line of input, using the size (in gibibytes2) as first format argument and the directory path as second. Field values are automatically available in numbered variables $1, $2, ... Numbers are formatted as floating point rounded to two decimals.

$ ...|awk -F\\t '{
      printf("%.2f GiB\t%s\n", $1/1024/1024/1024, $2)
    }'
0,98 GiB	/some/path/x
0,91 GiB	/some/path/y
0,50 GiB	/some/path/.cache
0,00 GiB	/some/path/.smalldir
  1. In awk terminology, columns are called fields and lines are called records. F is a mnemonic for Field separator. ↩︎
  2. Often confused with gigabytes, and nearly the same thing. 1 gibibyte is 10243 bytes, while 1 gigabyte is 10003 bytes. You can modify the awk program to suit your human readable size preference. ↩︎