Categories
Code Linux

Handling OS events in Emacs Lisp

Emacs can access files on any remote server running an ssh or SFTP service, assuming that an ssh client is installed on the host where Emacs runs. I use this extensively, since I run my own personal home servers available over the internet. On those servers, I have files and resources that I access all time, from anywhere.

/ssh:server.example.com:~/myfile.txt

Opening this path in Emacs will automatically open an ssh-connection to server.example.com and transparently allow editing of myfile.txt. Emacs uses the TRAMP package (Transparent Remote Access, Multiple Protocols) to provide this functionality.

TRAMP is designed to re-use existing ssh-connections for accessing multiple resources (on the same server). When using a laptop, where network conditions change and the system is frequently suspended, such persistent connections tend to hang for a while after the operating system has resumed operation, which can block Emacs. This is annoying, especially when I’d like to access a remote file immediately, and ssh hasn’t yet detected that its TCP connection has become unusable.

TRAMP provides a convenient function to clean up all existing connections, aptly named tramp-cleanup-all-connections, which I want to automatically call when the operating system wakes up from a suspended state or if the network changes.

Detecting operating system events on Linux

If running in a typical Linux environment, you can make Emacs listen for DBUS system bus events. Check out my packages nm.el and upower.el for code which reacts to networking and power management events.

For example, to automatically clean up remote connections whenever network connects or re-connects, the following code would work (requires that NetworkManager is used by Linux distro):

(load-file "/path/to/nm.el")
(add-hook 'nm-connected-hook 'tramp-cleanup-all-connections)
(nm-enable)

Or to do the same whenever the machine resumes from suspend:

(load-file "/path/to/upower.el")
(add-hook 'upower-resume-hook 'tramp-cleanup-all-connections)
(upower-enable)

Detecting resume by using system clock

I also use Emacs on Windows sometimes, in a WSL environment where DBUS is not available. But we can still detect if the machine has been resumed in a generic manner, by observing the system clock:

(defvar clock-jump-detector-hook nil
  "Functions to run when a system clock jump is detected.")
(defvar clock-jump-detector-threshold (* 5 60)
  "Minimum time skip (in seconds) to consider
it a system clock jump. When it is detected that
the system clock jumps with more than this number of seconds, then hooks in `clock-jump-detector-hook'
are run.")

(setq clock-jump-detector-time (current-time))
(defun clock-jump-detector ()
  (let ((time-passed (float-time (time-since clock-jump-detector-time))))
    (setq clock-jump-detector-time (current-time))
    (when (> time-passed clock-jump-detector-threshold)
      (message "Clock jump of %f seconds detected, running hooks .." time-passed)
      (run-hooks 'clock-jump-detector-hook))))
(run-at-time t 15 'clock-jump-detector)

The code records the current time every 15 seconds, and if a time jump beyond a threshold is detected (a «clock jump»), then all functions in hook-variable clock-jump-detector-hook are invoked.

By having Emacs listen for operating system events over DBUS and invoking functions in hook-variables, you can make anything happen in a loosely coupled fashion.

Categories
Code Linux

Find the largest directories

With human readable sizes.

📂 📂 📂

This command lists the sub directories of a given path, along with total disk space usage for each of them, sorted by largest first:

find /some/path -maxdepth 1 -mindepth 1 -type d -print0|\
    xargs -0 du -bs|\
    sort -nr -k1|\
    awk -F\\t '{
      printf("%.2f GiB\t%s\n", $1/1024/1024/1024, $2)
    }'

Example output:

0,98 GiB	/some/path/x
0,91 GiB	/some/path/y
0,50 GiB	/some/path/.cache
0,00 GiB	/some/path/.smalldir

You can use it to do a quick analysis of disk space usage in your home directory, or anywhere else.

Explanation

The find(1) command lists all immediate sub directories below the starting point /some/path. This is done using a combination of the -type d predicate, as well as -mindepth 1 and -maxdepth 1. The root path /some/path has depth 0 and so will not be included. For each directory found, the path is printed with the -print0 action. This action prints the path string, then a single null-byte. This byte works as a record separator in a similar fashion to newline characters, which are more commonly used in pipelines.

Output will look something like this:

$ find /some/path -maxdepth 1 -mindepth 1 -type d -print0
/some/path/x/some/path/.cache/some/path/y/some/path/.smalldir

Since the null byte is not a printable character, you will not see it directly in a terminal, and it looks like all the paths are mashed together into a single string. So why use this character ? Well, it is the absolute safest wrt. special characters and white space, as it cannot occur in file system names. So you are pretty much guaranteed that it will always work as a unique separator. Also, the xargs command, which consumes the find output in the pipeline, supports using the null character as separator, so why not.

|

Which brings us to the next command in the pipeline: xargs(1). This program will process input records according to rules documented in its manual page, and execute a specified command with one or more input records as command arguments. In this case, we tell xargs to invoke the command du -bs. To put it simply, xargs reads input records and uses those as arguments for a command that it will invoke. We also tell xargs to consider the null byte as the input record separator, with the -0 option.

The effect is that du is invoked like this:

$ du -bs /some/path/x /some/path/.cache /some/path/y /some/path/.smalldir
1048580096	/some/path/x
536875008	/some/path/.cache
978329600	/some/path/y
4096	/some/path/.smalldir

The du(1) command can calculate disk space usage for file system objects. We tell it to output sizes in bytes with -b. The -s option tells du to summarize each argument, which means only print total usage for exactly the provided arguments (skip info about their contents).

As we can see, du prints lines by default, where each line consists of two columns, and the columns are separated by a single tab character. The first column is total size in bytes and the second column is the directory path.

|

Next, the lines are sorted using sort(1). Option -k1 means use the first column as sorting key, -n tells sort to interpret the values numerically and -r reverses the natural ordering, so we get the biggest numbers first. Sort will output the same set of lines, but of course, in sorted order:

$ ...|sort -nr -k1
1048580096	/some/path/x
978329600	/some/path/y
536875008	/some/path/.cache
4096	/some/path/.smalldir

|

Lastly, awk(1) is used to transform the lines so that the raw byte sizes are converted to a more human readable form. We tell awk to only consider the tab character as column separator with -F\\t – which ensures all lines are parsed as exactly two fields1. Otherwise, paths with spaces would give us problems here.

The awk program simply executes printf("%.2f GiB\t%s\n"...) function for each line of input, using the size (in gibibytes2) as first format argument and the directory path as second. Field values are automatically available in numbered variables $1, $2, ... Numbers are formatted as floating point rounded to two decimals.

$ ...|awk -F\\t '{
      printf("%.2f GiB\t%s\n", $1/1024/1024/1024, $2)
    }'
0,98 GiB	/some/path/x
0,91 GiB	/some/path/y
0,50 GiB	/some/path/.cache
0,00 GiB	/some/path/.smalldir
  1. In awk terminology, columns are called fields and lines are called records. F is a mnemonic for Field separator. ↩︎
  2. Often confused with gigabytes, and nearly the same thing. 1 gibibyte is 10243 bytes, while 1 gigabyte is 10003 bytes. You can modify the awk program to suit your human readable size preference. ↩︎
Categories
Code Linux

Find the largest files in a directory tree

Display human readable sizes.

🗄 🗄 🗄

This command lists the 20 biggest files anywhere in a directory tree, biggest first:

find /some/path -type f -printf '%s\t%P\n'|\
     sort -nr -k1|\
     awk -F\\t '{
         printf("%.2f MiB\t%s\n", $1/1024/1024, $2);
       }'|\
     head -n 20

Example output:

1000,00 MiB	x/file4.dat
500,00 MiB	y/file5.dat
433,00 MiB	y/z/file6.dat
300,00 MiB	file3.dat
20,00 MiB	file2.dat
1,00 MiB	file1.dat
0,00 MiB	tiny.txt

Replace /some/path with a directory path of your choice.

Explanation

The find(1) command is used to gather all the necessary data. For each file in a directory tree /some/path, the -printf action is invoked to produce a line containing the file size and the path.

$ find /some/path -type f -printf '%s\t%P\n'
1048576000	x/file4.dat
1048576	file1.dat
314572800	file3.dat
524288000	y/file5.dat
454033408	y/z/file6.dat
0	tiny.txt
20971520	file2.dat

The lines use a tab character as a column separator, which is typical.

These lines are piped to sort(1), which sorts on the first column -k1 numerically and in reverse order -nr (biggest first). Then, awk(1) is used to format the lines with human readable file sizes in MiB1 using printf. Awk is explicitly told to only use a single tab character as field/column separator with -F\\t. Finally, we apply head(1) to limit output to 20 lines.

  1. Short for mebibytes, which is commonly confused with megabytes and nearly the same thing. 1 mebibyte is 10242 bytes, while 1 megabyte is 10002 bytes. ↩︎