shell – Page 3 – Øyvind Stegard

A critical section is some piece of code which, due to its nature and effects, should be executed by at most one thread or process at a time. If such code is executed concurrently, the results often become undefined and arbitrary. Atomically locking a shared resource is a common pattern to synchronize execution of critical sections and ensure mutually exclusive access.

Shell scripts mostly deal in processes and files, and there are several common scenarios where code is actually a critical section. If such code is run in several processes concurrently, it could introduce race conditions and arbitrary results. Consider a script that starts a background process if it is not already running – a typical pattern to start a singleton daemon process. Such code is a critical section, because you can end up with two running daemons if the code runs concurrently (and the daemon does not check for other instances of itself). Another good example is multiple scripts writing to a shared file, or even a shared directory structure.

There are a few strategies to implement locking in shell scripts, some are better than others. In this post, I will focus on one of the most robust ways: using flock(1). This nice tool gives you access to kernel level file locking from you shell. It has some clear advantages over traditional existence based file locking:

It is truly atomic.
The kernel manages the locks and releases them automatically when lock owning processes die. So no more stale lock files to clean up.
You can block and wait for lock, indefinitely or with a timeout, and instantly get it when another process frees it. (Avoid lock polling loops with sleeping.)

It is important to understand that file locks are tied to both a file on the file system and running processes with open file descriptors to it. Even if a file used for locking exists on the file system, it does not mean the lock is taken ! The file system acts as a namespace of shared resources on which we can attach locks. Also note that the locks are advisory only – if a process does not care to check for locks, it will not participate in any synchronization and can do whatever it pleases.

Shell script with locking functions

We will look at a script which needs to protect a critical section with locking. The locking shall be done on a common file job.lock, which means any process with access to that file can obtain or check for a lock. To code along, you can copy the script to your own file and run the examples.

job.sh

#!/bin/bash

lock_acquire() {
    # Open a file descriptor to lock file
    exec {LOCKFD}>job.lock || return 1

    # Block until an exclusive lock can be obtained on the file descriptor
    flock -x $LOCKFD
}

lock_release() {
    test "$LOCKFD" || return 1
    
    # Close lock file descriptor, thereby releasing exclusive lock
    exec {LOCKFD}>&- && unset LOCKFD
}

lock_acquire || { echo >&2 "Error: failed to acquire lock"; exit 1; }

# --- Begin critical section ---

if [ -f job.dat ]; then
    value=$(<job.dat)
else
    value=0
fi
value=$((value + 1))

echo $value >job.dat

# --- End critical section ---

lock_release

The lock_acquire function uses flock -x N to obtain an exlusive lock on a file descriptor N. Since the file descriptor is opened by the script process itself, it will be the owner of the lock after flock exits. Flock is able to lock the descriptor because it is inherited from the shell process that started it. The critical section reads a number from a file if it exists, increments it by one, and writes the updated number back to the file.

Testing

First we’ll run the job script once:

$ bash job.sh 
$ ls
job.dat  job.lock  job.sh
$ cat job.dat 
1

A job.dat file is produced with a value of 1, which is entirely expected and not very interesting.

Next we’ll start 100 job processes asynchronously as fast as possible in the background, which means that many of them will run concurrently. We do this two times:

$ rm job.dat 
$ (for i in {1..100}; do bash job.sh & done; wait)
$ cat job.dat 
100
$ (for i in {1..100}; do bash job.sh & done; wait)
$ cat job.dat 
200

The for loop is started in a sub shell to avoid job control messages. The data has been incremented exactly 100 times after the first run, and incremented again by 100 after the second. If you look at the code in the critical section, it both reads, updates and then writes to the shared file, and doing this without locking would not work consistently.

Actually, let us try that, by commenting out the lock_acquire call in the script:

[...]

#lock_acquire || { echo >&2 "Error: failed to acquire lock"; exit 1; }

# --- Begin critical section ---

Then we run the test again:

$ rm job.dat
$ (for i in {1..100}; do bash job.sh & done; wait)
$ cat job.dat
3
$ (for i in {1..100}; do bash job.sh & done; wait)
$ cat job.dat
14

Which ends with a final result of 14, clearly incorrect and arbitrary. The results will vary with each run and depend on things like the speed of your computer.

Releasing the lock ?

In this case the script actually does not need to release the lock right before it exits, because the kernel will automatically do that when the process exits anyway. We will try it by re-enabling the locking call, and commenting out the lock_release call:

[...]

lock_acquire || { echo >&2 "Error: failed to acquire lock"; exit 1; }

# --- Begin critical section ---
[...]
# --- End critical section ---

#lock_release

And run the test:

$ rm job.dat
$ (for i in {1..100}; do bash job.sh & done; wait)
$ (for i in {1..100}; do bash job.sh & done; wait)
$ cat job.dat
200

It still works fine.

Starting a daemon process from your shell init scripts

A common use case is the need to start a single daemon process from your shell init scripts, unless one is already running. So you only ever want one such thing running. Consider the following:

if ! ps -ef|grep some-daemon|grep -qv grep; then
    some-daemon & pid=$!
    echo Started some-daemon with pid $pid
fi

This code is racy unless it is protected by locking. If you were to start two terminals more or less simultaneously, both executing your shell init scripts, you could possibly end up with two running daemon processes.

To protect this with flock, you could do the following:

if exec {bashrc_fd}<~/.bashrc && flock -nx $bashrc_fd; then
    # --- Begin critical section ---
    if ! ps -ef|grep some-daemon|grep -qv grep; then
        some-daemon & pid=$!
        echo Started some-daemon with pid $pid
    fi
    # --- End critical section ---

    flock -u $bashrc_fd && exec {bashrc_fd}>&-
fi

Here we open a read-only file descriptor to ~/.bashrc and then try to grab an exclusive lock on it, but we do it non-blocking with option -n. If some other bash process is already executing that part of the init file, flock will not succeed and immediately exit with a non-zero code, so the block is skipped. It has the effect that only one bash process will execute the code, and others running at the same time will skip it.

You may notice that we explicitly release the lock using flock -u $bashrc_fd after the critical section. Normally it is enough to close the file descriptor used for locking, but when starting child processes, those may inherit and keep such descriptors open. So the parent process closing its copy of the descriptor may not be enough actually release lock. Therefore we do it explicitly.

Closing notes

The manual page for the flock command lists a few good examples of how you can use it your scripts. However, none of those examples show how you can make the current shell process own and control the locks without using sub shells or flock invocations to wrap critical sections/commands.

The manual page for the flock(2) system call is a good read if you are interested in more details about how it works.

Read more about handling file descriptors with Bash in this part of the bash(1) manual.

The ffprobe tool, which is part of ffmpeg, can be used to extract all sorts of technical information from media streams. It is very capable, but is perhaps not the most intuitive tool with regards to its options. Therefore, examples are handy to get started. This post will collect examples focusing on metadata tags and the JSON output format.

Extract format section to JSON

The format section is a global part of the media container and provides generic information about the container streams. It also contains the metadata tags, like title, artist, year, etc. In this example, the file is an MP4 audio file.

$ ffprobe -v quiet -of json -show_entries format audio.m4a
{
    "format": {
        "filename": "audio.m4a",
        "nb_streams": 1,
        "nb_programs": 0,
        "format_name": "mov,mp4,m4a,3gp,3g2,mj2",
        "format_long_name": "QuickTime / MOV",
        "start_time": "0.000000",
        "duration": "3.024000",
        "size": "27730",
        "bit_rate": "73359",
        "probe_score": 100,
        "tags": {
            "major_brand": "M4A ",
            "minor_version": "512",
            "compatible_brands": "isomiso2",
            "genre": "Tone",
            "artist": "A",
            "title": "A title",
            "album": "X",
            "date": "2018",
            "encoder": "Lavf58.13.100"
        }
    }
}

`-v quiet`	Be quiet with diagnostics to `stderr`.
`-of json`	Select JSON output format.
`-show_entries format`	Select the format section.

Options breakdown

Extract only the metadata tags

$ ffprobe -v quiet -of json -show_entries format_tags Movie.mkv 
{
    "format": {
        "tags": {
            "title": "The Movie",
            "DATE_RELEASED": "2008",
            "creation_time": "2022-02-11T18:26:58.000000Z",
            "ENCODER": "Lavf58.29.100"
        }
    }
}

Here we select the format_tags section from a video file in Matroska format, omitting the other bits of information under format. Finally, we can use jq to select parts of the JSON data:

$ ffprobe -v quiet -of json -show_entries format_tags Movie.mkv|\
          jq -r .format.tags.title
The Movie

Other output formats

ffprobe supports several output formats. Personally I like JSON, since it is very well defined with regard to syntax and escaping of special characters, etc. However, depending on use case, other forms may be more suitable for extraction of data in contexts like shell scripts.

Or how to avoid..