Extract media file tags with ffprobe

The ffprobe tool, which is part of ffmpeg, can be used to extract all sorts of technical information from media streams. It is very capable, but is perhaps not the most intuitive tool with regards to its options. Therefore, examples are handy to get started. This post will collect examples focusing on metadata tags and the JSON output format.

Extract format section to JSON

The format section is a global part of the media container and provides generic information about the container streams. It also contains the metadata tags, like title, artist, year, etc. In this example, the file is an MP4 audio file.

$ ffprobe -v quiet -of json -show_entries format audio.m4a
    "format": {
        "filename": "audio.m4a",
        "nb_streams": 1,
        "nb_programs": 0,
        "format_name": "mov,mp4,m4a,3gp,3g2,mj2",
        "format_long_name": "QuickTime / MOV",
        "start_time": "0.000000",
        "duration": "3.024000",
        "size": "27730",
        "bit_rate": "73359",
        "probe_score": 100,
        "tags": {
            "major_brand": "M4A ",
            "minor_version": "512",
            "compatible_brands": "isomiso2",
            "genre": "Tone",
            "artist": "A",
            "title": "A title",
            "album": "X",
            "date": "2018",
            "encoder": "Lavf58.13.100"
-v quietBe quiet with diagnostics to stderr.
-of jsonSelect JSON output format.
-show_entries formatSelect the format section.
Options breakdown

Extract only the metadata tags

$ ffprobe -v quiet -of json -show_entries format_tags Movie.mkv 
    "format": {
        "tags": {
            "title": "The Movie",
            "DATE_RELEASED": "2008",
            "creation_time": "2022-02-11T18:26:58.000000Z",
            "ENCODER": "Lavf58.29.100"

Here we select the format_tags section from a video file in Matroska format, omitting the other bits of information under format. Finally, we can use jq to select parts of the JSON data:

$ ffprobe -v quiet -of json -show_entries format_tags Movie.mkv|\
          jq -r .format.tags.title
The Movie

Other output formats

ffprobe supports several output formats. Personally I like JSON, since it is very well defined with regard to syntax and escaping of special characters, etc. However, depending on use case, other forms may be more suitable for extraction of data in contexts like shell scripts.

Hardware Linux

Capture images from a webcam using ffmpeg

The examples are for Linux and access the web camera through the Video4Linux2 interface. To control web camera settings, use the tool v4l2-ctl. To list connected camera devices, you can use the command: v4l2-ctl --list-devices. On a typical Debian-ish Linux distro, you will also want to add your user to the video and audio groups, so that you can easily access the webcam from a non-desktop session.

Capture to an image file, continually overwriting it with new contents

ffmpeg -y -f v4l2 -video_size 1280x720 -i /dev/video0 \
       -r 0.2 -qscale:v 2 -update 1 /tmp/webcam.jpg
-f v4l2specify input format explicitly as capture from a Video4Linux2 device
-video_size 1280x720specify video frame size from webcam
-i /dev/video0select input device (a UVC-compatible webcam in my case)
-r 0.2set output frame rate to one per 5 seconds
-qscale:v 2set video quality [JPEG quality in this case], 2 is highest quality.
-update 1Image2 muxer option, enable in place update of image file for each video output frame
Options breakdown

Point the output file to a place served by your web server to make your camera image available on the web. The ffmpeg command will run until interrupted or killed.

Add a timestamp to captured images

ffmpeg -y -f v4l2 -video_size 1280x720 -i /dev/video0 \
       -r 0.2 \
       -vf "drawtext=text=%{localtime}:fontcolor=white@1.0:fontsize=26:borderw=1:x=980:y=25" \
       -qscale:v 2 -update 1 /tmp/webcam.jpg

Here we have inserted the drawtext video filter into the processing pipeline. We use its text expansion facilities to simply render the local time onto each video frame with filter-argument text=%{localtime}. It is placed in the top right corner of the image using the x and y arguments.

Running as background job

You can ssh to the host which has the web camera connected, and start the ffmpeg capture process as a background job:

ffmpeg -y -loglevel fatal \
       -f v4l2 -video_size 1280x720 -i /dev/video0 \
       -r 0.2 \
       -vf "drawtext=text=%{localtime}:fontcolor=white@1.0:fontsize=26:borderw=1:x=980:y=25" \
       -qscale:v 2 -update 1 /tmp/webcam.jpg \
       </dev/null &>/tmp/webcam-ffmpeg.log & disown $!

This silences ffmpeg to log only fatal errors, runs it in the background and finally detaches the process from your [bash] shell’s job control, to avoid it being killed if you log out. A more polished solution would be to create a systemd service which controls the ffmpeg webcam capture process, running as a dedicated low privilege system user.

Creating a time lapse video from a bunch of image files

As a sort of bonus chapter on this post, here is how to create a time lapse video from a bunch of captured image files. Assuming you have a directory with JPEG images named in such a way that they sort chronologically by their filenames (padded sequence numbers or timestamps), here’s how you can transform them into a video.

VP9 video in WebM container:

ffmpeg -y -f image2 -pattern_type glob -framerate 30 \
       -i webcam-images/\*.jpg \
       -pix_fmt yuv420p -b 1500k timelapsevid.webm

H264 video in MP4 container:

ffmpeg -y -f image2 -pattern_type glob -framerate 30 \
       -i webcam-images/\*.jpg \
       -pix_fmt yuv420p -b 1500k timelapsevid.mp4
-f image2Input demuxer is Image2, which can read image files.
-pattern_type globInstructs Image2 demuxer to treat input pattern as file name glob.
-framerate 30Set desired framerate; how many images to display per second in the resulting video.
-i webcam-images/\*.jpgSet input to a glob pattern matching the images files you would like to include in the video. Note that we do not want the shell to expand the glob, but rather pass the asterisk verbatim to ffmpeg.
-pix_fmt yuv420pSet video codec pixel format. YUV420p is selected to ensure compatibility with a broad range of decoders/players.
-b 1500kSet desired bitrate of video file.
Options breakdown

Note that all input images should have the same dimensions. Otherwise, you will likely have to add more options to ffmpeg to transform everything to a single suitable video size.

The resulting video files will be suitable for publishing on the web using the <video> tag.


How to make a shell script log JSON-messages

If you have a shell script running in some environment where logs are expected to be formatted as JSON, it can be cumbersome to ensure all commands in the script output valid single line JSON-formatted messages, instead of just raw lines of text, which is what a shell script commonly does. Here I present a technique which can be used so that the script needs very little modifications to be able to output structured JSON instead of raw text lines.

We will setup a bash script so that its regular output is redirected to a JSON encoder co-process automatically. This is setup at the beginning of the script, and subsequent commands’ output will automatically be wrapped as JSON-messages. It requires the jq command to be present on the system where script runs.

#!/usr/bin/env bash

# 1 Make copies of shell's current stdout and stderr file
# descriptors:
exec 100>&1 200>&2

# 2 Define function which logs arguments or stdin as JSON-message to stdout:
log() {
    if [ "$1" = - ]; then
        jq -Rsc '{"@timestamp": now|strftime("%Y-%m-%dT%H:%M:%S%z"),
                  "message":.}' 1>&100 2>&200
        jq --arg m "$*" -nc '{"@timestamp": now|strftime("%Y-%m-%dT%H:%M:%S%z"),
                              "message":$m}' 1>&100 2>&200

# 3 Start a co-process which transforms input lines to JSON messages:
coproc JSON_LOGGER { jq --unbuffered -Rc \
      '{"@timestamp": now|strftime("%Y-%m-%dT%H:%M:%S%z"),
        "message":.}' 1>&100 2>&200; }

# 4 Finally redirect shell's stdout/stderr to JSON logger
# co-process:
exec 1>&${JSON_LOGGER[1]} 2>&${JSON_LOGGER[1]}

# What follows is whatever you need your script to do

echo Hello brave world
echo '  testing "escaping" and white  space  '
echo >&2 this goes do stderr


# If we want multiple output lines from a single command
# wrapped in a single JSON-message, we need to pipe it to the log function:
curl -sS --head|head -n 3|log -

# .. otherwise, each curl output line would become its own
# JSON-encoded message, which may not be desirable.

Output to a terminal with color support should look something like this:

{"@timestamp":"2021-07-01T14:55:15+02:00","message":"Hello brave world"}
{"@timestamp":"2021-07-01T14:55:15+02:00","message":"  testing \"escaping\" and white  space  "}
{"@timestamp":"2021-07-01T14:55:15+02:00","message":"this goes do stderr"}
{"@timestamp":"2021-07-01T14:55:16+02:00","message":"HTTP/2 200 \r\nserver:\r\ndate: Thu, 01 Jul 2021 12:55:10 GMT\r"}

Jq will take care of all the necessary escaping and always produce valid single line JSON-structured messages, regardless of message payload.


  • You could make the above setup reusable, by putting the code in its own file and sourcing it at the beginning of scripts that need it.
  • High precision timestamps cannot be generated natively in jq. If you need millisecond or higher precision on the log event timestamps, there are ways to do it, depending on the shell and command line tools available. If you have bash >= 5, you can modify the log() function so that timestamp string is generated using the expression:
    $(date +%Y-%m-%dT%H:%M:%S).${EPOCHREALTIME#*[.,]}$(date +%z). You’ll need to pass this as an --arg to jq for each invocation and use it in the JSON template. Also, it is a bit harder to accomplish for the log transformer co-process, because ideally we’d like only a single persistent jq process running, for efficiency reasons and no pipeline buffering.
  • You could easily expand the structured log messages by adding JSON fields to the jq templates. For example, you could add a level field, to indicate log level. Or a hostname field using the $HOSTNAME shell variable.
  • Building on the previous point, you could create two separate JSON encoder functions, where one handles stderr messages and logs them at level ERROR, while another one logs regular stdout as level INFO. Then create two co-processes for stdout and stderr, with different jq JSON templates respectively. Finally, redirect main stdout to the first co-process and stderr to the second.
  • For the log transformer co-process, be aware that pipeline buffering can have unfortunate effects, for instance missing the last log events before script exits. This is why jq is invoked with --unbuffered.