The ffprobe tool, which is part of ffmpeg, can be used to extract all sorts of technical information from media streams. It is very capable, but is perhaps not the most intuitive tool with regards to its options. Therefore, examples are handy to get started. This post will collect examples focusing on metadata tags and the JSON output format.
Extract format section to JSON
The format section is a global part of the media container and provides generic information about the container streams. It also contains the metadata tags, like title, artist, year, etc. In this example, the file is an MP4 audio file.
$ ffprobe -v quiet -of json -show_entries format audio.m4a
{
"format": {
"filename": "audio.m4a",
"nb_streams": 1,
"nb_programs": 0,
"format_name": "mov,mp4,m4a,3gp,3g2,mj2",
"format_long_name": "QuickTime / MOV",
"start_time": "0.000000",
"duration": "3.024000",
"size": "27730",
"bit_rate": "73359",
"probe_score": 100,
"tags": {
"major_brand": "M4A ",
"minor_version": "512",
"compatible_brands": "isomiso2",
"genre": "Tone",
"artist": "A",
"title": "A title",
"album": "X",
"date": "2018",
"encoder": "Lavf58.13.100"
}
}
}
-v quiet | Be quiet with diagnostics to stderr . |
-of json | Select JSON output format. |
-show_entries format | Select the format section. |
Extract only the metadata tags
$ ffprobe -v quiet -of json -show_entries format_tags Movie.mkv
{
"format": {
"tags": {
"title": "The Movie",
"DATE_RELEASED": "2008",
"creation_time": "2022-02-11T18:26:58.000000Z",
"ENCODER": "Lavf58.29.100"
}
}
}
Here we select the format_tags section from a video file in Matroska format, omitting the other bits of information under format. Finally, we can use jq to select parts of the JSON data:
$ ffprobe -v quiet -of json -show_entries format_tags Movie.mkv|\
jq -r .format.tags.title
The Movie
Other output formats
ffprobe supports several output formats. Personally I like JSON, since it is very well defined with regard to syntax and escaping of special characters, etc. However, depending on use case, other forms may be more suitable for extraction of data in contexts like shell scripts.