Categories
Code Linux

Find the largest files in a directory tree

Display human readable sizes.

🗄 🗄 🗄

This command lists the 20 biggest files anywhere in a directory tree, biggest first:

find /some/path -type f -printf '%s\t%P\n'|\
     sort -nr -k1|\
     awk -F\\t '{
         printf("%.2f MiB\t%s\n", $1/1024/1024, $2);
       }'|\
     head -n 20

Example output:

1000,00 MiB	x/file4.dat
500,00 MiB	y/file5.dat
433,00 MiB	y/z/file6.dat
300,00 MiB	file3.dat
20,00 MiB	file2.dat
1,00 MiB	file1.dat
0,00 MiB	tiny.txt

Replace /some/path with a directory path of your choice.

Explanation

The find(1) command is used to gather all the necessary data. For each file in a directory tree /some/path, the -printf action is invoked to produce a line containing the file size and the path.

$ find /some/path -type f -printf '%s\t%P\n'
1048576000	x/file4.dat
1048576	file1.dat
314572800	file3.dat
524288000	y/file5.dat
454033408	y/z/file6.dat
0	tiny.txt
20971520	file2.dat

The lines use a tab character as a column separator, which is typical.

These lines are piped to sort(1), which sorts on the first column -k1 numerically and in reverse order -nr (biggest first). Then, awk(1) is used to format the lines with human readable file sizes in MiB1 using printf. Awk is explicitly told to only use a single tab character as field/column separator with -F\\t. Finally, we apply head(1) to limit output to 20 lines.

  1. Short for mebibytes, which is commonly confused with megabytes and nearly the same thing. 1 mebibyte is 10242 bytes, while 1 megabyte is 10002 bytes. ↩︎