Backups now and in the future

I have never lost a single piece of data valuable to me in 20 plus years of computing. That is a very boring fact. But anyway. Deep in some archives, I even have copies of the very first web pages I made, back in 1996 (ugly and embarrassing, but funny stuff). These days, internet services (or “cloud” if you like) is all the rage for backups. There are the huge players, like Dropbox, Google Drive, OneDrive, iCloud and so forth. They all have client agents able to synchronize local data off to remote cloud storage. For mobile devices, everything’s automated to the point where the user doesn’t have to worry at all. (Except for their own privacy.) There are basically tons of services where you can send off your data and not worry about the details of how it is kept safe. I want more control over how and when my personal data is shipped off, so I only use such services for one thing related to backups: store gigabytes of locally encrypted backup archives that I upload manually about once a month.

If I was to make a list of 10 general requirements to a site backup system, well, this would probably be it:

No human involvement: Backups must be performed automatically and regularly.
Low maintenance: Backup must require little effort to setup on client machines (very little configuration).
Availability: Backups must not interfere with regular usage of client devices.
Storage efficiency: Backup snapshots must be stored incrementally (using some delta-scheme).
Versioning: Older versions of files must be kept, and the most recent backup snapshot should be easily accessible.
Privacy: Backups must always be encrypted at the source end before being sent off to a remote storage location or saved onto external media.
Redundancy: Backup archives should be kept at multiple physical sites, to avoid complete loss of everything in case one site burns down to the ground.
Redundancy: Encrypted backup archives of the latest snapshots should be transferred to an offsite storage location regularly, but not necessarily as often as automated local site backups are performed.
Redundancy: The encrypted backup archives should also be kept at the local site.
Monitoring: an automated backup system must warn promptly if problems arise, but otherwise stay silent and do its job. For details, a log of all operations must be available somewhere locally.

This list is of course not a random selection of 10 good points about data safety best practices. I have had a custom system in place for several years, which satisfies most of these criteria. It automates all steps except for number 8; I manually upload encrypted archives to offsite cloud storage (currently using Google drive). The archives themselves are automatically generated at the end of each month, so all I need to do is open a browser, access a local network share and initiate the upload procedure. This requires very little of my time.

My setup is based around a central backup server model, where the server pulls data from client hosts that are online on the local network. It’s a rather substantial shell script solution with support for configuration as code, pattern based exclusion rules, pluggable hooks and it uses rdiff-backup as the engine internally. Backup snapshots are saved to backup server local storage. The most recent (current) host snapshots are directly available at all times on the backup server (an advantage of rdiff-backup). The backup job runs at regular intervals and retries hosts missed in previous attempts within the course of a day. The snapshots are complete, in that all operating system files are backed up, in addition to personal data. The ssh protocol is used for transport across network.

In general this setup has worked very well over the years. Client setup is very lightweight, only requiring the installation of OpenSSH server, rdiff-backup and ssh key setup for root user. That procedure is automated using Ansible. An obvious weakness is that there is no support for Windows hosts or mobile devices. I don’t regularly use Windows-clients except for work-related things, so it is not a big deal. But a simple solution I’ve used in the past is to simply [shadow] copy from Windows host to a network share on a Linux host that is backed up.

So, in this modern age, is my trusty but crusty backup regime still a good solution ? All in all, yes, since data still boils down to files with valuable bytes in them, and my solution gives me absolute control and privacy. It has some disadvantages, though. Laptop clients will not be backed up when they are not present on my local network, since it is a pull model. So I am considering some options like dejadup, but it needs to support complete system backups. I have not investigated how well that works with dejadup. (rdiff-backup has proven itself to be excellent in this respect and happily creates full file based host system snapshots without issue.)