Backups now and in the future

I have never lost a single piece of data valuable to me in 20 plus years of computing. That is a very boring fact. But anyway. Deep in some archives, I even have copies of the very first web pages I made, back in 1996 (ugly and embarrassing, but funny stuff). These days, internet services (or “cloud” if you like) is all the rage for backups. There are the huge players, like Dropbox, Google Drive, OneDrive, iCloud and so forth. They all have client agents able to synchronize local data off to remote cloud storage. For mobile devices, everything’s automated to the point where the user doesn’t have to worry at all. (Except for their own privacy.) There are basically tons of services where you can send off your data and not worry about the details of how it is kept safe. I want more control over how and when my personal data is shipped off, so I only use such services for one thing related to backups: store gigabytes of locally encrypted backup archives that I upload manually about once a month.

If I was to make a list of 10 general requirements to a site backup system, well, this would probably be it:

  1. No human involvement: Backups must be performed automatically and regularly.
  2. Low maintenance: Backup must require little effort to setup on client machines (very little configuration).
  3. Availability: Backups must not interfere with regular usage of client devices.
  4. Storage efficiency: Backup snapshots must be stored incrementally (using some delta-scheme).
  5. Versioning: Older versions of files must be kept, and the most recent backup snapshot should be easily accessible.
  6. Privacy: Backups must always be encrypted at the source end before being sent off to a remote storage location or saved onto external media.
  7. Redundancy: Backup archives should be kept at multiple physical sites, to avoid complete loss of everything in case one site burns down to the ground.
  8. Redundancy: Encrypted backup archives of the latest snapshots should be transferred to an offsite storage location regularly, but not necessarily as often as automated local site backups are performed.
  9. Redundancy: The encrypted backup archives should also be kept at the local site.
  10. Monitoring: an automated backup system must warn promptly if problems arise, but otherwise stay silent and do its job. For details, a log of all operations must be available somewhere locally.

This list is of course not a random selection of 10 good points about data safety best practices. I have had a custom system in place for several years, which satisfies most of these criteria. It automates all steps except for number 8; I manually upload encrypted archives to offsite cloud storage (currently using Google drive). The archives themselves are automatically generated at the end of each month, so all I need to do is open a browser, access a local network share and initiate the upload procedure. This requires very little of my time.

My setup is based around a central backup server model, where the server pulls data from client hosts that are online on the local network. It’s a rather substantial shell script solution with support for configuration as code, pattern based exclusion rules, pluggable hooks and it uses rdiff-backup as the engine internally. Backup snapshots are saved to backup server local storage. The most recent (current) host snapshots are directly available at all times on the backup server (an advantage of rdiff-backup). The backup job runs at regular intervals and retries hosts missed in previous attempts within the course of a day. The snapshots are complete, in that all operating system files are backed up, in addition to personal data. The ssh protocol is used for transport across network.

In general this setup has worked very well over the years. Client setup is very lightweight, only requiring the installation of OpenSSH server, rdiff-backup and ssh key setup for root user. That procedure is automated using Ansible. An obvious weakness is that there is no support for Windows hosts or mobile devices. I don’t regularly use Windows-clients except for work-related things, so it is not a big deal. But a simple solution I’ve used in the past is to simply [shadow] copy from Windows host to a network share on a Linux host that is backed up.

So, in this modern age, is my trusty but crusty backup regime still a good solution ? All in all, yes, since data still boils down to files with valuable bytes in them, and my solution gives me absolute control and privacy. It has some disadvantages, though. Laptop clients will not be backed up when they are not present on my local network, since it is a pull model. So I am considering some options like dejadup, but it needs to support complete system backups. I have not investigated how well that works with dejadup. (rdiff-backup has proven itself to be excellent in this respect and happily creates full file based host system snapshots without issue.)

Inspector Pom on Github

I’ve brushed up a handy tool I wrote a few years ago. It allows quick inspection of Maven pom.xml files on the command line, producing human readable plain text output. If you’ve ever found yourself grepping pom.xml files for version numbers or other bits of information, you might find it useful. Check it out on Github: https://github.com/oyvindstegard/pom

Inspector Pom command line
Example of output

Slow HP Elitebook 840 laptop ?

I currently use an employer issued HP Elitebook 840 G4-laptop for development work. It has annoyed me. It has bothered me. It has always been totally under-performing, despite its Intel Core i7 CPU, and even with power connected and all Windows power management settings verified to be sane. I didn’t figure out why until recently ..

As it turns out, the laptop has unfortunate default BIOS-settings coupled with a power supply that is probably too weak. At a glance, all the important settings look ok: CPU turbo boost enabled, multi-core, hyper threading and virtualization enabled, runtime power management enabled. But there is another option that turns out to be extremely important for performance. It’s buried under “Built-in device options”, and it’s called “Boost Converter”. Off by default. There is no explanation in the BIOS itself about what this option does. (Hello, hardware vendors: how difficult can it be to provide a one sentence explanation for ALL the settings ?)

Eventually I found a PDF document online that explains various HP BIOS options. And it describes “Boost Converter” as:

Draws power from the battery to give the CPU a momentary performance gain.

http://h10032.www1.hp.com/ctg/Manual/c03481740

Check ! Enabled. And the laptop suddenly becomes much more powerful when connected to A/C power. Because now, CPU turbo boost actually works. Instead of maxing out at 2,8GHz core speed, it now happily jumps up to 3,8GHz when needed. (There is also a BIOS option to enable turbo boost when only running on battery power, but I left that off.)

I now enjoy more fan noise, in addition to faster software builds, faster Docker containers and a faster IntelliJ. And in case you wondered, the battery still charges, even though it us used for extra power in addition to A/C when needed.