Skip to content

Building an Offsite Backup NAS

Disclosure: Some of the links below are affiliate links. This means that, at zero cost to you, I will earn an affiliate commission if you click through the link and finalise a purchase. Learn more.

Hard drive on fire spurting images
Making sure that failures like these don't cause total data loss (AI generated image)

You've likely heard of the 3-2-1 rule for backups. If not, it's really simple:

  • You want 3 copies of your data,
  • of these copies, 2 on different mediums (e.g., HDD/tape),
  • and finally there should be 1 offsite copy.

When forming your backup plan, you should consider each of these requirements, and formulate a plan for how you'll fulfill them. You also want to consider the type of data you'll be storing, how frequently it'll be accessed, and whether the data is truly irreplaceable, or something that can be re-downloaded or imported from another medium.

In this article, I'll be explaining my 3-2-1 backup solution, including the architecture, costs, and overall performance.

Types of Data

For me, the main data I store on the NAS are old video libraries from holidays over the years, backups of old computers, photo albums, and server backups. I also share this NAS with my Dad and my brother, so that we can all have (relatively) cheap on-site and off-site storage without paying through the eyeballs for Google Drive, Dropbox, or a myriad of other storage providers.

Their data needs are broadly similar to mine, and the NAS is a relatively light-duty machine, not needing to serve many files synchronously (e.g., as with a video edit or database server).

The Architecture

To fulfil the 3-2-1 rule, I elected for two almost identical builds, one of which would live at my house, and the other which would live at Dad's. Both builds comprise two 6TB hard drives in mirrored vdevs, running ZFS. This means that either NAS could hypothetically have a drive failure without compromising the local data storage, and both drives could fail, or there could be a fire, or some other issue which kills the system, with a full remote backup at the second location.

This 4x overhead is larger than a typical data centre would use for their storage, as we consume 24TB of drives for 6TB of space. So far, this hasn't been an issue, as after 4 years or so including ingest of old footage, we've only used 6TB total.

One of the NAS machines also has an additional 8TB drive for the Plex library, which comprises media I can easily re-import, meaning there is no need to have a mirrored vdev for this data. The NAS on my network is running-lion, with the remote machine living at Dad's called diving-orca.

For true disaster recovery, I also plan to back up to the AWS S3 Glacier Deep Archive (GDA), which is really cheap for storing files and is great as a last line of defence, but from some simple calculations on the AWS calculator, I'll be paying through the eyeballs to ingest the initial files, and if I ever need to restore from this backup, I'll be spending loads of money on outbound data transfers.

Why would it cost so much?

At current usage, we'd need to store 2.2TB of data in the GDA. From a quick df -i I see that we are currently using 1704298 inodes, which loosely correspond to the total number of files.

I can put this into the AWS Calculator, which tells me the costs. For this estimate, I based it on 6TB of data per month with an average size of 1.29MB per file (2.2TB divided by 1704298 inodes). As I'd need to ingest this data, I'd have to make 1704298 requests at the minimum, which would work out as $85.

Storage of this data is cheap enough, at just over $7 per month assuming I don't need to retrieve anything, probably about $10 once I do integrity checks.

If I then need to restore this data from the archive, I suddenly hit the stupid AWS 'anti-egress' costs of $0.09 per GB, or $552 to get my data back.

In short, the GDA service could be useful, but only if I can somehow make my average file size bigger (looking at you node_modules), and if I never need to pull the data out of the service. AWS is very much geared towards the new startup that needs lots of compute quickly and very quickly begins to bite you if you misconfigure anything or look at it wrong.

Hardware Choice

The overall system comprises two identical Pi 4Bs, which run Raspberry Pi OS. Each Pi is currently connected to two 6TB Seagate Expansion HDDs, with running-lion also connected to an 8TB version of the same drive.

The NAS systems don't include any power loss protection, which is something that I would like to change in the future with the use of a UPS. On running-lion, there is a fan connected to a couple of the GPIO pins, which is running a hardware-based PWM control service PiFan, in addition to a small buzzer for notification (e.g., when the internet dies) and a door contact for smart home presence detection. The server also runs Plex, which just about works, provided the video and audio streams are in the correct format. I'm looking at using PlexCleaner to fix those files that can't direct play in the future, in addition to seeing if there's any way to transcode in a clustered way, or use hardware and a cheap GPU for lower-bandwidth remote connections.

As diving-orca simply serves files over the network, there is no need for a fan. Temperatures and load are monitored through Prometheus anyway, so this can be sorted with the addition of a fan at a later date if it becomes an issue.

Disk Encryption

The disks were historically encrypted with LUKS, but now I use ZFS and mirrored vdevs, encryption is handled natively through ZFS. As these reside at home, and it's unlikely that the disks will be stolen, the encryption is mainly for EOL disposal. The keys reside on the Pis and automatically unlock the drives at boot.

If we preferred security to convenience, we could always remove the keys from the Pis and manually enter the decryption key at boot, but this inevitably means calls to me when the shares are inaccessible. I'd like to change this in the future (you'll notice a bit of a theme here) as part of a remote unlock procedure and as part of a 'secure boot' of the Raspberry Pis. That's a project to think about another time, however.

Access

As there are two NAS machines in operation, and we don't present a standardised interface like we could if we used Ceph, or GlusterFS, for example, each user picks a primary NAS. To connect and minimise end configuration, we each connect via rl.chza.me or do.chza.me to our corresponding shares. These are simply convenience CNAMEs which point to the hostnames of the machines.

These records map to running-lion.net.argnoric.com and diving-orca.net.argnoric.com, which have A records assigned on OVH and are updated with their DynHost API. The records are locally overwritten by the DNS resolver on the company VPN, which instead map the records through to their VPN endpoints.

Possible Issues

The system described above is not perfect, and has a couple of possible drawbacks, as discussed below:

DNS Caching

Most operating systems cache DNS records for a set period of time to avoid excessive queries to the upstream DNS server. The TTL of the records on the OVH name servers are 60 seconds, so changes to the local IPs should be updated quickly. With regards to the VPN, we override the records and our VPN IPs for the NAS machines are static regardless.

This should ensure that when connecting to the NAS, we always map through to the same underlying machine, and access is either through the local network, or via our company VPN, and never over the internet.

Rebind Attacks

Some DNS resolvers implement DNS rebind attack prevention. This ensures that any record that resolves to a local IP (as defined in RFC1918) returns NXDOMAIN instead, as a potential attacker could host their own DNS server, then respond to queries pointing us to their machine instead, effectively MITM attacking.

In OpenWRT, I override this behaviour by allowing running-lion.net.argnoric.com and diving-orca.net.argnoric.com to resolve to their internal domains. If rebind attack prevention is enabled on the DNS server and can't be switched off, adding a local record in e.g., /etc/hosts for diving-orca.net.argnoric.com to the local network IP could be done.

For example, on my home network, all trusted servers and LAN clients go on the .home.chza.me network, and I could add a CNAME pointing running-lion.net.argnoric.com to running-lion.home.chza.me.

Storage

Shares

Each of us has a share for our own personal documents, and there is also a shared share, for storing documents that we want to share with each other, such as raw footage from adventures on the C&O Vlogs YouTube channel. In addition to storing the raw footage, we also archive all our old YouTube channels and a myriad of other important data that could disappear at any second. Each of us can then back up to the NAS, or store documents on the NAS, which will shortly be asynchronously replicated to the other NAS, and then (in the future) synchronised to the offline S3 copy.

Synchronisation

As we're not using a technology as awesome or with as much setup overhead as Ceph, we can't really keep the NASes in perfect sync instantaneously. With the current setup, we have eventual consistency, meaning that data on one NAS is copied to the other during scheduled synchronisation periods. I've looked into various ways that we can change this, including file-based solutions such as rclone, with the beta bisync function, rsnapshot, rsync; filesystem based solutions, such as GlusterFS; block-level solutions, such as DRBD; or fully enterprise grade solutions, such as Ceph.

Each of these technologies have their own merits and drawbacks, some of which are discussed on my Cloud Native page, and may become the topic of a future blog post.

We also need to consider the total bandwidth available on the NAS machines, which on the local network is not a problem, but as we move from site to site on the WAN is largely capped by the upload speed of each internet connection. Both Dad and my local upload speeds are clamped at about 25 Mbps, assuming that there is nothing else congesting the network. For our archival use, this isn't too bad, as we can theoretically get:

\[ \begin{align*} 25 \text{ Mbps} \times 86400 \text{ seconds in a day} &= 2160000 \text{ Mbit}\\ &= 270000 \text{ MB}\\ &= 270 \text{ GB per day} \end{align*} \]

Warning

As this command is considered experimental, we heavily test it before use, and ensure that there is a third, immutable copy of the data somewhere else if we need to recover!

At the moment, we're making use of rclone to run an hourly bisync in the most sane possible mode, with regular emails when files change or there are conflicts syncing across. rclone handles conflicts gracefully and ensures that a sync doesn't run if there's already one in progress (thanks lock files).

At the moment, we do not back up to Amazon's Glacier Deep Archive service so extra care needs to be taken to ensure that we don't accidentally delete everything. ZFS can handle this with regular snapshots, which we can remove on a schedule at predefined intervals.

Therefore, deletes on one filesystem propagate to the vfs_recyle directory if through a Samba delete, or are deleted on both current filesystem versions. It is only when the oldest snapshot referencing a file is deleted that the data will truly be lost.

Accidental Deletion

As we can all be quite clumsy, we've got the vfs_recycle module enabled in Samba, which does a soft delete of the files before removing their allocated space completely. This is configurable, but I believe we have ours set to unlimited at the moment, and in future when running low on space, we can set this to e.g., 7 days, which should allow us to restore anything we delete relatively pain free. If we realise after it is fully purged from the system, we always have the semi-immutable copy in the ZFS snapshot, eventually stored in S3 GDA, which can be retrieved either expedited or using the bulk retrieval within 48 hours. GDA is reasonable for 1-2 files at a time, when not restoring loads of data, as discussed in the costs section later.

Disaster Recovery

Ideally we never lose more than one NAS drive in each mirror. If we lose one drive, we can simply copy the data across from the other local drive by resilvering the ZFS pool. Otherwise, in the event of a total loss of one system, we can simply re-provision it and run a zfs send | ssh <host> zfs recv on the new host.

In the future, if this proves not to be enough, we'd be able to do a flexible recovery from GDA, and then only restore the files we realise we really can't go without, and leave the rest to stay in GDA until we potentially need them.

The S3 bucket I setup for testing was also configured with versioning, and rclone only has permission to write in an immutable way, such that we never delete anything from the GDA. If costs run too high in the future, we can always go in via the web UI and delete anything we don't need, but at the current price of <$1/TB/mo, it's not really an issue.

Monitoring

Within the company we have a Prometheus/Grafana stack which we use to monitor all the cloud instances, my home network, and of course, the NASes. We want to see the current load on the system, temperatures, network throughput, disk space available, and any SMART issues the disks report, so we can hopefully replace them before they start writing complete garbage or killing the system.

We also have the rclone bisync output configured to send in an email for successful sync operations, and cron job monitoring for failed cronjobs. This ensures that we know if syncs are taking place or failing, and can see if there are any conflicts that have caused issues with snapshots.

Costs

Initial Outlay

At time of purchase, these were the components and their costs, where they could be found, or the current costs (as denoted by *):

Item Quantity Cost Per Item (£) Cost Overall (£)
Seagate 6TB Expansion USB 3.0 Desktop External Hard Drive (STEB60000403) 2 109.50 219.00
Seagate Expansion Desktop, 6TB, External Hard Drive, USB 3.0, 2 year Rescue Services (STKP6000400) 2 91.00 182.00
Raspberry Pi Model 4B 4GB 2 42.69 85.38
Sandisk Ultra 32GB microSDHC (although I'd recommend netbooting or using an industrial SD card instead) 2 5.54 11.08
Spare Cat 6 Ethernet Patch Cable 2 -- --
Official Raspberry Pi Case* 1 5.50 5.50
Raspberry Pi 4 Case Fan* 1 5.50 5.50
Spare Pi 4 Case 1 -- --
Official UK Raspberry Pi 4 Power Supply (5.1V 3A)* 2 9 18
TOTAL 526.46
STORAGE 401.00
ONE-OFF 125.46

This yields a cost per GB of £0.0877. Compared to plain S3, with approx 500 GB/mo accesses, 6TB stored, which is £0.02422/GB/mo, or in the time since purchase (April 2020, ~51 months), we would be sitting at a TCO of £7411, compared to the current total including maybe £2-3 in electric/mo of £679, or £0.113 per GB.

Whilst I appreciate that S3 is not the cheapest, my Google One (which I am trying to get rid of) costs £1.79/mo for 100GB, or £0.0179/GB/mo. That means that if we utilise all 6TB of space, we break even after about 3 months. Again, other cloud providers do exist.

Saying that though, not only do we get the storage at a similar level of redundancy, but also the compute power of a Pi 4 which is enough to run several services (e.g., PleX, Home Assistant, other useful scripts) effectively thrown in for free.

Compared to a standard NAS box, our solution isn't the prettiest, but we get a much better CPU and RAM specs than many consumer NASes.

Running Costs

After the initial outlay, we can assume maybe £2-3 in electric per month. As we run this as part of our business, we already have a company VPN server, so that's not a cost. Outside our local storage, we plan to make use of the S3 Glacier Deep Archive service, at a cost of less than £1/TB this is only a couple of quid for 6TB of storage.

Disaster Recovery Costs

The downside of GDA is that it's really expensive to restore the data from the archive at a cost of about $15 to retrieve the data, then $522 to download it from AWS, but if willing to wait to restore, they generously give you 100GB of bandwidth for free each month... vendor lock in whom? The main point about this is that it is a last resort and should only ever be needed if both running-lion AND diving-orca fail.

Alternatives (and why not implemented)

I think I've generally covered this in the costs section... The original gaol of this project was to provide a simple off site backup solution for me personally but it's slightly spiralled into something I seem to be maintaining for the company, and for my brother and Dad. If I had an unlimited budget and somebody else paying my electric bill, I'd love to have some sort of Ceph cluster on the go, and who knows, once I have a job and some spare cash, maybe I'll be able to throw a bit towards a proper cluster at home.

Commercial solutions were really underpowered in terms of processing and all ran proprietary software, making them not at all suited to me running whatever I like on them and doing whatever I please with them.

Cloud can be good for some things, and it's far more convenient (case in point my Google Photos), but does come at a large cost when you start deciding you want to store all your old video archives, so that at some point in the future you can use the raw version, as opposed to the version ripped off of YouTube.

Summary

This post focused on the underlying architecture and design choices for the NAS. I might make a detailed post in the future about setting up the Raspberry Pis with Ansible, and the setup of the hard drives, benchmarking, testing heavy workloads, etc.

Let me know what you'd like to see in the comments below!

Comments