Loïc Doubinine | Zed

Developer, Architect, DevOps and Site Reliability Engineer

Avatar
  • Posts
  • Socials
  • Tags
  • My story
  • Who am I
  • Backup with Borg

    9 minutes read 1826 words Linux - Raspberry pi - Tech

    You probably already heard of the OVH’s recent fire. It was a hard reminder for many people and companies. Many websites and services had big outage consequently and some lost everything in the fire

    Picture of OVH's strasbourg datacenter burning
    OVH is on fire

    Most website went down for few hours. Avoiding service interruption when a whole datacenter disappear is not something easy. Some already had everything in place for this kind of scenario. I won’t talk about that today.

    Today, I will explain how I do my personal backup.

    The basics

    When talking backup, we often point out 3 rules to follow:

    The most important and often forgotten bullet point is the last one. If your backups are stolen or destroyed all together, they are not really useful anymore.

    The cloud is a good third place to store backup, but I decided to avoid it for cost’s reasons, principle, and learning interest.

    Borg

    Borg is a command line tool, meaning there is no interface to click on, meant to back up data incrementally. This mean it can reduce backup time by only focusing on changed files instead of doing a full swipe every times. Borg also use a powerfull deduplication mechanism to reduce backup archive size. If two identical files are found, they will not be store twice, but only once.

    Borg works mostly on linux, but I successfully used it on Windows thanks to the Windows Subsystem Linux

    Borg works by scanning a folder, and copying it’s content to a borg archive via a ssh connection.

    A basic example:

    borg create -s locutus@host:/path/borg/backup/repo/$REPO_NAME::$ARCHIVE_NAME-$CURRENT_DATE  /path/to/backup
    

    Will give you this output

    ------------------------------------------------------------------------------
    Archive name: mirror-2021-04-13-13h57
    Archive fingerprint: c2f6ffa259f358636371439767dd9edc571ec01d84aa474c052c555b70b5c76f
    Time (start): Tue, 2021-04-13 13:59:22
    Time (end):   Tue, 2021-04-13 13:59:27
    Duration: 4.67 seconds
    Number of files: 0
    Utilization of max. archive size: 0%
    ------------------------------------------------------------------------------
                           Original size      Compressed size    Deduplicated size
    This archive:                  712 B                640 B                640 B
    All archives:                2.77 TB              2.79 TB            137.29 GB
    
                           Unique chunks         Total chunks
    Chunk index:                   53243              1224153
    ------------------------------------------------------------------------------
    

    I invite you to follow the official documentation to find out how to create repository. There is a lot of ways to do it, I won’t go in details.

    Overview

    Basically, here is how I built my backup system:

    Global schematics
    Global schematics
    Borg server

    I have a server dedicated to receive borg backup. It’s a Virtual Machine in my case, but it could easily be a raspberry pi. The disk size is the most important aspect of it. It must be big enough to received all my backups.

    I reserved 1TB on a RAID 5 disk array. If I need more one day I’ll have to buy new hard drives, or do some cleaning.

    Remember that I do incremental backups. This ensures I keep a history of my files though their life. If I delete a file it will not be deleted from backups for a long time (configurable)

    I created one borg repository for each server and computer I wish to backup.

    root@borg:/backup# ls -la
    total 40
    drwxrwx--- 10 locutus locutus 4096 Nov  8 01:57 .
    drwxr-xr-x  4 root    root    4096 Jun 19  2020 ..
    drwxrwx---  3 locutus locutus 4096 Jul 24  2020 windows
    drwxrwx---  3 locutus locutus 4096 Nov 19 00:44 home
    drwxrwx---  3 locutus locutus 4096 Apr 14 06:18 knode01
    drwxrwx---  3 locutus locutus 4096 Apr 14 06:18 knode02
    drwxrwx---  3 locutus locutus 4096 Jun 20  2020 martine
    drwxrwx---  3 locutus locutus 4096 Nov  8 01:52 pouet
    drwxrwx---  3 locutus locutus 4096 Apr 13 00:58 pouet-home
    drwxrwx---  3 locutus locutus 4096 Nov  8 01:53 pouet-home-old
    

    All my backup are accessible to one user locutus. This use is used to connect to the borg server via ssh. All my backup computer and servers will have credentials to connect to it (one public ssh key per server)

    All repository are encrypted and use a different key. Only the corresponding server and my self know the key.

    Linux server

    On each of my VM I will to backup things, a script goes thought all my important folders and backups them.

    #!/bin/bash
    source /root/.bashrc
    ARCHIVE_NAME="kube"
    CURRENT_DATE=$(date +"%Y-%m-%d-%Hh%M")
    echo "Creating a backup archive of /riper/kube named '$ARCHIVE_NAME-$CURRENT_DATE'"
    borg create \
     --checkpoint-interval 600 \
     --exclude-caches\
     --verbose \
     -p \
     -s ::$ARCHIVE_NAME-$CURRENT_DATE \
     /riper/kube && \
    borg prune -v --list --keep-within=10d --keep-weekly=4 --keep-monthly=12 --keep-yearly=-1
    

    I scheduled it using a CRON

    18 6 * * * /root/borg-backup-volumes >> /var/log/backup-kube 2>&1
    

    In the root .bashrc, there is environment variables containing all details and secrets required by borg. Do not forget to read protect it, otherwise the security key will not be secret.

    BORG_OPTS=""
    BORG_SERVER="ssh://locutus@norghost"
    BORG_REPO_PATH="/backup/knode01"
    export BORG_REPO="$BORG_SERVER$BORG_REPO_PATH"
    export BORG_PASSPHRASE="redacted"
    

    The last line of the script is here to do some cleanup of borg archives. I’ve decided to keep files for a certain amount of times in my backup.

    borg prune -v --list --keep-within=10d --keep-weekly=4 --keep-monthly=12 --keep-yearly=-1
    

    As you can read, there is an issue. After few years I might miss some space if I do not alter the last parameter. I’m waiting to have to do it before setting it, -1 meaning keep everything.

    Windows

    On Windows, I did not setup CRON and only do manual trigger.

    Once debian installed as WSL , I did exactly the same thins as my linux servers. Windows mounting points allow to access all drives, and we can backup them like any other folder.

    As an example, I backup my pictures with this script

    ARCHIVE_NAME="Photos"
    CURRENT_DATE=$(date +"%Y-%m-%d-%Hh%M")
    echo "Creating a backup archive of /mnt/s/photos named '$ARCHIVE_NAME-$CURRENT_DATE'"
    borg create \
     --checkpoint-interval 600 \
     --exclude-caches\
     --verbose \
     -p \
     -s ::$ARCHIVE_NAME-$CURRENT_DATE \
     /mnt/s/photos && \
    borg prune -v --list --keep-within=10d --keep-weekly=4 --keep-monthly=12 --keep-yearly=-1
    
    raspberry PIs

    I stated it at the beginning of this article, 3 copies are needed with one elsewhere. For now, I only have

    Both are physically located on the same place, at my apartment. I decided to use Raspberry pi to do the other copies. I configured two of them (Raspberry pi 3)

    On them, I installed the classical raspbian, and configured OpenVPN to access my backup network. This VPN ensure once they are connected to it, they can only access to what’s needed, meaning the Borg server. I will probably write something about how I use OpenVpn (I use PfSense).

    Each Raspberry pi has an usb hard drive attached. Each drives at least have as the same capacity as the Borg server.

    Thanks to the VPN, I can use Raspberry Pi anywhere on the world as long as an internet access is available. Via Wifi or best, ethernet.

    borg archive in borg archive

    My first idea was to simply do an rsync of all borg’s repository on each Raspberry pi. To avoid losing my copy if I delete (willingly or not),the original source, I decided to avoid simple copy (with or without rsync).

    Instead, I simply reused borg. For each borg repository, I create a corresponding repository on each Raspberry pi. I simply create archive of my borg repository inside those. I end up with a borg repository inside another borg repository.

    I do it this way because I have the benefice of the history. If I do something wrong on my main server, I sil have old copies available on each Raspberry pi. As objects in borg archive are almost immutable, the deduplication process make those copies really space efficient.

    To sum up:

    I’ve setup those cron to do copies automatically:

    57 13 * * * /home/locutus/borg/mirror-borg.sh IP-RASPBERRY-PI-1 >> /var/log/borg/IP-RASPBERRY-PI-1 2>&1
    57 13 * * * /home/locutus/borg/mirror-borg.sh IP-RASPBERRY-PI-2 >> /var/log/borg/IP-RASPBERRY-PI-2 2>&1
    
    #!/bin/bash
    export BORG_UNKNOWN_UNENCRYPTED_REPO_ACCESS_IS_OK="yes"
    if [ $# -eq 0 ]
      then
        echo "No arguments supplied, please specify hostname to mirror to"
    fi
    
    TARGET=$1
    echo "Create repository if they does not exist"
    ls -l /backup/ | awk '{print $9}'|tail -n +2  | xargs -I {} -n 1 /usr/local/bin/borg init --encryption=none locutus@$TARGET:/backup/{}
    
    LIST=$(ls -l /backup/ | awk '{print $9}'|tail -n +2)
    
    ARCHIVE_NAME="mirror"
    CURRENT_DATE=$(date +"%Y-%m-%d-%Hh%M")
    declare -a arrayList
    while read -r line
    do
        arrayList+=("$line")
    done <<< "$LIST"
    
    for REPO_NAME in "${arrayList[@]}"
    do
      echo "Creating a backup archive of $REPO_NAME named '$ARCHIVE_NAME-$CURRENT_DATE'"
      /usr/local/bin/borg create \
       --checkpoint-interval 600 \
       --exclude-caches\
       --verbose \
       -p -x \
       -s locutus@$TARGET:/backup/$REPO_NAME::$ARCHIVE_NAME-$CURRENT_DATE \
       /backup/$REPO_NAME
    done <<< "$LIST"
    

    I end up with two Raspberry pi containing a copy of my central Borg server, only one day behind.

    I keep one at my place (this is my second storage type), and I keep the other one at a family member’s place (where Optical fiber connection is available). I have my “elsewhere” copy this way.

    A Raspberry pi has a really low power consumption, and you can lower it more by hacking it to only power it once a day for the required time only. It’s also cheap, and the big budget are the usb drives.

    Monitoring

    It’s really cool to have automatic backup and stuff, except when you need it and discover all automatic stuffs did not worked for some days. The best is to be warned when something is a bit off.

    When a cron end up with an error, an alert email is enough.

    I follow backup disk space using prometheus. In fact, I have an ongoing alert I must resolve!

    Disk space alert

    My backup disk space do not evolve a lot, here 14 days on borg, and the two Raspberry pi. (Raspberry pi drivers are 1.7T and 2.6TB against only 1TB for the borg server)

    Disk space

    Recently, I moved the Raspberry pi synchronisation cron on GoCD for testing purpose. It works well, I will have a beautiful web interface to trigger synchronisation and see quickly if jobs are working well. GoCd is in my ToolBox anyway, using it has no real cost.

    GoCD dashboard

    Summary

    I use this setup for less than a year, and a lots of details are not perfect, however:

    Thank you reading this,
    Bisoux 😗

    Although commenting is not an option on this blog, I am open to discussing this topic further on various social media platforms. You can follow those links to find me

    Twitter logo @ztec@mamot.fr Twitter logo @ztec6 Bluesky logo @ztec.fr