I use restic as my backup solution, and I haven’t had any problem so far, but recently one of my health checks suddenly stopped working, telling me that something is wrong with my backup. On the other side, restic has done its job, and everything works fine, so what was the problem?
I run the whole backup using following command:
/usr/local/bin/restic \ --verbose \ --files-from "$MACHINE_NAME/restic_backup/include.txt" \ backup \ && curl --retry 3 $RESTIC_BACKUP_HEALTCHECK_ENDPOINT
I realized that it started to occur after the last
brew upgrade, and I noticed that brew also updated the restic binary to the version
0.10.0. I suspected that something changed in the restic exit code. I went through the changelog, and I saw this change:
Chg #2546: Return exit code 3 when failing to backup all source data
Yes, I back up a lot of directories, and some of them in the hierarchy are not accessible without proper permission, e.g., if you want to back up the entire APFS volume, you probably get errors as below:
error: Open: open /Volumes/Photos/.Spotlight-V100: operation not permitted error: NodeFromFileInfo: Listxattr: xattr.list /Volumes/Photos/.TemporaryItems : permission denied error: NodeFromFileInfo: Listxattr: xattr.list /Volumes/Photos/.Trashes : permission denied
0.10 it didn’t impact the output from a command, but for now, restic distinguishes between the full and incomplete snapshot and returns a different exit code.
How to fix it?
Exclude problematic directories and files.
As a User: Read the changelog before updating software you depend on.
- First step is marking the current functionality as deprecated and warn the user when he uses it. Additionally, developers may provide a feature flag turned by, e.g., environment variable to enforce a new behavior.
- The second step would be replacing the new functionality with the old one but leaving a feature flag to use the old behavior.
- Third step will be removing the old functionality entirely.
In the case of restic, the change was small and recorded correctly in the changelog. But it revealed the main drawback of self-hosted solutions – if anything stops working, you have to change your plans and check what’s wrong, especially if your workflow depends on the process or the tool.
I still use restic because it fulfills all my needs, and it requires little maintenance. Seriously, it was the first time I had to investigate what happened since I create my backup script. You can read how to automate your backup based on restic in my other article.
Restic still does not reach the 1.0 version, so according to SemVer 2.0, anything may change at any time. The deprecation process, in this case, depends on the goodwill of developers. ↩︎
I wrote about the need of having deprecation process but in a slighly different context. If this topic sounds interesting for you, you can [read it here][deprecatiton]. ↩︎