I started retaining digital data sometime in the mid-90s. My first archive was a custom sized cardboard box with a short stack of 5.25" DS-DD Floppy Disks. I don’t remember how long this archive lasted, but it couldn’t have been more than a few years. I got my own computer in 2000, and started filing in earnest shortly thereafter.

The Codex Distributus

My brothers and I used CD-Rs to store “stuff”. I think this was probably 80% mp3s, 5% documents and project files, and 15% miscellaneous. This method worked, in that we could transfer files off of our primary systems onto media from which they could probably be retreived.

As our collections surpassed a few discs per category, file management got cumbersome. We had CD spindles, CD wallets, Hardshell CD cases. We had stacks of CDs, color-coded-cases, and (briefly) file folders. We tried Zip disks, and eventually landed on External Hard Drives.

The Codex Conglomeratus

External Hard Drives were amazing. One unit could hold everything, which was just fantastic. All the mp3s, all the ebooks and pdfs, all the miscellaneous. There were a bit large, and required an external power supply, but it worked.

Despite our ignorant wranglings, we had no network of any kind at this point. No wifi, no ethernet/thicknet/thinnet, no PPP/SLiP, and no file shares. If one computer needed files from a different computer, we’d use the hard drive. We’d create a new folder named “Transfer” or “Temporary” or “stuff” or something, copy the files off, and forget to clean up the drive.

When the drive eventually filled up, we got a larger drive to replace it. We copied most of the data off, then stored the old drive “just in case”. After a few cycles of this, we successfully recreated the file management issues of the Optical Storage era.

The Codex Archivus

I moved a few times, and carried the growing stack of hard drives with me. I wasn’t sure what all was on them, which files were duplicated, or whch drives were flakey. By this time I had encountered bit rot, and had come across corrupted picture files, mp3s, and at least one drive that just wouldn’t spin up. The External Drive storage approach had broken down.

The aggragated volume of the drive collection was probably between 500GB and 1TB, spread oversix or eight drives of various vintages. The solution, as I saw it, was to transfer the data into a storage volume so the data integrity would no longer depend of any individual bit of hardware. My requirements were:

  • Full Redundancy
  • Checksums and error correction
  • No proprietary anything

In 2014 when I put this together, that meant eith Linux with BTRFS or FreeBSD/Illumos with ZFS. Linux-capable hardware was less expensive, so I cobbled together a cheap file server with RAID-1 BTRFS. It worked, but the setup was complicated enough for its own blog post.

Codex Receptus?

It took a few days to decide on a filesystem layout, and I ended up with this structure:

  • import
    • 40TB-drive
    • 6TB-drive
    • tinydrive
    • etc
  • media
    • Applications

    • Audio

      • Albums
      • Audiobooks
      • Archive
    • Books

    • ISOs

    • Pictures

      • YYYY
        • MM
      • Unsorted
        • Folder_1
        • Folder_2
        • etc
    • Video

      • Movies
      • Kids-Movies
      • TV
        • Show_One
          • Season_01
          • Season_02
        • Show_Two
      • Misc
  • home
    • jhjessup
      • documents
      • backups
    • user2
    • user3
    • etc

I copied the contents of each drive into a dedicated ‘import’ folder for subsequent sorting and deduplication. Because the filesystem had snapshot capability, I was able to use “destructive” deduplication to prune matching files and folders before moving them in to the organized structure.

I think the bulk file import from the drives took about a week, and the sorting and deduplication took another few months. Some of the files had degraded and were unrecoverable, but almost everything was able to get pulled in to the archive.

Archivus Two

After a few years I upgraded to FreeNAS on an HP Microserver. BTRFS was fine, but didn’t have a solid track record of the RAID-6 configuration I wanted. FreeNAS gave me an easy appliance-type setup that didn’t need very much maintenance after the initial setup, and I didn’t have to think about software upgrades or obselescence.

I migrated from a two-disk BTRFS RAID-1 volume to a four-disk RAID-Z2 volume. The first round of disks were 500GB, 1TB, 1TB, and 2TB, for a total storage capacity of 1TB (ZFS math). I’ve replaced three of the disks since then, and ZFS has automatically expanded the capacity each time the smallest disk gets larger. It’s pretty slick.

Since I’ve got dual redundancy, I’m comfortable using second-hand hard drives. This lets me use enterprise-grade drives (WD Red, Gold, and RE series) for a bit less than consumer grade drives on a per-GB rate. I’ve had one drive failure, but no downtime or data loss.

I expect this organization method and technology platform to last for a really long time. ZFS is well supported on Linux, FreeBSD, and Illumos (and their derivitaves), and runs on NetBSD and MacOS, with a Windows port in the works. I have switched platforms from FreeNAS to SmartOS without impacting the dataset, and expect to be able to migate again in the future if needed.

The software ecosystem around ZFS is sufficiently diverse that I don’t exect it to ever disappear, and the hardware requirements for a supporting O/S platform are easily met by cheap x86 hardware.

It’s not going to be the ideal setup for everybody, but it’s perfect for me.