So you’ve got a bunch of data sitting around on your computer, and you’d like to archive it.
First off, we need to have a little talk about the difference between archive and backup, and what a copy is.
First, a copy is an instance of your data. So, if you’ve just purchased and downloaded Prometheus from iTunes, you have one copy of it. (If you haven’t purchased and downloaded Prometheus from iTunes, why not?)
The difference between backup and archive is what happens with that (and any other) copy of the data.
With a backup, you’re making another copy – if it’s the first backup, you’re making your second copy; if you do another backup to another destination, you’re making your third copy, and so on.
On the other hand, archive is about moving your copy*:
So, as a starting point, make sure when you’re talking about archiving your data at home that you’re really talking about archive, not backup. (If you’re after tips about backup, check out this and this.)
A common thing people don’t immediately understand when considering archive with smaller sets of data is – do you really need to? If you look at the cost of personal storage, for instance, it’s trivially easy and cheap to have up to say, 10TB of storage attached to a single machine. So long as you’ve got an actual backup strategy, there may be no need to archive at all. Unless you’re planning to archive 2TB or more, I’d honestly suggest you’re wasting your time to consider archival options for home, and should instead focus on having a good data backup strategy.
So if you’re archiving data, there’s a few basic considerations and decisions you need to make:
- How long do I need to keep the data for?
- How important is the data?
- How do I avoid forgetting about the data?
- How do I find the archived data? (Similar to the above)
- How do I manage the archived data?
I’ll go through each of these individually.
How long do I need to keep the data for?
Know this in advance. Are you archiving tax/financial material? (If so, why? It’s trivially small. Keep it online.) Regardless of the content, you should know how long you intend to keep it for. E.g., are you archiving all your photos from more than 3 years ago? How long should you keep them? Another 3 years? Another 5 years?
Knowing how long you plan to keep the data helps you decide how to manage it. If you only plan to keep it for a couple of years before deleting it, you don’t have to worry too much about data formats changing to the point that you can’t access the data. However, if you really need to keep the data for 5+ years, the chances are you’ll need a much more detailed data management plan. Consider:
- The application the data was created in – will that still be around in X years?
- The application the data was created in – has the vendor been known to change or drop support for legacy formats?
- Can the data be written in a more long-term format? (E.g., consider word processing documents. Save in Rich Text Format or even plain text format, if all you want is the content, rather than formatting. Or if you need both, save/print the document as a PDF instead.)
How important is the data?
This is a serious question. If the data is largely unimportant, why are you archiving it at all? Why not just delete it? (I say this as a data hoarder. Even I recognise the importance of getting rid of old data from time to time. Much as it pains me to say it.)
If the data is important, then you should obviously be making plans around archive failure. Bare minimum to consider here is keeping two copies of your archive data, on two different media types. E.g., write some archives to BluRay or DVD, and others to external hard drives. Or, if the data is too big, write it to two entirely different external hard drives – not two from the same manufacturer. (In short: reduce the risk of a “batch” of hard drives being bad.)
For what it’s worth, by the way, I think optical media is generally a terrible choice for archive data. It’s small, fiddly and prone to degradation. My personal preference for home data archiving is external hard drives.
Important data also necessitates testing. Make sure after you copy the data but before you delete the original that you test it is accessible from the archive media.
How do I avoid forgetting about the data?
This may seem like a silly question, but it’s not. Imagine for a moment you’re in IT, and you download a lot of CD/DVD images of various operating systems and applications you deal with. If you archive all those and store them on media that’s not online, but then later need one of those images, will you download it again (and possibly archive it again), or will you remember it’s on archive and retrieve it?
Usually this question points at two habits to develop:
- Remembering to check the archives before downloading or recreating content that may be in the archives;
- Having a way of recording what is stored in the archives.
Which leads us to…
How do I find the archived data?
Do find content on the archived data, do you need to reattach the archive to your computer? Sometimes this may be the case, particularly if you need to search by file content, but there are simpler solutions – e.g., archiving highly organised and well named data, and then generating a plain text list of all the files (and the folders they’re in) may allow you to search you archives between 50 and 80% of the time without having to attach them to the computer at all.
How do I manage the archived data?
If you don’t want to spend time managing your archived data, there’s a really simple way to avoid this: don’t archive it.
If however, you plan on archiving data, you must have a management plan. This plan needs to answer the following basic questions:
- How often will I power up the archive to make sure it’s OK?
- How often will I attempt to access the data on the archive to make sure it’s OK? (Not always the same activity as the above.)
- How often will I do a full media check on the archive to make sure it’s OK? (E.g., copying it from one piece of storage to another.)
- How will I know when it’s time to delete data I don’t need to keep archived any longer?
- Where will I store my archive(s)?
If all of the above looks like a lot of effort, remember my original suggestion; unless you’ve got a large amount of data at home, you’re probably better off spending time working out a good backup strategy instead of archiving the data.
* In enterprise land, moving is a bit simplistic. Often archive involves moving the content, but leaving a ‘stub’ behind, so that the file appears to still be present on your main storage, and if you go to open it, the file is automatically retrieved from the archival storage for you. This is technically “hierarchical storage management”, but it’s not the typical thing a home user will do, so in this context, I’m talking about moving the copy, thereby removing the original.