Ultra Long-term Archiving and the Cloud – It’s a Good Thing?
Some companies face the prospect of archiving electronic documents for very long periods of time – up to 100 years, for regulatory or business reasons. For example, construction companies involved in large projects, bridges, dams, skyscrapers, airports, etc., must keep all documents related to the project for the actual construction period plus 30, 50, 100 years (varies depending on state and local government regulations). As well, the records must be quickly searchable and readable over those same periods.
Insurance companies are faced with very long retention periods as well. The insurance industry will keep policies and associated documentation available and searchable for as long as the policyholder is a client – which could be decades for home policies. Healthcare providers in the U.K. must keep patient records until ten years after the patient's death or until after the patient has permanently left the country.
These three examples are just a small sampling of industries/companies that have these retention requirements. And with today’s data analytics capabilities, many companies are keeping all of their data for much longer periods of time.
Remember banker’s boxes?
The standard practice for most corporate archiving has been to either print the records out and store them in physical records storage, store them electronically in on premise archives, or to back up the electronic records onto tape. The downside of printing and storing records include paying for square footage, the risk of physical damage (fire, flood, etc.) and the issue of finding specific records across hundreds or thousands of boxes - quickly. Additional issues include the cost to retrieve records – there is usually a cost charged by the storage company to retrieve the records manually. And because physical storage is a remote, manual process, disposition/destruction of expired records may not be accomplished when they should or never deleted at all. In essence, the company ends up paying an unending storage charge.
On the other hand, using backup tape for long-term archiving does have a couple of advantages. They include very low cost, a small footprint for the amount of data stored, and the capability to be physically removed from the electronic world. This is known in IT circles as isolated storage, the concept that once stored, backup tapes are physically isolated from the rest of the electronic world and therefore not subject to hacking, viruses, and ransomware. This strategy is referred to as having an “air gap” between the data and the internet.
The downside of tapes includes response time – finding and retrieving specific files; file retrieval times can take minutes, hours, or days, - the need to physically store the tapes, and the risk of tape degradation over long periods of time.
Over very long periods of time, obsolesce is the enemy of archiving
A major hurdle companies required to store electronic files for very long periods of time have faced is how to store and manage electronic files over 20, 30, 50 years and still be able to read them. The sheer volume of ESI flowing into and out of companies today has guaranteed that converting and storing paper records is now an impossibility.
As companies transitioned completely to electronic files, those same companies began purchasing on premise, server-based archiving systems to archive and manage their content. Companies quickly realized that these systems were expensive, required constant upgrades, and the vendors offering them were subject to going out of business, being purchased by another company with differing product strategies, and the risk of the archiving solution formats changing over very long periods of time (eventually losing backward compatibility) to the point where very old archived files become unreadable.
An alternate solution companies have moved to is that of cloud-based, third party, proprietary archiving solutions. With these, you get the scalability of a cloud platform, automatic updates, higher levels of security, and potentially lower cost due to the economies of scale.
Downsides of proprietary cloud archives include file conversion, data throttling, and cost. Also, many of these proprietary archives are housed in yet another third party data center meaning the customer could have additional issues with retrieving their data if issues with the vendor arise.
Many proprietary cloud archiving vendors convert your data as they move it into their cloud. They do this for storage efficiency but also to make it more difficult for you to move your archive somewhere else. Aside from any term contract entered into, some cloud vendors charge huge additional fees for you to move your data to another vendor- sometimes as much as $12/GB. Some don’t charge “conversion fees,” but instead throttle the data migration speed to some unrealistically slow speed to draw the migration out over months or in some cases years, hoping the company gives up ans stays.
Is the cloud a viable solution for ultra-long-term archiving?
As we have seen, converting and archiving paper records is no longer feasible. On premise archives can be extremely expensive, complex, and still not suitable for very long retention periods, and third-party cloud archives can be risky.
Public cloud platforms from major technology companies such as Microsoft, Amazon, Google, and IBM lend more credibility to long-term archiving in the cloud. These well-known companies offer several cloud models to choose from; Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS) allowing customers to design a cloud solution for their particular needs. Also, because these cloud platforms are public (economies of scale) they are much less costly, $0.005 to $0.05 per GB per month. An additional benefit of these, well-known technology companies is the risk of their disappearing our being bought by another company is very low.
A public cloud such as Microsoft’s Azure is a suitable platform for very long-term archiving for the following reasons:
- Azure is not just a cloud storage system; it’s a platform that vendors can design archiving applications to run on.
- It's not a proprietary system
- It's infinitely scalable
- Its extremely low cost
- It’s features and capabilities grow over time, taking advantage of the newest technology such as machine learning/AI
Archive360’s Archive2Azure information management and archiving solution remove the issues of vendor lock-in and migration throttling while providing state of the art information management and archiving by managing your data in your company’s Azure subscription, so you own the data and the cloud its housed in. And Archive2Azure stores and manages your data in its native format meaning that if you want to move away from Archive2Azure, you can, because the data is in your Azure cloud in its native format.
Is there a file format for long-term archiving?
The final issue is that of format obsolescence. How can you guarantee that a file will be readable in 100 years? In fact, a long-term file format has been created and adopted by companies’ worldwide. THiis file format creates/stores files in a standardized, long-term archiving format such as PDF/A, a standard created by the International Standards Organization (ISO-19005-1) for the archiving of electronic documents for extended periods of time. PDF/A is an ISO-standardized version of the Portable Document Format (PDF) designed for use in the archiving and long-term preservation of electronic documents. PDF/A differs from PDF by prohibiting features ill-suited to long-term archiving, such as font linking (as opposed to font embedding) and encryption.
For companies needing to archive content for extended periods of time due to regulatory, legal, or business reasons, the combination of a standardized, industry file format and utilizing the Azure public cloud platform with Archive2Azure as the information management application, will provide you the perfect solution.
About Bill Tolson
Bill is the Vice President of Global Compliance for Archive360. Bill brings more than 29 years of experience with multinational corporations and technology start-ups, including 19-plus years in the archiving, information governance, and eDiscovery markets. Bill is a frequent speaker at legal and information governance industry events and has authored numerous eBooks, articles and blogs.