Best long term data storage solution? A cloud archive.
- Bill Tolson |
- August 26, 2020 |
- minute read
Industries such as the pharma, life sciences and healthcare face the requirement of preservation of documents for very long periods of time - up to 95 years for regulatory compliance. For example, drug manufacturers must comply with several international standards for records retention like the GxP standards – Good Manufacturing Processes, Good Clinical Processes, and Good Laboratory Practices.
Drug manufacturers regulated by the Food and Drug Administration (FDA CFR Title 21 Part 11) require the easy accessibility of records for agency inspection, guaranteed records integrity, and be easily viewable for extended periods of time.
Additionally, electronic healthcare records for drug test subjects must be kept for the life of the test subject plus "n" years (“N” depends on the situation). These retention requirements mean that some FDA records must be retained, accessible and viewable for many decades.
Furthermore, healthcare insurance companies are faced with very long records retention periods as well. The standard insurance industry practice is to keep policies and associated documentation available and searchable for as long as the policyholder is a client – which could also be for decades. Another example of long-term records retention appears in the UK where patient records must be kept for ten years after the patient's death.
Clearly there are many other examples of long-term retention requirements in other sectors including the federal government with the National Archives and Records Administration (NARA), mortgage providers, and insurance companies.
With today's data analytics capabilities, many organizations are retaining much of their data for ongoing analytics for longer periods than the generally accepted industry norms – except where it could run afoul of privacy regulations.
Remember banker's boxes?
In the past, the standard practice for most corporate archiving has been to either print the records and store them in physical records storage, store them electronically in on premises archives, or to back up the electronic files onto tape.
The downside of printing and storing records include the monthly payment for storage space, the risk of loss in transit and physical damage (fire, flood, etc.), and the issue of finding specific records across hundreds or thousands of boxes - quickly.
An additional issue of long-term hard copy records storage includes the cost recalling records for inspection and review – there is usually a cost charged by the storage company to retrieve and deliver the document boxes manually. And because physical storage is a remote and manual process, the disposition/destruction of expired records may not be done at all. In essence, the company ends up paying an unending storage charge for long-term storage – that never ends. Because most organizations have recognized the obvious usability issues with storing hard copy records - namely the inability to utilize the data for machine learning training and inclusion into data analytics projects, many are digitizing hard copy records for inclusion into automated information management solutions.
On the other hand, using backup tape for long-term archiving does have a couple of advantages. They include very low cost, a relatively small footprint (as compared to hard copy storage) for the amount of data stored, and the capability to be physically removed from the electronic world. This physical separation is known in IT circles as isolated storage or “air gapping”, the concept that once stored, backup tapes are physically isolated from the rest of the electronic world and, therefore, not subject to hacking, viruses, and ransomware. This process is also referred to as having an "air gap" between the data and the rest of the electronic world.
The downside of tapes includes response time – finding and retrieving specific files for inspection; file retrieval times can take minutes, hours, or days, the inability to run analytics on the tape-based data, the need to physically store and possibly pay for the storage square-footage of the tapes, and the risk of data degradation over long periods of time – electronic data on magnetic tape must be regenerated occasionally.
Obsolesce is the enemy of electronic data archive storage over decades
A significant hurdle companies required to store electronic files for very long periods have is how to store, manage, and eventually find archived data over 20, 50, 100 years, and still be able to view them. The sheer volume of electronically stored information (ESI) flowing into and out of regulated companies today has guaranteed that converting and storing paper records is no longer feasible.
As companies have transitioned to electronic files, they began purchasing on-premises, server-based archiving systems to archive and manage their regulated content. Over time, companies realized that these on-premises systems were expensive, required constant software and hardware upgrades, were not suitable for very long-term archiving, and that the vendors offering them were subject to going out of business, or the victims of industry consolidation - being purchased by another company with differing product strategies. They also were at the mercy of the archiving application formats changing over very long time periods (eventually losing backward compatibility) to the point where aging archived files become unreadable.
Proprietary clouds for long-term data storage and archiving
A newer solution some companies have adopted is that of a cloud-based, third-party, proprietary archiving solutions. With these platforms, you get the scalability of a SaaS cloud, generally adequate levels of (one-size-fits-all) security, and potentially lower cost of storage - due to cloud economies of scale.
Downsides of proprietary cloud data archives include unwanted file conversion, data ransoming, data throttling, and total cost of operation. Many of these proprietary cloud archives are housed in yet another third-party data center, meaning the customer could have additional issues with retrieving/moving their data if issues with the SaaS vendor arise.
One practice many proprietary cloud archiving vendors do not advertise is the automatic format conversion of your data as they move it into their cloud. They do this, they say, for storage efficiency but also to make it more difficult for you to move your archived data somewhere else if dissatisfied. Some SaaS cloud archiving vendors will charge huge additional fees for you to move your data out of their cloud to another - sometimes as much as $30+/GB. Their excuse is that they need to re-convert your data back into its original format which takes time and compute power. On the other hand, some don't charge "conversion fees" but instead throttle the outgoing data migration speed to some unrealistically slow speed to draw the migration out over months or, in some cases, years, hoping the company gives up the migration and stays.
Long-term data storage and cloud archiving formats
Another consideration when contemplating long-term cloud storage and archiving is that of data format. Keep in mind that a company archiving electronic data for 30, 50, or 100 years must worry about the ability to view the data decades into the future. I know this seems far-fetched, but in 50 years, a document saved as a 2016 Word doc might not be recognizable to Microsoft Office 2070! Long-term archivists must worry about the format that data is saved to ensure viewability in 100 years.
The adoption of the cloud and advanced object-based storage services provides a whole new set of opportunities to support emerging models for long-term data archiving. Among the benefits of cloud for long-term archiving is leveraging the cloud's built-in scalability and redundancy as well as the ability to adapt quickly to changing requirements over long periods of time.
One long-term archiving model is the ISO 14721 Open Archival Information System (OAIS) reference model initially developed by the Consultative Committee for Space Data Systems in 2005. OAIS has been adapted to work in a cloud environment but takes special considerations when being set up and is used mainly by the aerospace industry.
On the other hand, the PDF/A file format is an ISO 19005 version of the Adobe Portable Document Format (PDF) created in 2005 explicitly for use in the archiving and long-term preservation of electronic documents. The PDF/A format differs from the standard PDF format by prohibiting features unsuitable for long-term archiving - such as font linking (as opposed to font embedding) and encryption.
The PDF/A long-term data format has become the most popular file format for commercial organizations needing to store data for decades while ensuring accessibility and viewing.
Is the cloud a viable ultra-long-term data storage solution?
Most have concluded that converting and archiving paper records is no longer feasible. On-premises archives can be expensive, complicated, and less secure, making them unsuitable for very long-term archiving, and third-party SaaS cloud archives can be risky due to the reasons already discussed.
Public cloud platforms from major technology companies such as Microsoft, Amazon, and Google lend more credibility to long-term cloud storage. These well-known companies offer several cloud models to choose from; Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS), allowing customers to choose the cloud architecture that best meets their long-term needs. Also, because of the scale and scope of these public cloud platforms, they are much less costly ($0.005 to $0.05 per GB per month mainly due to economies of scale) than on-premises solutions. An additional benefit of these well-known technology company platforms is the extremely low risk of their disappearing or being bought by another company, and shut down.
A public cloud such as Microsoft's Azure is an excellent platform for very long-term archiving for the following reasons:
- Azure is not just a cloud storage system; it's a cloud computing platform that vendors leverage to build and run specialized archiving applications.
- These massive cloud solution providers have the brand, reputation, and resources to be considered a safe choice for very long-term data archiving
- They're not proprietary systems
- They're infinitely scalable
- Depending on use, they are extremely low cost – especially as compared to on-premises solutions
- Their features and capabilities grow over time, taking advantage of the newest technology such as machine learning and AI
Archive360's Archive2Azure information management and archiving solution removes the issues of vendor lock-in and migration throttling while providing state of the art information management and archiving by managing your data in your company's Azure subscription, which means you own the data as well as the cloud tenancy where its housed. And Archive2Azure can store and manage your data in its native format, or in the PDF/A format meaning that if you want to move away from the Archive2Azure solution, you can, because the data is in your Azure cloud tenancy in a format that you control.
For companies needing to archive disparate content for extended periods of time due to regulatory, legal, or business reasons, the combination of a standardized, industry file format and utilizing the Azure public cloud platform with Archive2Azure as the information management application, will provide you the perfect long-term data storage solution. Contact Us to learn more.
PaaS versus SaaS Archives: What you Need to Know
Organizations are increasingly moving their archives from on-premises to the cloud. This Technical Brief explores what you will need to consider in order to make an informed decision about your cloud solution.
Bill is the Vice President of Global Compliance for Archive360. Bill brings more than 29 years of experience with multinational corporations and technology start-ups, including 19-plus years in the archiving, information governance, and eDiscovery markets. Bill is a frequent speaker at legal and information governance industry events and has authored numerous eBooks, articles and blogs.