- January 25, 2023
As corporate data continues to pile up within the enterprise, a much asked question, at least around the IT water cooler, is why is all of this data accumulating instead of being deleted. Employees create, send, and receive approximately 20 MB of data per day. The vast majority of this data is retained because employees feel that they will need to reuse/reference it at a later date so it accumulates on local storage, on file shares, in the email system and archive, and lately, into employee corporate and private clouds (figure1). In fact, 70 to 80% of corporate unstructured data is unindexed, unmanaged, and invisible to IT.
The Lifecycle of Data chart above (figure 1) shows the effect of time on data importance or value. As data ages, the probability of that it will ever be looked at again goes down dramatically over a short period of time and approaches (but never reaches) zero. To many, this means that unless data is categorized as having direct business value, is subject to eDiscovery, or is regulated, it should be disposed of quickly. This disposal would reduce IT storage and management costs as well as litigation risk.
The CGOC put forth this theory after they published an important survey in 2012 around this topic. The survey showed that the average enterprise data store contained 1% of data that was subject to litigation hold, 5% that was regulated, and 25% that had some business value, leaving 69% as a probable subject of defensible disposal. This “valueless” data has been termed (by us) as grey or low touch data. Some of this data are in reality duplicates and useless system files but much of it is not.
Grey Data is Not Valueless
Having worked in the information governance consulting industry for many years, directly with companies, I know for a fact that much of this grey data still has value to an organization and its disposal would potentially cause issues later. For example, many employees keep old data for reference and reuse at a later data. In my case, I regularly go back and look for old presentations, spreadsheets, and reports years after they were created to see how I came to a particular conclusion, what formula I used in a spreadsheet, or specific graphics I used for a report.
I have learned over the years to file this content with later search in mind…however most don’t take the time subjecting themselves to aggravating and time consuming searches years later. This is not to say that employees should keep all data at their fingertips forever but some percentage of this grey data does have potential value and if the cost of retaining it is low enough, why not keep it for an extended period of time – under management. Figure 2 below shows the CGOC survey with an important addition; the further segmentation of the 69% valueless data into 39% grey/low touch/low value data. I admit this 39% is strictly a guess on my part based on experience but the fact remains, not all of this data should be disposed of.
A big issue with the huge, unmanaged (and unindexed) volumes of corporate grey data is finding particular content when you need it. Finding data when it’s not indexed/managed is a major cost for companies. To make the point, an old CEO from my past, T. M. Ravi, CEO of Mimosa Systems, once stated that “It costs up to 500 times more to find and utilize information once, than to store it untouched for 20 years.”
This statement makes the case for not simply leaving data “where it lies”, unindexed and unmanaged, but rather recognize it for what it is and manage it cost effectively. A potential strategy would include letting employees retain data on specific enterprise storage resources for a period of time but as it ages AND its last accessed date ages (say 2 years), that data could be identified and migrated to a centralized repository only accessible by IT or legal. This would ensure the grey data is managed, is searchable, and disposed of at a later date.
Ex-Employee Grey Data
Another grey data challenge is the large data stores related to departed employees. Their laptop hard drives, email boxes, file system folders, OneDrive cloud accounts, SharePoint content, OneNote content, etc. can consume huge amounts of corporate enterprise storage resources. In fact, many companies report having 20,30, 50% of their unstructured enterprise data being from departed employees. Many companies simply ignore this data after the employee leaves while some companies simply delete it by re-imaging the laptop and closing out their enterprise accounts effectively deleting all of their data.
The issue many legal people have with this blind deletion strategy is the potential of later wrongful termination lawsuits cropping up. A best practice for departed employee data is to quickly capture all of it (as part of the exit process) and migrate it into a central storage repository for retention and management for at least a time period equal to the local statute of limitations for wrongful termination lawsuits. This practice has the added benefit of safely and securely storing departed data for others to have access to.
The Enterprise Value of Grey Data (EVGD)
Just this month, the Harvard Business Review published an article on the enterprise value of data. This article talked about determining the actual value of enterprise data. I have begun looking into expanding this theory, focusing in on determining the value of grey data. To do that we must look at costs such as:
- The cost of ongoing storage and management
- The cost of responding to eDiscovery which includes this grey data
- The cost of responding to a regulatory information request
- The effect on employee productivity
- The effect of reduced employee productivity on potential revenue
The actual value of grey data can be calculated taking the above costs, costs savings and potential increased revenue into account along with the cost of the investment to better manage it over the long term.
Microsoft Azure as the Managed Grey Data Repository
Archive2Azure is Archive360’s Compliance Storage Solution targeting long tern storage and management of unstructured grey data into the Microsoft Azure platform. The Archive2Azure solution leverages Microsoft Azure’s low-cost ‘cool’ storage as an alternative to expensive on premise enterprise storage. Azure costs as little as $0.02 per GB per month and eliminates all the expensive overhead costs of traditional on premise storage.
Archive2Azure importantly provides automated retention, indexing on demand, encryption, search, review, and production – all important components of a low cost, searchable storage solution. Given the clear cost advantages of the Azure cloud, it’s no surprise many companies are looking to Azure and Archive2Azure for grey data management and storage.
If you are attending Microsoft Ignite, please visit booth #2144 for a more in-depth discussion on grey data.
If you’re journaling today, the stakes are high.
Your legal, compliance and security teams rely on having an immutable copy of all of your emails. Office 365 archiving does not support journaling. So what should we do?
This eBook provides actionable tips to empower IT to solve the problem.
Bill is the Vice President of Global Compliance for Archive360. Bill brings more than 29 years of experience with multinational corporations and technology start-ups, including 19-plus years in the archiving, information governance, and eDiscovery markets. Bill is a frequent speaker at legal and information governance industry events and has authored numerous eBooks, articles and blogs.