- February 24, 2021
- Bill Tolson|
- Data archiving|
- Cloud archiving|
- Information Management|
Over the years, a lot has been written about the benefits of enterprise file or data consolidation, i.e., storing and managing both structured and unstructured data in a common repository.
The majority of organizations still have data spread around the enterprise in stand-alone data silos (usually unindexed – so you can’t search it easily - and unmanaged), including employee computers and personal devices, removable media, personal cloud accounts, file shares, email systems, OneDrive accounts, and SharePoint servers. Sometimes this is the result of acquisitions and mergers, but more often it stems from data retention and records management protocols not keeping up with technology. A scattered approach to data management results in organizations running the risk of experiencing eDiscovery and regulatory issues, the inability to run meaningful data analytics processes. A complicating factor to utilizing a scattered data management approach is the move to a remote workforce due to COVID-19.
How is remote employee data secured? Where is it stored? Are there active access controls? Does the company have access to the data or is it locked up on a single laptop? Is there PII present?
Because of this new corporate reality, data consolidation for greater cost control, increased security, ongoing data management, and increased employee productivity is a new information management focus.
The critical principle of responding to an eDiscovery request is to find, secure, review, and turn-over all potentially responsive data to the opposing party within the timeframe directed by the court. If all relevant data is not found and turned over, you run the risk of receiving an adverse inference decision. This is where the Judge instructs a jury that potentially relevant information was not turned over (and potentially destroyed) because the defendant did not want the jury or Judge to see it. The inference is that the data was not turned over in production because it would have been potentially detrimental to the defendant's case, i.e., a smoking gun. FYI, an adverse inference usually indicates that the case is already lost, and the only question left is how many zeros will be included on the judgment check.
This situation occurs regularly even when the defendant makes a good faith effort to find all relevant data but didn't or couldn’t because of their complex data storage environment and lack of data management.
Note that because of the 2015 amendments to the Federal Rules of Civil Procedure (FRCP), it is more difficult to get an adverse inference ruling from a Judge due to missing or inadvertently destroyed data. Plaintiffs must now show the missing data was destroyed or not "found" on purpose, meaning good-faith mistakes will be overlooked. However, this is dependent on the Judge and does not altogether remove the possibility of an adverse instruction.
Consolidating, indexing, and managing most unstructured and structured data into a single cloud repository dramatically simplifies the collection (and protection/legal hold) of legally responsive data in that it provides only one location to secure, one location to search, and much higher search accuracy. It also reduces the risk of inadvertent custodian spoliation (destruction of evidence) because case-specific litigation holds can be applied on specific data across the entire repository with granular access controls and audit logs.
Many cloud-based office business and collaboration platforms, as well as archiving platforms have pushed the concept of “blanket legal holds” for indefinite periods of time - the practice of placing legal holds on entire custodian mailboxes, or even worse, on all custodian mailboxes. This “hold everything” strategy is highly inefficient in that it bypasses all retention/disposition policies and is legally risky. Most corporate General Counsels as well as their external law firms frown on this strategy, instead preferring the granular placement of litigation holds on individual files so as not to retain expired data that could be drawn into a eDiscovery request in the future.
Much like the data silo issues with eDiscovery, relying on numerous unmanaged data silos can raise the risk of regulatory non-compliance. Data retention requirements require companies to 1.) collect and store specific industry business documents for certain periods, 2.) be able to react quickly and thoroughly to a regulatory information request, 3.) turn over all regulated data (subject to the specific information request) with all original metadata in a reviewable format. Failure to fully respond to an information request can trigger large fines, loss of business, and in some rare cases – jail terms.
Until now, larger organizations have utilized expensive and complex enterprise content management (ECM) systems for records management; however, because most organizations using ECM systems rely on employees to determine which documents are records and to move them into the ECM system manually, many/most regulated records do not find their way into the ECM system. This issue is the main challenge of records management for regulatory compliance – human nature and overwork puts employee filing of records on the back burner, causing the ECM systems to become virtually obsolete for most.
Again, consolidating, indexing, and managing (i.e., apply retention policies with access controls) all data into a single data silo (ECM systems are not architected to manage many types of structured and unstructured data) greatly simplifies the finding, collection, and presentation of requested data – especially when all records are fully indexed and searchable with one powerful search capability.
Additionally, data consolidation utilizing machine learning-assisted document categorization and policy placement provides more consistent and accurate records management while also providing one location to search, one search application to use, and much greater search speed. It also reduces the risk of inadvertent custodian deletion because of granular system access controls and secure litigation holds.
Data analytics is the science of drawing insights from raw data. Data analytics techniques can reveal valuable trends and metrics that would otherwise be lost in the mass of information across the many data silos. Data analytics results can then be used to uncover sales opportunities, additional marketing focuses, regulatory/litigation trends, and optimize processes to increase the overall efficiency.
Data analytics have recently become a significant focus for organizations to uncover value from their archive data stores. Most data analytics applications operate only within a single data repository - for example, a file system, email system, or SharePoint server. The issue companies currently face is this; because their data is spread across numerous unconnected data silos, it's difficult or impossible to run a meaningful analytics process on all of their structured and unstructured data across all of the repositories. Consolidating all data into a single repository simplifies analytics processing and ensures more insightful results.
You may be asking yourself, "what does separate data silos have to do with employee productivity." In reality, employees searching for older content to reference or reuse consumes a great deal of time in corporations. Over the years, market research firms have tracked this loss of productivity; how many times per week does an employee search for shared or older data for reference, conduct research, etc. How much time is spent looking for the data, how often do they find it, and if they don't find it, how much time do they spend recreating it?
A conservative employee productivity model showed employees consuming an average of 2.5 to 4 hours per week searching for/recreating data. That comes to 208 hours per year. Considering the average employee's annual wages - $80,000 (fully loaded), the average hourly wage is $38.46. Now multiply $38.46 times 208 hours, and you get an eye-popping $8000 of time spent trying to find or recreate existing data that couldn't be found. In a company of 5,000 employees, that comes to a cost of approximately $40 million of wasted time (productivity loss) trying to find existing data.
A closely related metric is that of lost revenue due to lost productivity. If you could recover the 208 hours of lost productivity per employee, how much additional revenue could have been generated? Let's calculate an example.
Let assume a company has an annual average revenue per employee of $150,000 – again, conservative in the high-tech industry. With this number, we can calculate the average revenue per hour of $72. Multiply that by 208 hours, and we get an average lost revenue per employee of $15k. Now multiply that by the 5,000 employees, and we see a massive total annual lost revenue of $75 million. To be more conservative, let's assume the average employee couldn't convert one lost hour of productivity to one hour of additional revenue. We can halve that $75 million to a total annual lost revenue of $37 million – still a massive number.
The final question to address is the return on investment (ROI) for a solution to fix the eDiscovery and regulatory data issue and the lost productivity challenge. eDiscovery costs and the potential savings that could be realized by consolidating data takes time - understanding your company's eDiscovery costs and the overall eDiscovery process, so I will leave that for later discussions. However, just taking the approximate conservative annual cost of $40 million in lost productivity and an estimated cost of a file consolidation solution for a large organization of $500,000 (on the high side), we calculate an ROI of 7,800% - a number CFOs dream of. You could halve the cost numbers repeatedly, and you would still realize an ROI in the thousands of percent.
Calculating a more in-depth ROI model would determine the cost and cost savings numbers more accurately. Still, the point is obvious – file consolidation is an apparent plus for any business.
With the new hyperscale public clouds offered by Microsoft, AWS, and Google, automatic storage tiering (hierarchical storage management) is now a tool for file archiving and management. For example, the Microsoft Azure Cloud includes three storage tiers: Hot, Cool, and Archive. With these storage tiers, data can be directed to the most appropriate and cost-effective storage tier for a particular record. Active data can be stored on the Hot tier to be accessed and viewable quickly, whereas the cool tier is lower cost while taking a bit more time to access. The Archive tier within Azure met for long-term archival with little or no access requirements, is in reality, stored to tape libraries so it can take hours to recover; however, the storage cost is extremely low.
A related topic we’ve written about before is that of legacy application retirement and what to do with the remaining data from retired applications. When a legacy application is retired, its remaining data must be kept for regulatory or legal reasons. In many cases, the orphaned retired application data is either deleted or is left to sit in the application’s data repository. Over time, these data repositories are overlooked and forgotten and tend not to be included in data searches. Additionally, once orphaned, this data is not accessible due to the fact the application is no longer available to view it. As you can imagine, this becomes a compliance and eDiscovery issue. Consolidating this legacy application data into a consolidated archive enables the retention/disposition management and access when conducting regulatory information requests or eDiscovery processing.
In August of 2020, we published a blog Retire Your Legacy Applications – Keep the Data Available focused on the compliance, eDiscovery, and cost savings benefits associated with application retirement and data consolidation.
Simpler and accurate file consolidation and management are now easier and more cost-effective than ever in your company's Azure Cloud tenancy. Archive2Azure (a native Azure application) provides a file consolidation and data management application that works with your data in your company's own Azure tenancy. It provides automated processes to move (or copy) data from your numerous enterprise data repositories into your company's Azure tenancy, indexes the data, categorizes it, potentially scans it for PII or other content and secures it, tags it, applies retention/disposition policies, and provides centralized eDiscovery, elastic search, and case management and eDiscovery. Archive2Azure also provides end-user access (based on granular access controls) to address the employee productivity issue already discussed.
Decommission your long-held legacy applications while still maintaining regulatory compliance and eDiscovery requirements.
Bill is the Vice President of Global Compliance for Archive360. Bill brings more than 29 years of experience with multinational corporations and technology start-ups, including 19-plus years in the archiving, information governance, and eDiscovery markets. Bill is a frequent speaker at legal and information governance industry events and has authored numerous eBooks, articles and blogs.