The other day I decided it was finally time to clean – really clean – my desk. As I worked through years of accumulation, I noticed an actual pencil lying forgotten on an old pad of paper. The Information Explosion pencil was all dusty and unused, and the pad had yellowed and curled edges. That got me thinking about how long it’s been since pen and paper were used for daily communication and record keeping. Now, some of you may brand me an old codger for being able to remember that at all, but, bear in mind that the first usable version of Microsoft Windows, version 3.0, shipped just 18 years ago. At that time, many pencil pushers resisted transferring everything to the “computer,” but even the most active holdouts gave way in a year or two.
Oh, some of us might have used an email system back then. At Hewlett Packard, I remember sending emails using their proprietary HP Desk mail system back in the mid 1980s. However, at the time, the majority of office workers were. actual pencil pushers.
Can you find anyone today who doesn’t compulsively check their email or scroll through file listings to find what they need? OK, John McCain doesn’t, but he has “people.” I don’t know about you, but I don’t have “people.” However, I do have my fully networked system with Internet access, Instant Messaging, stunning graphics, oodles of productivity, and regular backups. And I love it!
In that eighteen years we experienced the world changing faster, and more completely than it ever had. The Internet (once the exclusive domain of the nerdy-ist of nerds – cosmological physicists) became everyone’s instant window to the world. Easy-to-use authoring tools allowed everyone to be productive. It’s as if the sorcerer’s apprentice were replicating legions of pencils instead of brooms to generate a catastrophic data deluge. Today, corporations are literally drowning in it.
Back in 2006, IDC conducted an exhaustive study (Source: The Expanding Digital Universe, IDC, March 2007) and forecasted, between 2006 through 2010, a 57% growth rate year over year in the amount of information created, captured and replicated.
So, where is all this information coming from and why aren’t companies able to deal with it? Well, it comes from everyone, and it’s a problem because most of it is unstructured. Most people aren’t aware of this, but The Enterprise Strategy Group estimates that between 80-85% of all business data is unstructured (Source: Extending Discovery to All Corporate Information, Enterprise Strategy Group, December 2007).
What is unstructured data? It consists of emails, reports, all user files (documents, spreadsheets, PPTs, PDFs), images, video, HTML/XML, MP3, etc. It varies in importance, too. The average user will save pictures of their children, emails about what a good job they are doing, CYA “email trails,” work-related spreadsheets, thick Word documents, etc.
In the book, “Tapping into unstructured data: Integrating unstructured data and structural analytics into business intelligence” (Bill Inmon and Anthony Nesavich, Prentice Hall, 2008), the authors describe the various types of unstructured data created by the typical departments in a corporation. These include: Accounting, Call Centers, Engineering, Finance, Human Resources, Legal, Marketing, Sales, Shipping and Operations. That means everyone is contributing to the challenge while they look to the data center to control it.
The Challenges of Unbridled Information Growth
Let’s take a look at some of the major challenges in dealing with this unbridled growth of information.
Factor 1: Information must stored
The more data we generate, the more storage is required. This storage need opened up tremendous opportunities for storage vendors as customers sought to purchase more and more equipment. The storage industry introduced the moniker Information Lifecycle Management to provide more cost effective ways to deal with this growth. They also introduced the concept of tiered storage to allow companies to better manage it along various dimensions: price, performance, capacity and function. Initially, the storage cost factor was the biggest impact on corporations of this growth. However, as storage cost quickly declined, its importance became dwarfed by other factors.
Factor 2: Information Explosion can sensitive and needs to protect
As companies created more and more information, the importance of protecting that information and ensuring the proper access level became more apparent. While it sounds easy (i.e. making sure the right people have access to the right information), it’s not so easy to actually do, and the costs of not securing data can be astounding. Examples are:
Hefty fines under PCI, SOX and HIPAA for breaches and noncompliance
Bad PR and damage to the corporate brand due to the need to publicly disclose privacy breaches
Outright IP theft where trade secrets and proprietary information could fall into the hands of a competitor and materially damage the company’s business prospects.
Factor 3: Information must be preserved for regulatory reasons
Every company is governed by a set of regulations that that determine the length of time that information must be stored. There are a slew of regulations that govern information retention. The more familiar of these include:
Health Insurance Portability and Accountability Act (HIPAA) of 1996
Sarbanes-Oxley Act of 2002
SEC Rule 17a-3, a-4
There are countless more. Some industries (e.g. Pharmaceutical, Finance, etc.) are more regulated than others. And, of course, with the recent Credit Crisis, we expect the number of regulations to skyrocket in the coming years.
In the good old days, retaining this information was simple. We simply put everything in a box and placed that box in a warehouse for however long. Given the explosive growth of easily replicable electronic information, it’s much more challenging.
Factor 4: Information Explosion is subject to electronic discovery
A critical event occurred in December 2006 with the passing of The Federal Rules of Civil Procedure (FRCP). The FRCP governs procedures for civil suits in United States district (federal) courts. It amended to outline how electronic documents can use to support litigation proceedings. The amendment also defined how electronic documents should handle to support litigation search and discovery.
Essentially, this means that all information is discoverable, which presents a problem. Companies are not only required to keep information for a particular period of time (for regulatory purposes), but also are incentivized to get rid of it as soon as possible. It simply isn’t practical for a company to pay an attorney $400/hour to perform discovery across all of their information.