Around 59 zettabytes (ZB) of data – that’s 59 with 21 zeros after it – were expected to be created, captured, copied and consumed in the world, according to Global DataSphere from International Data Corporation (IDC). The ratio of unique data created and captured to data copied and consumed, is roughly 1:9, but the trend is toward less unique and more replicated data. The Covid-19 pandemic hindered the creation of new data but increased the consumption of downloaded and streamed video. IDC predicts that the amount of data created over the next three years will be more than the data created over the past 30 years, and the world will create more than three times the data over the next five years than it did in the previous five.
This is a lot of data and it comes in two forms. Structured data is often what first comes to mind when you think of digital data and big data analytics. So, what is structured data? It’s the type of information that can be stored in traditional databases composed of columns and rows, for example, such as a customer database comprising names, addresses, telephone numbers and orders. Highly organized, it’s easy to process, access and work with.
What is unstructured data? It short, it is everything else and every organization has a whole host of unstructured data. Common unstructured data examples are email conversations or chat logs, word processing documents, slideshow presentations, image libraries or videos. In fact, some 80 percent of data is unstructured and much of it is sensitive information.
The financial industry is particularly affected by this. In addition to their large databases of client information and transactions, financial institutions hold a wide range of other data and documents such as trading reports, HR records, meeting notes, business plans, financial statements and spreadsheets, many of which are highly confidential. Our love of spreadsheets represents a particular data security problem as they often contain highly sensitive data but are weakly protected.
Traditionally, we have tried to protect all data with multiple layers of security to prevent access, but the relentless flow of headlines around successful cyber-attacks and breaches proves that this is not working. Most organizations also accept that any given data file is likely to be accessible by staff who have no reason to see that information. So, if we cannot keep the cybercriminals out nor trust the people around us, we must rethink the traditional ‘castle and moat’ methods of protection and adopt a data-centric approach, where security is built into data itself.
Full disk encryption will protect structured and unstructured data when it is at rest on a hard disk or USB stick, which is great if you lose your laptop but is of absolutely no use in protecting data against unauthorized access or theft from a running system. Staff may need to run reports, analyze data, make presentations, work on proposals, all extracting data from applications and data silos. And though the situation may gradually change, currently most organizations deploy endpoints with local storage, where extracted, sensitive data is often stored. Data therefore needs to be protected not only at rest, but also in transit and in use, on-site or in the cloud.
But this is no easy task. In a 2020 Ponemon report, 67 percent of respondents say discovering where sensitive data resides in the organization is the number one challenge in planning and executing a data encryption strategy. Data classification technology is often used to identify ‘important’ or ‘sensitive’ data so that it can be encrypted. But the report found that 31 percent cited classifying which data to encrypt as difficult.
When it comes to unstructured data, deciding what is most important to protect is even more difficult.
The first step is to assess and classify the data, such as intellectual property, merger and acquisition plans, letters, emails and human resources records etc., taking into account risk and business impact analysis and regulatory requirements. Manual classification is impractical for most organizations, but automation means that search patterns and rules must be developed, so that it is highly likely that a proportion of sensitive data will be misclassified and often the user is allowed to override the assigned classification.
The initial effort to catalog and assign classifications to all existing data must then become an ongoing process for users to assign classification tags to data as new information is created, modified and shared. Then there is the question of where do you set the bar? Even seemingly trivial information can be useful to a cybercriminal, since they are adept at amalgamating small pieces of data to form a bigger picture, to build a spear-phishing attack for example.
A universal approach
So why is it that the accepted norm is to encrypt only the ‘most important’ or ‘sensitive’ data? Traditionally, encryption has been considered complex and costly to deploy, and detrimental to performance and productivity. Also, there is the belief that we can protect data by adding more access control and authentication mechanisms to put control barriers in front of information.
Data encryption has been with us for decades. It’s tried and trusted technology and should be used to protect all data – not just that which is classified as the most important. This way, classification can be used for what it’s good at, leaving data encryption to ensure that stolen information remains protected and useless to the thief. But with today’s technology and processing power it is possible to deliver full data protection that is transparent to the end-user. The ability to slide encryption technology in ‘behind’ other software exists, automatically securing data – structured or unstructured – without having to change any applications or decide what is important. By actively choosing to encrypt all data – whether it is stored, in transit or in use – we are finally designing security into the only thing which has value – the data itself.
Nigel Thorpe, technical director, SecureAge