We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 - 28. Join AI and data leaders for insightful talks and exciting networking opportunities. Register today!
This article was contributed by Anselmo Diaz, principal consultant, and associate lecturer.
Data isn’t new. Since ancient times, people have recorded facts for inventory purposes, to carve events in the annals of history, and for everything else our ancestors deemed to be important. The need to preserve information when coupled with advances in technology has resulted in data becoming ubiquitous in modern societies. In fact, technology makes the creation, processing, and sharing of data such a common activity that people have stopped paying much attention to it.
Data explosion
Data is everywhere, and we interact with it in numerous diverse ways: credit cards, identification documents, medical records, CCTV footage, digital photographs, emails, social media, and more. The list is truly endless and is constantly expanding with the use of the Internet of Things (IoT), smartphones, and wearable technologies, to name a few.
It is estimated that in 2021, the amount of data generated daily surpasses 2.5 quintillion bytes, which is 25 followed by a staggering 17 zeros. Most of that is stored in immense server farms or in the cloud.
Event
Transform 2022
Join us at the leading event on applied AI for enterprise business and technology decision makers in-person July 19 and virtually from July 20-28.
Most people right now do not see a problem with their data being spread all over the place, and are also generally unaware of the privacy concerns raised by the current situation. However, identity theft for individuals is notably on the rise, as are data exfiltration and Man in the Middle (MiTM) attacks for organizations. Both are the result of exploiting vulnerabilities to compromise personal data and breach confidentiality, and both can be prevented or mitigated by judicious application of data protection principles.
Data: It’s personal
Personal data is a subset of data that relates to individuals. It has been in the spotlight since the advent of the GDPR in 2018. Since then, other pieces of legislation have been adopted across the globe, including in China (PIPL), Brazil (LGPD), and California (CCPA). These legislative efforts have one common objective: to provide a framework to protect personal data.
The GDPR defines personal data as, “Any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person.”
This concept of personal data, understood in a broad sense, is shared across the aforementioned laws. Given that most of these have provisions for cross-border transfers, chances are your organization falls within the scope of one jurisdiction or another when it comes to the processing and protection of personal data.
Examples of personal data include:
Definition differences
Personally Identifiable Information (PII) and personal data are not the same
One word of caution: although often erroneously used interchangeably, Personally Identifiable Information (PII) and personal data are not the same. Personal data also encompasses information that can point indirectly to an individual, whereas PII, a term mostly used in the U.S., is narrower and points directly to an individual.
For instance, “the red-haired woman who sat by the window” could constitute personal data if it helps identify a single person. Notice how, in this example, there is probably no need to obtain a name or a number. Some ambiguity is at play, though, which is largely dependent on context. The context in this sense would combine different data items like pieces of a puzzle. This can be achieved with information from different sources that can be aggregated in such a way as to make an inference.
Another example of the role context plays would be a random sequence of digits, which is not personal data unless those digits are associated with an individual’s telephone number and the connection between an individual and telephone number could be established.
It follows that any PII can be considered personal data, but not vice versa.
Data vs personal data vs special categories of data
We have seen how data differs from personal data, but there is another important data type: special category data. These three types can be represented in a diagram:
Unlike personal data, special category data is clearly defined in a prescriptive manner under Article 9 of the GDPR, and consists of:
Data mapping and data classification
It is critical to know the personal data your organization handles and map how that personal data is processed, stored, transmitted, and ultimately deleted. This will help adapt your organizational and technical measures to protect the personal data according to their associated risk and prioritize wisely.
This can be achieved either one of two ways:
- Performing a data inventory, which consists of creating a record of the data that enters, resides in, or exits your organization
- Performing a Record of Processing Activities (RoPA), which is similar to data inventory, but focuses on the activities (read ‘flows of data’), rather than specifying each data element. It suggests a lesser effort but by no means a negligible one.
Data classification will categorize the data in several compartments. For commercial organizations, a typical classification scheme is composed of:
- strictly confidential
- confidential
- personal
- internal use only
- public
These issues can be considered two sides of the same coin, as data inventory/RoPA without data classification is not particularly useful, and data classification without data inventory/RoPA just cannot take place.
Where to find personal data
To support your search for personal data, it is a good idea to have an asset inventory first, that way it becomes easier to determine the data that may be circulating through these devices. Regardless, these are some locations to examine:
- Personal data can be in digital form and in physical form as part of a filling system.
- Personal data can be found in structured or unstructured databases.
- Data lakes and data warehouses often have copious amounts of personal data.
- Web forms used on your website or in any other method of communicating with your customers
- Cookies and related technologies deployed in your website and, in particular, those by third parties if present.
- Shared drivers used internally, particularly those with access from external parties
- Email inboxes
- Bring Your Own Device (BYOD) phones, including SIM card, and internal memory
- Removable media such as USB drives, CDs, DVDs
- Dormant accounts, with usernames and other data
- HR records
- Accident books containing health-related data
As part of digitalization, I have seen many organizations struggle with data previously (or concurrently) stored in physical form. Filing cabinets in an office environment used to store all sorts of paper documents, including passport scans, are still quite common. Their bigger brother is the ‘data vault,’ which is an entire room dedicated to storing printouts. These pose a high risk to the organization and an unknown risk as normally, the data is just dumped there without any form of consideration. The amount of effort and time required to categorize these documents can be daunting, and the process is prone to errors if attempted using a ‘speedy’ approach.
Tools exist to aid with the data discovery exercise, but a good policy needs to be in place to ensure newly acquired data is appropriately labeled and safely stored.
This looks like a lot of effort, why bother?
The consequences of not protecting personal data, which as we have seen requires knowledge of where the data is and what it is, could be severe and can be framed as follows:
In case an event was to occur, a security incident involving personal data constitutes a data breach, and may mandate notification and reporting to supervisory authorities and data subjects, depending on the extent of the breach.
Although personal data is pervasive and perceived as a commodity, it has been elevated by data-protection laws worldwide to something that needs to be handled with care. Organizations must be wary of applicable laws and act in accordance with them to avoid nasty surprises that may impact their business objectives.
Anselmo Diaz is an experienced principal consultant and associate lecturer with an extensive academic background in law, information security, and engineering, including globally recognized certifications such as Fellow of Information Privacy (FIP), CIPP/E, CIPM, CIPT, CDPSE, and CISSP.
DataDecisionMakers
Welcome to the VentureBeat community!
DataDecisionMakers is where experts, including the technical people doing data work, can share data-related insights and innovation.
If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data tech, join us at DataDecisionMakers.
You might even consider contributing an article of your own!