Data privacy is all the rage these days, especially since Apple decided to make it one of the company’s core values. Let's take a little peek into how it works behind the scenes.
Let’s start with what needs to be protected? Not all data is sensitive in nature. Broadly speaking sensitive data can be divided into two buckets PII and PCI. PII data lets you identify a unique individual, think email or SSN. PCI is payment-related data that companies capture while processing payments.
Most companies capture user data with the aim of personalization.
Personalization → Better user experience → Growth
They don't actually care about spying on you using your data. Instead, user data is a liability for them because if it’s leaked, let’s just say it'll be a nightmare.
So how do they protect your data? There are two facets to this. Encryption is used to protect a chunk of data like a file to be transferred. Whereas data that is going to live in a table that others will look at is protected with tokenization.
Encryption, as the name goes, uses a (public) key to encrypt data which can then only be decrypted by a trusted entity using a private key. To others, this data looks like writing on a cave wall. The main use case for this is transferring data externally. Here your only responsibility is to protect the private key and this is done by adding this key to a “vault”. A vault is like a safe that can only be accessed by the right people, Hashicorp is an example of such a vault.
Tokenization entails creating a token for a particular piece of data like a customer’s name. Then the token will replace the actual data in the table, hence protecting it. These tokens look like a combination of alphabets that shouldn’t be together. They’re created by passing the data through a special tokenization function. To get the original value, the token needs to be passed through the inverse function. Different token mechanisms can be used for different types of data.
Now you might wonder if this is done merely using a function, then anyone can access it right? Here's where we come to the next point about having a secure space where you handle sensitive data. Only in this environment can you access the vault mentioned above for decryption or tokenization functions. An example of such an environment can be a secure server or a secure spark cluster where it all goes down.
Having said that some teams or individuals in a company are going to have access to user data and this is where the company needs to have practices in place about how to deal with sensitive data.
Here’s what we covered today:
What data needs to be protected?
How data is protected?
Encryption
Tokenization
Having a safe space
Liked this? Check out this thread about how Apple's privacy changes affect marketing:
If you’re into this, you can find me on X @abhishek27297 where I talk data.