Data pseudonymisation
Pseudonymisation is a security measure. Pseudonymisation of personal data makes tracing these data back to individuals more difficult. And that, for example, reduces the impact if there is a data breach. One of the methods that you can use for pseudonymising personal data is hashing.
Pseudonymisation is not anonymisation
Pseudonymisation is not the same as anonymisation. In the case of pseudonymised data it is not immediately clear to which individuals the data relate, but the data may still be traceable to specific individuals by using additional data.
Organisations, for example, often store pseudonyms in combination with additional information. Is this additional information sufficiently distinctive? Then this can be used for tracing pseudonymised data back to individuals.
A birthdate in combination with the four digits of a postcode, for example, is often sufficiently unique to identify a person with information from the Key Register of Persons (Dutch BRP).
Guidelines on anonymisation and pseudonymisation
The European Data Protection Board (EDPB) is working on guidelines on anonymisation and pseudonymisation. As soon as the guidelines have been published, you will find them on this website.
Compliance with the GDPR in the case of pseudonymisation
Pseudonymisation is a form of personal data processing. This means that you have to comply with the General Data Protection Regulation (GDPR). This means, among other things, that you must still protect the pseudonymised data properly.
Note: Do you (only) remove directly identifying information, such as names? Then there is no pseudonymisation if the remaining data contain enough information to trace them back to individuals.
Hashing of personal data
Hashing is the application of a calculation (cryptographic hash function) to convert data with various volumes to data with the same volume. You must also comply with the GDPR if you apply hashing. Because hashed personal data are often pseudonymised personal data.
Cryptographic hashing methods are designed to make reverse-engineering a hash value to the original data as difficult as possible. In practice, however, it turns out that it is nevertheless possible to trace hashed personal data back to an individual.
Hashed data often not anonymous
These are the 3 most common reasons why hashed personal data are often not anonymous:
- Original data are often still available. Data are not anonymous as long as the original data are still available. By hashing all original data again and storing the pseudonyms with the original data, a connected table can be made. With the help of this connected table, a link can still be made between the pseudonymous data and the original data. In practice, it appears that this is often possible. In that case, there is pseudonymisation, not anonymisation.
- Hash values are reproducible. You can derive the original data from a new calculation of the hash values of all possible original data. Until you have a match. Every unique data has a unique hash value of its own that is unchanging. This is called a 'brute force' attack. For example: you have a list with a number of hashed Dutch telephone numbers. Someone could then hash all telephone numbers in the Netherlands and compare the hash values to your list. Given the processing power of computers, this is also a risk for data such as citizen service numbers and IP addresses.
- Additional information may be available. With additional information that is available at the organisation itself or from external sources it may be possible to trace pseudonymised data back to individuals.
Truncating hashes
Do you want to anonymise personal data by hashing and then truncating the data? Make sure then that you remove enough. Not removing enough from the hash values will leave unique identificators. And in that case, there are no anonymised data. See also: Tech blogpost: the practical problems of trimming hashes