A data controller should use a technical measure that follows the two design goals and pseudonymization properties defined by the GDPR:
The pseudonyms should not allow an easy re-identification by any third party (other than the data controller or data processor) within a specific data processing context.
It should not be trivial for any third party (other than the data controller or data processor) to reproduce the pseudonyms so as to avoid the usage of the same pseudonyms across different data processing domains.
The following techniques are generally considered strong wrt the GDPR, since they follow both D1 and D2:
Currently, encryption is regarded as the most robust pseudonymization technique. Without access to the decryption key, retrieving the original identifier from the encrypted data is very difficult.
A hash function whose output depends on both input and a secret key or a salt is a good technique. Depending on the key or salt chosen, multiple different psuedonyms can be generated for the same input, making it unidentifiable or reproducible.
The following techniques are generally considered weak wrt the GDPR, since they follow neither D1 nor D2:
The cryptographic hash function will create only one-way pseudonyms as such even in case the pseudonyms are known to the third party, they won’t be able to find the initial data using a hash function. It is vulnerable to brute force and dictionary attacks.
Masking refers to the process of hiding part of an individual’s identifier with random characters or other data. If masking is not carefully designed, it is possible to assign the same pseudonym to different users, therefore potentially leading to collisions.
Scrambling is a process of mixing letters or disarrangement of letters, eg 017655762298 could be represented as 982756571026. Simple rearrangement can lead to initial identifier.
Explore more implementation guidelines.