For years on end, telecom provider T-Mobile and the Dutch Central Bureau of Statistics have, behind closed doors, been working together on developing an algorithm for measuring residence patterns and mobility behaviour of Dutch citizens. In this context, for purposes of analysis, CBS was given access to pseudonymised location data relating to the entire population of T-Mobile subscribers. But what led to the choice of pseudonymisation? Why weren’t the data anonymised instead?
On March 10 2021, Dutch newspaper NRC published a remarkable story revealing that from as early as 2017 the Dutch Central Bureau of Statistics (CBS) has been given access to location data relating to T-Mobile customers in The Netherlands. Under supervision of the telecom provider, various CBS operatives, while present at T-Mobile’s Dutch head office, were allowed to analyse data tracking the whereabouts of mobile phone users. Reacting to the article, CBS stated that the information studied by its employees had consisted of pseudonymised personal data which were used for the development of an algorithm for the purpose of measuring residence patterns and mobility behaviours based on location data such as the ones held by T-Mobile. Information of this type may for instance be considered by local authorities when deciding on infrastructural investments, while also providing insight when choices have to be made on clearing or closing down potentially overcrowded areas.
Meanwhile, the matter has become a topic of heated discussion, in both in the public sphere and the political arena, leading to questions in the Dutch House of Representatives, where Member of Parliament Mr Verhoeven wanted to know ‘What other CBS pilot projects involving large-scale data collection are currently active?’ An obvious question in light of the nature of this particular data processing, which was reported to be aimed at measuring overall mobility behaviour in The Netherlands. If the research scope really was intended to be nation-wide, T-Mobile’s customer base may have been too limited in size and diversity to provide the full picture. Is it not possible then, that the T-Mobile case is just one example of many additional instances of CBS cooperation with other organisations?
Typically, big data processors tend to qualify location data, and similar types of metadata in general, as information which does not present a serious threat to, let alone an actual violation of the right to privacy. Which is incorrect. Location data can paint very detailed pictures of a person’s mobility behaviour, regular contacts, frequented shops and stores and place of residence. Location data may even provide health related information. Monitoring and sharing, for instance, the daily movements of a district nurse will reveal the addresses – and thus, the identities – of her patients along with the hours and frequency of visits.
To determine whether or not, in any specific case, a violation of the data subject’s privacy has occurred, we first need to establish what exactly is the definition of ‘pseudonymised personal data’. This then is the essential question we will be focussing on in this week’s blog: ‘What, in terms of the GDPR, does it take for personal data to qualify as having been pseudonymised?’
Parliamentary questions and further investigation
The publicity surrounding the scenario illustrated above has prompted the Dutch Telecom Agency, in conjunction with the Data Protection Authority, to further investigate the case in order to determine whether or not it constitutes a violation of applicable privacy legislation. At the same time, the matter has led to discussion in the Dutch House of Representatives, with Member of Parliament Mrs Buitenweg submitting several questions: ‘What, in this context, is the minister’s definition of pseudonymised personal data? Does telecommunication traffic information from which only the unique IMSI numbers have been removed, still qualify as personal data?’ Specifically the latter question is relevant in determining whether or not a violation of the GDPR has occurred, since for this to be the case, the information involved has to qualify as personal data, which anonymous information does not.
And since the GDPR does not apply to anonymous data, no violation can be considered to have occurred when the data involved is shown to clearly be anonymous, which, crucially, is to be defined as not allowing unambiguous identification of the data subject or never having been related to an individual person in the first place. Additionally, data falling outside of the scope of the GDPR are those referred to as ‘anonymised personal data’, a term used for information which originally could be traced back to an individual person, but from which, by application of specific processing techniques, the option of unique identification has irreversibly been removed. This also means that when information has been anonymised, no additional data may be available which allow for identification of natural persons after all.
Pseudonymised personal data
CBS, in its statement to the press, states that the information it was given access to, was in the form of ‘pseudonymised personal data’, where pseudonymisation, as the existence of the term apart from anonymisation suggests, refers to a distinct procedure or set of techniques. Not surprisingly then, the difference is addressed in one of the parliamentary questions mentioned above, which specifically asks for the definition of ‘pseudonymised personal data’. Which is a relevant question indeed, since pseudonymised personal data – as opposed to anonymised data – do fall under the application scope of the GDPR, the reason being that pseudonymised personal data, with the use of additional information, can still be traced back to individual natural persons. Unlike anonymisation, pseudonymisation is not an irreversible process, as there is still, in some form, in some location, additional information which allows for identification of individual persons.
The actual process of pseudonymisation referred to by CBS consisted of two steps. First, the IMSI numbers, globally unique numeric codes linked to personally identifiable SIM cards, were replaced with random numbers. Next, additional encryption was applied, with new keys being used at 30-day intervals.
In conclusion, it remains very difficult to decide unequivocally whether or not the T-Mobile case constitutes a violation of privacy, primarily because CBS reportedly only had access to pseudonymised personal data, which access was also supervised by T-Mobile. In other words, there was no sharing of personal data in the literal sense of the word. What is important to remember, however, is that access having been restricted to pseudonymised data, is a claim made by CBS. The joint TA and DPA investigation will have to clarify whether this corresponds to the facts of the matter. Also, the question of whether or not CBS and T-Mobile have correctly interpreted the concept of ‘pseudonymised personal data’ in this case, is impossible to answer until after the conclusion of said investigation by TA and DPA and without answers to the parliamentary questions having been provided by the, currently demissionary, Dutch government.