These days, often without even being aware of it, we are constantly subjected to algorithms utilising machine learning mechanisms. Examples include the Captcha’s on internet pages and our interactions with streaming services like Spotify and Netflix, where we are continuously flooded with suggestions for future viewing and listening content based on the choices we made in the past. Aside from the ethical objections that may be raised against this deployment of machine learning technology, there is another issue at stake here – the potentially problematic state of a thing called privacy. This is why data protection authorities are keeping a watchful eye on businesses making large-scale use of machine learning algorithms. Recently, a new type of machine learning was introduced which allegedly works in a more privacy-friendly manner, as smaller volumes of data are required for ‘training’ the algorithms. This new technique, known as ‘federated learning’ and developed by Google, is getting a lot of good press from computer scientists because of its practical application options. In this blog, we will take a closer look at federated learning to see if it really makes a difference in the protection of our privacy.
What is federated learning?
Federated learning is a special form of machine learning which, in turn, is a form of artificial intelligence allowing software applications to reach enhanced predictive accuracy without having been explicitly programmed for the purpose. In general, machine learning algorithms utilise historical data for the prediction of new output values. But where most other algorithms achieve their learning by drawing from large databases centrally storing huge collections of information, the principles of federated learning are based on a significantly different system of learning conditions. With federated learning, the data required for training the algorithms never leave the physical devices on which they have been created. Federated learning, in other words, introduces a process in which the machine learning algorithm goes to the device and the learning model is installed on all ‘local’ devices. When a federated learning algorithm is being trained to recognise certain word suggestions in search engines, the model will be installed on, for example, individual mobile phones on the condition of consent given by the users. This model is then trained based on input from the phone’s keyboard as used in search engines. Once the algorithm has been sufficiently trained on the local device, the results – the training-acquired collections of word suggestions – are shared with a centralised database. What this means is that at no point in the process privacy-sensitive information is shared with the organisation executing the model.
In summary, the process is as follows. First, the central server sends copies of the centralised model to local devices where the training program is then installed. Next, the model is updated with data available on the individual device, after which, in a third step, the local devices send the updated models back to the central server. This server then uses all the updated models from all the local devices to update the central model and the process is ready to start all over again. This way, the exchange of information is limited to only the updated model and the new parameters, with no unnecessary data being shared with the central server.
Is federated learning good news for our privacy?
The impact of federated learning on our privacy can be described in terms of the basic guidelines of the GDPR. For one thing, federated learning can help simplify compliance with the principle of data minimisation mentioned in Article 5(1)(c) GDPR, which specifies that personal data should be adequate, relevant and limited to what is necessary in relation to the purposes for which they are processed.
In the context of federated learning applications, what this means is that the party executing the model may only use data strictly necessary for appropriate training of the model. Where these training data are personal data in the sense of the GDPR, the advantage of federated learning is immediately apparent in that no training data are being sent to a central server for processing, which prevents unnecessary duplication of data. Thus, by avoiding duplication and centralisation of information, federated learning can help in reducing the risk of data being re-used for purposes incompatible with the original purpose of collection. As a result, federated learning also contributes to better adherence to the principle of purpose limitation mentioned in Article 5(1)(b) GDPR, requiring that personal data are to be collected for specific, explicit and legitimate purposes and not further processed in a manner that is incompatible with those purposes. Thus, federated learning constitutes an important step forward, in two significant and privacy-relevant ways, as compared to the more traditional forms of machine learning algorithms.
Federated learning is a form of machine learning, different from other models in that instead of relying on a system of central data storage and processing, federated learning works according to a setup in which the model training is handled on multiple local devices, with only anonymised parameters being shared with the central server. In other words, no information is being shared with the central server other than what is strictly necessary for enhancing the model and no copies of data are being centrally stored. Information related to the user of the local device never leaves that device. In short, federated learning conforms to the GDPR principles of data minimisation and purpose limitation and in doing so, marks a significant step forward in the general protection of privacy.