High consumers of personal data to train their models, artificial intelligence projects must respect a certain number of rules during their operational stages, from data collection to fully automated decisions.

 

Estimated at $136.6 billion in 2022, the global artificial intelligence market is expected to reach a business volume of $1,811.8 billion in 2030, following a compound annual growth rate of 38.2% over period, according to figures from  Grand View Research.

However, the various artificial intelligence technologies (Deep Learning, Machine Learning, Natural Language Processing, Machine Vision, etc.) are heavy consumers of personal data. A framework must therefore be established so that this data is used in compliance with regulations, in particular GDPR.

 

A draft European regulation

In April 2021, the European Commission published its draft regulation for trustworthy artificial intelligence, guaranteeing the security and fundamental rights of citizens and businesses, while increasing the adoption of AI. The new rules will be directly applicable in all Member States. They follow a risk-based approach, categorized from “minimal” to “unacceptable”.

In response to this draft European regulation, the CNIL and its European counterparts have taken a position through the publication of an opinion. Data protection authorities particularly appreciated the risk-based approach adopted by the European Commission.

This should make it possible to focus the regulatory effort only on a limited volume of so-called “high risk” AI systems for fundamental rights, such as AI technologies used in education and vocational training, in employment and workforce management, in credit risk assessment or in policing.

 

Define a purpose and clearly distinguish the learning and production phases

The CNIL also recalls that, to comply with GDPR, an artificial intelligence system based on the exploitation of personal data must always be developed, trained and deployed with a well-defined purpose. This objective must be determined from the design of the project, be legitimate (compatible with the missions of the organization) and be explicit. This is the purpose that ensures that only the relevant data is used and that the retention period chosen is appropriate.

Furthermore, setting up an AI system based on machine learning requires the succession of two phases: the learning phase and the production phase. From a data protection point of view, these two steps do not fulfill the same purpose and should therefore be separated. This is particularly the case in so-called “continuous” learning systems for which the data used during the production phase are also used to improve the system, thus proceeding from a complete feedback loop.

 

Proper database construction

AI systems, and especially those based on machine learning, require the use of large volumes of data in order to train models. For the constitution of their databases, companies can carry out a specific collection of personal data for this purpose or reuse data already collected for another purpose. In the latter case, the question arises of the compatibility of the purposes for which the data were initially collected and the conditions under which the initial database was formed.

In any case, the creation of personal databases, which are often based on long data retention periods, cannot be done to the detriment of the rights of the persons concerned. In particular, it must be accompanied by information measures either prior to collection, or within one month after receipt of the databases by third parties.

 

Fully automated profiling and decisions

Profiling is processing using an individual’s personal data with a view to analyzing and predicting their behavior, such as determining their work performance, financial situation, health, preferences, lifestyle, etc. A fully automated decision is a decision made with respect to a person, through algorithms applied to their personal data, without any human being involved in the process.

The two notions are intimately linked: profiling a person frequently leads to making a decision about them and many fully automated decisions are made on the basis of profiling. The establishment of profiles and the use of algorithms applied to sets of personal data can thus lead to fully automated decision-making, in fields as diverse as health, education, insurance, social protection, the fight against fraud, etc.

According to Article 22 of GDPR, individuals have the right not to be subject to a fully automated decision – often based on profiling – which has legal effect (a decision produces legal effect when it impacts rights and freedoms of a person) or significantly affects it. An organization can nevertheless automate this type of decision if the person has given their explicit consent, if the decision is necessary for a contract concluded with the organization or if the automated decision is authorized by specific legal provisions.

In these cases, it must be possible for the person to be informed that a fully automated decision has been taken against them, to ask to know the logic and the criteria used to make the decision, to challenge the decision and to express their point of view, and to ask for the intervention of a human being who can reconsider the decision.

In its draft recruitment guide, the CNIL analyzes the use of certain automatic classification tools, or even the evaluation of applications. Such solutions can lead to making a “decision based exclusively on automated processing” by design when applications are discarded, or when applications are relegated to a secondary plan not controlled by humans for lack of time, for example.

Due to the risks associated with this method of decision-making, which is often opaque for candidates, such procedures are in principle prohibited by GDPR. Their use is only permitted under exceptional conditions, and is subject to the implementation of specific guarantees, intended to ensure the rights and interests of candidates.

As we can see, artificial intelligence raises crucial and new questions, particularly with regard to the protection of personal data. Companies must therefore – through active monitoring – carefully monitor this discipline and put in place the appropriate actions to guarantee the rights of individuals.