ECMI Minorities Blog. Towards Evidence-Based Minority Policy: Processing of Ethnic Data and Monitoring the Quality of National Minority Protection


Author: Dr. Ljubica Djordjević

Over the years a consensus has been established at the international level about the importance of ethnic data for the fight against discrimination, protection of national minorities and integration of diverse societies. Several bodies of the UN, Council of Europe and the OSCE have called for states to provide statistical (aggregated) data broken down by ethnicity that can be indicative of the position of persons belonging to national minorities in various areas of life. Despite these clear and consonant voices at the international level, the practice of the European states indicates a significant proportion of reluctance or opposition towards registering ethnicity in the populationcensus, let alone in administrative registers or targeted surveys.

The main justification for countries not to collect ethnic data lies in data protection legislation and the argument that the collection of such data is prohibited. The ‘sensitivity’ of personal ethnic data is indisputable: this data can reveal information based on which an individual can be put into a vulnerable position: any form of disadvantage, exclusion from rights or services, exposure to violence or, in the most extreme cases, physical execution. The trauma of the World War Two and the misuse of population data systems for the annihilation of Jews and Roma in Europe has a significant impact on the reluctance towards collection of ethnic data today. In the countries with a colonial past, the collection of ethnic data revokes memories of the classification of colonised populations and the creation of explicit or implicit hierarchies between different groups. More contemporary concerns regarding ethnic data collection relate to the misuse of data leading to discrimination against individuals and groups, ethnic profiling and perpetuation of negative stereotypes and stigmatisation of specific groups and individuals belonging to them.

The ultimate safeguard against the misuse of ethnic data could be “not to gather or save data that permits associating an individual with a potentially vulnerable group”. Addressing the complexity of processing ethnic data by its absolute prohibition seems to be too extensive and unproportioned. It would ignore and sacrifice the benefits that a well-designed processing of ethic data can offer. The core of these benefits relates to fighting discrimination, protection and empowerment of minorities and monitoring of minority and integration policies. The European legalframework on personal data protection recognises the complexity of ethnic data processing and takes a balanced approach aimed at minimising the risks and utilising the benefits. It lists ethnic data among the ‘special categories of data’, commonly referred to as sensitive data, for which additional safeguards are prescribed, but they do not constitute absolute prohibition of processing of such data. In addition to legal safeguards, appropriate technical and organisational measures are also significant to ensure the security of ethnic data: restricted (authorised) access to data; pseudonymisation and encryption of personal data, as well as the measures ensuring integrity and resilience of the processing system.

The controversy around ethnic data collection often stems from misunderstanding and/or neglecting the fact that the purpose of the processing can be achieved through aggregated (macro) data. Identifying population composition, access to employment, education, services etc. all broken down by ethnicity, same as tracking the exposure of different ethnic groups to apparently neutral rules or practices (which can indicate indirect discrimination) rests on statistical data. Hence, it does not necessarily require personal data, notwithstanding the fact that “every statistic derives from data which were personal before they were converted into statistical information”. Against this background, anonymisation serves as an important method for minimising risks of processing ethnic data. The main consequence of the anonymisation of data is that – through breaking links between the data and identified or identifiable (natural) person – the data ceases to be personal and as such does not fall under the special data protection regime. Anonymisation as a data processing technique is not fully applicable, because enjoyment of certain minority rights presupposes explicit declaration of ethnic affiliation and processing of personal ethnic data (for instance, voting for minority representatives or enjoyment of affirmative action measures aimed to support participation of national minorities in public affairs). In such cases, the above-mentioned legal and technical safeguards are essential, as is the level of trust within the society (stabile interethnic relations and trust in the state as a data controller or as the main guarantor of the lawful and fair ethnic data processing).

One of the greatest challenges in the collection of ethnic data comes from the categorisation of ethnicity. The problem is twofold and arises around the question of how to ‘put ethnicity in boxes’ and then ‘classify’ people accordingly. Ethnicity is not a clear-cut phenomenon (like age or to a lesser degree gender), and the question on how ethnicity is to be (statistically) measured is open to debates and different approaches. For that reason, collection of ethnic data goes beyond pure demographic technicality and involves different social, political and legal factors and different stakeholders (some of them also with opposite claims). Thus, it amounts to a complex endeavour, which reflects “political responses to diversity” and “debates over the nature of citizenship and belonging in a given country”.

When it comes to classification of groups, the main question is whether they are (as ethnic categories) listed on the enquiry form or not. The former method with set categories and codes makes statistical processing of data easier but opens important conceptual questions: which groups will be listed and in which order, and are multiple identities recognised through pre-set mixed-categories or through the possibility to provide multiple answers. An alternative method to determining categories is to collect ethnic data based on a fully open question (the response field is left blank). In this case, it is entirely on the respondent to define their own ethnic affiliation. It makes statistical processing more complicated as it theoretically leaves space for an indefinite number of combinations, but it appears less intrusive to respondents’ choice (self-identification) than the options with the pre-coded categories. The approach with pre-coded categories can be mitigated through offering response options that include the possibility of write-in (open) responses or indicating ‘none’ or ‘not declared’. On the other hand, to mitigate the pitfalls of the open question approach, a brief explanation and/ or a few examples can be provided with the question or enquiry.

Regardless of whether the ethnic question is designed around pre-coded categories or as an open question, the guiding principle for individual classification is voluntary self-identification. It is for every individual to identify themselves as being a member (or not) of a particular racial or ethnic group or groups. Along the same line, respondents should have possibility to indicate more than one ethnic affiliation or a combination of ethnic affiliations. Furthermore, the principle of self-identification presupposes that the respondents are not bound to one static identity: it can change over time or depending on specific situations. As a consequence, “persons (…) must not be required always to self-identify in the same manner”. This indeed poses some (methodological) challenges but yet is in line with the dynamic understanding of multiple and multi-layered ethnic identity.

Because of the controversies around ethnicity, there is a tendency to use some objective parameters (citizenship, place of birth, language or similar) either as ‘corrective’ or ‘control’ factors or as proxies for ethnicity. In the former case, data on ethnicity are compared with data on selected (objective) parameters in order to identify discrepancies that might indicate underreporting or overreporting of ethnicity and to check interdependences or tendencies. In the latter case, the lack of explicit ethnic data is compensated with some other data that can be indicative of ethnicity. This method is widely spread in (Western) Europe to assess integration of immigrants, but it is of almost no use for monitoring status of national minorities, who are born in the country and have citizenship thereof. Language seems to be the most appropriate proxy for national minorities, but this is also not straightforward bearing in mind the complexity of language use (mother tongue, primary language, language used in public or language used in private communication, just to name a few). Religion could also be indicative in cases where national minorities are also religious minorities, but, again, although religion plays important role in identity building, it is a separate identity trait and should not be mixed with ethnicity. Against this background, the FCNM Advisory Committee advises against making “automatic reference from a particular indication (for example language use) to another indication (for instance religion, ethnicity) and no assumption of certain linguistic, religious or ethnic affiliations (...) based on a person’s name or other characteristics”.

Collection of ethnic data is a tool and not a goal. It should be part of a systematic and carefully considered process throughout which the crucial questions on “what type of data or ethnicity and/or race are processed, using which definitions and for which purposes are they collected” are clarified. Accordingly, the strategy ‘the more the better’ is not a desirable remedy for the problem of the lack of ethnic data, because it can prove to be inadequate. It can cause “data fatigue” if there is too much data but not enough action is being taken; it can disguise the lack of genuine policy or be presented as a ‘policy’, and it can also ethicise areas for which ethnicity should be irrelevant. A desirable approach is a balanced one, which is aligned to a well-thought out diversity policy and serves its goals. Ultimately, collection of ethnic data cannot substitute effective diversity policy, but can only support it.

The lack of ethnic data or absence of a “coherent, systematic and long-term approach” to ethnic data collection clearly indicates the lack of a (genuine) state interest (which can result from various reasons), for coherent, systematic and long-term diversity policy and national minority protection that goes beyond the two typical issues: security threat and minority folklore. Hence, the plea for the processing of ethnic data, under condition that all legal, technical and organisational safeguards for (personal) data protection are established, and under condition that the endeavour is undertaken in a close cooperation with (the representatives of) national minorities, is at its core a call for an evidence-based minority policy and the systematic monitoring of its effects.


Back to overview

ECMI Founders