Fair representation in datasets a little-known battle front in the fight for equality
Sarahjane Delany, Professor of Inclusive Computer Science TUDublin; research collaborator with the Insight SFI Research Centre for Data Analytics
A global tech company receives thousands of job applications every year. Imagine an AI system that processes these applications to screen out unsuitable candidates. Suppose that system has been trained by a dataset that includes all successful and unsuccessful CVs for the proceeding 10 years. So far, so efficient.
Now imagine your name is Michael. No one by the name of Michael has been hired by this tech company in the last ten years. The AI screening system ‘learns’ that all candidates with the name Michael have been unsuccessful in their applications. The system concludes that people called Michael must be unsuitable for recruitment.
This is an absurd example and AI systems of this type, already in widespread use, are anonymised and do not screen for people’s names. However, there are many other details buried in CVs that might read as ‘signals’ to an AI system; flagging as ‘unsuitable’ categories of people who have been unsuccessful in the past for reasons other than job suitability.
We already know that women are underrepresented in many employment areas, including tech. What details peculiar to rejected female candidates might ‘teach’ an AI system to reject them again?
We have evidence of this learned prejudice already. In 2015 tech giant Amazon’s machine-learning specialists discovered that their new AI-powered recruitment system had a women problem.
The company’s hiring tool used AI to rate candidates using data from 10 years of job applications. Most CVs in the dataset were from male applicants – a common phenomenon in the tech sector.
Not surprisingly, the system ‘preferred’ male candidates. It rejected CVs that included the word ‘women’ (ie women’s basketball team or Women in Stem Programme). Graduates of all-women’s colleges in the US were automatically demoted.
Amazon engineers reprogrammed the system to remove this particular example of bias. This episode reveals, however, a fundamental weakness of AI systems that ‘learn’ from data sets. What data sets are we using and what are they teaching our AI?
Where else might biased or incomplete datasets create problems in our day-to-day lives, in ways we are not even aware of? The NewYork Times revealed in 2019 that female applicants for the Apple credit card were given significantly lower credit scores than their male partners, even where the details of the applications were otherwise identical. Steve Wozniak, co-inventor of the Apple-1 computer, was awarded a credit limit 10 times the size of his partner’s even though ‘we have no separate bank or credit card accounts or any separate assets,’ he said. It is likely that this bias arose from a training dataset that was skewed towards male applicants.
My work at Technological University Dublin is concerned with this very question. AI cannot be representative if the data it learns from is not representative. The Equal Status Acts 2000-2018 cover nine grounds of discrimination; gender, marital status, family status, age, disability, sexual orientation, race, religion, and membership of the Traveller community. As we move to automate many public-facing systems – social welfare, government, law, healthcare screening, recruitment, parole, education, banking – we must ensure that we are not building decision-making machines that have learned from datasets that leave out or mitigate against members of the any of these nine groups.
Image recognition software is a case in point. As a judge in the BT Young Scientists Competition, I recently reviewed a project by Solomon Doyle of Dundalk Grammar School. Doyle created a mobile app to diagnose malignant skin lesions by analysing a photo and searching for similar characteristics in a database of images of diagnosed malignancies. Doyle optimised his system to improve accuracy for people of colour – he discovered biases within existing software that had been trained primarily on images of white skin. Doyle went on to win the Analog Technology Award.
How do we correct for these biases? Firstly, we need diversity in tech. Humans build AI, and where we have diverse groups developing software, we are more likely to identify and eradicate biases as they emerge.
Secondly, we must put processes in place to evaluate datasets before we use them to build decision-making systems. Natural language models, of the sort that train chatbots, have been shown to reflect gender bias existing in training data. This bias can impact on the downstream task that machine learning models, built on this training data, are to accomplish. Several techniques have been proposed to mitigate gender bias in training data. In one study at TUDublin we compare different gender bias mitigation approaches on a classification task, to see which approaches are the most effective. In a second study we compare and evaluate different systems for labelling datasets to isolate for gender. Building knowledge in this field is essential to support AI researchers in handing the datasets they use to build the systems we increasingly rely upon to make decisions on our behalf.
Thirdly, we must evaluate the decision-making systems themselves for fairness and inclusion. In the case of gender bias, does the system behave differently for groups of females than groups of males? This can be considered and extended for any of the sub groups from the nine grounds of discrimination.
The work of creating a fairer society for all consists of examining reproduction of prejudice wherever we find it. Artificial Intelligence is a brave new world, but without vigilance it will carry forward the worst aspects of a blinkered old one.