Skip to main content

14 December 2021

Machine learning is here…. But do you know it is biased?

Auto generated profile Image
Written by

Andy Gray

In January 2018, the Google CEO Sundar Pichai claimed that Artificial Intelligence (AI) will be more transformative to humanity than electricity.

So what is AI?

Well, the concept of AI and machine learning (ML) is not new. However, due to the computational power we now have, it has only been since the 1980's that ML started to take off. PWC has reported that by 2030, the global GDP could rise by 14% due to AI-enabled activities, equal to $15.7 trillion. And, 84% of enterprises believe investing in AI will lead to more significant competitive advantages.

So, with AI having such a substantial impact on our daily lives now and in the future, it is vital that we empower ourselves to understand it. AI is the overarching term which has multiple subfields within, these are: 

  • ML: the approach of using more traditional statistical algorithms.
  • Natural Language Processing (NLP): the method for computers to make sense of human languages. This is the main driver behind translator apps and AI voice assistant devices like Apple's Siri and Google's Voice Assistant.
  • Deep Learning: the field that looks into the different forms of neural networks.
  • Knowledge-Based Systems: one of the early methods of AI that uses a series of ‘if/else’ statements for knowledge representation (used a lot within the field of law).
  • Computer Vision: the process of allowing a computer to see, so it can classify images or label people's faces.

In general, AI aims to learn from past data to make predictions on future events. You may have heard the saying “a model is only as good as its training data”. This means that a model will only be effective to use within production if the training (or past) data truly has insight within it.

Holding quality data is hard due to the collected data having potential missing values, or collecting biased data. For example, if the data collected represents more white men than any other demographic, the model will learn these biases and enforce them, as that is what past data has told it to do.

On the other hand, ML has three main approaches determined by the style of data. These are supervised, unsupervised, and semi-supervised learning. We will focus on the two main ones:

  • Supervised learning involves the data having targets or more commonly known as a label. Labels allow the algorithm (either regression or classification) to know what to do with the data when it is training, constantly updating its metrics with the aim to predict more accurately.
  • Unsupervised learning is used when the data does not have output labels. Therefore the algorithms aim to look for patterns within the dataset. These will include clustering, anomaly detection methods and probabilistic methods. Unsupervised methods can help you find features within your data that can be useful for categorising the data. It is easier to get unlabelled data from a computer than labelled data, which needs manual intervention.

The problem of bias

A scary statistic from the AI Index found that men continue to dominate the AI applicant pool, as much as 71% on average. Having such a large amount from one demographic will lead to the data generated being biased, as people are unintentionally more likely to create more for their image than anyone else.

An example is Google's image search classifier, where you upload an image into Google image search, providing misclassification of labels. The model was classifying people with an African Caribbean heritage as gorillas. Users first reported the issue of misclassification to Google in 2017 and Google acknowledged the issue, made a quick solution to the problem promising a more long term solution. The ‘quick-fix’ was to remove the gorilla label as a possible classification and that is still in place.

Issues with using historical data have caused multiple problems, especially in the financial sector. For example, 10.1% of Asian applicants were denied a conventional loan, but by comparison, just 7.9% of white applicants were denied. However, 19.3% of black borrowers and 13.5% of Hispanic borrowers were turned down for a conventional loan by banks. Black and Hispanic people are more likely to pay higher mortgage rates. And it doesn't stop there - men are more likely to get a higher limit on a credit card than women, even if the man has a worse credit rating.

It is clear we face an overabundance of poor-quality leading to concerns over the usage of too many data sources that can hide as proxies for illegal discrimination. For example, the law makes it unlawful to use gender to determine credit eligibility or pricing. Still, countless proxies for gender exist, from the type of deodorant you buy to the movies you watch.

Key requirements to resolve the issues

Due to all the issues surrounding data, a movement within the academic tech space around transparency and explainable AI (XAI) has started. This has also led to the EU setting seven key requirements for all AI systems coming into law by 2022. 

The key features are:

  • Human agency and oversight: AI systems should empower human beings, allowing them to make informed decisions and fostering their fundamental rights. Whilst proper oversight mechanisms need to be ensured – these can be achieved through human-in-the-loop, human-on-the-loop, and human-in-command approaches.
  • Technical robustness and safety: AI systems need to be resilient and secure. They need to be safe (ensuring a fallback plan in case something goes wrong), accurate, reliable, and reproducible. This is the only way to ensure unintentional harm can be minimised and prevented.
  • Privacy and data governance: besides ensuring full respect for privacy and data protection, adequate data governance mechanisms must also be ensured. These should take into account quality and integrity of the data, therefore ensuring legitimised access to data.
  • Transparency: the data, system and AI business models should be transparent. Traceability mechanisms can help achieve this. Moreover, AI systems and their decisions should be explained in a manner that can be understood. Humans need to be aware they are interacting with an AI system, and must be informed of the system’s capabilities and limitations.
  • Diversity, non-discrimination and fairness: unfair bias must be avoided, as it could have multiple negative implications, from the marginalisation of vulnerable groups, to the exacerbation of prejudice and discrimination. To foster diversity, AI systems should be accessible to all, regardless of any disability, and involve relevant stakeholders throughout their entire life circle.
  • Societal and environmental well-being: AI systems should benefit all human beings, including future generations. Ensuring they are sustainable and environmentally friendly, they should take into account the environment, including other living beings, and their social and societal impact should be carefully considered. 
  • Accountability: mechanisms should be in place to ensure responsibility and accountability for AI systems and their outcomes. Auditability, which enables the assessment of algorithms, data and design processes, plays a key role therein, especially in critical applications. And, adequate and accessible redress should be ensured.

AI and ML are now in our lives forever. However, we must first understand how these algorithms work. Empowering ourselves with this knowledge, along with legislation, will allow us to challenge any biased decisions and allow the models to get trained more fairly.

The only way we can do this is put pressure on companies to have transparent models and XAI so everyone can see why the model came up with its decision. Placing this pressure on the big tech companies will allow societies to move forward and break the underlying biases that existed in the past data sets and within our tech.

Resources for teachers to learn about Artificial Intelligence

  1. CAS in a BOX – Artificial Intelligence (Beginners) – available from your CAS Community Leader at a CAS Community Meeting (ask your community leader to feature AI on the agenda)
  2. CAS in a BOX – Artificial Intelligence (Intermediate) - available from your CAS Community Leader at a CAS Community Meeting (ask your community leader to feature AI on the agenda)

Resources to support teaching Artificial Intelligence Concepts