AI-based software can inherit and perpetuate existing biases present in the data used to train them. If the training data contains biased or discriminatory patterns, the AI models learn and replicate those biases, potentially exacerbating existing inequalities.
Training Data Bias
Scenario: Hiring Bias in a Resume Screening Model
Background:
A company develops an AI model to assist in the screening of resumes for job applicants. The goal is to automate the initial phase of the hiring process by identifying qualified candidates based on their resumes.
Training Data:
The training data for the model consists of historical resumes that were used by the company's hiring team to make decisions. The resumes include information about education, work experience, skills, and other relevant details.
Bias in the Training Data:
- Gender Bias - Generally, hiring data shows a gender bias towards male candidates compared to equally qualified female candidates, leading the model to more positively with males.
- Education Bias - The model may learn to prefer candidates from these specific institutions, potentially leading to biased hiring decisions against candidates from less-known or non-traditional educational backgrounds.
- Experience Bias - The historical hiring decisions may indicate a bias towards candidates with a certain experience in the industry. The model may learn to favor candidates with a specific experience, potentially excluding qualified candidates with limited experiences.
Impact on the Model:
As a result of these biases in the training data, the AI model may inadvertently favor male candidates, candidates from prestigious universities, or those with a specific type of work experience.
Algorithmic Bias
Scenario: Movie Recommendation Algorithm
Objective:
An online streaming service develops an algorithm to recommend movies to users based on their viewing history and preferences.
Algorithm Training:
The recommendation algorithm is trained using user data like viewing history, ratings, and genre preferences. The goal is to provide personalized movie recommendations to enhance user satisfaction.
Algorithmic Bias:
- Genre Bias - If the training data predominantly includes certain genres due to user preferences, the algorithm will start becoming more and more biased towards recommending movies from those genres more frequently. Users with may receive recommendations that are skewed towards some genres, potentially limiting the variety of movies suggested.
- Popularity Bias - The algorithm, when recommending movies, may prioritize popular or trending titles over lesser-known, independent films. This can result in a feedback loop where popular movies receive more views and further reinforce their prominence in the algorithm's recommendations.
Impact on Recommendations:
Due to these biases, users may find their movie recommendations limited to a narrow set of genres, popular titles, and potentially missing out on discovering a broader range of content.
Data Representation Bias
Scenario: Bias in Predictive Healthcare Model
Objective:
A healthcare organization is developing an AI model to predict patient outcomes based on historical medical data.
Data Representation Bias:
- Socioeconomic Bias in Health Records - Health records used to train the predictive model may not reflect the correct distribution of socioeconomic status, with a majority of data coming from patients with higher incomes and / or better access to healthcare.
- Geographic Bias - If the model is trained on data from urban areas and lacks sufficient representation from rural communities, it may not generalize well to predict outcomes for patients in those underserved regions.
Impact on the Model:
The model may perform well for populations well-represented in the training data but may struggle to generalize to diverse demographic groups.
Equality and AI