One of the apparent strengths of artificial intelligence is its ability to remove human bias. However, although this may be the intention, AI systems learn what they are taught; meaning if they are not powered by robust and diverse data sets, bias can still emerge.

The challenge in training AI is clearly demonstrated by facial recognition technology. Facial recognition systems use biometrics to map facial features from an image, and then compares this with a database of known faces to find a match.

If the data used when training the machine learning software favours particular facial characteristics, problems arise. If, for example, a larger proportion of the data comes from people of a certain ethnicity or skin colour, the system will be better equipped to recognise certain facial features, and will struggle to recognise others.

This means that some users may encounter problems when using facial recognition. According to the New York Times, a study conducted last year by Joy Buolamwini, a researcher at the MIT Media Lab, found that Amazon’s facial analysis software can recognise the face of a white man 99% of the time. However, for darker skinned women, the software made errors in 35% of cases, often misidentifying gender.

To combat this, data sets must be large enough and different enough that the technology learns to recognise a wide variety of different faces regardless of age, gender, ethnicity and skin tone, as not only are errors annoying for users, they point to an inherently unrepresentative dataset.

This will only become more apparent as facial recognition software becomes more commonplace, with the iPhone XR equipped with Face ID and many airports expected to replace passports with biometric facial recognition in the future, highlighting the need for AI systems that are fair and accurate.

How well do you really know your competitors?

Access the most comprehensive Company Profiles on the market, powered by GlobalData. Save hours of research. Gain competitive edge.

Company Profile – free sample

Thank you!

Your download email will arrive shortly

Not ready to buy yet? Download a free sample

We are confident about the unique quality of our Company Profiles. However, we want you to make the most beneficial decision for your business, so we offer a free sample that you can download by submitting the below form

By GlobalData
Visit our Privacy Policy for more information about our services, how we may use, process and share your personal data, including information of your rights in respect of your personal data and how you can unsubscribe from future marketing communications. Our services are intended for corporate subscribers and you warrant that the email address submitted is your corporate email address.

IBM’s facial recognition dataset

Today, IBM Research, a subsidiary of the computer hardware company, released a new, large and diverse dataset called Diversity in Faces (DiF) to advance the study of accuracy in facial recognition technology.

Believed to be the first of its kind, DiF provides a data set of annotations of 1 million human facial images using publicly available images from the YFCC-100M Creative Commons data set.

IBM then annotated the faces using ten different coding schemes to measure craniofacial features such as head length, nose length, forehead height and other factors, including age and gender.

By studying a wide range of different faces, it is hoped that diversity and coverage of data for AI facial recognition will improve by providing a more balanced distribution and broader coverage of facial images compared with previous data sets.

The dataset is now available to the global research community upon request.