In this article, we list 10 face datasets that can be used to launch a face recognition project.
1 | Flickr – Faces – HQ data set (FFHQ)
The Flickr faces-HQ dataset (FFHQ) is a dataset of Faces that contains more variation in age, race, and image background than the Celeba-HQ dataset, and has a better coverage of accessories like glasses, sunglasses, hats, and so on. Images are captured from Flickr, then automatically aligned and cropped.
Size: The dataset consists of 70,000 high-quality PNG images at 1024×1024 resolution, and contains considerable variation in age, ethnicity, and image background.
Project: This dataset was originally created as a benchmark for generating adversarial networks (gans).
2 | Tufts – Face Face database
The Tufts Face Database is the most comprehensive large-scale dataset of faces, containing seven image modes: visible light, near infrared, thermal, computer sketch, LYTRO, recorded video, and 3D images.
Size: The dataset contains more than 10,000 images, including 74 women and 38 men from more than 15 countries, ranging in age from 4 to 70.
Project: The database can be used by researchers worldwide to benchmark facial recognition algorithms for sketch, heat, NIR, 3D face recognition and heterosexual face recognition.
3 | true and false face detection
The dataset contains expert-generated high-quality Photoshopped face images, where the images are composited from different faces, separated by eyes, nose, mouth or the whole face.
Size: Data set size is 215MB
Project: This data set can be used to distinguish between real and fake images.
4 | Google facial expressions compared data set
Google’s dataset is a large-scale facial expression dataset consisting of triples of face images and designated human annotations, with the two faces in each triplet forming the most similar pair in terms of facial expression.
Size: The dataset is 200MB in size and includes 500K triples and 156K face images.
Project: This dataset is designed to help researchers study topics related to facial expression analysis, such as emotion-based image retrieval, emotion-based album summaries, emotion classification, emotion-based synthesis, etc.
5 | face image with mark of punctuation
Face Images with Marked Landmark Points is a Kaggle data set used to predict the location of key Points on an image of a Face.
Size: The dataset is 497MP in size and contains 7049 face images and up to 15 key points marked on them.
Project: This data set can be used as a building block for applications such as tracking faces in images and videos, analyzing facial expressions, detecting signs of facial deformity for medical diagnosis and biometrics or facial recognition.
6 | tag wild homes (LFW) data set of facial
The Field Labeled Face (LFW) dataset is a face photo database designed to study unconstrained face recognition. Labeled Faces in the Wild is a common benchmark for face verification, also known as paired matching.
Size: The dataset is 173MB in size and contains more than 13,000 images of faces collected from the web.
Project: Data sets can be used for face verification and other forms of face recognition.
7 | UTKFace large-scale face data set
The UTKFace dataset is a large-scale dataset of faces with a wide range of ages, ranging from 0 to 116 years. The images cover huge variations in posture, facial expression, lighting, occlusion, resolution, and more.
Size: The dataset contains more than 20,000 images annotated with age, gender, and ethnicity.
Project: This dataset can be used for a variety of tasks, such as face detection, age estimation, age progression, age regression, landmark location, etc.
8 | YouTubeFaces with facial point data set
This Dataset is a processed version of the YouTube Faces Dataset and consists mainly of short celebrity videos that are publicly available and downloaded from YouTube. Each celebrity has multiple videos (up to six videos per celebrity).
Size: The dataset is 10GB in size and contains approximately 1293 videos, with a maximum of 240 consecutive frames per original video. There are 155,560 images in the entire single image frame.
Project: This dataset can be used to identify faces in unconstrained videos.
9 | mass CelebFaces properties (CelebA) data set
CelebFaces Attributes Dataset (CelebA) is a large-scale face attribute Dataset with over 200K celebrity images and 40 attribute annotations per image. The images in this dataset cover large postural changes and background clutter.
Size: The dataset size is 200K, including 10177 identities, 202599 personal face images, 5 landmark locations, and 40 binary attribute annotations per image.
Project: This data set can be used as a training and test set for computer vision tasks such as face attribute recognition, face detection, localization of landmarks (or parts of faces), and face editing and composition.
10 | Yale face database
The Yale Face Database contains 165 gray-scale giFs of 15 people. Each subject had 11 images, one for each different facial expression or configuration: center light, with glasses, happy, left light, without glasses, normal, right light, sad, sleepy, surprised, and blinking.
Size: The data set is 6.4MB in size and contains 5760 single-source images, each with 10 objects seen under 576 viewing conditions.
Project: Data set can be used for face recognition, doppelganger list comparison, etc.