Video Recordings of Male Face and Neck Movements

Full text

Turn on search term navigation

1. Summary

This dataset is comprised of video of multiple male subjects’ head and shoulder areas while the subjects make regular movements, which can prospectively be used to train and test facial recognition algorithms. The subjects’ movements are synchronized across the dataset to facilitate recognition time-based comparisons.

For each subject, data was collected for multiple head positions, light brightness and light temperature. The set of videos includes approximately 33,000 frames, recorded at a rate of 29.97 frames per second. Because these are not discrete images, but instead frames in a video, face detection, and identification during movement and recognition from video can also be assessed using this data.

To this end, the subjects’ movements are synchronized between videos to facilitate comparison of algorithms based on total time subject is detected during the video. Specifically, the subjects move their head and neck through an entire range of motion. This facilitates the testing of recognition at different points using individual frames as well as the testing of algorithm performance on moving video data. This dataset can be used with large training sets otherwise available to test recognition with a large database of potential subjects to match to. The data was collected in a controlled environment with a consistent white background.

2. Background

Facial recognition has multiple uses including retail store [1,2], access control [3] and law enforcement [4] applications. Facial data can also help classify subjects by gender [5] and age [6] and provide insight into their current interest level and emotional state [7].

While much facial recognition research focuses on static images, many applications require the real-time or near-real-time identification of moving faces from video. Even if single video frames are used and presented to static facial recognition systems, they have movement blur and facial orientations not supported by the static recognition system. For this reason, facial recognition from video, in many cases with static image training, is a key area of research.

Approaches which use support vector machines [8], tree-augmented naive-Bayes classifiers [9], AdaBoost [10], linear discriminant analysis [10], independent component analysis [11], Fisher linear discriminant analysis [11], sparse network of Winnows classifiers [12], k-nearest neighbor classifiers [12] and multiple other techniques [13] has been proposed. Some techniques have also been developed which take advantage of video properties [14] and motion history [12].

This dataset provides training and testing data for evaluating the performance of video facial recognition systems. In particular, it provides data for both static image and video-based training and video-based recognition. Subjects move their head to multiple positions, facilitating the comparison of algorithm performance with regards to subject head position. Additionally, data collected under different lighting brightness and temperature levels is included to facilitate the evaluation of lighting (as part of training or testing data sets) on algorithms.

3. Experimental Design, Materials, and Methods

This section discusses the equipment, configuration and experimental methods used to collect the dataset. First, the equipment and its configuration are discussed. Then the lighting conditions are presented. Finally, the experimental design is reviewed.

3.1. Equipment and Setup

A Sony AX100 4 K Expert Handycam (Tokyo, Japan) was used for video recording. The high definition 4 K videos (3840 × 2160 resolution at 29.97 frames per second) recorded were saved in the MP4 file format with the AVC codec. Two Neewer LED500LRC LED lights (Edison, NJ, USA) were used as background lighting. One Yongnuo YN600L LED light (Hot Springs, AK, USA) was used for the lighting of the subject. All lights and the camera were placed at stationary positions, depicted in Figure 1 with location measurements presented in Table 1. A Tekpower lumen meter (Montclair, CA, USA) was used to measure the lumens produced in each lighting configuration. A standard projector screen served as the backdrop for the photos. The audio present in the videos is the sound of an electronic metronome that indicates to the subject when it is time to change positions. This audio was recorded in stereo with a 48 kHz sampling rate using PCM encoding.

One lighting angle was used and the light was pointed directly towards the subject. The camera was placed in a fixed position near the light, also aimed at the subject.

3.2. Lighting

The subject was filmed under multiple lighting levels, which are summarized in Table 2. The lighting of the subject from the Yongnuo YN600L LED light was set at 60% brightness on warm (3200 k), 60% brightness on cold (5500 k), 10% brightness on warm (3200 k) with 10% brightness on cold (5500 k), 40% brightness on warm (3200 k) with 40% brightness on cold (5500 k), and 70% brightness on warm (3200 k) with 70% brightness on cold (5500 k). Lumen readings were measured using a Tekpower lumen meter and these values are also included in Table 2.

3.3. Subjects & Procedure

Five videos were taken for each of 11 subjects, using a protocol approved by the NDSU institutional review board, for a total of 55 videos. The subjects were males between the ages of 18 and 26. A few of the subjects had small beards, while the rest have minimal facial hair.

Each of the five videos, for each subject, uses a different lighting configuration. These lighting configurations are depicted in Figure 2. The videos are approximately 20 s long and the subject moves his head to a new standardized position every second. Markers were placed on the walls, floor, and ceilings that the subjects were instructed to look at, to correctly position their heads.

The subject was told to move their head after immediately upon hearing a metronome-like clicking sound. The time between each tick was one second. Because of this, each subject spends approximately the same amount of time facing in each position and moving between positions. This allows the impact of the lighting conditions on different facial recognition algorithms to be compared in terms of aggregate face detection time, as each view of the subject’s face was visible for the approximately the same amount of time in each video. Figure 3 depicts the various positions each subject positioned his head in.

4. Comparison to Other Data Sets

A number of data sets exist that have been collected for performing facial recognition work. The majority of these data sets are collections of individual images. Commonly used data sets range in size from the “ORL Database of Faces” which has 400 images for 10 individuals [15] to the “MS-Celeb-1M” dataset which has 10,000,000 images for 100,000 people [16]. Data sets also vary in the number of images that are presented for each subject. The “Labeled Faces in the Wild” data set, for example, included only two images per individual [17]. The “Pgu-Face” dataset had 4 images per individual [18] and the “FRGC 1.0.4” dataset had approximately 5 images per person [19]. Other data sets, such as the “IARPA Janus Benchmark A” [20] and “Extended Cohn-Kanade Dataset (CK+)” [21] have a larger number, including 11.4 and 23 images per individual, respectively. Prior work included producing data sets with 525 [22] and 735 [23] images per subject.

The dataset described herein is comprised of 55 videos (of 11 subjects) which are each approximately 20 seconds in length. Recorded at 29.97 frames per second, this means that each video is approximately 600 frames and that there are approximately 3,000 frames of each subject and 33,000 frames, in total, for all 11 subjects. This places this dataset towards the larger end of the spectrum, in terms of the total number of frames or images.

Data sets have been collected in a variety of ways. Learned-Miller, et al. [17] created a dataset entitled “Labeled Faces in the Wild” which included 13,000 images for 5,749 people. This data set was harvested from websites. Guo, et al. [16] also created a harvested dataset, “MS-Celeb-1M” of 10,000,000 images covering 100,000 people. Many other datasets are manually collected using volunteer subjects. These databases are typically smaller with images of fewer individuals. The “ORL Database of Faces” contained images of 10 people [15]. Larger datasets included the Georgia Tech Face Database with images of 50 people [24] and the AR Face Database [25] with images of 126 individuals.

Most manually collected datasets present multiple views of the subject, in many cases from different angles. In some cases, lighting or other environmental conditions are varied. The “Extended Yale Face Database B” [26], for example, included 9 poses and 64 lighting settings per subject. In other cases (such as the “Pgu-Face” dataset [18]), objects are placed in front of the subjects to facilitate the testing of the recognition of partially occluded faces. In prior work [22,23], subjects were imaged from multiple camera perspectives and with lighting in different positions and at different levels of brightness and at different temperatures.

The dataset described herein includes multiple lighting levels; however, the subject is asked to reposition his head into multiple positions (shown in Figure 3) instead of changing the position of the camera, the lighting location or other variables. In addition, this dataset includes continuous recordings of videos of subjects’ movement, allowing assessment of recognition in the fixed positions as well as in intermediate positions.

A final aspect of datasets that should be compared is their resolution. Datasets vary significantly in this regard, ranging from smaller images such as the Georgia Tech database (with a resolution of 640 × 640 pixels) [24] to multi-megapixel images (such as those presented in [22,23]). The 4 K video files presented in this dataset have a resolution of 3840 × 2160 pixels.

Author Contributions

Conceptualization, C.G. and J.S.; data curation, C.G.; writing—original draft preparation, C.G. and J.S.; writing—review and editing, J.S.; supervision, J.S.; project administration, J.S.; funding acquisition, J.S.

Funding

The collection of this data was supported by the United States National Science Foundation (NSF award # 1757659).

Acknowledgments

Thanks is given to William Clemons and Marco Colasito who aided in the collection of this data. Facilities and some equipment used for the collection of this data were provided by the North Dakota State University Institute for Cyber Security Education and Research and the North Dakota State University Department of Computer Science.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Figures and Tables

Figure 1. Depicts the positions of lights and the video camera.

Figure 2. The different lighting settings used for each video (left to right: Warm, Cold, Low, Medium, and High).

Figure 3. Subjects were told to position their head in multiple orientations, for one second at a time, during video recording.

Table 1

Camera, lights and subject positions.

Location	X Coordinate	Y Coordinate
Subject Light	84.5	127.5
Background 1	43.5	50.5
Background 2	129	47
Camcorder	97	132
Subject	96.5	63.5

Table 2

Light setting equipment configuration and measurements of lumens produced for each light setting.

Configuration	Light Settings	Lumens
Warm	60% brightness on warm (3200 k)	280
Cold	60% brightness on cold (5500 k)	391
Low	10% brightness on warm (3200 k) and 10% brightness on cold (5500 k)	155
Medium	40% brightness on warm (3200 k) and 40% on brightness on cold (5500 k)	492
High	70% brightness on warm (3200 k) and 70% brightness on cold (5500 k)	745

Word count: 1892

Show less

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

Facial recognition is made more difficult by unusual facial positions and movement. However, for many applications, the ability to accurately recognize moving subjects with movement-distorted facial features is required. This dataset includes videos of multiple subjects, taken under multiple lighting brightness and temperature conditions, which can be used to train and evaluate the performance of facial recognition systems.

Dataset:https://doi.org/10.17632/xgg8xcscr5.1; https://doi.org/10.17632/f47pm7rwt3.1

Dataset License: CC-BY

Details

Title

Video Recordings of Male Face and Neck Movements for Facial Recognition and Other Purposes

Author

Gros, Collin¹; Straub, Jeremy²

¹ Department of Computer Science, Texas Tech University, Lubbock, TX 79409, USA
² Department of Computer Science, North Dakota State University, Fargo, ND 58108, USA

First page

130

Publication year

2019

Publication date

2019

Publisher

MDPI AG

e-ISSN

23065729

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.3390/data4030130

ProQuest document ID

2548372009

Video Recordings of Male Face and Neck Movements for Facial Recognition and Other Purposes

Jump to:

Full text

Abstract

Details

Suggested sources