Content area
Full text
Abstract-Automatic speaker recognition technology appears to have reached a sufficient level of maturity for realistic application in the field of forensic science. However, there are key issues to be solved before the forensic community will accept its use as an investigative assistant or as evidence in actual criminal cases. To assess the state of the technology, the Federal Bureau of Investigation (FBI) built a speech corpus that included multiple levels of increasing difficulty based on text-independence, channel-independence, speaking mode, and speech duration. An evaluation of multiple automatic speaker recognition programs indicated that a large GMM model-based recognition algorithm operating with features that are robust with respect to channel variations had the best performance. In this paper we describe (1) the challenges, (2) The FBI's initial Forensic Automatic Speaker Recognition (FASR) program based on these concepts, and (3) a confidence measurement method to indicate the probabilistic certainty level of correctness of each recognition decision. We will also discuss the need and justification for input speech screening and pre-processing to improve the recognition performance of the FASR as applied in a real forensic environment.
Keywords- Text-independence, Speaker recognition, Challenges, correctness.
I. INTRODUCTION
Speaker recognition is the general term used to include all of the many different tasks of discriminating one person from another based on the sound of their voices. Forensics means the use of science or technology in the investigation and establishment of facts or evidence in the court of law. The role of forensic science is the provision of information (factual or opinion) to help answer questions of importance to investigators and to courts of law. Forensic speaker recognition (FSR) is the process of determining if a specific individual (suspected speaker) is the source of questioned voice recording (trace). This process involves the comparison of recordings of an unknown voice (questioned recording) with one or more recordings of a known voice (voice of the suspected speaker). There are several types of forensic speaker recognition. When the recognition employs any trained skill or any technologically supported procedure, the term technical forensic speaker recognition is often used. In contrast to this, so-called naive forensic speaker recognition refers to the application of everyday abilities of people to recognize familiar voices.
The success of speaker recognition system depends...




