Voice Biometrics: Technology, trust and security

Length: 300 pages
Edition: 1
Language: English
Publisher: The Institution of Engineering and Technology
Publication Date: 2021-10-27
ISBN-10: 1785619004
ISBN-13: 9781785619007
Sales Rank: #0 (See Top 100 Books)

Voice biometrics are being implemented globally in large scale applications such as remote banking, government e-services, transportation and building security access, autonomous vehicles, and healthcare. They have been integrated in numerous apps, often coupled with face biometrics and artificial intelligence methods. Voice biometrics products and solutions must meet three key requirements for the success in their deployment: they must be highly trustable regarding privacy protection; easy to use and always be available.

This edited book presents the state of the art in voice biometrics research and technologies including implementation and deployment challenges in terms of interoperability, scalability and performance, and security. The team of editors and chapter authors combine a wealth of expertise from academia and the industry. Topics covered include the fundamentals of voice biometrics; design of countermeasures for replay attack; attacker’s perspective for voice biometrics; voice biometrics; speaker de-identification; performance evaluation of voice biometrics solutions; standardization of voice biometrics technology; industry perspectives; joining forces of voice and facial biometrics; and future trends and challenges in voice biometrics.

Providing comprehensive coverage of the field of voice biometrics, this authoritative volume will be of great interest to researchers, scientists, engineers, practitioners and advanced students involved in the fields of security, biometrics, forensic sciences, human computer interaction, speech processing, acoustics, multimedia, pattern recognition, and privacy-preserving, digital signal processing and speech technologies. It will also be of interest to researchers and professionals working in law and criminology.

Cover
Halftitle Page
Series Page
Title Page
Copyright
Contents
List of figures
List of tables
Short biographies of the editors and authors
Preface to Voice Biometrics
About the editors
1   Introduction
    Chapter 2 – Fundamentals of voice biometrics: classical and machine learning approaches
    Chapter 3 – Voice biometrics: attacker’s perspective
    Chapter 4 – Voice biometrics: privacy in paralinguistic and extralinguistic tasks for health applications
    Chapter 5 – Voice privacy in biometrics: speaker de-identification
    Chapter 6 – Performance evaluation of voice biometrics solutions
    Chapter 7 – Voice biometrics: how the technology is standardized
    Chapter 8 – Voice biometrics: perspective from the industry
    Chapter 9 – Joining forces of voice and facial biometrics: a case study in the scope of NIST SRE’19
    Chapter 10 – Voice biometrics: future trends and challenges ahead
2   Fundamentals of voice biometrics: classical and machine learning approaches
    2.1   Introduction to speaker recognition systems
    2.2   Metrics for system performance evaluation
        2.2.1   ROC, DET and EER
        2.2.2   Detection cost function
    2.3   Text-independent speaker recognition
        2.3.1   Classical acoustic approaches: GMM-UBM, i-vector and PLDA
        2.3.2   DNN approaches
            2.3.2.1   Basic concepts of neural networks
            2.3.2.2   Some applications of DNNs to speech processing
        2.3.3   DNNs for speaker recognition
    2.4   Text-dependent speaker recognition
        2.4.1   Classification of systems and techniques
        2.4.2   Databases and benchmarks
    2.5   Calibration of speaker recognition scores
        2.5.1   Motivation: why to calibrate?
        2.5.2   What is calibration?
        2.5.3   Score-to-LR computation methods
            2.5.3.1   Generative calibration models: fitting distributions to scores
            2.5.3.2   Discriminative calibration models: transforming scores into LR values to optimize a cost function
        2.5.4   Performance measurement of score-to-LR methods
    References
3   Voice biometrics: attacker’s perspective
    Abstract
    3.1   Introduction
    3.2   Direct attacks
        3.2.1   Spoofing attacks
        3.2.2   Black box hardware attacks
        3.2.3   Black box adversarial attacks
    3.3   Indirect attacks
        3.3.1   Attacks on corpora
        3.3.2   Gray box hardware attacks
        3.3.3   Gray box and white box adversarial attacks
    3.4   Technological challenges
        3.4.1   Extracting prosodic information
        3.4.2   Enrolled users with malicious intent
        3.4.3   Number of trials permitted on the ASV
        3.4.4   Minuteness of the perturbation in adversarial attacks
        3.4.5   Privacy preservation of speech and voice privacy
    3.5   Conclusions and future work
    Acknowledgments
    References
4   Voice biometrics: privacy in paralinguistic and extralinguistic tasks for health applications
    4.1   Introduction
    4.2   Paralinguistic and extralinguistic tasks
        4.2.1   Speech-affecting diseases
        4.2.2   Methods
    4.3   Cryptographic primitives and MPC for PPML
        4.3.1   Homomorphic encryption
        4.3.2   Oblivious transfer
        4.3.3   Secure Multiparty Computation
            4.3.3.1   Yao’s GCs protocol
            4.3.3.2   Secret sharing
            4.3.3.3   Security models
        4.3.4   Distance-preserving hashing techniques
        Secure binary embeddings
        Secure modular hashing
    4.4   PPML for paralinguistic and extralinguistic tasks
        4.4.1   PPML for non-health-related tasks
        4.4.2   PPML for health-related tasks
        4.4.3   Private SVM+RBF for health-related tasks
            4.4.3.1   Private RBF computation
            4.4.3.2   Private SVM computation
            4.4.3.3   Experimental setup
            4.4.3.4   Model training and parameters
            4.4.3.5   Private SVM implementation details
            4.4.3.6   Classification results
            4.4.3.7   Security and computational performance
    4.5   Conclusions
    Acknowledgements
    References
5   Voice privacy in biometrics: speaker de-identification
    5.1   Introduction
    5.2   How to evaluate speaker de-identification?
        5.2.1   Subjective measures
        5.2.2   Objective measures
    5.3   Speaker de-identification techniques
        5.3.1   Codebook mapping
        5.3.2   Gaussian mixture model
        5.3.3   Frequency warping
        5.3.4   Deep learning techniques
    5.4   Experiment definition
        5.4.1   Piecewise definition of transformation functions
        5.4.2   Pretrained transformation functions
        5.4.3   De-identification based on DNNs
        5.4.4   De-identification based on generative adversarial networks
    5.5   Evaluation corpora
        5.5.1   Evaluation metrics
    5.6   Results and analysis
    5.7   Conclusion
    Acknowledgements
    References
6   Performance evaluation of voice biometrics solutions
    6.1   Introduction
    6.2   Evaluating methods or technology
        6.2.1   Existing benchmarking evaluations
        6.2.2   Evaluation criteria
            6.2.2.1   Evaluating a system producing hard decisions
            6.2.2.2   Evaluating the goodness of verification scores
        6.2.3   Statistical significance
        6.2.4   Specific evaluation aspects
        6.2.5   Evaluating related technologies
    6.3   Bias in testing
    6.4   Summary and propositions
    References
7   Voice biometrics: How the technology is standardized
    7.1   Introduction
    7.2   Biometrics standardization within ISO/IEC
        7.2.1   Generalized system design
        7.2.2   Harmonized biometric vocabulary
        7.2.3   Performance testing and reporting
        7.2.4   Presentation attack detection
        7.2.5   Biometric information protection
    7.3   Data interchange formats for passports and beyond
        7.3.1   Motivation and background on encoding biometric data
        7.3.2   Data interchange standard ISO/IEC 19794
        7.3.3   Format structure
        7.3.4   ISO/IEC 19794 Part 13: voice data
    7.4   Discussion: de facto and ISO/IEC standards
        7.4.1   On the general system design
        7.4.2   Gap analysis: performance testing and reporting
        7.4.3   Regarding implementations and data interchange formats
    7.5   Conclusion
    Acknowledgements
    References
8   Voice biometrics: perspective from the industry
    8.1   Automated password reset: an example of a commercial application using voice biometrics
        8.1.1   Overview
        8.1.2   Introduction
        8.1.3   System architecture
        8.1.4   Voice biometric system
        8.1.5   Summary
    8.2   Testing of commercial voice biometric systems
        8.2.1   Introduction
            8.2.1.1   Biometric testing
        8.2.2   User analysis
        8.2.3   Summary
    8.3   Forensic speaker recognition
        8.3.1   Introduction
        8.3.2   Forensic speaker recognition and the strength of evidence
        8.3.3   The forensic expert’s workflow
        8.3.4   Technical challenges
            8.3.4.1   Improving interpretability of scores
            8.3.4.2   Score normalization
            8.3.4.3   Score calibration
            8.3.4.4   Condition adaptation
            8.3.4.5   Dealing with multi-speaker recordings
        8.3.5   Training–communication between system developers and end-users
        8.3.6   Conclusions
    References
9   Joining forces of voice and facial biometrics: a case study in the scope of NIST SRE’19
    9.1   Introduction to the NIST SRE’19 challenge
        9.1.1   The SRE’19 CTS challenge
        9.1.2   The SRE’19 multimedia challenge
        9.1.3   SRE’19 evaluation metrics
    9.2   TSP speaker verification system for the SRE’19 evaluation
        9.2.1   A brief review of state of the art in speaker verification
        9.2.2   TSP speaker verification common pipeline for the SRE’19 CTS and multimedia challenges
        TDNN
        E-TDNN
        9.2.3   TSP speaker verification system for the SRE’19 CTS challenge
        9.2.4   TSP speaker verification system for the SRE’19 multimedia challenge
        9.2.5   Results for TSP speaker verification systems on the SRE’19 CTS and multimedia challenges
        9.2.6   Conclusions
    9.3   TSP face recognition system for SRE’19
        9.3.1   Survey of face recognition systems
        9.3.2   TSP face recognition system pipeline
        9.3.3   Databases used in the TSP face recognition system
        9.3.4   Face preprocessing
        9.3.5   Embedding extractor
        Initial version of the DNN architecture
        Final version of the DNN architecture
        9.3.6   Conclusions
    9.4   Audiovisual biometric system for the SRE’19 multimedia challenge
    9.5   Conclusions and perspectives
    Acknowledgements
    References
10   Voice biometrics: future trends and challenges ahead
    10.1   Applications
    10.2   Privacy and security
    10.3   Research
    References
Index
Back Cover