Speech Recognition Technology and Applications
- Length: 240 pages
- Edition: 1
- Language: English
- Publisher: Nova Science Pub Inc
- Publication Date: 2022
- ISBN-10: 1685079296
- ISBN-13: 9781685079291
- Sales Rank: #0 (See Top 100 Books)
Speech represents the most natural means of communication between humans. By using Automatic Speech Recognition (ASR) and Text-to-Speech (TTS) systems, machines also become able to interact with humans using speech. This is of particular importance for building interactive robots or speech-enabled chatbots. This book starts by exploring state-of-the-art ASR and TTS approaches, making use of artificial neural networks, relevant also to low-resource scenarios. Then, it explores the application of speech technology to specific domains, such as the medical domain, human-robot interaction, and even interlinking of speech and text resources using linguistic linked open data (LLOD) principles. The book also provides punctuation restoration techniques, enabling the production of high-quality text transcripts. Included algorithms have low latency and can be parallelized, thus enabling their use in interactive systems. Chapter authors are professors and scientific researchers with experience in building and using natural language processing algorithms and speech applications.
Computer Science, Technologyand Applications Speech Recognition Technologyand Applications Contents Preface Chapter 1Building an Automatic Speech RecognitionSystem for a Low-Resource Language Abstract 1. Introduction 1.1. Romanian as a Low-Resource Language 2. State-of-the-Art Architectures 2.1. HMM-GMMBased Architectures 2.2. Deep Neural Networks Architectures 2.3. Hybrid Architectures 2.4. Language Models 3. Method 3.1. Corpora 3.2. Automatic Grapheme-to-Phoneme Conversion 3.3. Language Models 3.4. Data Augmentation 3.5. Speech-to-Text Architectures for Romanian 3.5.1. CMUSphinx 3.5.2. DeepSpeech 3.5.3. DeepSpeech 2 3.5.4. Kaldi 3.6. Replicable Experiments with Containerization 4. Results 4.1. CMUSphinx 4.2. DeepSpeech 4.3. Kaldi 4.4. Data Augmentation and SpecAugment 5. Discussion 5.1. CMUSphinx 5.2. DeepSpeech 5.3. Kaldi Conclusion and FutureWork Acknowledgment References Chapter 2Self-Supervised Pre-Training in SpeechRecognition Systems Abstract 1. Introduction 2. Contrastive Representation Learning 2.1. Training Objectives 2.2. Essential Components 3. Pre-Trained ASR Architectures 3.1. Wav2Vec 3.2. VQ-Wav2Vec 3.3. Wav2Vec2 4. Comparison with Non-Pre-TrainedModels 4.1. Dataset 4.2. Baseline Models 4.3. Pre-TrainedWav2Vec2 Models 4.4. Experimental Setup 4.5. Results 5. RELATE Integration Conclusion References Chapter 3The Impact of Speech RecognitionPerformance on Human-ComputerInteraction Abstract 1. Introduction 2. Architecture of a Speech-Based Dialogue System 3. Implementation Details 3.1. Automatic Speech Recognition 3.2. DialogueManager 3.3. Text-to-Speech 4. ASR Enhancements Leading to IncreasedPerformance of the Overall System 4.1. End-to-End Neural ASR System 4.2. Fine-Tuning the ASR System with Domain-Specific Data 5. Impact of ASR Enhancements 5.1. Evaluation of RDM with a Fine-Tuned ASR System 5.2. Overall System Response Time Conclusion References Chapter 4The Role of Automatic Speech RecognitionSystems in Developing Medical Applications Abstract 1. Introduction 2. General Overview of ASR 3. NLP Applications in Medical Domain 3.1. Named Entity Recognition 3.2. Classification 3.3. Summarization 4. ASR Applications in Medical Domain 4.1. Digital Scribes for Medical Domain 4.1.1. Challenges of Developing Digital Scribes for theMedical Domain 4.2. Software and Platforms with ASR-Based Capabilities intheMedical Domain 4.2.1. Case Study: AmazonMedical • Amazon Transcribe Medical • Amazon Comprehend Medical 4.3. ASR and Vocal Biomarkers 4.4. Medical IOT and ASR Conclusion References Chapter 5Punctuation Recovery for RomanianTranscribed Documents Abstract 1. Introduction 2. Punctuation in Romanian Language 3. Corpora and Resources 4. Algorithms 5. Results Conclusion References Chapter 6Linguistic Linked Open Datafor Speech Processing Abstract 1. Introduction 2. Linguistic Linked Open Data 3. Romanian Resources as Linguistic Linked OpenData 4. LLOD Resources for Speech Processing 5. Romanian LLOD Resources for SpeechProcessing 5.1. The RoLEX Lexicon 5.2. The RTASC Corpus 6. ExploitingMultiple Resources for AdvancedUsage Scenarios Conclusion References Chapter 7Transformer-Based RomanianText-to-Speech System Using BooleanMasking for Improved Prosody Abstract 1. Introduction 2. RelatedWork 2.1. Deep Neural Models for Speech Synthesis 2.2. Speech Synthesis for the Romanian Language 3. Datasets for the Romanian Language 3.1. Existing Datasets 3.2. Introducing a New Male Voice Dataset - RSS-Alex 4. FastSpeech TTS with BooleanMasking 5. Experiments 6. Results Conclusion FutureWork Acknowledgments References About the Editor About the Contributors Index Blank Page Blank Page
Donate to keep this site alive
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.