An Introduction to Audio Content Analysis: Music Information Retrieval Tasks and Applications
- Length: 464 pages
- Edition: 2
- Language: English
- Publisher: Wiley-IEEE Press
- Publication Date: 2022-12-08
- ISBN-10: 1119890942
- ISBN-13: 9781119890942
- Sales Rank: #1442967 (See Top 100 Books)
An Introduction to Audio Content Analysis
Enables readers to understand the algorithmic analysis of musical audio signals with AI-driven approaches
An Introduction to Audio Content Analysis serves as a comprehensive guide on audio content analysis explaining how signal processing and machine learning approaches can be utilized for the extraction of musical content from audio. It gives readers the algorithmic understanding to teach a computer to interpret music signals and thus allows for the design of tools for interacting with music. The work ties together topics from audio signal processing and machine learning, showing how to use audio content analysis to pick up musical characteristics automatically. A multitude of audio content analysis tasks related to the extraction of tonal, temporal, timbral, and intensity-related characteristics of the music signal are presented. Each task is introduced from both a musical and a technical perspective, detailing the algorithmic approach as well as providing practical guidance on implementation details and evaluation.
To aid in reader comprehension, each task description begins with a short introduction to the most important musical and perceptual characteristics of the covered topic, followed by a detailed algorithmic model and its evaluation, and concluded with questions and exercises. For the interested reader, updated supplemental materials are provided via an accompanying website.
Written by a well-known expert in the music industry, sample topics covered in Introduction to Audio Content Analysis include:
- Digital audio signals and their representation, common time-frequency transforms, audio features
- Pitch and fundamental frequency detection, key and chord
- Representation of dynamics in music and intensity-related features
- Beat histograms, onset and tempo detection, beat histograms, and detection of structure in music, and sequence alignment
- Audio fingerprinting, musical genre, mood, and instrument classification
An invaluable guide for newcomers to audio signal processing and industry experts alike, An Introduction to Audio Content Analysis covers a wide range of introductory topics pertaining to music information retrieval and machine listening, allowing students and researchers to quickly gain core holistic knowledge in audio analysis and dig deeper into specific aspects of the field with the help of a large amount of references.
Cover Title Page Copyright Contents Author Biography Preface Acronyms List of Symbols Source Code Repositories Chapter 1 Introduction 1.1 A Short History of Audio Content Analysis 1.2 Applications and Use Cases 1.2.1 Music Browsing and Music Discovery 1.2.2 Music Consumption 1.2.3 Music Production 1.2.4 Music Education 1.2.5 Generative Music References Part I Fundamentals of Audio Content Analysis Chapter 2 Analysis of Audio Signals 2.1 Audio Content 2.2 Audio Content Analysis Process 2.3 Exercises 2.3.1 Questions References Chapter 3 Input Representation 3.1 Audio Signals 3.1.1 Periodic Signals 3.1.2 Random Signals 3.1.3 Statistical Signal Description 3.1.3.1 Arithmetic Mean 3.1.3.2 Geometric Mean 3.1.3.3 Harmonic Mean 3.1.3.4 Variance and Standard Deviation 3.1.3.5 Quantiles and Quantile Ranges 3.1.4 Digital Audio Signals 3.2 Audio Preprocessing 3.2.1 Down‐Mixing 3.2.2 DC Removal 3.2.3 Normalization 3.2.4 Sample Rate Conversion 3.2.5 Block‐Based Processing 3.2.6 Other Preprocessing Options 3.3 Time‐Frequency Representations 3.3.1 Fourier Transform 3.3.2 Constant Q Transform 3.3.3 Log‐Mel Spectrogram 3.3.4 Filterbanks 3.4 Other Input Representations 3.5 Instantaneous Features 3.5.1 Spectral Centroid 3.5.2 Spectral Spread 3.5.3 Spectral Skewness and Spectral Kurtosis 3.5.4 Spectral Rolloff 3.5.5 Spectral Decrease 3.5.6 Spectral Slope 3.5.7 Mel Frequency Cepstral Coefficients 3.5.8 Spectral Flux 3.5.9 Spectral Crest Factor 3.5.10 Spectral Flatness 3.5.11 Tonal Power Ratio 3.5.12 Maximum of Autocorrelation Function 3.5.13 Zero Crossing Rate 3.6 Learned Features 3.7 Feature PostProcessing 3.7.1 Derived Features 3.7.2 Feature Aggregation 3.7.3 Normalization and Mapping 3.7.4 Feature Dimensionality Reduction 3.7.4.1 Feature Subset Selection 3.7.4.2 Feature Space Transformation 3.8 Exercises 3.8.1 Questions 3.8.2 Assignments References Chapter 4 Inference 4.1 Classification 4.2 Regression 4.3 Clustering 4.4 Distance and Similarity 4.5 Underfitting and Overfitting 4.6 Exercises 4.6.1 Questions 4.6.2 Assignments References Chapter 5 Data 5.1 Data Split 5.1.1 N‐Fold Cross Validation 5.2 Training Data Augmentation 5.3 Utilization of Data From Related Tasks 5.4 Reducing Accuracy Requirements for Data Annotation 5.5 Semi‐, Self‐, and Unsupervised Learning 5.6 Exercises 5.6.1 Questions 5.6.2 Assignments References Chapter 6 Evaluation 6.1 Metrics 6.1.1 Classification 6.1.2 Regression 6.1.3 Clustering 6.2 Exercises 6.2.1 Questions References Part II Music Transcription Chapter 7 Tonal Analysis 7.1 Human Perception of Pitch 7.1.1 Pitch Scales 7.1.2 Chroma Perception 7.2 Representation of Pitch in Music 7.2.1 Pitch Classes and Names 7.2.2 Intervals 7.2.3 The Frequency of Musical Pitch 7.2.3.1 Temperament 7.2.3.2 Intonation 7.3 Fundamental Frequency Detection 7.3.1 Detection Accuracy 7.3.1.1 Time Domain 7.3.1.2 Frequency Domain 7.3.1.3 Potential Solutions 7.3.2 Preprocessing 7.3.3 Monophonic Input Signals 7.3.3.1 Zero Crossing Rate 7.3.3.2 Autocorrelation Function 7.3.3.3 Average Magnitude Difference Function 7.3.3.4 Harmonic Product Spectrum and Harmonic Sum Spectrum 7.3.3.5 Autocorrelation Function of the Magnitude Spectrum 7.3.3.6 Cepstral Pitch Detection 7.3.3.7 Maximum Likelihood and Template Matching 7.3.3.8 Auditory‐Motivated Pitch Tracking 7.3.4 Polyphonic Input Signals 7.3.4.1 Iterative Subtraction 7.3.4.2 Nonnegative Matrix Factorization 7.3.4.3 Other Approaches 7.3.5 Evaluation 7.3.5.1 Metrics 7.3.5.2 Datasets 7.3.5.3 Results 7.4 Tuning Frequency Estimation 7.4.1 Approaches to Tuning Frequency Estimation 7.4.2 Evaluation 7.5 Key Detection 7.5.1 Pitch Chroma 7.5.1.1 Pitch Chroma Properties 7.5.1.2 Features Derived from the Pitch Chroma 7.5.2 Approaches to Key Detection 7.5.2.1 Key Profiles 7.5.2.2 Similarity Measure between Template and Extracted Vector 7.5.3 Evaluation 7.5.3.1 Metrics 7.5.3.2 Datasets 7.5.3.3 Results 7.6 Chord Recognition 7.6.1 Approaches to Chord Recognition 7.6.2 Viterbi Algorithm 7.6.3 Evaluation 7.6.3.1 Metrics 7.6.3.2 Datasets 7.6.3.3 Results 7.7 Exercises 7.7.1 Questions 7.7.2 Assignments References Chapter 8 Intensity 8.1 Human Perception of Intensity and Loudness 8.2 Representation of Dynamics in Music 8.3 Features 8.3.1 Root Mean Square 8.3.2 Weighted Root Mean Square 8.3.3 Peak Envelope 8.3.4 Psycho‐Acoustic Loudness Features 8.4 Exercises 8.4.1 Questions 8.4.2 Assignments References Chapter 9 Temporal Analysis 9.1 Human Perception of Temporal Events 9.1.1 Onsets 9.1.2 Tempo and Meter 9.1.3 Rhythm 9.1.4 Timing 9.2 Representation of Temporal Events in Music 9.2.1 Tempo and Time Signature 9.2.2 Note Value 9.3 Onset Detection 9.3.1 Novelty Function 9.3.2 Peak Picking 9.3.3 Evaluation 9.3.3.1 Metrics 9.3.3.2 Datasets 9.3.3.3 Results 9.4 Beat Histogram 9.4.1 Beat Histogram Features 9.5 Detection of Tempo and Beat Phase 9.5.1 Evaluation 9.5.1.1 Metrics 9.5.1.2 Datasets 9.5.1.3 Results 9.6 Detection of Meter and Downbeat 9.7 Structure Detection 9.7.1 Self‐Similarity Matrix 9.7.2 Approaches to Structure Detection 9.7.2.1 Novelty Analysis 9.7.2.2 Homogeneity Analysis 9.7.2.3 Repetition Analysis 9.7.3 Evaluation 9.7.3.1 Metrics 9.7.3.2 Datasets 9.7.3.3 Results 9.8 Automatic Drum Transcription 9.8.1 Transcription of Drum Onsets 9.8.2 Evaluation 9.9 Exercises 9.9.1 Questions 9.9.2 Assignments References Chapter 10 Alignment 10.1 Dynamic Time Warping 10.1.1 Example 10.1.2 Common Variants 10.1.3 Optimizations 10.2 Audio‐to‐Audio Alignment 10.3 Audio‐to‐Score Alignment 10.3.1 Real‐Time Systems 10.3.2 Non‐Real‐Time Systems 10.4 Evaluation 10.4.1 Metrics 10.4.2 Data 10.5 Exercises 10.5.1 Questions 10.5.2 Assignments References Part III Music Identification, Classification, and Assessment Chapter 11 Audio Fingerprinting 11.1 Fingerprint Extraction 11.2 Fingerprint Matching 11.3 Fingerprinting System: Example 11.4 Evaluation References Chapter 12 Music Similarity Detection and Music Genre Classification 12.1 Music Similarity Detection 12.1.1 Approaches to Music Similarity Computation 12.1.2 Evaluation 12.2 Musical Genre Classification 12.2.1 Approaches to Musical Genre Classification 12.2.2 Genre Classification: Example 12.2.3 Evaluation 12.2.3.1 Metrics 12.2.3.2 Data 12.2.3.3 Results 12.2.4 Exercises 12.2.5 Questions 12.2.6 Assignments References Chapter 13 Mood Recognition 13.1 Approaches to Mood Recognition 13.2 Evaluation References Chapter 14 Musical Instrument Recognition 14.1 Evaluation References Chapter 15 Music Performance Assessment 15.1 Music Performance 15.2 Music Performance Analysis 15.3 Approaches to Music Performance Assessment References Part IV Appendices Appendix A Fundamentals A.1 Sampling and Quantization A.1.1 Sampling A.1.2 Quantization A.2 Convolution A.2.1 Identity A.2.2 Commutativity A.2.3 Associativity A.2.4 Distributivity A.2.5 Circularity A.2.6 Simple Filter Examples A.2.6.1 Moving Average Filter A.2.6.2 Single‐Pole Low‐Pass Filter A.2.7 Zero‐Phase Filtering with IIRs A.3 Correlation Function A.3.1 Normalization A.3.2 Autocorrelation Function A.3.3 Applications A.3.4 Calculation in the Frequency Domain A.3.4.1 Frequency Domain Compression References Appendix B Fourier Transform B.1 Properties of the Fourier Transformation B.1.1 Inverse Fourier Transform B.1.2 Superposition B.1.3 Convolution and Multiplication B.1.4 Parseval's Theorem B.1.5 Time and Frequency Shift B.1.6 Symmetry B.1.7 Time and Frequency Scaling B.1.8 Derivatives B.2 Spectrum of Example Time Domain Signals B.2.1 Delta Function B.2.2 Constant B.2.3 Cosine B.2.4 Rectangular Window B.2.5 Delta Pulse B.3 Transformation of Sampled Time Signals B.4 Short Time Fourier Transform of Continuous Signals B.4.1 Window Functions B.4.1.1 Rectangular Window B.4.1.2 Bartlett Window B.4.1.3 Generalized Superposed Cosines B.4.1.4 Generalized Power of Cosine B.5 Discrete Fourier Transform B.5.1 Window Functions B.5.1.1 Discrete Window Properties B.5.2 Fast Fourier Transform B.6 Frequency Reassignment: Instantaneous Frequency References Appendix C Principal Component Analysis C.1 Computation of the Transformation Matrix C.2 Interpretation of the Transformation Matrix Appendix D Linear Regression Appendix E Software for Audio Analysis E.1 Frameworks and Libraries E.1.1 librosa E.1.2 Essentia E.1.3 openSMILE E.1.4 Marsyas E.1.5 jMIR E.1.6 MIRtoolbox E.1.7 Yaafe E.1.8 madmom E.1.9 Software for Education E.1.10 Other Software E.2 Data Annotation and Visualization References Appendix F Datasets References Index EULA
Donate to keep this site alive
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.