An Introduction to Audio Content Analysis: Music Information Retrieval Tasks and Applications

Length: 464 pages
Edition: 2
Language: English
Publisher: Wiley-IEEE Press
Publication Date: 2022-12-08
ISBN-10: 1119890942
ISBN-13: 9781119890942
Sales Rank: #1442967 (See Top 100 Books)

An Introduction to Audio Content Analysis

Enables readers to understand the algorithmic analysis of musical audio signals with AI-driven approaches

An Introduction to Audio Content Analysis serves as a comprehensive guide on audio content analysis explaining how signal processing and machine learning approaches can be utilized for the extraction of musical content from audio. It gives readers the algorithmic understanding to teach a computer to interpret music signals and thus allows for the design of tools for interacting with music. The work ties together topics from audio signal processing and machine learning, showing how to use audio content analysis to pick up musical characteristics automatically. A multitude of audio content analysis tasks related to the extraction of tonal, temporal, timbral, and intensity-related characteristics of the music signal are presented. Each task is introduced from both a musical and a technical perspective, detailing the algorithmic approach as well as providing practical guidance on implementation details and evaluation.

To aid in reader comprehension, each task description begins with a short introduction to the most important musical and perceptual characteristics of the covered topic, followed by a detailed algorithmic model and its evaluation, and concluded with questions and exercises. For the interested reader, updated supplemental materials are provided via an accompanying website.

Written by a well-known expert in the music industry, sample topics covered in Introduction to Audio Content Analysis include:

Digital audio signals and their representation, common time-frequency transforms, audio features
Pitch and fundamental frequency detection, key and chord
Representation of dynamics in music and intensity-related features
Beat histograms, onset and tempo detection, beat histograms, and detection of structure in music, and sequence alignment
Audio fingerprinting, musical genre, mood, and instrument classification

An invaluable guide for newcomers to audio signal processing and industry experts alike, An Introduction to Audio Content Analysis covers a wide range of introductory topics pertaining to music information retrieval and machine listening, allowing students and researchers to quickly gain core holistic knowledge in audio analysis and dig deeper into specific aspects of the field with the help of a large amount of references.

Cover
Title Page
Copyright
Contents
Author Biography
Preface
Acronyms
List of Symbols
Source Code Repositories
Chapter 1 Introduction
	1.1 A Short History of Audio Content Analysis
	1.2 Applications and Use Cases
		1.2.1 Music Browsing and Music Discovery
		1.2.2 Music Consumption
		1.2.3 Music Production
		1.2.4 Music Education
		1.2.5 Generative Music
	References
Part I Fundamentals of Audio Content Analysis
	Chapter 2 Analysis of Audio Signals
		2.1 Audio Content
		2.2 Audio Content Analysis Process
		2.3 Exercises
			2.3.1 Questions
		References
	Chapter 3 Input Representation
		3.1 Audio Signals
			3.1.1 Periodic Signals
			3.1.2 Random Signals
			3.1.3 Statistical Signal Description
				3.1.3.1 Arithmetic Mean
				3.1.3.2 Geometric Mean
				3.1.3.3 Harmonic Mean
				3.1.3.4 Variance and Standard Deviation
				3.1.3.5 Quantiles and Quantile Ranges
			3.1.4 Digital Audio Signals
		3.2 Audio Preprocessing
			3.2.1 Down‐Mixing
			3.2.2 DC Removal
			3.2.3 Normalization
			3.2.4 Sample Rate Conversion
			3.2.5 Block‐Based Processing
			3.2.6 Other Preprocessing Options
		3.3 Time‐Frequency Representations
			3.3.1 Fourier Transform
			3.3.2 Constant Q Transform
			3.3.3 Log‐Mel Spectrogram
			3.3.4 Filterbanks
		3.4 Other Input Representations
		3.5 Instantaneous Features
			3.5.1 Spectral Centroid
			3.5.2 Spectral Spread
			3.5.3 Spectral Skewness and Spectral Kurtosis
			3.5.4 Spectral Rolloff
			3.5.5 Spectral Decrease
			3.5.6 Spectral Slope
			3.5.7 Mel Frequency Cepstral Coefficients
			3.5.8 Spectral Flux
			3.5.9 Spectral Crest Factor
			3.5.10 Spectral Flatness
			3.5.11 Tonal Power Ratio
			3.5.12 Maximum of Autocorrelation Function
			3.5.13 Zero Crossing Rate
		3.6 Learned Features
		3.7 Feature PostProcessing
			3.7.1 Derived Features
			3.7.2 Feature Aggregation
			3.7.3 Normalization and Mapping
			3.7.4 Feature Dimensionality Reduction
				3.7.4.1 Feature Subset Selection
				3.7.4.2 Feature Space Transformation
		3.8 Exercises
			3.8.1 Questions
			3.8.2 Assignments
		References
	Chapter 4 Inference
		4.1 Classification
		4.2 Regression
		4.3 Clustering
		4.4 Distance and Similarity
		4.5 Underfitting and Overfitting
		4.6 Exercises
			4.6.1 Questions
			4.6.2 Assignments
		References
	Chapter 5 Data
		5.1 Data Split
			5.1.1 N‐Fold Cross Validation
		5.2 Training Data Augmentation
		5.3 Utilization of Data From Related Tasks
		5.4 Reducing Accuracy Requirements for Data Annotation
		5.5 Semi‐, Self‐, and Unsupervised Learning
		5.6 Exercises
			5.6.1 Questions
			5.6.2 Assignments
		References
	Chapter 6 Evaluation
		6.1 Metrics
			6.1.1 Classification
			6.1.2 Regression
			6.1.3 Clustering
		6.2 Exercises
			6.2.1 Questions
		References
Part II Music Transcription
	Chapter 7 Tonal Analysis
		7.1 Human Perception of Pitch
			7.1.1 Pitch Scales
			7.1.2 Chroma Perception
		7.2 Representation of Pitch in Music
			7.2.1 Pitch Classes and Names
			7.2.2 Intervals
			7.2.3 The Frequency of Musical Pitch
				7.2.3.1 Temperament
				7.2.3.2 Intonation
		7.3 Fundamental Frequency Detection
			7.3.1 Detection Accuracy
				7.3.1.1 Time Domain
				7.3.1.2 Frequency Domain
				7.3.1.3 Potential Solutions
			7.3.2 Preprocessing
			7.3.3 Monophonic Input Signals
				7.3.3.1 Zero Crossing Rate
				7.3.3.2 Autocorrelation Function
				7.3.3.3 Average Magnitude Difference Function
				7.3.3.4 Harmonic Product Spectrum and Harmonic Sum Spectrum
				7.3.3.5 Autocorrelation Function of the Magnitude Spectrum
				7.3.3.6 Cepstral Pitch Detection
				7.3.3.7 Maximum Likelihood and Template Matching
				7.3.3.8 Auditory‐Motivated Pitch Tracking
			7.3.4 Polyphonic Input Signals
				7.3.4.1 Iterative Subtraction
				7.3.4.2 Nonnegative Matrix Factorization
				7.3.4.3 Other Approaches
			7.3.5 Evaluation
				7.3.5.1 Metrics
				7.3.5.2 Datasets
				7.3.5.3 Results
		7.4 Tuning Frequency Estimation
			7.4.1 Approaches to Tuning Frequency Estimation
			7.4.2 Evaluation
		7.5 Key Detection
			7.5.1 Pitch Chroma
				7.5.1.1 Pitch Chroma Properties
				7.5.1.2 Features Derived from the Pitch Chroma
			7.5.2 Approaches to Key Detection
				7.5.2.1 Key Profiles
				7.5.2.2 Similarity Measure between Template and Extracted Vector
			7.5.3 Evaluation
				7.5.3.1 Metrics
				7.5.3.2 Datasets
				7.5.3.3 Results
		7.6 Chord Recognition
			7.6.1 Approaches to Chord Recognition
			7.6.2 Viterbi Algorithm
			7.6.3 Evaluation
				7.6.3.1 Metrics
				7.6.3.2 Datasets
				7.6.3.3 Results
		7.7 Exercises
			7.7.1 Questions
			7.7.2 Assignments
		References
	Chapter 8 Intensity
		8.1 Human Perception of Intensity and Loudness
		8.2 Representation of Dynamics in Music
		8.3 Features
			8.3.1 Root Mean Square
			8.3.2 Weighted Root Mean Square
			8.3.3 Peak Envelope
			8.3.4 Psycho‐Acoustic Loudness Features
		8.4 Exercises
			8.4.1 Questions
			8.4.2 Assignments
		References
	Chapter 9 Temporal Analysis
		9.1 Human Perception of Temporal Events
			9.1.1 Onsets
			9.1.2 Tempo and Meter
			9.1.3 Rhythm
			9.1.4 Timing
		9.2 Representation of Temporal Events in Music
			9.2.1 Tempo and Time Signature
			9.2.2 Note Value
		9.3 Onset Detection
			9.3.1 Novelty Function
			9.3.2 Peak Picking
			9.3.3 Evaluation
				9.3.3.1 Metrics
				9.3.3.2 Datasets
				9.3.3.3 Results
		9.4 Beat Histogram
			9.4.1 Beat Histogram Features
		9.5 Detection of Tempo and Beat Phase
			9.5.1 Evaluation
				9.5.1.1 Metrics
				9.5.1.2 Datasets
				9.5.1.3 Results
		9.6 Detection of Meter and Downbeat
		9.7 Structure Detection
			9.7.1 Self‐Similarity Matrix
			9.7.2 Approaches to Structure Detection
				9.7.2.1 Novelty Analysis
				9.7.2.2 Homogeneity Analysis
				9.7.2.3 Repetition Analysis
			9.7.3 Evaluation
				9.7.3.1 Metrics
				9.7.3.2 Datasets
				9.7.3.3 Results
		9.8 Automatic Drum Transcription
			9.8.1 Transcription of Drum Onsets
			9.8.2 Evaluation
		9.9 Exercises
			9.9.1 Questions
			9.9.2 Assignments
		References
	Chapter 10 Alignment
		10.1 Dynamic Time Warping
			10.1.1 Example
			10.1.2 Common Variants
			10.1.3 Optimizations
		10.2 Audio‐to‐Audio Alignment
		10.3 Audio‐to‐Score Alignment
			10.3.1 Real‐Time Systems
			10.3.2 Non‐Real‐Time Systems
		10.4 Evaluation
			10.4.1 Metrics
			10.4.2 Data
		10.5 Exercises
			10.5.1 Questions
			10.5.2 Assignments
		References
Part III Music Identification, Classification, and Assessment
	Chapter 11 Audio Fingerprinting
		11.1 Fingerprint Extraction
		11.2 Fingerprint Matching
		11.3 Fingerprinting System: Example
		11.4 Evaluation
		References
	Chapter 12 Music Similarity Detection and Music Genre Classification
		12.1 Music Similarity Detection
			12.1.1 Approaches to Music Similarity Computation
			12.1.2 Evaluation
		12.2 Musical Genre Classification
			12.2.1 Approaches to Musical Genre Classification
			12.2.2 Genre Classification: Example
			12.2.3 Evaluation
				12.2.3.1 Metrics
				12.2.3.2 Data
				12.2.3.3 Results
			12.2.4 Exercises
			12.2.5 Questions
			12.2.6 Assignments
		References
	Chapter 13 Mood Recognition
		13.1 Approaches to Mood Recognition
		13.2 Evaluation
		References
	Chapter 14 Musical Instrument Recognition
		14.1 Evaluation
		References
	Chapter 15 Music Performance Assessment
		15.1 Music Performance
		15.2 Music Performance Analysis
		15.3 Approaches to Music Performance Assessment
		References
Part IV Appendices
	Appendix A Fundamentals
		A.1 Sampling and Quantization
			A.1.1 Sampling
			A.1.2 Quantization
		A.2 Convolution
			A.2.1 Identity
			A.2.2 Commutativity
			A.2.3 Associativity
			A.2.4 Distributivity
			A.2.5 Circularity
			A.2.6 Simple Filter Examples
				A.2.6.1 Moving Average Filter
				A.2.6.2 Single‐Pole Low‐Pass Filter
			A.2.7 Zero‐Phase Filtering with IIRs
		A.3 Correlation Function
			A.3.1 Normalization
			A.3.2 Autocorrelation Function
			A.3.3 Applications
			A.3.4 Calculation in the Frequency Domain
				A.3.4.1 Frequency Domain Compression
		References
	Appendix B Fourier Transform
		B.1 Properties of the Fourier Transformation
			B.1.1 Inverse Fourier Transform
			B.1.2 Superposition
			B.1.3 Convolution and Multiplication
			B.1.4 Parseval's Theorem
			B.1.5 Time and Frequency Shift
			B.1.6 Symmetry
			B.1.7 Time and Frequency Scaling
			B.1.8 Derivatives
		B.2 Spectrum of Example Time Domain Signals
			B.2.1 Delta Function
			B.2.2 Constant
			B.2.3 Cosine
			B.2.4 Rectangular Window
			B.2.5 Delta Pulse
		B.3 Transformation of Sampled Time Signals
		B.4 Short Time Fourier Transform of Continuous Signals
			B.4.1 Window Functions
				B.4.1.1 Rectangular Window
				B.4.1.2 Bartlett Window
				B.4.1.3 Generalized Superposed Cosines
				B.4.1.4 Generalized Power of Cosine
		B.5 Discrete Fourier Transform
			B.5.1 Window Functions
				B.5.1.1 Discrete Window Properties
			B.5.2 Fast Fourier Transform
		B.6 Frequency Reassignment: Instantaneous Frequency
		References
	Appendix C Principal Component Analysis
		C.1 Computation of the Transformation Matrix
		C.2 Interpretation of the Transformation Matrix
	Appendix D Linear Regression
	Appendix E Software for Audio Analysis
		E.1 Frameworks and Libraries
			E.1.1 librosa
			E.1.2 Essentia
			E.1.3 openSMILE
			E.1.4 Marsyas
			E.1.5 jMIR
			E.1.6 MIRtoolbox
			E.1.7 Yaafe
			E.1.8 madmom
			E.1.9 Software for Education
			E.1.10 Other Software
		E.2 Data Annotation and Visualization
		References
	Appendix F Datasets
		References
Index
EULA