Object Oriented Data Analysis

Length: 424 pages
Edition: 1
Language: English
Publisher: Chapman and Hall/CRC
Publication Date: 2021-11-08
ISBN-10: 0815392826
ISBN-13: 9780815392828
Sales Rank: #0 (See Top 100 Books)

Object Oriented Data Analysis is a framework that facilitates inter-disciplinary research through new terminology for discussing the often many possible approaches to the analysis of complex data. Such data are naturally arising in a wide variety of areas. This book aims to provide ways of thinking that enable the making of sensible choices.

The main points are illustrated with many real data examples, based on the authors’ personal experiences, which have motivated the invention of a wide array of analytic methods.

While the mathematics go far beyond the usual in statistics (including differential geometry and even topology), the book is aimed at accessibility by graduate students. There is deliberate focus on ideas over mathematical formulas.

Cover
Half Title
Series Page
Title Page
Copyright Page
Dedication
Contents
Preface
1. What Is OODA?
	1.1. Case Study: Curves as Data Objects
	1.2. Case Study: Shapes as Data Objects
		1.2.1. The Segmentation Challenge
		1.2.2. General Shape Representations
		1.2.3. Skeletal Shape Representations
		1.2.4. Bayes Segmentation via Principal Geodesic Analysis
2. Breadth of OODA
	2.1. Amplitude and Phase Data Objects
	2.2. Tree-Structured Data Objects
	2.3. Sounds as Data Objects
	2.4. Images as Data Objects
3. Data Object Definition
	3.1. OODA Foundations
		3.1.1. OODA Terminology
		3.1.2. Object and Feature Space Example
		3.1.3. Scree Plots
		3.1.4. Formalization of Modes of Variation
	3.2. Mathematical Notation
	3.3. Overview of Object and Feature Spaces
		3.3.1. Example: Probability Distributions as Data Objects
4. Exploratory and Confirmatory Analyses
	4.1. Exploratory Analysis–Discover Structure in Data
		4.1.1. Example: Tilted Parabolas FDA
		4.1.2. Example: Twin Arches FDA
		4.1.3. Case Study: Lung Cancer Data
		4.1.4. Case Study: Pan-Cancer Data
	4.2. Confirmatory Analysis–Is It Really There?
	4.3. Further Major Statistical Tasks
5. OODA Preprocessing
	5.1. Visualization of Marginal Distributions
		5.1.1. Case Study: Spanish Mortality Data
		5.1.2. Case Study: Drug Discovery Data
	5.2. Standardization–Appropriate Linear Scaling
		5.2.1. Example: Two Scale Curve Data
		5.2.2. Overview of Standardization
	5.3. Transformation–Appropriate Nonlinear Scaling
	5.4. Registration–Appropriate Alignment
6. Data Visualization
	6.1. Heat-Map Views of Data Matrices
	6.2. Curve Views of Matrices and Modes of Variation
	6.3. Data Centering and Combined Views
	6.4. Scatterplot Matrix Views of Scores
	6.5. Alternatives to PCA Directions
7. Distance Based Methods
	7.1. Fréchet Centers In Metric Spaces
	7.2. Multi-Dimensional Scaling For Object Representation
	7.3. Important Distance Examples
		7.3.1. Conventional Norms
		7.3.2. Wasserstein Distances
		7.3.3. Procrustes Distances
		7.3.4. Generalized Procrustes Analysis
		7.3.5. Covariance Matrix Distances
8. Manifold Data Analysis
	8.1. Directional Data
	8.2. Introduction to Shape Manifolds
	8.3. Statistical Analysis of Shapes
	8.4. Landmark Shapes
		8.4.1. Shape Tangent Space
		8.4.2. Case Study: Digit 3 Data
		8.4.3. Case Study: DNA Molecule Data
		8.4.4. Principal Nested Shape Spaces
		8.4.5. Size-and-shape space
		8.4.6. Further Methodology
	8.5. Central Limit Theory on Manifolds
	8.6. Backwards PCA
	8.7. Covariance Matrices as Data Objects
9. FDA Curve Registration
	9.1. Fisher-Rao Curve Registration
		9.1.1. Example: Shifted Betas Data
		9.1.2. Introduction to Warping Functions
		9.1.3. Fisher-Rao Mathematics
	9.2. Principal Nested Spheres Decomposition
10. Graph Structured Data Objects
	10.1. Arterial Trees as Data Objects
		10.1.1. Combinatoric Approaches
		10.1.2. Phylogenetics
		10.1.3. Dyck Path
		10.1.4. Persistent Homology
		10.1.5. Comparison of Tree Analysis Methods
	10.2. Networks as Data Objects
		10.2.1. Graph Laplacians
		10.2.2. Example: A Tale of Two Cities
		10.2.3. Extrinsic and Intrinsic Analysis
		10.2.4. Case Study: Corpus Linguistics
		10.2.5. Labeled versus Unlabeled Nodes
11. Classification–Supervised Learning
	11.1. Classical Methods
	11.2. Kernel Methods
	11.3. Support Vector Machines
	11.4. Distance Weighted Discrimination
	11.5. Other Classification Approaches
12. Clustering–Unsupervised Learning
	12.1. K-Means Clustering
	12.2. Hierarchical Clustering
	12.3. Visualization Based Methods
		12.3.1. Hybrid Clustering Methods
13. High-Dimensional Inference
	13.1. DiProPerm–Two Sample Testing
	13.2. Statistical Significance in Clustering
		13.2.1. High Dimensional SigClust
14. High Dimensional Asymptotics
	14.1. Random Matrix Theory
	14.2. High Dimension Low Sample Size
	14.3. High Dimension Medium Sample Size
15. Smoothing and SiZer
	15.1. Why Not Histograms?–Hidalgo Stamps Data
	15.2. Smoothing Basics–Bralower Fossils Data
	15.3. Smoothing Parameter Selection
	15.4. Statistical Inference and SiZer
		15.4.1. Case Study: British Family Incomes Data
		15.4.2. Case Study: Bralower Fossils Data
		15.4.3. Case Study: Mass Flux Data
		15.4.4. Case Study: Kidney Cancer Data
		15.4.5. Additional SiZer Applications and Variants
16. Robust Methods
	16.1. Robustness Controversies
	16.2. Robust Methods for OODA
		16.2.1. Case Study: Cornea Curvature Data
		16.2.2. Case Study: Genome-Wide Association Data
	16.3. Other Robustness Areas
17. PCA Details and Variants
	17.1. Viewpoints of PCA
		17.1.1. Data Centering
		17.1.2. Singular Value Decomposition
		17.1.3. Gaussian Likelihood View
		17.1.4. PCA Computational Issues
	17.2. Two Block Decompositions
		17.2.1. Partial Least Squares
		17.2.2. Canonical Correlations
		17.2.3. Joint and Individual Variation Explained
18. OODA Context and Related Areas
	18.1. History and Terminology
	18.2. OODA Analogy with Object-Oriented Programming
	18.3. Compositional Data Analysis
	18.4. Symbolic Data Analysis
	18.5. Other Research Areas
Bibliography
Index