Data Science for Engineers

Length: 344 pages
Edition: 1
Language: English
Publisher: CRC Press
Publication Date: 2022-12-16
ISBN-10: 0367754266
ISBN-13: 9780367754266
Sales Rank: #0 (See Top 100 Books)

With tremendous improvement in computational power and availability of rich data, almost all engineering disciplines use data science at some level. This textbook presents material on data science comprehensively, and in a structured manner. It provides conceptual understanding of the fields of data science, machine learning, and artificial intelligence, with enough level of mathematical details necessary for the readers. This will help readers understand major thematic ideas in data science, machine learning and artificial intelligence, and implement first-level data science solutions to practical engineering problems.

The book

Provides a systematic approach for understanding data science techniques
Explain why machine learning techniques are able to cross-cut several disciplines.
Covers topics including statistics, linear algebra and optimization from a data science perspective.
Provides multiple examples to explain the underlying ideas in machine learning algorithms
Describes several contemporary machine learning algorithms

The textbook is primarily written for undergraduate and senior undergraduate students in different engineering disciplines including chemical engineering, mechanical engineering, electrical engineering, electronics and communications engineering for courses on data science, machine learning and artificial intelligence.

Cover
Half Title
Title Page
Copyright Page
Dedication
Contents
Preface
Authors Bio
Chapter 1: Introduction to DS, ML, and AI
1.1. Definitions of DS and ML
1.2. What Is Learnt in ML Algorithms?
1.3. How Does Learning Happen in ML?
1.4. Decision-Making in ML
1.5. Discussion on AI
1.6. Why Are ML and AI Techniques Effective?
Chapter 2: DS and ML—Fundamental Concepts
2.1. Classification and Function Approximation
2.2. Model Forms
2.3. Training Philosophy
2.4. Generality of Data Science/ML Solutions
2.4.1. Examples of Function Approximation Problems
2.4.1.1. Predicting Materials Property for Different Chemicals
2.4.1.2. Predicting Scores in a Game of Cricket
2.4.1.3. Predicting Mechanical Properties of a Part
2.4.1.4. Predicting Value of a Board Position in Chess
2.4.2. Examples of Classification Problems
2.4.2.1. Fraud Detection in Credit Card Transactions
2.4.2.2. Distinguishing Objects—“Self-driving Cars"
2.4.2.3. Detecting Failures in Built Systems/Equipment
2.4.2.4. Classifying Emails
2.4.3. Feature Engineering as a Connector Between Domain and Data Science/ML
2.5. Data Classification
2.6. Viewing ML Algorithms as Tools to Understand Multi-dimensional Data
2.7. A Framework for Solving Data Science Problems
2.7.1. Data Imputation
2.7.1.1. Start: Problem Arrival
2.7.1.2. Problem Statement
2.7.1.3. Solution Conceptualization
2.7.1.4. Method Identification
2.7.1.5. Solution Realization
2.7.1.6. Assess Assumptions
2.7.1.7. Validate-Revise-Assess Cycle
2.8. Conclusions
Chapter 3: Linear Algebra for DS and ML
3.1. Matrices as a Concept for Data Organization
3.2. Matrix View of Linear Algebra
3.2.1. Rank of a Matrix
3.2.2. LU Decomposition
3.3. Fundamental Subspaces
3.3.1. Row and Column Spaces of a Matrix
3.3.2. Null and Left-Null Spaces of a Matrix
3.4. Data Science and Fundamental Subspaces
3.4.1. Understanding Linear Relationships between Variables and Samples
3.5. Solving Linear Equations—Multiple Views
3.5.1. Case 1: m = n (Square Matrix)
3.5.2. Case 2: m > n
3.5.3. Case 3: m < n
3.6. Orthogonality, Projections, and Hyperplanes
3.6.1. Notion of Distance and Orthogonality
3.6.2. Projection of Vectors onto Subspaces
3.6.3. Generating Orthogonal Vectors through Projections
3.6.4. Understanding Noise Removal through Projections
3.6.5. Understanding Partitions through Hyperplanes and Half-spaces
3.7. Eigenvalues, Eigenvectors, and SVD
3.7.1. Eigenvalues and Eigenvectors
3.7.2. Singular Value Decomposition (SVD)
3.7.3. Understanding Data Spread, Significant Directions, Linear Relationships and Noise Removal through SVD
Chapter 4: Optimization for DS and ML
4.1. Elements of an Optimization Formulation
4.2. Discussion on Objective Functions for Classification and Function Approximation Problems
4.2.1. Function Approximation Objective Function
4.2.2. Classification Objective Function
4.3. First- and Second-Order Analytical Conditions for Optimality of Unconstrained NLPs
4.4. Numerical Approaches to Solving Optimization Problems
4.4.1. Univariate Problems
4.4.1.1. Gradient-Based Approach for Univariate Optimization Problems
4.4.1.2. Bracketing Methods
4.4.2. Multivariate Problems
4.4.2.1. Steepest Descent Algorithm
4.4.2.2. Newton’s Method
4.4.2.3. Algebraic Derivation of Steepest Descent Method
4.5. Description of Stochastic Gradient Descent
4.6. Alternate Learning Algorithms
4.7. Impact of Non-Convexity on ML Algorithms
4.8. Handling Constraints
4.8.1. Equality Constraints
4.8.2. Inequality Constraints
4.9. Dynamic Programming
4.9.1. Recursion
4.9.2. Optimal Substructure and Overlapping Subproblems
4.9.3. 2D Dynamic Programming Example
Chapter 5: Statistical Foundations for DS and ML
5.1. Decomposition of a Data Matrix Into Model and Uncertainty Matrices
5.2. Uncertainty Characterization
5.2.1. A Simple Probability Model
5.2.2. Computing Probabilities from Experimental Data
5.3. Random Variables and Probability Mass Functions
5.4. Deriving Model Probability Distribution Functions
5.4.1. Alternate Model Distributions for the Group Meeting Attendance Problem
5.4.2. Another Example—Students Clearing Exams
5.4.3. Summary of Discussions on Theoretical Distributions
5.5. Properties of Probability Distribution Functions
5.5.1. Continuous Random Variables and Their Distributions
5.5.2. Summary of Distributions for Continuous Random Variables and Their Properties
5.6. Qualitative Validation of Random Variable Probability Distribution Functions
5.6.1. Computing Quantiles
5.6.2. Computing Probabilities
5.7. Estimating Parameters of a Distribution
5.7.1. Mean and Variance
5.7.2. Method of Moments
5.7.3. Maximum Likelihood Estimation
5.7.4. Modeling Unknown Distribution—Consolidation of Ideas
5.8. Mixed Models—Joint Identification of Model and Distribution Parameters
5.8.1. Mixed Models – Error only in Dependent Variable
5.8.2. Mixed Models – Errors in both Dependent and Independent Variables
5.9. Sampling Distributions
5.10. Important Sampling Distributions
5.10.1. z-Distribution
5.10.2. Sampling Distribution of Mean of Data from Normal Distribution
5.10.3. Central Limit Theorem
5.10.4. t-Distribution
5.10.5. Chi-Squared Distribution
5.10.6. F-Distribution
5.11. Determining Quality of Estimates
5.11.1. Unbiasedness
5.11.2. Consistency
5.11.3. Bias-Variance Trade-off
5.12. Hypothesis Testing
5.12.1. Confidence Intervals
5.13. Distributions of Multiple Related Random Variables
Chapter 6: Function Approximation Methods
6.1. Setting Up the Problem
6.2. Parametric Methods
6.2.1. Linear Regression
6.2.1.1. Quantities that Indicate Relationships between Variables
6.2.1.2. Univariate Linear Regression
6.2.1.3. Multivariate Regression
6.2.2. Principal Component Analysis (PCA)
6.2.3. Neural Networks
6.2.3.1. Neural Network Structures
6.2.3.2. Training of Neural Networks
6.2.3.3. Backpropagation Algorithm
6.3. Non-Parametric Methods
6.3.1. k-Nearest Neighbors (k-NN)
6.3.2. Decision Trees
6.3.3. Random Forests
Chapter 7: Classification Methods
7.1. Types of Classification Problems
7.2. Parametric Methods
7.2.1. Naive Bayes Classifier
7.2.2. Linear Discriminant Analysis (LDA)
7.2.3. Quadratic Discriminant Analysis (QDA)
7.2.4. Logistic Regression
7.2.5. Clustering Techniques
7.2.5.1. k-Means Clustering
7.2.6. Neural Networks
7.2.6.1. Softmax Layer
7.2.6.2. Cross-Entropy Loss
7.2.6.3. Summary
7.3. Non-Parametric Methods
7.3.1. k-NN
7.3.2. Hierarchical Clustering
7.3.3. Support Vector Machines
7.3.4. Decision Trees and Random Forests
Chapter 8: Conclusions and Future Directions
8.1. Future Directions
8.1.1. Improvements in ML Techniques
8.1.2. Deep Learning
8.1.3. Reinforcement Learning (RL)
8.1.4. Integrating Domain Knowledge in ML/AI
References
Index