Modern Deep Learning for Tabular Data: Novel Approaches to Common Modeling Problems
Deep learning is one of the most powerful tools in the modern artificial intelligence landscape. While having been predominantly applied to highly specialized image, text, and signal datasets, this book synthesizes and presents novel deep learning approaches to a seemingly unlikely domain – tabular data. Whether for finance, business, security, medicine, or countless other domain, deep learning can help mine and model complex patterns in tabular data – an incredibly ubiquitous form of structured data.
Part I of the book offers a rigorous overview of machine learning principles, algorithms, and implementation skills relevant to holistically modeling and manipulating tabular data. Part II studies five dominant deep learning model designs – Artificial Neural Networks, Convolutional Neural Networks, Recurrent Neural Networks, Attention and Transformers, and Tree-Rooted Networks – through both their ‘default’ usage and their application to tabular data. Part III compounds the power of the previously covered methods by surveying strategies and techniques to supercharge deep learning systems: autoencoders, deep data generation, meta-optimization, multi-model arrangement, and neural network interpretability. Each chapter comes with extensive visualization, code, and relevant research coverage.
Modern Deep Learning for Tabular Data is one of the first of its kind – a wide exploration of deep learning theory and applications to tabular data, integrating and documenting novel methods and techniques in the field. This book provides a strong conceptual and theoretical toolkit to approach challenging tabular data problems.
What You Will Learn
- Important concepts and developments in modern machine learning and deep learning, with a strong emphasis on tabular data applications.
- Understand the promising links between deep learning and tabular data, and when a deep learning approach is or isn’t appropriate.
- Apply promising research and unique modeling approaches in real-world data contexts.
- Explore and engage with modern, research-backed theoretical advances on deep tabular modeling
- Utilize unique and successful preprocessing methods to prepare tabular data for successful modelling.
Who This Book Is ForData scientists and researchers of all levels from beginner to advanced looking to level up results on tabular data with deep learning or to understand the theoretical and practical aspects of deep tabular modeling research. Applicable to readers seeking to apply deep learning to all sorts of complex tabular data contexts, including business, finance, medicine, education, and security.
Table of Contents About the Authors About the Technical Reviewer Acknowledgments Foreword Foreword Introduction Part I: Machine Learning and Tabular Data Chapter 1: Classical Machine Learning Principles and Methods Fundamental Principles of Modeling What Is Modeling? Modes of Learning Quantitative Representations of Data: Regression and Classification The Machine Learning Data Cycle: Training, Validation, and Test Sets Bias-Variance Trade-Off Feature Space and the Curse of Dimensionality Optimization and Gradient Descent Metrics and Evaluation Mean Absolute Error Mean Squared Error (MSE) Confusion Matrix Accuracy Precision Recall F1 Score Area Under the Receiver Operating Characteristics Curve (ROC-AUC) Algorithms K-Nearest Neighbors Theory and Intuition Implementation and Usage Linear Regression Theory and Intuition Implementation and Usage Other Variations on Simple Linear Regression Logistic Regression Theory and Intuition Implementation and Usage Other Variations on Logistic Regression Decision Trees Theory and Intuition Implementation and Usage Random Forest Gradient Boosting Theory and Intuition AdaBoost XGBoost LightGBM Summary of Algorithms Thinking Past Classical Machine Learning Key Points Chapter 2: Data Preparation and Engineering Data Storage and Manipulation TensorFlow Datasets Creating a TensorFlow Dataset TensorFlow Sequence Datasets Handling Large Datasets Datasets That Fit in Memory Pickle SciPy and TensorFlow Sparse Matrices Datasets That Do Not Fit in Memory Pandas Chunker h5py NumPy Memory Map Data Encoding Discrete Data Label Encoding One-Hot Encoding Binary Encoding Frequency Encoding Target Encoding Leave-One-Out Encoding James-Stein Encoding Weight of Evidence Continuous Data Min-Max Scaling Robust Scaling Standardization Text Data Keyword Search Raw Vectorization Bag of Words N-Grams TF-IDF Sentiment Extraction Word2Vec Time Data Geographical Data Feature Extraction Single- and Multi-feature Transformations Principal Component Analysis t-SNE Linear Discriminant Analysis Statistics-Based Engineering Feature Selection Information Gain Variance Threshold High-Correlation Method Recursive Feature Elimination Permutation Importance LASSO Coefficient Selection Key Points Part II: Applied Deep Learning Architectures Chapter 3: Neural Networks and Tabular Data What Exactly Are Neural Networks? Neural Network Theory Starting with a Single Neuron Feed-Forward Operation Introduction to Keras Modeling with Keras Defining the Architecture Compiling the Model Training and Evaluation Loss Functions Math Behind Feed-Forward Operation Activation Functions Sigmoid and Hyperbolic Tangent Rectified Linear Unit LeakyReLU Swish The Nonlinearity and Variability of Activation Functions The Math Behind Neural Network Learning Gradient Descent in Neural Networks The Backpropagation Algorithm Optimizers Mini-batch Stochastic Gradient Descent (SGD) and Momentum Nesterov Accelerated Gradient (NAG) Adaptive Moment Estimation (Adam) A Deeper Dive into Keras Training Callbacks and Validation Batch Normalization and Dropout The Keras Functional API Nonlinear Topologies Multi-input and Multi-output Models Embeddings Model Weight Sharing The Universal Approximation Theorem Selected Research Simple Modifications to Improve Tabular Neural Networks Ghost Batch Normalization Leaky Gates Wide and Deep Learning Self-Normalizing Neural Networks Regularization Learning Networks Key Points Chapter 4: Applying Convolutional Structures to Tabular Data Convolutional Neural Network Theory Why Do We Need Convolutions? The Convolution Operation The Pooling Operation Base CNN Architectures ResNet Inception v3 EfficientNet Multimodal Image and Tabular Models 1D Convolutions for Tabular Data 2D Convolutions for Tabular Data DeepInsight IGTD (Image Generation for Tabular Data) Key Points Chapter 5: Applying Recurrent Structures to Tabular Data Recurrent Models Theory Why Are Recurrent Models Necessary? Recurrent Neurons and Memory Cells Backpropagation Through Time (BPTT) and Vanishing Gradients LSTMs and Exploding Gradients Gated Recurrent Units (GRUs) Bidirectionality Introduction to Recurrent Layers in Keras Return Sequences and Return State Standard Recurrent Model Applications Natural Language Time Series Multimodal Recurrent Modeling Direct Tabular Recurrent Modeling A Novel Modeling Paradigm Optimizing the Sequence Optimizing the Initial Memory State(s) Further Resources Key Points Chapter 6: Applying Attention to Tabular Data Attention Mechanism Theory The Attention Mechanism The Transformer Architecture BERT and Pretraining Language Models Taking a Step Back Working with Attention Simple Custom Bahdanau Attention Native Keras Attention Attention in Sequence-to-Sequence Tasks Improving Natural Language Models with Attention Direct Tabular Attention Modeling Attention-Based Tabular Modeling Research TabTransformer TabNet SAINT ARM-Net Key Points Chapter 7: Tree-Based Deep Learning Approaches Tree-Structured Neural Networks Deep Neural Decision Trees Soft Decision Tree Regressors NODE Tree-Based Neural Network Initialization Net-DNF Boosting and Stacking Neural Networks GrowNet XBNet Distillation DeepGBM Key Points Part III: Deep Learning Design and Tools Chapter 8: Autoencoders The Concept of the Autoencoder Vanilla Autoencoders Autoencoders for Pretraining Multitask Autoencoders Sparse Autoencoders Denoising and Reparative Autoencoders Key Points Chapter 9: Data Generation Variational Autoencoders Theory Implementation Generative Adversarial Networks Theory Simple GAN in TensorFlow CTGAN Key Points Chapter 10: Meta-optimization Meta-optimization: Concepts and Motivations No-Gradient Optimization Optimizing Model Meta-parameters Optimizing Data Pipelines Neural Architecture Search Key Points Chapter 11: Multi-model Arrangement Average Weighting Input-Informed Weighting Meta-evaluation Key Points Chapter 12: Neural Network Interpretability SHAP LIME Activation Maximization Key Points Closing Remarks Appendix: NumPy and Pandas NumPy Arrays NumPy Array Construction Simple NumPy Indexing Quantitative Manipulation Advanced NumPy Indexing NumPy Data Types Function Application and Vectorization NumPy Array Application: Image Manipulation Pandas DataFrames Constructing Pandas DataFrames Simple Pandas Mechanics Advanced Pandas Mechanics Pivot Melt Explode Stack Unstack Conclusion Index
How to download source code?
1. Go to:
2. In the Find a repository… box, search the book title:
Modern Deep Learning for Tabular Data: Novel Approaches to Common Modeling Problems, sometime you may not get the results, please search the main title.
3. Click the book title in the search results.
3. Click Code to download.
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.