Machine Learning for Data Streams: with Practical Examples in MOA
- Length: 288 pages
- Edition: 1
- Language: English
- Publisher: The MIT Press
- Publication Date: 2018-03-02
- ISBN-10: 0262037793
- ISBN-13: 9780262037792
- Sales Rank: #1162545 (See Top 100 Books)
A hands-on approach to tasks and techniques in data stream mining and real-time analytics, with examples in MOA, a popular freely available open-source software framework.
Today many information sources—including sensor networks, financial markets, social networks, and healthcare monitoring—are so-called data streams, arriving sequentially and at high speed. Analysis must take place in real time, with partial data and without the capacity to store the entire data set. This book presents algorithms and techniques used in data stream mining and real-time analytics. Taking a hands-on approach, the book demonstrates the techniques using MOA (Massive Online Analysis), a popular, freely available open-source software framework, allowing readers to try out the techniques after reading the explanations.
The book first offers a brief introduction to the topic, covering big data mining, basic methodologies for mining data streams, and a simple example of MOA. More detailed discussions follow, with chapters on sketching techniques, change, classification, ensemble methods, regression, clustering, and frequent pattern mining. Most of these chapters include exercises, an MOA-based lab session, or both. Finally, the book discusses the MOA software, covering the MOA graphical user interface, the command line, use of its API, and the development of new methods within MOA. The book will be an essential reference for readers who want to use data stream mining as a tool, researchers in innovation or data stream mining, and programmers who want to create new algorithms for MOA.
Cover Series Page Title Page Copyright Page Table of Contents List of Figures List of Tables Preface I: Introduction 1. Introduction 1.1. Big Data 1.1.1. Tools: Open-Source Revolution 1.1.2. Challenges in Big Data 1.2. Real-Time Analytics 1.2.1. Data Streams 1.2.2. Time and Memory 1.2.3. Applications 1.3. What This Book Is About 2. Big Data Stream Mining 2.1. Algorithms 2.2. Classification 2.2.1. Classifier Evaluation in Data Streams 2.2.2. Majority Class Classifier 2.2.3. No-Change Classifier 2.2.4. Lazy Classifier 2.2.5. Naive Bayes 2.2.6. Decision Trees 2.2.7. Ensembles 2.3. Regression 2.4. Clustering 2.5. Frequent Pattern Mining 3. Hands-on Introduction to MOA 3.1. Getting Started 3.2. The Graphical User Interface for Classification 3.2.1. Drift Stream Generators 3.3. Using the Command Line II: Stream Mining 4. Streams and Sketches 4.1. Setting: Approximation Algorithms 4.2. Concentration Inequalities 4.3. Sampling 4.4. Counting Total Items 4.5. Counting Distinct Elements 4.5.1. Linear Counting 4.5.2. Cohen’s Logarithmic Counter 4.5.3. The Flajolet-Martin Counter and HyperLogLog 4.5.4. An Application: Computing Distance Functions in Graphs 4.5.5. Discussion: Log vs. Linear 4.6. Frequency Problems 4.6.1. The SpaceSaving Sketch 4.6.2. The CM-Sketch Algorithm 4.6.3. CountSketch 4.6.4. Moment Computation 4.7. Exponential Histograms for Sliding Windows 4.8. Distributed Sketching: Mergeability 4.9. Some Technical Discussions and Additional Material 4.9.1. Hash Functions 4.9.2. Creating (ε, δ)-Approximation Algorithms 4.9.3. Other Sketching Techniques 4.10. Exercises 5. Dealing with Change 5.1. Notion of Change in Streams 5.2. Estimators 5.2.1. Sliding Windows and Linear Estimators 5.2.2. Exponentially Weighted Moving Average 5.2.3. Unidimensional Kalman Filter 5.3. Change Detection 5.3.1. Evaluating Change Detection 5.3.2. The CUSUM and Page-Hinkley Tests 5.3.3. Statistical Tests 5.3.4. Drift Detection Method 5.3.5. ADWIN 5.4. Combination with Other Sketches and Multidimensional Data 5.5. Exercises 6. Classification 6.1. Classifier Evaluation 6.1.1. Error Estimation 6.1.2. Distributed Evaluation 6.1.3. Performance Evaluation Measures 6.1.4. Statistical Significance 6.1.5. A Cost Measure for the Mining Process 6.2. Baseline Classifiers 6.2.1. Majority Class 6.2.2. No-change Classifier 6.2.3. Naive Bayes 6.2.4. Multinomial Naive Bayes 6.3. Decision Trees 6.3.1. Estimating Split Criteria 6.3.2. The Hoeffding Tree 6.3.3. CVFDT 6.3.4. VFDTc and UFFT 6.3.5. Hoeffding Adaptive Tree 6.4. Handling Numeric Attributes 6.4.1. VFML 6.4.2. Exhaustive Binary Tree 6.4.3. Greenwald and Khanna’s Quantile Summaries 6.4.4. Gaussian Approximation 6.5. Perceptron 6.6. Lazy Learning 6.7. Multi-label Classification 6.7.1. Multi-label Hoeffding Trees 6.8. Active Learning 6.8.1. Random Strategy 6.8.2. Fixed Uncertainty Strategy 6.8.3. Variable Uncertainty Strategy 6.8.4. Uncertainty Strategy with Randomization 6.9. Concept Evolution 6.10. Lab Session with MOA 7. Ensemble Methods 7.1. Accuracy-Weighted Ensembles 7.2. Weighted Majority 7.3. Stacking 7.4. Bagging 7.4.1. Online Bagging Algorithm 7.4.2. Bagging with a Change Detector 7.4.3. Leveraging Bagging 7.5. Boosting 7.6. Ensembles of Hoeffding Trees 7.6.1. Hoeffding Option Trees 7.6.2. Random Forests 7.6.3. Perceptron Stacking of Restricted Hoeffding Trees 7.6.4. Adaptive-Size Hoeffding Trees 7.7. Recurrent Concepts 7.8. Lab Session with MOA 8. Regression 8.1. Introduction 8.2. Evaluation 8.3. Perceptron Learning 8.4. Lazy Learning 8.5. Decision Tree Learning 8.6. Decision Rules 8.7. Regression in MOA 9. Clustering 9.1. Evaluation Measures 9.2. The k-means Algorithm 9.3. BIRCH, BICO, and CluStream 9.4. Density-Based Methods: DBSCAN and Den-Stream 9.5. ClusTree 9.6. StreamKM++: Coresets 9.7. Additional Material 9.8. Lab Session with MOA 10. Frequent Pattern Mining 10.1. An Introduction to Pattern Mining 10.1.1. Patterns: Definitions and Examples 10.1.2. Batch Algorithms for Frequent Pattern Mining 10.1.3. Closed and Maximal Patterns 10.2. Frequent Pattern Mining in Streams: Approaches 10.2.1. Coresets of Closed Patterns 10.3. Frequent Itemset Mining on Streams 10.3.1. Reduction to Heavy Hitters 10.3.2. Moment 10.3.3. FP-Stream 10.3.4. IncMine 10.4. Frequent Subgraph Mining on Streams 10.4.1. WinGraphMiner 10.4.2. AdaGraphMiner 10.5. Additional Material 10.6. Exercises III: The MOA Software 11. Introduction to MOA and Its Ecosystem 11.1. MOA Architecture 11.2. Installation 11.3. Recent Developments in MOA 11.4. Extensions to MOA 11.5. ADAMS 11.6. MEKA 11.7. OpenML 11.8. StreamDM 11.9. Streams 11.10. Apache SAMOA 12. The Graphical User Interface 12.1. Getting Started with the GUI 12.2. Classification and Regression 12.2.1. Tasks 12.2.2. Data Feeds and Data Generators 12.2.3. Bayesian Classifiers 12.2.4. Decision Trees 12.2.5. Meta Classifiers (Ensembles) 12.2.6. Function Classifiers 12.2.7. Drift Classifiers 12.2.8. Active Learning Classifiers 12.3. Clustering 12.3.1. Data Feeds and Data Generators 12.3.2. Stream Clustering Algorithms 12.3.3. Visualization and Analysis 13. Using the Command Line 13.1. Learning Task for Classification and Regression 13.2. Evaluation Tasks for Classification and Regression 13.3. Learning and Evaluation Tasks for Classification and Regression 13.4. Comparing Two Classifiers 14. Using the API 14.1. MOA Objects 14.2. Options 14.3. Prequential Evaluation Example 15. Developing New Methods in MOA 15.1. Main Classes in MOA 15.2. Creating a New Classifier 15.3. Compiling a Classifier 15.4. Good Programming Practices in MOA Bibliography Index Series List
Donate to keep this site alive
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.