Machine Learning for Streaming Data with Python: Rapidly build practical online machine learning solutions using River and other top key frameworks
- Length: 258 pages
- Edition: 1
- Language: English
- Publisher: Packt Publishing
- Publication Date: 2022-07-15
- ISBN-10: 180324836X
- ISBN-13: 9781803248363
- Sales Rank: #2651517 (See Top 100 Books)
Apply machine learning to streaming data with the help of practical examples, and deal with challenges that surround streaming
Key Features
- Work on streaming use cases that are not taught in most data science courses
- Gain experience with state-of-the-art tools for streaming data
- Mitigate various challenges while handling streaming data
Book Description
Streaming data is the new top technology to watch out for in the field of data science and machine learning. As business needs become more demanding, many use cases require real-time analysis as well as real-time machine learning. This book will help you to get up to speed with data analytics for streaming data and focus strongly on adapting machine learning and other analytics to the case of streaming data.
You will first learn about the architecture for streaming and real-time machine learning. Next, you will look at the state-of-the-art frameworks for streaming data like River. Later chapters will focus on various industrial use cases for streaming data like Online Anomaly Detection and others. As you progress, you will discover various challenges and learn how to mitigate them. In addition to this, you will learn best practices that will help you use streaming data to generate real-time insights.
By the end of this book, you will have gained the confidence you need to stream data in your machine learning models.
What you will learn
- Understand the challenges and advantages of working with streaming data
- Develop real-time insights from streaming data
- Understand the implementation of streaming data with various use cases to boost your knowledge
- Develop a PCA alternative that can work on real-time data
- Explore best practices for handling streaming data that you absolutely need to remember
- Develop an API for real-time machine learning inference
Who this book is for
This book is for data scientists and machine learning engineers who have a background in machine learning, are practice and technology-oriented, and want to learn how to apply machine learning to streaming data through practical examples with modern technologies. Although an understanding of basic Python and machine learning concepts is a must, no prior knowledge of streaming is required.
Machine Learning for Streaming Data with Python Contributors About the author About the reviewer Preface Who this book is for What this book covers To get the most out of this book Download the example code files Download the color images Conventions used Get in touch Share Your Thoughts Part 1: Introduction and Core Concepts of Streaming Data Chapter 1: An Introduction to Streaming Data Technical requirements Setting up a Python environment A short history of data science Working with streaming data Streaming data versus batch data Advantages of streaming data Examples of successful implementation of streaming analytics Challenges of streaming data How to get started with streaming data Common use cases for streaming data Streaming versus big data Real-time data formats and importing an example dataset in Python Summary Further reading Chapter 2: Architectures for Streaming and Real-Time Machine Learning Technical requirements Python environment Defining your analytics as a function Understanding microservices architecture Communicating between services through APIs Demystifying the HTTP protocol The GET request The POST request JSON format for communication between systems RESTful APIs Building a simple API on AWS API Gateway in AWS Lambda in AWS Data-generating process on a local machine Implementing the example More architectural considerations Other AWS services and other services in general that have the same functionality Big data tools for real time streaming Calling a big data environment in real time Summary Further reading Chapter 3: Data Analysis on Streaming Data Technical requirements Python environment Descriptive statistics on streaming data Why are descriptive statistics different on streaming data? Introduction to sampling theory Comparing population and sample Population parameters and sample statistics Sampling distribution Sample size calculations and confidence level Rolling descriptive statistics from streaming Exponential weight Tracking convergence as an additional KPI Overview of the main descriptive statistics The mean The median The mode Standard deviation Variance Quartiles and interquartile range Correlations Real-time visualizations Opening the dashboard Comparing Plotly's Dash and other real-time visualization tools Building basic alerting systems Alerting systems on extreme values Alerting systems on process stability (mean and median) Alerting systems on constant variability (std and variance) Basic alerting systems using statistical process control Summary Further reading Part 2: Exploring Use Cases for Data Streaming Chapter 4: Online Learning with River Technical requirements Python environment What is online machine learning? How is online learning different from regular learning? Advantages of online learning Challenges of online learning Types of online learning Using River for online learning Training an online model with River Improving the model evaluation Building a multiclass classifier using one-vs-rest Summary Further reading Chapter 5: Online Anomaly Detection Technical requirements Python environment Defining anomaly detection Are outliers a problem? Exploring use cases of anomaly detection Fraud detection in financial institutions Anomaly detection on your log data Fault detection in manufacturing and production lines Hacking detection in computer networks (cyber security) Medical risks in health data Predictive maintenance and sensor data Comparing anomaly detection and imbalanced classification The problem of imbalanced data The F1 score SMOTE oversampling Anomaly detection versus classification Algorithms for detecting anomalies in River The use of thresholders in River anomaly detection Anomaly detection algorithm 1 – One-Class SVM Anomaly detection algorithm 2 – Half-Space-Trees Going further with anomaly detection Summary Further reading Chapter 6: Online Classification Technical requirements Python environment Defining classification Identifying use cases of classification Use case 1 – email spam classification Use case 2 – face detection in phone camera Use case 3 – online marketing ad selection Overview of classification algorithms in River Classification algorithm 1 – LogisticRegression Classification algorithm 2 – Perceptron Classification algorithm 3 – AdaptiveRandomForestClassifier Classification algorithm 4 – ALMAClassifier Classification algorithm 5 – PAClassifier Evaluating benchmark results Summary Further reading Chapter 7: Online Regression Technical requirements Python environment Defining regression Use cases of regression Use case 1 – Forecasting Use case 2 – Predicting the number of faulty products in manufacturing Overview of regression algorithms in River Regression algorithm 1 – LinearRegression Regression algorithm 2 – HoeffdingAdaptiveTreeRegressor Regression algorithm 3 – SGTRegressor Regression algorithm 4 – SRPRegressor Summary Further reading Chapter 8: Reinforcement Learning Technical requirements Python environment Defining reinforcement learning Comparing online and offline reinforcement learning A more detailed overview of feedback loops in reinforcement learning The main steps of a reinforcement learning model Making the decisions Updating the decision rules Exploring Q-learning The goal of Q-learning Parameters of the Q-learning algorithm Deep Q-learning Using reinforcement learning for streaming data Use cases of reinforcement learning Use case one – trading system Use case two – social network ranking system Use case three – a self-driving car Use case four – chatbots Use case five – learning games Implementing reinforcement learning in Python Summary Further reading Part 3: Advanced Concepts and Best Practices around Streaming Data Chapter 9: Drift and Drift Detection Technical requirements Python environment Defining drift Three types of drift Introducing model explicability Measuring drift Measuring data drift Measuring concept drift Measuring drift in Python A basic intuitive approach to measuring drift Measuring drift with robust tools Counteracting drift Offline learning with retraining strategies against drift Online learning against drift Summary Further reading Chapter 10: Feature Transformation and Scaling Technical requirements Python environment Challenges of data preparation with streaming data Scaling data for streaming Introducing scaling Adapting scaling to a streaming context Transforming features in a streaming context Introducing PCA Mathematical definition of PCA Regular PCA in Python Incremental PCA for streaming Summary Further reading Chapter 11: Catastrophic Forgetting Technical requirements Python environment Introducing catastrophic forgetting Catastrophic forgetting in online models Detecting catastrophic forgetting Using Python to detect catastrophic forgetting Model explicability versus catastrophic forgetting Explaining models using linear coefficients Explaining models using dendrograms Explaining models using variable importance Summary Further reading Chapter 12: Conclusion and Best Practices Going further Summary Why subscribe? Other Books You May Enjoy Packt is searching for authors like you Share Your Thoughts
Donate to keep this site alive
How to download source code?
1. Go to: https://github.com/PacktPublishing
2. In the Find a repository… box, search the book title: Machine Learning for Streaming Data with Python: Rapidly build practical online machine learning solutions using River and other top key frameworks
, sometime you may not get the results, please search the main title.
3. Click the book title in the search results.
3. Click Code to download.
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.