Practical Weak Supervision: Doing More with Less Data
- Length: 200 pages
- Edition: 1
- Language: English
- Publisher: O'Reilly Media
- Publication Date: 2021-12-21
- ISBN-10: 1492077062
- ISBN-13: 9781492077060
- Sales Rank: #5330202 (See Top 100 Books)
Most data scientists and engineers today rely on quality labeled data to train their machine learning models. But building training sets manually is time-consuming and expensive, leaving many companies with unfinished ML projects. There’s a more practical approach. In this book, Amit Bahree, Senja Filipi, and Wee Hyong Tok from Microsoft show you how to create products using weakly supervised learning models.
You’ll learn how to build natural language processing and computer vision projects using weakly labeled datasets from Snorkel, a spin-off from the Stanford AI Lab. Because so many companies pursue ML projects that never go beyond their labs, this book also provides a guide on how to ship the deep learning models you build.
- Get a practical overview of weak supervision
- Dive into data programming with help from Snorkel
- Perform text classification using Snorkel’s weakly labeled dataset
- Use Snorkel’s labeled indoor-outdoor dataset for computer vision tasks
- Scale up weak supervision using scaling strategies and underlying technologies
Preface Who Should Read This Book Navigating This Book Conventions Used in This Book Using Code Examples O’Reilly Online Learning How to Contact Us Acknowledgments 1. Introduction to Weak Supervision What is Weak Supervision? Real-world Weak Supervision with Snorkel Approaches to Weak Supervision Incomplete supervision Inexact supervision Inaccurate supervision Data programming Getting training data How data programming is helping accelerate Software 2.0 Summary Bibliography 2. Diving into Data Programming with Snorkel Snorkel, a data programming framework Getting started with Labeling Functions Applying the labels to the datasets Analyzing the labeling performance Using a validation set Reaching labeling consensuswith LabelModel Strategies to improve the labeling functions Data Augmentation with Snorkel Transformers Data augmentation through word removal Snorkel Preprocessors Data augmentation through GPT-2 prediction Data Augmentation through translation Applying the transformation functions to the dataset Summary Bibliography 3. Labeling in Action Labeling a Text Dataset: Identifying Fake News Exploring the Fake news detection(FakeNewsNet) dataset Importing Snorkel, and setting up representative constants Fact-checking sites Is the speaker a “liar”? Twitter profile and Botometer score Generating agreements between weak classifiers Labeling an Images Dataset. Determining Indoor versus Outdoor Images Creating a dataset of images from Bing Defining and training weak classifiers in TensorFlow Training the various classifiers Weak classifiers out of image tags Deploying the Computer Vision Service Interacting with the Computer Vision Service Preparing the data frame Learning a label model Summary Bibliography 4. Using the Snorkel-labeled Dataset for Text Classification Getting started with Natural Language Processing (NLP) Transformers Hard vs Probabilistic Labels Using ktrain for Performing Text Classification Data Preparation Dealing with an Imbalanced Dataset Training the model Using the Text Classification model for prediction Finding a good learning rate Using Hugging Face and Transformers Loading the relevant Python packages Dataset Preparation Checking whether GPU hardware is available Performing Tokenization Model Training Testing the Fine-tuned Model Summary Bibliography 5. Using the Snorkel-labeled Dataset for Image Classification Visual Object Recognition Overview Representing Image Features Transfer Learning for Computer Vision Using PyTorch for Image classification Loading the Indoor/Outdoor dataset Utility Functions Visualizing the Training Data Fine-tuning the Pre-trained Model Summary Bibliography 6. Scalability and Distributed Training The need for scalability Distributed training Apache Spark - An Introduction Spark Application Design Using Azure Databricks to Scale Cluster Setup for Weak Supervision Fake news detection dataset on Databricks Labeling Functions for Snorkel Setting up dependencies Loading the data Fact-checking sites Transfer Learning using the LIAR dataset Weak classifiers - generating agreement Type Conversions needed for Spark runtime Summary Bibliography
Donate to keep this site alive
How to download source code?
1. Go to: https://www.oreilly.com/
2. Search the book title: Practical Weak Supervision: Doing More with Less Data
, sometime you may not get the results, please search the main title
3. Click the book title in the search results
3. Publisher resources
section, click Download Example Code
.
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.