Applied Unsupervised Learning with Python
- Length: 482 pages
- Edition: 1
- Language: English
- Publisher: Packt Publishing
- Publication Date: 2019-05-28
- ISBN-10: 1789952298
- ISBN-13: 9781789952292
- Sales Rank: #328043 (See Top 100 Books)
Design clever algorithms that can uncover interesting structures and hidden relationships in unstructured, unlabeled data
Key Features
- Learn how to select the most suitable Python library to solve your problem
- Compare k-Nearest Neighbor (k-NN) and non-parametric methods and decide when to use them
- Delve into the applications of neural networks using real-world datasets
Book Description
Unsupervised learning is a useful and practical solution in situations where labeled data is not available.
Applied Unsupervised Learning with Python guides you on the best practices for using unsupervised learning techniques in tandem with Python libraries and extracting meaningful information from unstructured data. The course begins by explaining how basic clustering works to find similar data points in a set. Once you are well versed with the k-means algorithm and how it operates, you’ll learn what dimensionality reduction is and where to apply it. As you progress, you’ll learn various neural network techniques and how they can improve your model. While studying the applications of unsupervised learning, you will also understand how to mine topics that are trending on Twitter and Facebook and build a news recommendation engine for users. You will complete the course by challenging yourself through various interesting activities such as performing a Market Basket Analysis and identifying relationships between different merchandises.
By the end of this course, you will have the skills you need to confidently build your own models using Python.
What you will learn
- Understand the basics and importance of clustering
- Build k-means, hierarchical, and DBSCAN clustering algorithms from scratch with built-in packages
- Explore dimensionality reduction and its applications
- Use scikit-learn (sklearn) to implement and analyse principal component analysis (PCA)on the Iris dataset
- Employ Keras to build autoencoder models for the CIFAR-10 dataset
- Apply the Apriori algorithm with machine learning extensions (Mlxtend) to study transaction data
Who this book is for
This course is designed for developers, data scientists, and machine learning enthusiasts who are interested in unsupervised learning. Some familiarity with Python programming along with basic knowledge of mathematical concepts including exponents, square roots, means, and medians will be beneficial.
Table of Contents
- Introduction to Clustering
- Hierarchical Clustering
- Neighborhood Approaches and DBSCAN
- An Introduction to Dimensionality Reduction and PCA
- Autoencoders
- t-Distributed Stochastic Neighbor Embedding (t-SNE)
- Topic Modeling
- Market Basket Analysis
- Hotspot Analysis
Preface About the Book About the Authors Learning Objectives Audience Approach Hardware Requirements Software Requirements Conventions Installation and Setup Install Anaconda on Windows Install Anaconda on Linux Install Anaconda on macOS Install Python on Windows Install Python on Linux Install Python on macOS X Additional Resources Chapter 1 Introduction to Clustering Introduction Unsupervised Learning versus Supervised Learning Clustering Identifying Clusters Two-Dimensional Data Exercise 1: Identifying Clusters in Data Introduction to k-means Clustering No-Math k-means Walkthrough k-means Clustering In-Depth Walkthrough Alternative Distance Metric – Manhattan Distance Deeper Dimensions Exercise 2: Calculating Euclidean Distance in Python Exercise 3: Forming Clusters with the Notion of Distance Exercise 4: Implementing k-means from Scratch Exercise 5: Implementing k-means with Optimization Clustering Performance: Silhouette Score Exercise 6: Calculating the Silhouette Score Activity 1: Implementing k-means Clustering Summary Chapter 2 Hierarchical Clustering Introduction Clustering Refresher k-means Refresher The Organization of Hierarchy Introduction to Hierarchical Clustering Steps to Perform Hierarchical Clustering An Example Walk-Through of Hierarchical Clustering Exercise 7: Building a Hierarchy Linkage Activity 2: Applying Linkage Criteria Agglomerative versus Divisive Clustering Exercise 8: Implementing Agglomerative Clustering with scikit-learn Activity 3: Comparing k-means with Hierarchical Clustering k-means versus Hierarchical Clustering Summary Chapter 3 Neighborhood Approaches and DBSCAN Introduction Clusters as Neighborhoods Introduction to DBSCAN DBSCAN In-Depth Walkthrough of the DBSCAN Algorithm Exercise 9: Evaluating the Impact of Neighborhood Radius Size DBSCAN Attributes – Neighborhood Radius Activity 4: Implement DBSCAN from Scratch DBSCAN Attributes – Minimum Points Exercise 10: Evaluating the Impact of Minimum Points Threshold Activity 5: Comparing DBSCAN with k-means and Hierarchical Clustering DBSCAN Versus k-means and Hierarchical Clustering Summary Chapter 4 Dimension Reduction and PCA Introduction What Is Dimensionality Reduction? Applications of Dimensionality Reduction The Curse of Dimensionality Overview of Dimensionality Reduction Techniques Dimensionality Reduction and Unsupervised Learning PCA Mean Standard Deviation Covariance Covariance Matrix Exercise 11: Understanding the Foundational Concepts of Statistics Eigenvalues and Eigenvectors Exercise 12: Computing Eigenvalues and Eigenvectors The Process of PCA Exercise 13: Manually Executing PCA Exercise 14: Scikit-Learn PCA Activity 6: Manual PCA versus scikit-learn Restoring the Compressed Dataset Exercise 15: Visualizing Variance Reduction with Manual PCA Exercise 16: Visualizing Variance Reduction with Exercise 17: Plotting 3D Plots in Matplotlib Activity 7: PCA Using the Expanded Iris Dataset Summary Chapter 5 Autoencoders Introduction Fundamentals of Artificial Neural Networks The Neuron Sigmoid Function Rectified Linear Unit (ReLU) Exercise 18: Modeling the Neurons of an Artificial Neural Network Activity 8: Modeling Neurons with a ReLU Activation Function Neural Networks: Architecture Definition Exercise 19: Defining a Keras Model Neural Networks: Training Exercise 20: Training a Keras Neural Network Model Activity 9: MNIST Neural Network Autoencoders Exercise 21: Simple Autoencoder Activity 10: Simple MNIST Autoencoder Exercise 22: Multi-Layer Autoencoder Convolutional Neural Networks Exercise 23: Convolutional Autoencoder Activity 11: MNIST Convolutional Autoencoder Summary Chapter 6 t-Distributed Stochastic Neighbor Embedding (t-SNE) Introduction Stochastic Neighbor Embedding (SNE) t-Distributed SNE Exercise 24: t-SNE MNIST Activity 12: Wine t-SNE Interpreting t-SNE Plots Perplexity Exercise 25: t-SNE MNIST and Perplexity Activity 13: t-SNE Wine and Perplexity Iterations Exercise 26: t-SNE MNIST and Iterations Activity 14: t-SNE Wine and Iterations Final Thoughts on Visualizations Summary Chapter 7 Topic Modeling Introduction Topic Models Exercise 27: Setting Up the Environment A High-Level Overview of Topic Models Business Applications Exercise 28: Data Loading Cleaning Text Data Data Cleaning Techniques Exercise 29: Cleaning Data Step by Step Exercise 30: Complete Data Cleaning Activity 15: Loading and Cleaning Twitter Data Latent Dirichlet Allocation Variational Inference Bag of Words Exercise 31: Creating a Bag-of-Words Model Using the Count Vectorizer Perplexity Exercise 32: Selecting the Number of Topics Exercise 33: Running Latent Dirichlet Allocation Exercise 34: Visualize LDA Exercise 35: Trying Four Topics Activity 16: Latent Dirichlet Allocation and Health Tweets Bag-of-Words Follow-Up Exercise 36: Creating a Bag-of-Words Using TF-IDF Non-Negative Matrix Factorization Frobenius Norm Multiplicative Update Exercise 37: Non-negative Matrix Factorization Exercise 38: Visualizing NMF Activity 17: Non-Negative Matrix Factorization Summary Chapter 8 Market Basket Analysis Introduction Market Basket Analysis Use Cases Important Probabilistic Metrics Exercise 39: Creating Sample Transaction Data Support Confidence Lift and Leverage Conviction Exercise 40: Computing Metrics Characteristics of Transaction Data Exercise 41: Loading Data Data Cleaning and Formatting Exercise 42: Data Cleaning and Formatting Data Encoding Exercise 43: Data Encoding Activity 18: Loading and Preparing Full Online Retail Data Apriori Algorithm Computational Fixes Exercise 44: Executing the Apriori algorithm Activity 19: Apriori on the Complete Online Retail Dataset Association Rules Exercise 45: Deriving Association Rules Activity 20: Finding the Association Rules on the Complete Online Retail Dataset Summary Chapter 9 Hotspot Analysis Introduction Spatial Statistics Probability Density Functions Using Hotspot Analysis in Business Kernel Density Estimation The Bandwidth Value Exercise 46: The Effect of the Bandwidth Value Selecting the Optimal Bandwidth Exercise 47: Selecting the Optimal Bandwidth Using Grid Search Kernel Functions Exercise 48: The Effect of the Kernel Function Kernel Density Estimation Derivation Exercise 49: Simulating the Derivation of Kernel Density Estimation Activity 21: Estimating Density in One Dimension Hotspot Analysis Exercise 50: Loading Data and Modeling with Seaborn Exercise 51: Working with Basemaps Activity 22: Analyzing Crime in London Summary Appendix Chapter 1: Introduction to Clustering Activity 1: Implementing k-means Clustering Chapter 2: Hierarchical Clustering Activity 3: Comparing k-means with Hierarchical Clustering Chapter 3: Neighborhood Approaches and DBSCAN Activity 4: Implement DBSCAN from Scratch Activity 5: Comparing DBSCAN with k-means and Hierarchical Clustering Chapter 4: Dimension Reduction and PCA Activity 6: Manual PCA versus scikit-learn Activity 7: PCA Using the Expanded Iris Dataset Chapter 5: Autoencoders Activity 8: Modeling Neurons with a ReLU Activation Function Activity 9: MNIST Neural Network Activity 10: Simple MNIST Autoencoder Activity 11: MNIST Convolutional Autoencoder Chapter 6: t-Distributed Stochastic Neighbor Embedding (t-SNE) Activity 12: Wine t-SNE Activity 13: t-SNE Wine and Perplexity Activity 14: t-SNE Wine and Iterations Chapter 7: Topic Modeling Activity 15: Loading and Cleaning Twitter Data Activity 16: Latent Dirichlet Allocation and Health Tweets Activity 17: Non-Negative Matrix Factorization Chapter 8: Market Basket Analysis Activity 18: Loading and Preparing Full Online Retail Data Activity 19: Apriori on the Complete Online Retail Dataset Activity 20: Finding the Association Rules on the Complete Online Retail Dataset Chapter 9: Hotspot Analysis Activity 21: Estimating Density in One Dimension Activity 22: Analyzing Crime in London
Donate to keep this site alive
How to download source code?
1. Go to: https://github.com/PacktPublishing
2. In the Find a repository… box, search the book title: Applied Unsupervised Learning with Python
, sometime you may not get the results, please search the main title.
3. Click the book title in the search results.
3. Click Code to download.
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.