Applied Unsupervised Learning with Python

by Aaron Jones, Benjamin Johnston, Christopher Kruger

Length: 482 pages
Edition: 1
Language: English
Publisher: Packt Publishing
Publication Date: 2019-05-28
ISBN-10: 1789952298
ISBN-13: 9781789952292
Sales Rank: #328043 (See Top 100 Books)

Design clever algorithms that can uncover interesting structures and hidden relationships in unstructured, unlabeled data

Key Features

Learn how to select the most suitable Python library to solve your problem
Compare k-Nearest Neighbor (k-NN) and non-parametric methods and decide when to use them
Delve into the applications of neural networks using real-world datasets

Book Description

Unsupervised learning is a useful and practical solution in situations where labeled data is not available.

Applied Unsupervised Learning with Python guides you on the best practices for using unsupervised learning techniques in tandem with Python libraries and extracting meaningful information from unstructured data. The course begins by explaining how basic clustering works to find similar data points in a set. Once you are well versed with the k-means algorithm and how it operates, you’ll learn what dimensionality reduction is and where to apply it. As you progress, you’ll learn various neural network techniques and how they can improve your model. While studying the applications of unsupervised learning, you will also understand how to mine topics that are trending on Twitter and Facebook and build a news recommendation engine for users. You will complete the course by challenging yourself through various interesting activities such as performing a Market Basket Analysis and identifying relationships between different merchandises.

By the end of this course, you will have the skills you need to confidently build your own models using Python.

What you will learn

Understand the basics and importance of clustering
Build k-means, hierarchical, and DBSCAN clustering algorithms from scratch with built-in packages
Explore dimensionality reduction and its applications
Use scikit-learn (sklearn) to implement and analyse principal component analysis (PCA)on the Iris dataset
Employ Keras to build autoencoder models for the CIFAR-10 dataset
Apply the Apriori algorithm with machine learning extensions (Mlxtend) to study transaction data

Who this book is for

This course is designed for developers, data scientists, and machine learning enthusiasts who are interested in unsupervised learning. Some familiarity with Python programming along with basic knowledge of mathematical concepts including exponents, square roots, means, and medians will be beneficial.

Introduction to Clustering
Hierarchical Clustering
Neighborhood Approaches and DBSCAN
An Introduction to Dimensionality Reduction and PCA
Autoencoders
t-Distributed Stochastic Neighbor Embedding (t-SNE)
Topic Modeling
Market Basket Analysis
Hotspot Analysis

Preface
    About the Book
        About the Authors
        Learning Objectives
        Audience
        Approach
        Hardware Requirements
        Software Requirements
        Conventions
        Installation and Setup
        Install Anaconda on Windows
        Install Anaconda on Linux
        Install Anaconda on macOS
        Install Python on Windows
        Install Python on Linux
        Install Python on macOS X
        Additional Resources
Chapter 1
Introduction to Clustering
    Introduction
    Unsupervised Learning versus Supervised Learning
    Clustering
        Identifying Clusters
        Two-Dimensional Data
        Exercise 1: Identifying Clusters in Data
    Introduction to k-means Clustering
        No-Math k-means Walkthrough
        k-means Clustering In-Depth Walkthrough 
        Alternative Distance Metric – Manhattan Distance
        Deeper Dimensions
        Exercise 2: Calculating Euclidean Distance in Python
        Exercise 3: Forming Clusters with the Notion of Distance
        Exercise 4: Implementing k-means from Scratch
        Exercise 5: Implementing k-means with Optimization
        Clustering Performance: Silhouette Score
        Exercise 6: Calculating the Silhouette Score
        Activity 1: Implementing k-means Clustering
    Summary
Chapter 2
Hierarchical Clustering
    Introduction
    Clustering Refresher
        k-means Refresher
    The Organization of Hierarchy
    Introduction to Hierarchical Clustering
        Steps to Perform Hierarchical Clustering
        An Example Walk-Through of Hierarchical Clustering
        Exercise 7: Building a Hierarchy
    Linkage
        Activity 2: Applying Linkage Criteria
    Agglomerative versus Divisive Clustering
        Exercise 8: Implementing Agglomerative Clustering with scikit-learn
        Activity 3: Comparing k-means with Hierarchical Clustering
    k-means versus Hierarchical Clustering
    Summary
Chapter 3
Neighborhood Approaches and DBSCAN
    Introduction
        Clusters as Neighborhoods
    Introduction to DBSCAN
        DBSCAN In-Depth
        Walkthrough of the DBSCAN Algorithm
        Exercise 9: Evaluating the Impact of Neighborhood Radius Size
        DBSCAN Attributes – Neighborhood Radius
        Activity 4: Implement DBSCAN from Scratch
        DBSCAN Attributes – Minimum Points
        Exercise 10: Evaluating the Impact of Minimum Points Threshold
        Activity 5: Comparing DBSCAN with k-means and Hierarchical Clustering
    DBSCAN Versus k-means and Hierarchical Clustering
    Summary
Chapter 4
Dimension Reduction and PCA
    Introduction
        What Is Dimensionality Reduction?
        Applications of Dimensionality Reduction
        The Curse of Dimensionality
    Overview of Dimensionality Reduction Techniques
        Dimensionality Reduction and Unsupervised Learning
    PCA
        Mean
        Standard Deviation
        Covariance
        Covariance Matrix
        Exercise 11: Understanding the Foundational Concepts of Statistics
        Eigenvalues and Eigenvectors
        Exercise 12: Computing Eigenvalues and Eigenvectors
        The Process of PCA
        Exercise 13: Manually Executing PCA
        Exercise 14: Scikit-Learn PCA
        Activity 6: Manual PCA versus scikit-learn
        Restoring the Compressed Dataset
        Exercise 15: Visualizing Variance Reduction with Manual PCA
        Exercise 16: Visualizing Variance Reduction with
        Exercise 17: Plotting 3D Plots in Matplotlib
        Activity 7: PCA Using the Expanded Iris Dataset
    Summary
Chapter 5
Autoencoders
    Introduction
    Fundamentals of Artificial Neural Networks
        The Neuron
        Sigmoid Function
        Rectified Linear Unit (ReLU)
        Exercise 18: Modeling the Neurons of an Artificial Neural Network
        Activity 8: Modeling Neurons with a ReLU Activation Function
        Neural Networks: Architecture Definition
        Exercise 19: Defining a Keras Model
        Neural Networks: Training
        Exercise 20: Training a Keras Neural Network Model
        Activity 9: MNIST Neural Network
    Autoencoders
        Exercise 21: Simple Autoencoder
        Activity 10: Simple MNIST Autoencoder
        Exercise 22: Multi-Layer Autoencoder
        Convolutional Neural Networks
        Exercise 23: Convolutional Autoencoder
        Activity 11: MNIST Convolutional Autoencoder
    Summary
Chapter 6
t-Distributed Stochastic Neighbor Embedding (t-SNE)
    Introduction
    Stochastic Neighbor Embedding (SNE)
    t-Distributed SNE
        Exercise 24: t-SNE MNIST
        Activity 12: Wine t-SNE
    Interpreting t-SNE Plots
        Perplexity
        Exercise 25: t-SNE MNIST and Perplexity
        Activity 13: t-SNE Wine and Perplexity
        Iterations
        Exercise 26: t-SNE MNIST and Iterations
        Activity 14: t-SNE Wine and Iterations
        Final Thoughts on Visualizations
    Summary
Chapter 7
Topic Modeling
    Introduction
        Topic Models
        Exercise 27: Setting Up the Environment
        A High-Level Overview of Topic Models
        Business Applications
        Exercise 28: Data Loading
    Cleaning Text Data
        Data Cleaning Techniques
        Exercise 29: Cleaning Data Step by Step
        Exercise 30: Complete Data Cleaning
        Activity 15: Loading and Cleaning Twitter Data
    Latent Dirichlet Allocation
        Variational Inference
        Bag of Words
        Exercise 31: Creating a Bag-of-Words Model Using the Count Vectorizer
        Perplexity
        Exercise 32: Selecting the Number of Topics
        Exercise 33: Running Latent Dirichlet Allocation
        Exercise 34: Visualize LDA
        Exercise 35: Trying Four Topics
        Activity 16: Latent Dirichlet Allocation and Health Tweets
        Bag-of-Words Follow-Up
        Exercise 36: Creating a Bag-of-Words Using TF-IDF
    Non-Negative Matrix Factorization
        Frobenius Norm
        Multiplicative Update
        Exercise 37: Non-negative Matrix Factorization
        Exercise 38: Visualizing NMF
        Activity 17: Non-Negative Matrix Factorization
    Summary
Chapter 8
Market Basket Analysis
    Introduction
    Market Basket Analysis
        Use Cases
        Important Probabilistic Metrics
        Exercise 39: Creating Sample Transaction Data
        Support
        Confidence
        Lift and Leverage
        Conviction
        Exercise 40: Computing Metrics
    Characteristics of Transaction Data
        Exercise 41: Loading Data
        Data Cleaning and Formatting
        Exercise 42: Data Cleaning and Formatting
        Data Encoding
        Exercise 43: Data Encoding
        Activity 18: Loading and Preparing Full Online Retail Data
    Apriori Algorithm
        Computational Fixes
        Exercise 44: Executing the Apriori algorithm
        Activity 19: Apriori on the Complete Online Retail Dataset
    Association Rules
        Exercise 45: Deriving Association Rules
        Activity 20: Finding the Association Rules on the Complete Online Retail Dataset
    Summary
Chapter 9
Hotspot Analysis
    Introduction
        Spatial Statistics
        Probability Density Functions
        Using Hotspot Analysis in Business
    Kernel Density Estimation
        The Bandwidth Value
        Exercise 46: The Effect of the Bandwidth Value
        Selecting the Optimal Bandwidth
        Exercise 47: Selecting the Optimal Bandwidth Using Grid Search
        Kernel Functions
        Exercise 48: The Effect of the Kernel Function
        Kernel Density Estimation Derivation
        Exercise 49: Simulating the Derivation of Kernel Density Estimation
        Activity 21: Estimating Density in One Dimension
    Hotspot Analysis
        Exercise 50: Loading Data and Modeling with Seaborn
        Exercise 51: Working with Basemaps
        Activity 22: Analyzing Crime in London
    Summary
Appendix
    Chapter 1: Introduction to Clustering
        Activity 1: Implementing k-means Clustering
    Chapter 2: Hierarchical Clustering
        Activity 3: Comparing k-means with Hierarchical Clustering
    Chapter 3: Neighborhood Approaches and DBSCAN
        Activity 4: Implement DBSCAN from Scratch
        Activity 5: Comparing DBSCAN with k-means and Hierarchical Clustering
    Chapter 4: Dimension Reduction and PCA
        Activity 6: Manual PCA versus scikit-learn
        Activity 7: PCA Using the Expanded Iris Dataset
    Chapter 5: Autoencoders
        Activity 8: Modeling Neurons with a ReLU Activation Function
        Activity 9: MNIST Neural Network
        Activity 10: Simple MNIST Autoencoder
        Activity 11: MNIST Convolutional Autoencoder
    Chapter 6: t-Distributed Stochastic Neighbor Embedding (t-SNE)
        Activity 12: Wine t-SNE
        Activity 13: t-SNE Wine and Perplexity
        Activity 14: t-SNE Wine and Iterations
    Chapter 7: Topic Modeling
        Activity 15: Loading and Cleaning Twitter Data
        Activity 16: Latent Dirichlet Allocation and Health Tweets
        Activity 17: Non-Negative Matrix Factorization
    Chapter 8: Market Basket Analysis
        Activity 18: Loading and Preparing Full Online Retail Data
        Activity 19: Apriori on the Complete Online Retail Dataset
        Activity 20: Finding the Association Rules on the Complete Online Retail Dataset
    Chapter 9: Hotspot Analysis
        Activity 21: Estimating Density in One Dimension
        Activity 22: Analyzing Crime in London

Computers & Technology
- Databases & Big Data
- Programming Languages

Python

Donate to keep this site alive

To access the Link, solve the captcha.

How to download source code?

1. Go to: https://github.com/PacktPublishing

2. In the Find a repository… box, search the book title: Applied Unsupervised Learning with Python, sometime you may not get the results, please search the main title.