Practitioner’s Guide to Data Science: Streamlining Data Science Solutions using Python, Scikit-Learn, and Azure ML Service Platform

by Nasir Ali Mirza

Length: 242 pages
Edition: 1
Language: English
Publisher: BPB Publications
Publication Date: 2022-01-17
ISBN-10: 9391392873
ISBN-13: 9789391392871
Sales Rank: #4639285 (See Top 100 Books)

1 ratings

Print Book Look Inside

Covers Data Science concepts, processes, and the real-world hands-on use cases.

Key Features

Covers the journey from a basic programmer to an effective Data Science developer.
Applied use of Data Science native processes like CRISP-DM and Microsoft TDSP.
Implementation of MLOps using Microsoft Azure DevOps.

Description

“How is the Data Science project to be implemented?” has never been more conceptually sounding, thanks to the work presented in this book. This book provides an in-depth look at the current state of the world’s data and how Data Science plays a pivotal role in everything we do.

This book explains and implements the entire Data Science lifecycle using well-known data science processes like CRISP-DM and Microsoft TDSP. The book explains the significance of these processes in connection with the high failure rate of Data Science projects.

The book helps build a solid foundation in Data Science concepts and related frameworks. It teaches how to implement real-world use cases using data from the HMDA dataset. It explains Azure ML Service architecture, its capabilities, and implementation to the DS team, who will then be prepared to implement MLOps. The book also explains how to use Azure DevOps to make the process repeatable while we’re at it.

By the end of this book, you will learn strong Python coding skills, gain a firm grasp of concepts such as feature engineering, create insightful visualizations and become acquainted with techniques for building machine learning models.

What you will learn

Organize Data Science projects using CRISP-DM and Microsoft TDSP.
Learn to acquire and explore data using Python visualizations.
Get well versed with the implementation of data pre-processing and Feature Engineering.
Understand algorithm selection, model development, and model evaluation.
Hands-on with Azure ML Service, its architecture, and capabilities.
Learn to use Azure ML SDK and MLOps for implementing real-world use cases.

Who this book is for

This book is intended for programmers who wish to pursue AI/ML development and build a solid conceptual foundation and familiarity with related processes and frameworks. Additionally, this book is an excellent resource for Software Architects and Managers involved in the design and delivery of Data Science-based solutions.

Cover Page
Title Page
Copyright Page
Foreword
Dedication Page
About the Author
About the Reviewer
Acknowledgement
Preface
Errata
Table of Contents
1. Data Science for Business
    Structure
    Objectives
    Application programmer to Data Science professional
    What is Data Science?
    The unprecedented scope of Data Science
    Data Science application
    Big Data, DM, ML, DL, AI, and Data Science
    Legal, ethical, and security aspects of Data Science
    Methodology used in organizing this book
    Conclusion
    Points to remember
    Multiple choice questions
        Answers
    Questions
    Key terms
2. Data Science Project Methodologies and Team Processes
    Structure
    Objectives
    What is a process and its importance?
    Data Science from a process perspective
    Software engineering and Data Science
    Data Science project methodologies and processes
        Knowledge Discovery in Databases
        CCC Big Data pipeline
        CRoss-Industry Standard Process for Data Mining
        Domino’s Data Science Life Cycle
        Microsoft’s Team Data Science Process
            Data Science lifecycle
            Standardized project structure
            Infrastructure and resources
            Tools and utilities
        Sample, Explore, Modify, Model, and Assess
        Data-Driven Scrum (DDS)
    Conclusion
    Points to remember
    Multiple choice questions
        Answers
    Questions
    Key terms
3. Business Understanding and Its Data Landscape
    Structure
    Objectives
    What is involved in business understanding?
        CRISP-DM guidelines
        Microsoft TDSP guidelines
    Business problem types and Data Science solutions
    Reliability and validity of business data
    Hands-on use case
        Project charter
            Business background
            Project scope
            Project team
            Evaluation metrics
            Project plan
            Solution architecture
            Communication plan
        Data sources
        Data dictionary
    Conclusion
    Points to remember
    Multiple choice questions
        Answers
    Questions
    Key terms
4. Acquire, Explore, and Analyze Data
    Structure
    Objectives
    Development environment options
    Guidelines for data acquisition and understanding
        CRISP-DM
        Microsoft TDSP
    Data acquisition and sampling
        Essential considerations
        Use case data
        Down-sampling the use case data
            Down-sampling for rate spread use case
    Data exploration and visualization
        Essential considerations
        Explore and visualize HMDA use case data
            HMDA use case data distribution
            Data relations (bivariate)
            Categorical variables
            Data relations (multivariate)
    Data quality report and decision checkpoint
        Data quality
        Decision checkpoint
    Conclusion
    Points to remember
    Multiple choice question
        Answers
    Questions
    Key terms
5. Pre-processing and Preparing Data
    Structure
    Objectives
    Guidelines for data preparation
        CRISP-DM for data preparation
            Selection of data
            Cleaning of data
            Construction of data
            Integration of data
            Data formatting
        Microsoft TDSP for data preparation
            Data pre-processing concept
            Data health screening
            Data pre-processing major tasks
            Feature engineering
    Data pre-processing and cleaning
    Feature engineering
    Conclusion
    Points to remember
    Multiple choice questions
        Answers
    Questions
    Key terms
6. Developing a Machine Learning Model
    Structure
    Objectives
    Guidelines for model development
        CRISP-DM
            Selection of modeling technique
            Generation of test design
            Model building
            Model assessment
        Microsoft TDSP
            Goals
            Tasks
            Deliverables
    Modeling algorithms and evaluation
        What is a model?
        How to choose an algorithm?
        Metrics for model evaluation
            Classification metrics
            Regression metrics
        Model development procedure
    Modeling for HMDA use case
        Choosing an algorithm
        Modeling scenario-1
        Modeling scenario-2
        Model tuning
            Feature selection
            Dimensionality reduction
            Cross-validation
            Regularization
            Bagging and boosting
    Conclusion
    Points to remember
    Multiple choice questions
        Answers
    Questions
    Key terms
7. Lap Around Azure ML Service
    Structure
    Objectives
    Azure ML Service overview
    Architecture and key concepts
        Workspace
        Compute
            Managed compute
            Un-managed compute
        Datasets and datastores
        Environments
        Experiments
            Runs
            Run configurations
            Snapshots
        Pipelines
        Models
            Model registry
        Deployment
        Endpoints
            Web service endpoint
            IoT module endpoints
    Getting started: signup and provisioning
    AutoML in Azure ML Service
    Model development with Azure ML Service
        Azure ML Designer
        AutoML using ML Studio UI
        AutoML using Python SDK
    Conclusion
    Points to remember
    Multiple choice questions
        Answers
    Questions
    Key terms
8. Deploying and Managing Models
    Structure
    Objectives
    Guidelines for deployment and evaluation
        CRISP-DM
        Microsoft TDSP
    Model lifecycle management
        Model lifecycle using Azure ML SDK
            Training the model
            Registering model
            Deploying the model
            Testing/consuming deployed model
            Retraining a model
        Model lifecycle using Azure ML Studio UI
        MLOps with Azure Pipelines
            Pre-requisites
            Azure DevOps project
            Project repository
            Azure Subscription
            Azure Service connection
            Creating a build pipeline
            Creating a release pipeline
    Conclusion
    Points to remember
    Multiple choice questions
        Answers
    Questions
    Key terms
Index