Practical Data Science with Jupyter: Explore Data Cleaning, Pre-processing, Data Wrangling, Feature Engineering and Machine Learning using Python and Jupyter

Length: 360 pages
Edition: 1
Language: English
Publisher: BPB Publications
Publication Date: 2021-03-01
ISBN-10: 9389898064
ISBN-13: 9789389898064
Sales Rank: #1673217 (See Top 100 Books)

Solve business problems with data-driven techniques and easy-to-follow Python examples

Key Features

Essential coverage on statistics and data science techniques.
Exposure to Jupyter, PyCharm, and use of GitHub.
Real use-cases, best practices, and smart techniques on the use of data science for data applications.

Description
This book begins with an introduction to Data Science followed by the Python concepts. The readers will understand how to interact with various database and Statistics concepts with their Python implementations. You will learn how to import various types of data in Python, which is the first step of the data analysis process. Once you become comfortable with data importing, you will clean the dataset and after that will gain an understanding about various visualization charts. This book focuses on how to apply feature engineering techniques to make your data more valuable to an algorithm. The readers will get to know various Machine Learning Algorithms, concepts, Time Series data, and a few real-world case studies. This book also presents some best practices that will help you to be industry-ready.

This book focuses on how to practice data science techniques while learning their concepts using Python and Jupyter. This book is a complete answer to the most common question that how can you get started with Data Science instead of explaining Mathematics and Statistics behind the Machine Learning Algorithms.

What you will learn

Rapid understanding of Python concepts for data science applications.
Understand and practice how to run data analysis with data science techniques and algorithms.
Learn feature engineering, dealing with different datasets, and most trending machine learning algorithms.
Become self-sufficient to perform data science tasks with the best tools and techniques.

Who this book is for
This book is for a beginner or an experienced professional who is thinking about a career or a career switch to Data Science. Each chapter contains easy-to-follow Python examples.

About the Author
Prateek Gupta is a Data Enthusiast and loves data-driven technologies. Prateek has completed his B.Tech in Computer Science & Engineering and he is currently working as a Data Scientist in an IT company. Prateek has a total 9 years of experience in the software industry, and currently, he is working in the computer vision area. Prateek has implemented various end-to-end Data Science projects for fishing, winery, and ecommerce clients. His implemented object detection and recognition models and product recommendation engines have solved many business problems of various clients. His keen area of interest is in natural language processing and computer vision. In his leisure time, he writes posts about artificial intelligence in his blog.

Blog links: http://dsbyprateekg.blogspot.com/
LinkedIn Profile: https://www.linkedin.com/in/prateek-gupta-64203354/

Cover Page
Title Page
Copyright Page
Dedication Page
About the Author
Acknowledgement
Preface
Errata
Table of Contents
1. Data Science Fundamentals
    Structure
    Objective
    What is data?
        Structured data
        Unstructured data
        Semi-structured data
    What is data science?
    What does a data scientist do?
    Real-world use cases of data science
    Why Python for data science?
    Conclusion
2. Installing Software and System Setup
    Structure
    Objective
    System requirements
    Downloading Anaconda
    Installing the Anaconda on Windows
    Installing the Anaconda in Linux
    How to install a new Python library in Anaconda?
    Open your notebook – Jupyter
    Know your notebook
    Conclusion
3. Lists and Dictionaries
    Structure
    Objective
    What is a list?
    How to create a list?
    Different list manipulation operations
    Difference between Lists and Tuples
    What is a Dictionary?
    How to create a dictionary?
    Some operations with dictionary
    Conclusion
4. Package, Function, and Loop
    Structure
    Objective
    The help() function in Python
    How to import a Python package?
    How to create and call a function?
    Passing parameter in a function
    Default parameter in a function
    How to use unknown parameters in a function?
    A global and local variable in a function
    What is a Lambda function?
    Understanding main in Python
    while and for loop in Python
    Conclusion
5. NumPy Foundation
    Structure
    Objective
    Importing a NumPy package
    Why use NumPy array over list?
    NumPy array attributes
    Creating NumPy arrays
    Accessing an element of a NumPy array
    Slicing in NumPy array
    Array concatenation
    Conclusion
6. Pandas and DataFrame
    Structure
    Objective
    Importing Pandas
    Pandas data structures
        Series
    DataFrame
    .loc[] and .iloc[]
    Some Useful DataFrame Functions
    Handling missing values in DataFrame
    Conclusion
7. Interacting with Databases
    Structure
    Objective
    What is SQLAlchemy?
    Installing SQLAlchemy package
    How to use SQLAlchemy?
    SQLAlchemy engine configuration
    Creating a table in a database
    Inserting data in a table
    Update a record
    How to join two tables
        Inner join
        Left join
        Right join
    Conclusion
8. Thinking Statistically in Data Science
    Structure
    Objective
    Statistics in data science
    Types of statistical data/variables
        Mean, median, and mode
    Basics of probability
    Statistical distributions
        Poisson distribution
        Binomial distribution
        Normal distribution
    Pearson correlation coefficient
    Probability Density Function (PDF)
    Real-world example
    Statistical inference and hypothesis testing
    Conclusion
9. How to Import Data in Python?
    Structure
    Objective
    Importing text data
    Importing CSV data
    Importing Excel data
    Importing JSON data
    Importing pickled data
    Importing a compressed data
    Conclusion
10. Cleaning of Imported Data
    Structure
    Objective
    Know your data
    Analyzing missing values
    Dropping missing values
    Automatically fill missing values
    How to scale and normalize data?
    How to parse dates?
    How to apply character encoding?
    Cleaning inconsistent data
    Conclusion
11. Data Visualization
    Structure
    Objective
    Bar chart
    Line chart
    Histograms
    Scatter plot
    Stacked plot
    Box plot
    Conclusion
12. Data Pre-processing
    Structure
    Objective
    About the case-study
    Importing the dataset
    Exploratory data analysis
    Data cleaning and pre-processing
    Feature Engineering
    Conclusion
13. Supervised Machine Learning
    Structure
    Objective
    Some common ML terms
    Introduction to machine learning (ML)
        Supervised learning
    Unsupervised learning
        Semi-supervised learning
        Reinforcement learning
    List of common ML algorithms
    Supervised ML fundamentals
        Logistic Regression
        Decision Tree Classifier
        K-Nearest Neighbor Classifier
        Linear Discriminant Analysis (LDA)
        Gaussian Naive Bayes Classifier
        Support Vector Classifier
    Solving a classification ML problem
        About the dataset
        Attribute information
        Why train/test split and cross-validation?
    Solving a regression ML problem
    How to tune your ML model?
    How to handle categorical variables in sklearn?
    The advanced technique to handle missing data
    Conclusion
14. Unsupervised Machine Learning
    Structure
    Objective
    Why unsupervised learning?
    Unsupervised learning techniques
        Clustering
        K-mean clustering
        Hierarchical clustering
            t-SNE
    Principal Component Analysis (PCA)
    Case study
    Validation of unsupervised ML
    Conclusion
15. Handling Time-Series Data
    Structure
    Objective
    Why time-series is important?
    How to handle date and time?
    Transforming a time-series data
    Manipulating a time-series data
    Comparing time-series growth rates
    How to change time-series frequency?
    Conclusion
16. Time-Series Methods
    Structure
    Objective
    What is time-series forecasting?
    Basic steps in forecasting
    Time-series forecasting techniques
    Autoregression (AR)
    Moving Average (MA)
        Autoregressive Moving Average (ARMA)
        Autoregressive Integrated Moving Average (ARIMA)
        Seasonal Autoregressive Integrated Moving-Average (SARIMA)
        Seasonal Autoregressive Integrated Moving-Average with Exogenous Regressors (SARIMAX)
        Vector Autoregression Moving-Average (VARMA)
        Holt Winter’s Exponential Smoothing (HWES)
    Forecast future traffic to a web page
    Conclusion
17. Case Study-1
    Predict whether or not an applicant will be able to repay a loan
    Conclusion
18. Case Study-2
    Build a prediction model that will accurately classify which text messages are spam
    Conclusion
19. Case Study-3
    Build a film recommendation engine
    Conclusion
20. Case Study-4
    Predict house sales in King County, Washington State, USA, using regression
    Conclusion
21. Python Virtual Environment
    Structure
    Objective
    What is a Python virtual environment?
    How to create and activate a virtual environment?
    How to open Jupyter notebook with this new environment?
    How to set an activated virtual environment in PyCharm IDE?
    What is requirements.txt file?
    What is README.md file?
    Upload your project in GitHub
    Conclusion
22. Introduction to An Advanced Algorithm - CatBoost
    Structure
    Objective
    What is a Gradient Boosting algorithm?
    Introduction to CatBoost
    Install CatBoost in Python virtual environment
    How to solve a classification problem with CatBoost?
    Push your notebook in your GitHub repository
    Conclusion
23. Revision of All Chapters’ Learning
    Conclusion
Index