The Pandas Workshop: A comprehensive guide to using Python for data analysis with real-world case studies
- Length: 744 pages
- Edition: 1
- Language: English
- Publisher: Packt Publishing
- Publication Date: 2022-06-17
- ISBN-10: 1800208936
- ISBN-13: 9781800208933
- Sales Rank: #1785966 (See Top 100 Books)
Learn the fundamentals of data science with Python by analyzing real datasets and solving problems using pandas
Key Features
- Learn how to apply data retrieval, transformation, visualization, and modeling techniques using pandas
- Become highly efficient in unlocking deeper insights from your data, including databases, web data, and more
- Build your experience and confidence with hands-on exercises and activities
Book Description
The Pandas Workshop will teach you how to be more productive with data and generate real business insights to inform your decision-making. You will be guided through real-world data science problems and shown how to apply key techniques in the context of realistic examples and exercises. Engaging activities will then challenge you to apply your new skills in a way that prepares you for real data science projects.
You’ll see how experienced data scientists tackle a wide range of problems using data analysis with pandas. Unlike other Python books, which focus on theory and spend too long on dry, technical explanations, this workshop is designed to quickly get you to write clean code and build your understanding through hands-on practice. As you work through this Python pandas book, you’ll tackle various real-world scenarios, such as using an air quality dataset to understand the pattern of nitrogen dioxide emissions in a city, as well as analyzing transportation data to improve bus transportation services.
By the end of this data analytics book, you’ll have the knowledge, skills, and confidence you need to solve your own challenging data science problems with pandas.
What you will learn
- Access and load data from different sources using pandas
- Work with a range of data types and structures to understand your data
- Perform data transformation to prepare it for analysis
- Use Matplotlib for data visualization to create a variety of plots
- Create data models to find relationships and test hypotheses
- Manipulate time-series data to perform date-time calculations
- Optimize your code to ensure more efficient business data analysis
Who this book is for
This data analysis book is for anyone with prior experience working with the Python programming language who wants to learn the fundamentals of data analysis with pandas. Previous knowledge of pandas is not necessary.
The Pandas Workshop Contributors About the authors About the reviewer Preface Who this book is for What this book covers To get the most out of this book Download the example code files Download the color images Conventions used Get in touch Share Your Thoughts Part 1 – Introduction to pandas Chapter 1: Introduction to pandas Introduction to the world of pandas Exploring the history and evolution of pandas Components and applications of pandas Understanding the basic concepts of pandas The Series object The DataFrame object Working with local files Reading a CSV file Displaying a snapshot of the data Writing data to a file Data types in pandas Data selection Data transformation Data visualization Time series data Code optimization Utility functions Exercise 1.02 – basic numerical operations with pandas Data modeling Exercise 1.03 – comparing data from two DataFrames Activity 1.01 – comparing sales data for two stores Summary Chapter 2: Working with Data Structures Introduction to data structures The need for data structures Data structures Creating DataFrames in pandas Exercise 2.01 – Creating a DataFrame Indexes and columns Exercise 2.02 – Reading DataFrames and manipulating the index Working with columns Series The Series index Exercise 2.03 – Series to DataFrames Using time as the index Exercise 2.04 – DataFrame indices Activity 2.01 – Working with pandas data structures Summary Chapter 3: Data I/O The world of data Exploring data sources Text files and binary files Online data sources Exercise 3.01 – reading data from web pages Fundamental formats Text data Exercise 3.02 – text character encoding and data separators Binary data Databases – SQL data sqlite3 Additional text formats Working with JSON Working with HTML/XML Working with XML data Working with Excel SAS data SPSS data Stata data HDF5 data Manipulating SQL data Exercise 3.03 – working with SQL Choosing a format for a project Activity 3.01 – using SQL data for pandas analytics Summary Chapter 4: Pandas Data Types Introducing pandas dtypes Obtaining the underlying data types Converting from one type into another Exercise 4.01 – underlying data types and conversion Missing data types The missing alphabet soup Nullable types Exercise 4.02 – missing data and converting into non-nullable dtypes Activity 4.01 – optimizing memory usage by converting into the appropriate dtypes Subsetting by data types Working with the dtype category Working with dtype = datetime64[ns] Working with dtype = timedelta64[ns] Exercise 4.03 – working with text data using string methods Selecting data in a DataFrame by its dtype Summary Part 2 – Working with Data Chapter 5: Data Selection – DataFrames Introduction to DataFrames The need for data selection methods Data selection in pandas DataFrames The index and its forms Exercise 5.01 – identifying the row and column indices in a dataset Slicing and indexing methods Exercise 5.02 – subsetting rows and columns Using labels as the index and the pandas multi-index Creating a multi-index from columns Activity 5.01 – Creating a multi-index from columns Bracket and dot notation Bracket notation Dot notation Exercise 5.03 – integer row numbers versus labels Using extended indexing Type exceptions Changing DataFrame values using bracket or dot notation Exercise 5.04 – selecting data using bracket and dot notation Summary Chapter 6: Data Selection – Series Introduction to pandas Series The Series index Data selection in a pandas Series Brackets, dots, Series.loc, and Series.iloc Exercise 6.01 – basic Series data selection Preparing Series from DataFrames and vice versa Exercise 6.02 – using a Series index to select values Activity 6.01 – Series data selection Understanding the differences between base Python and pandas data selection Lists versus Series access DataFrames versus dictionary access Activity 6.02 – DataFrame data selection Summary Chapter 7: Data Exploration and Transformation Introduction to data transformation Dealing with messy data Working on data without column headers Multiple values in one column Duplicate observations in both rows and columns Exercise 7.01 – working with messy addresses Multiple variables stored in one column Multiple DataFrames with identical structures Exercise 7.02 – storing sales by demographics Dealing with missing data What is missing data? Strategies for missing data Summarizing data Grouping and aggregation Exploring pivot tables Activity 7.01 – data analysis using pivot tables Summary Chapter 8: Understanding Data Visualization Introduction to data visualization Understanding the basics of pandas visualization Exercise 8.01 – Building histograms for the Titanic dataset Exploring matplotlib Visualizing data of different types Visualizing numerical data Visualizing categorical data Visualizing statistical data Exercise 8.02 – Boxplots for the Titanic dataset Visualizing multiple data plots Activity 8.01 – Using data visualization for exploratory data analysis Summary Part 3 – Data Modeling Chapter 9: Data Modeling – Preprocessing An introduction to data modeling Exploring dependent and independent variables Training, validation, and test splits of data Exercise 9.01 – Creating training, validation, and test data Avoiding information leakage Complete model validation Understanding data scaling and normalization Different ways to Scale Data Scaling data yourself Min/max scaling Standardization – addressing variance Transforming back to real units Exercise 9.02 – Scaling and normalizing data Activity 9.01 – Data splitting, scaling, and modeling Summary Chapter 10: Data Modeling – Modeling Basics Introduction to data modeling Learning the modeling basics Modeling tools Pandas modeling tools Predicting future values of time series Exercise 10.01 – Smoothing data to discover patterns Activity 10.01 – Normalizing and smoothing data Summary Chapter 11: Data Modeling – Regression Modeling An introduction to regression modeling Exploring regression modeling Using linear models Exercise 11.1 – Linear regression Non-linear models Model diagnostics Comparing predicted and actual values Using the Q-Q plot Exercise 11.02 – Multiple regression and non-linear models Activity 11.01 – Multiple regression with non-linear models Summary Part 4 – Additional Use Cases for pandas Chapter 12: Using Time in pandas Introduction to time series What are datetimes? Attributes of datetime objects Exercise 12.01 – working with datetime Creating and manipulating datetime objects/time series Time periods in pandas Information in pandas time-aware objects Exercise 12.02 – math with datetimes Timestamp formats Activity 12.01 – understanding power usage Datetime math operations Date ranges Timedeltas, offsets, and differences Date offsets Exercise 12.03 – timedeltas and date offsets Summary Chapter 13: Exploring Time Series The time series as an index Time series periods/frequencies Shifting, lagging, and converting frequency Resampling, grouping, and aggregation by time Using the resample method Exercise 13.01 – Aggregating and resampling Windowing operations with the rolling method Activity 13.01 – Creating a time series model Summary Chapter 14: Applying pandas Data Processing for Case Studies Introduction to the case studies and datasets Recap of the preprocessing steps Preprocessing the German climate data Exercise 14.01 – preprocessing the German climate data Exercise 14.02 – merging DataFrames and renaming variables Exercise 14.03 – data interpolation and answering questions after data preprocessing Exercise 14.04 – using data visualizations to answer questions Exercise 14.05 – using data visualizations to answer questions Exercise 14.06 – analyzing data on bus trajectories Activity 14.01 – analyzing air quality data Summary Chapter 15: Appendix Solution 1.1 Solution 2.1 Solution 3.1 Solution 4.1 Solution 5.1 Solution 6.1 Solution 6.2 Solution 7.1 Solution 8.1 Solution 9.1 Solution 10.1 Solution 11.1 Solution 12.1 Solution 13.1 Solution 14.1 Why subscribe? Other Books You May Enjoy Packt is searching for authors like you Share Your Thoughts
Donate to keep this site alive
How to download source code?
1. Go to: https://github.com/PacktPublishing
2. In the Find a repository… box, search the book title: The Pandas Workshop: A comprehensive guide to using Python for data analysis with real-world case studies
, sometime you may not get the results, please search the main title.
3. Click the book title in the search results.
3. Click Code to download.
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.