Take the next steps in your data science career! This friendly and hands-on guide shows you how to start mastering Pandas with skills you already know from spreadsheet software.
In Pandas in Action you will learn how to:
- Import datasets, identify issues with their data structures, and optimize them for efficiency
- Sort, filter, pivot, and draw conclusions from a dataset and its subsets
- Identify trends from text-based and time-based data
- Organize, group, merge, and join separate datasets
- Use a GroupBy object to store multiple DataFrames
Pandas has rapidly become one of Python’s most popular data analysis libraries. In Pandas in Action, a friendly and example-rich introduction, author Boris Paskhaver shows you how to master this versatile tool and take the next steps in your data science career. You’ll learn how easy Pandas makes it to efficiently sort, analyze, filter and munge almost any type of data.
Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.
About the technology
Data analysis with Python doesn’t have to be hard. If you can use a spreadsheet, you can learn pandas! While its grid-style layouts may remind you of Excel, pandas is far more flexible and powerful. This Python library quickly performs operations on millions of rows, and it interfaces easily with other tools in the Python data ecosystem. It’s a perfect way to up your data game.
About the book
Pandas in Action introduces Python-based data analysis using the amazing pandas library. You’ll learn to automate repetitive operations and gain deeper insights into your data that would be impractical—or impossible—in Excel. Each chapter is a self-contained tutorial. Realistic downloadable datasets help you learn from the kind of messy data you’ll find in the real world.
- Organize, group, merge, split, and join datasets
- Find trends in text-based and time-based data
- Sort, filter, pivot, optimize, and draw conclusions
- Apply aggregate operations
About the reader
For readers experienced with spreadsheets and basic Python programming.
About the author
Boris Paskhaver is a software engineer, Agile consultant, and online educator. His programming courses have been taken by 300,000 students across 190 countries.
Pandas in Action contents preface acknowledgments about this book Who should read this book How this book is organized: A road map About the code liveBook discussion forum Other online resources about the author about the cover illustration Part 1 Core pandas 1 Introducing pandas 1.1 Data in the 21st century 1.2 Introducing pandas 1.2.1 Pandas vs. graphical spreadsheet applications 1.2.2 Pandas vs. its competitors 1.3 A tour of pandas 1.3.1 Importing a data set 1.3.2 Manipulating a DataFrame 1.3.3 Counting values in a Series 1.3.4 Filtering a column by one or more criteria 1.3.5 Grouping data Summary 2 The Series object 2.1 Overview of a Series 2.1.1 Classes and instances 2.1.2 Populating the Series with values 2.1.3 Customizing the Series index 2.1.4 Creating a Series with missing values 2.2 Creating a Series from Python objects 2.3 Series attributes 2.4 Retrieving the first and last rows 2.5 Mathematical operations 2.5.1 Statistical operations 2.5.2 Arithmetic operations 2.5.3 Broadcasting 2.6 Passing the Series to Python’s built-in functions 2.7 Coding challenge 2.7.1 Problems 2.7.2 Solutions Summary 3 Series methods 3.1 Importing a data set with the read_csv function 3.2 Sorting a Series 3.2.1 Sorting by values with the sort_values method 3.2.2 Sorting by index with the sort_index method 3.2.3 Retrieving the smallest and largest values with the nsmallest and nlargest methods 3.3 Overwriting a Series with the inplace parameter 3.4 Counting values with the value_counts method 3.5 Invoking a function on every Series value with the apply method 3.6 Coding challenge 3.6.1 Problems 3.6.2 Solutions Summary 4 The DataFrame object 4.1 Overview of a DataFrame 4.1.1 Creating a DataFrame from a dictionary 4.1.2 Creating a DataFrame from a NumPy ndarray 4.2 Similarities between Series and DataFrames 4.2.1 Importing a DataFrame with the read_csv function 4.2.2 Shared and exclusive attributes of Series and DataFrames 4.2.3 Shared methods of Series and DataFrames 4.3 Sorting a DataFrame 4.3.1 Sorting by a single column 4.3.2 Sorting by multiple columns 4.4 Sorting by index 4.4.1 Sorting by row index 4.4.2 Sorting by column index 4.5 Setting a new index 4.6 Selecting columns and rows from a DataFrame 4.6.1 Selecting a single column from a DataFrame 4.6.2 Selecting multiple columns from a DataFrame 4.7 Selecting rows from a DataFrame 4.7.1 Extracting rows by index label 4.7.2 Extracting rows by index position 4.7.3 Extracting values from specific columns 4.8 Extracting values from Series 4.9 Renaming columns or rows 4.10 Resetting an index 4.11 Coding challenge 4.11.1 Problems 4.11.2 Solutions Summary 5 Filtering a DataFrame 5.1 Optimizing a data set for memory use 5.1.1 Converting data types with the astype method 5.2 Filtering by a single condition 5.3 Filtering by multiple conditions 5.3.1 The AND condition 5.3.2 The OR condition 5.3.3 Inversion with ~ 5.3.4 Methods for Booleans 5.4 Filtering by condition 5.4.1 The isin method 5.4.2 The between method 5.4.3 The isnull and notnull methods 5.4.4 Dealing with null values 5.5 Dealing with duplicates 5.5.1 The duplicated method 5.5.2 The drop_duplicates method 5.6 Coding challenge 5.6.1 Problems 5.6.2 Solutions Summary Part 2 Applied pandas 6 Working with text data 6.1 Letter casing and whitespace 6.2 String slicing 6.3 String slicing and character replacement 6.4 Boolean methods 6.5 Splitting strings 6.6 Coding challenge 6.6.1 Problems 6.6.2 Solutions 6.7 A note on regular expressions Summary 7 MultiIndex DataFrames 7.1 The MultiIndex object 7.2 MultiIndex DataFrames 7.3 Sorting a MultiIndex 7.4 Selecting with a MultiIndex 7.4.1 Extracting one or more columns 7.4.2 Extracting one or more rows with loc 7.4.3 Extracting one or more rows with iloc 7.5 Cross-sections 7.6 Manipulating the Index 7.6.1 Resetting the index 7.6.2 Setting the index 7.7 Coding challenge 7.7.1 Problems 7.7.2 Solutions Summary 8 Reshaping and pivoting 8.1 Wide vs. narrow data 8.2 Creating a pivot table from a DataFrame 8.2.1 The pivot_table method 8.2.2 Additional options for pivot tables 8.3 Stacking and unstacking index levels 8.4 Melting a data set 8.5 Exploding a list of values 8.6 Coding challenge 8.6.1 Problems 8.6.2 Solutions Summary 9 The GroupBy object 9.1 Creating a GroupBy object from scratch 9.2 Creating a GroupBy object from a data set 9.3 Attributes and methods of a GroupBy object 9.4 Aggregate operations 9.5 Applying a custom operation to all groups 9.6 Grouping by multiple columns 9.7 Coding challenge 9.7.1 Problems 9.7.2 Solutions Summary 10 Merging, joining, and concatenating 10.1 Introducing the data sets 10.2 Concatenating data sets 10.3 Missing values in concatenated DataFrames 10.4 Left joins 10.5 Inner joins 10.6 Outer joins 10.7 Merging on index labels 10.8 Coding challenge 10.8.1 Problems 10.8.2 Solutions Summary 11 Working with dates and times 11.1 Introducing the Timestamp object 11.1.1 How Python works with datetimes 11.1.2 How pandas works with datetimes 11.2 Storing multiple timestamps in a DatetimeIndex 11.3 Converting column or index values to datetimes 11.4 Using the DatetimeProperties object 11.5 Adding and subtracting durations of time 11.6 Date offsets 11.7 The Timedelta object 11.8 Coding challenge 11.8.1 Problems 11.8.2 Solutions Summary 12 Imports and exports 12.1 Reading from and writing to JSON files 12.1.1 Loading a JSON file Into a DataFrame 12.1.2 Exporting a DataFrame to a JSON file 12.2 Reading from and writing to CSV files 12.3 Reading from and writing to Excel workbooks 12.3.1 Installing the xlrd and openpyxl libraries in an Anaconda environment 12.3.2 Importing Excel workbooks 12.3.3 Exporting Excel workbooks 12.4 Coding challenge 12.4.1 Problems 12.4.2 Solutions Summary 13 Configuring pandas 13.1 Getting and setting pandas options 13.2 Precision 13.3 Maximum column width 13.4 Chop threshold 13.5 Option context Summary 14 Visualization 14.1 Installing matplotlib 14.2 Line charts 14.3 Bar graphs 14.4 Pie charts Summary appendix A Installation and setup A.1 The Anaconda distribution A.2 The macOS setup process A.2.1 Installing Anaconda in macOS A.2.2 Launching Terminal A.2.3 Common Terminal commands A.3 The Windows setup process A.3.1 Installing Anaconda in Windows A.3.2 Launching Anaconda Prompt A.3.3 Common Anaconda Prompt commands A.4 Creating a new Anaconda environment A.5 Anaconda Navigator A.6 The basics of Jupyter Notebook appendix B Python crash course B.1 Simple data types B.1.1 Numbers B.1.2 Strings B.1.3 Booleans B.1.4 The None object B.2 Operators B.2.1 Mathematical operators B.2.2 Equality and inequality operators B.3 Variables B.4 Functions B.4.1 Arguments and return values B.4.2 Custom functions B.5 Modules B.6 Classes and objects B.7 Attributes and methods B.8 String methods B.9 Lists B.9.1 List iteration B.9.2 List comprehension B.9.3 Converting a string to a list and vice versa B.10 Tuples B.11 Dictionaries B.11.1 Dictionary Iteration B.12 Sets appendix C NumPy crash course C.1 Dimensions C.2 The ndarray object C.2.1 Generating a numeric range with the arange method C.2.2 Attributes on a ndarray object C.2.3 The reshape method C.2.4 The randint function C.2.5 The randn function C.3 The nan object appendix D Generating fake data with Faker D.1 Installing Faker D.2 Getting started with Faker D.3 Populating a DataFrame with fake values appendix E Regular expressions E.1 Introduction to Python’s re module E.2 Metacharacters E.3 Advanced search patterns E.4 Regular expressions and pandas index Symbols A B C D E F G H I J K L M N O P Q R S T U V W X Pandas in Action-back
How to download source code?
1. Go to:
2. Search the book title:
Pandas in Action, sometime you may not get the results, please search the main title
3. Click the book title in the search results
resources section, click
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.