Python for Data Analysis: Data Wrangling with pandas, NumPy, and Jupyter, 3rd Edition

Length: 579 pages
Edition: 3
Language: English
Publisher: O'Reilly Media
Publication Date: 2022-09-27
ISBN-10: 109810403X
ISBN-13: 9781098104030
Sales Rank: #86558 (See Top 100 Books)

Get the definitive handbook for manipulating, processing, cleaning, and crunching datasets in Python. Updated for Python 3.9 and pandas 1.2, the third edition of this hands-on guide is packed with practical case studies that show you how to solve a broad set of data analysis problems effectively. You’ll learn the latest versions of pandas, NumPy, and Jupyter in the process.

Written by Wes McKinney, the creator of the Python pandas project, this book is a practical, modern introduction to data science tools in Python. It’s ideal for analysts new to Python and for Python programmers new to data science and scientific computing. Data files and related material are available on GitHub.

Use the Jupyter notebook and IPython shell for exploratory computing
Learn basic and advanced features in NumPy
Get started with data analysis tools in the pandas library
Use flexible tools to load, clean, transform, merge, and reshape data
Create informative visualizations with matplotlib
Apply the pandas groupby facility to slice, dice, and summarize datasets
Analyze and manipulate regular and irregular time series data
Learn how to solve real-world data analysis problems with thorough, detailed examples

Preface
    1. Conventions Used in This Book
    2. Using Code Examples
    3. O’Reilly Online Learning
    4. How to Contact Us
    5. Acknowledgments
        In Memoriam: John D. Hunter (1968–2012)
        Acknowledgments for the Third Edition (2022)
        Acknowledgments for the Second Edition (2017)
        Acknowledgments for the First Edition (2012)
1. Preliminaries
    1.1. What Is This Book About?
        What Kinds of Data?
    1.2. Why Python for Data Analysis?
        Python as Glue
        Solving the “Two-Language” Problem
        Why Not Python?
    1.3. Essential Python Libraries
        NumPy
        pandas
        matplotlib
        IPython and Jupyter
        SciPy
        scikit-learn
        statsmodels
        Other Packages
    1.4. Installation and Setup
        Miniconda on Windows
        GNU/Linux
        Miniconda on macOS
        Installing Necessary Packages
        Integrated Development Environments and Text       Editors
    1.5. Community and Conferences
    1.6. Navigating This Book
        Code Examples
        Data for Examples
        Import Conventions
2. Python Language Basics, IPython, and Jupyter Notebooks
    2.1. The Python Interpreter
    2.2. IPython Basics
        Running the IPython Shell
        Running the Jupyter Notebook
        Tab Completion
        Introspection
    2.3. Python Language Basics
        Language Semantics
            Indentation, not braces
            Everything is an object
            Comments
            Function and object method calls
            Variables and argument passing
            Dynamic references, strong types
            Attributes and methods
            Duck typing
            Imports
            Binary operators and comparisons
            Mutable and immutable objects
        Scalar Types
            Numeric types
            Strings
            Bytes and Unicode
            Booleans
            Type casting
            None
            Dates and times
        Control Flow
            if, elif, and else
            for loops
            while loops
            pass
            range
    2.4. Conclusion
3. Built-In Data Structures, Functions, and Files
    3.1. Data Structures and Sequences
        Tuple
            Unpacking tuples
            Tuple methods
        List
            Adding and removing elements
            Concatenating and combining lists
            Sorting
            Slicing
        Dictionary
            Creating dictionaries from sequences
            Default values
            Valid dictionary key types
        Set
        Built-In Sequence Functions
            enumerate
            sorted
            zip
            reversed
        List, Set, and Dictionary Comprehensions
            Nested list comprehensions
    3.2. Functions
        Namespaces, Scope, and Local Functions
        Returning Multiple Values
        Functions Are Objects
        Anonymous (Lambda) Functions
        Generators
            Generator expressions
            itertools module
        Errors and Exception Handling
            Exceptions in IPython
    3.3. Files and the Operating System
        Bytes and Unicode with Files
    3.4. Conclusion
4. NumPy Basics: Arrays and Vectorized   Computation
    4.1. The NumPy ndarray: A Multidimensional Array Object
        Creating ndarrays
        Data Types for ndarrays
        Arithmetic with NumPy Arrays
        Basic Indexing and Slicing
            Indexing with slices
        Boolean Indexing
        Fancy Indexing
        Transposing Arrays and Swapping Axes
    4.2. Pseudorandom Number Generation
    4.3. Universal Functions: Fast Element-Wise Array Functions
    4.4. Array-Oriented Programming with Arrays
        Expressing Conditional Logic as Array Operations
        Mathematical and Statistical Methods
        Methods for Boolean Arrays
        Sorting
        Unique and Other Set Logic
    4.5. File Input and Output with Arrays
    4.6. Linear Algebra
    4.7. Example: Random Walks
        Simulating Many Random Walks at Once
    4.8. Conclusion
5. Getting Started with pandas
    5.1. Introduction to pandas Data Structures
        Series
        DataFrame
        Index Objects
    5.2. Essential Functionality
        Reindexing
        Dropping Entries from an Axis
        Indexing, Selection, and Filtering
            Selection on DataFrame with loc and iloc
            Integer indexing pitfalls
            Pitfalls with chained indexing
        Arithmetic and Data Alignment
            Arithmetic methods with fill values
            Operations between DataFrame and Series
        Function Application and Mapping
        Sorting and Ranking
        Axis Indexes with Duplicate Labels
    5.3. Summarizing and Computing Descriptive Statistics
        Correlation and Covariance
        Unique Values, Value Counts, and Membership
    5.4. Conclusion
6. Data Loading, Storage, and File   Formats
    6.1. Reading and Writing Data in Text Format
        Reading Text Files in Pieces
        Writing Data to Text Format
        Working with Other Delimited Formats
        JSON Data
        XML and HTML: Web Scraping
            Parsing XML with lxml.objectify
    6.2. Binary Data Formats
        Reading Microsoft Excel Files
        Using HDF5 Format
    6.3. Interacting with Web APIs
    6.4. Interacting with Databases
    6.5. Conclusion
7. Data Cleaning and Preparation
    7.1. Handling Missing Data
        Filtering Out Missing Data
        Filling In Missing Data
    7.2. Data Transformation
        Removing Duplicates
        Transforming Data Using a Function or Mapping
        Replacing Values
        Renaming Axis Indexes
        Discretization and Binning
        Detecting and Filtering Outliers
        Permutation and Random Sampling
        Computing Indicator/Dummy Variables
    7.3. Extension Data Types
    7.4. String Manipulation
        Python Built-In String Object Methods
        Regular Expressions
        String Functions in pandas
    7.5. Categorical Data
        Background and Motivation
        Categorical Extension Type in pandas
        Computations with Categoricals
            Better performance with categoricals
        Categorical Methods
            Creating dummy variables for modeling
    7.6. Conclusion
8. Data Wrangling: Join, Combine, and Reshape
    8.1. Hierarchical Indexing
        Reordering and Sorting Levels
        Summary Statistics by Level
        Indexing with a DataFrame’s columns
    8.2. Combining and Merging Datasets
        Database-Style DataFrame Joins
        Merging on Index
        Concatenating Along an Axis
        Combining Data with Overlap
    8.3. Reshaping and Pivoting
        Reshaping with Hierarchical Indexing
        Pivoting “Long” to “Wide” Format
        Pivoting “Wide” to “Long” Format
    8.4. Conclusion
9. Plotting and Visualization
    9.1. A Brief matplotlib API Primer
        Figures and Subplots
            Adjusting the spacing around subplots
        Colors, Markers, and Line Styles
        Ticks, Labels, and Legends
            Setting the title, axis labels, ticks, and tick labels
            Adding legends
        Annotations and Drawing on a Subplot
        Saving Plots to File
        matplotlib Configuration
    9.2. Plotting with pandas and seaborn
        Line Plots
        Bar Plots
        Histograms and Density Plots
        Scatter or Point Plots
        Facet Grids and Categorical Data
    9.3. Other Python Visualization Tools
    9.4. Conclusion
10. Data Aggregation and Group   Operations
    10.1. How to Think About Group Operations
        Iterating over Groups
        Selecting a Column or Subset of Columns
        Grouping with Dictionaries and Series
        Grouping with Functions
        Grouping by Index Levels
    10.2. Data Aggregation
        Column-Wise and Multiple Function Application
        Returning Aggregated Data Without Row Indexes
    10.3. Apply: General split-apply-combine
        Suppressing the Group Keys
        Quantile and Bucket Analysis
        Example: Filling Missing Values with Group-Specific       Values
        Example: Random Sampling and Permutation
        Example: Group Weighted Average and Correlation
        Example: Group-Wise Linear Regression
    10.4. Group Transforms and “Unwrapped” GroupBys
    10.5. Pivot Tables and Cross-Tabulation
        Cross-Tabulations: Crosstab
    10.6. Conclusion
11. Time Series
    11.1. Date and Time Data Types and Tools
        Converting Between String and Datetime
    11.2. Time Series Basics
        Indexing, Selection, Subsetting
        Time Series with Duplicate Indices
    11.3. Date Ranges, Frequencies, and Shifting
        Generating Date Ranges
        Frequencies and Date Offsets
            Week of month dates
        Shifting (Leading and Lagging) Data
            Shifting dates with offsets
    11.4. Time Zone Handling
        Time Zone Localization and Conversion
        Operations with Time Zone-Aware Timestamp Objects
        Operations Between Different Time Zones
    11.5. Periods and Period Arithmetic
        Period Frequency Conversion
        Quarterly Period Frequencies
        Converting Timestamps to Periods (and Back)
        Creating a PeriodIndex from Arrays
    11.6. Resampling and Frequency Conversion
        Downsampling
            Open-high-low-close (OHLC) resampling
        Upsampling and Interpolation
        Resampling with Periods
        Grouped Time Resampling
    11.7. Moving Window Functions
        Exponentially Weighted Functions
        Binary Moving Window Functions
        User-Defined Moving Window Functions
    11.8. Conclusion
12. Introduction to Modeling Libraries in   Python
    12.1. Interfacing Between pandas and Model Code
    12.2. Creating Model Descriptions with Patsy
        Data Transformations in Patsy Formulas
        Categorical Data and Patsy
    12.3. Introduction to statsmodels
        Estimating Linear Models
        Estimating Time Series Processes
    12.4. Introduction to scikit-learn
    12.5. Conclusion
13. Data Analysis Examples
    13.1. Bitly Data from 1.USA.gov
        Counting Time Zones in Pure Python
        Counting Time Zones with pandas
    13.2. MovieLens 1M Dataset
        Measuring Rating Disagreement
    13.3. US Baby Names 1880–2010
        Analyzing Naming Trends
            Measuring the increase in naming diversity
            The “last letter” revolution
            Boy names that became girl names (and vice versa)
    13.4. USDA Food Database
    13.5. 2012 Federal Election Commission Database
        Donation Statistics by Occupation and Employer
        Bucketing Donation Amounts
        Donation Statistics by State
    13.6. Conclusion
A. Advanced NumPy
    A.1. ndarray Object Internals
        NumPy Data Type Hierarchy
    A.2. Advanced Array Manipulation
        Reshaping Arrays
        C Versus FORTRAN Order
        Concatenating and Splitting Arrays
            Stacking helpers: r_ and c_
        Repeating Elements: tile and repeat
        Fancy Indexing Equivalents: take and put
    A.3. Broadcasting
        Broadcasting over Other Axes
        Setting Array Values by Broadcasting
    A.4. Advanced ufunc Usage
        ufunc Instance Methods
        Writing New ufuncs in Python
    A.5. Structured and Record Arrays
        Nested Data Types and Multidimensional Fields
        Why Use Structured Arrays?
    A.6. More About Sorting
        Indirect Sorts: argsort and lexsort
        Alternative Sort Algorithms
        Partially Sorting Arrays
        numpy.searchsorted: Finding Elements in a Sorted Array
    A.7. Writing Fast NumPy Functions with Numba
        Creating Custom numpy.ufunc Objects with Numba
    A.8. Advanced Array Input and Output
        Memory-Mapped Files
        HDF5 and Other Array Storage Options
    A.9. Performance Tips
        The Importance of Contiguous Memory
B. More on the IPython System
    B.1. Terminal Keyboard Shortcuts
    B.2. About Magic Commands
        The %run Command
            Interrupting running code
        Executing Code from the Clipboard
    B.3. Using the Command History
        Searching and Reusing the Command History
        Input and Output Variables
    B.4. Interacting with the Operating System
        Shell Commands and Aliases
        Directory Bookmark System
    B.5. Software Development Tools
        Interactive Debugger
            Other ways to use the debugger
        Timing Code: %time and %timeit
        Basic Profiling: %prun and %run -p
        Profiling a Function Line by Line
    B.6. Tips for Productive Code Development Using IPython
        Reloading Module Dependencies
        Code Design Tips
            Keep relevant objects and data alive
            Flat is better than nested
            Overcome a fear of longer files
    B.7. Advanced IPython Features
        Profiles and Configuration
    B.8. Conclusion
Index

To access the Link, solve the captcha.

How to download source code?

1. Go to: https://www.oreilly.com/

2. Search the book title: Python for Data Analysis: Data Wrangling with pandas, NumPy, and Jupyter, 3rd Edition, sometime you may not get the results, please search the main title

3. Click the book title in the search results

3. Publisher resources section, click Download Example Code.

1. Disable the AdBlock plugin. Otherwise, you may not get any links.

2. Solve the CAPTCHA.

3. Click download link.

4. Lead to download server to download.