Python Data Science Handbook: Essential Tools for Working with Data, 2nd Edition
- Length: 550 pages
- Edition: 2
- Language: English
- Publisher: O'Reilly Media
- Publication Date: 2023-01-31
- ISBN-10: 1098121228
- ISBN-13: 9781098121228
- Sales Rank: #149091 (See Top 100 Books)
Python is a first-class tool for many researchers, primarily because of its libraries for storing, manipulating, and gaining insight from data. Several resources exist for individual pieces of this data science stack, but only with the new edition of Python Data Science Handbook do you get them all–IPython, NumPy, pandas, Matplotlib, scikit-learn, and other related tools.
Working scientists and data crunchers familiar with reading and writing Python code will find the second edition of this comprehensive desk reference ideal for tackling day-to-day issues: manipulating, transforming, and cleaning data; visualizing different types of data; and using data to build statistical or machine learning models. Quite simply, this is the must-have reference for scientific computing in Python.
With this handbook, you’ll learn how:
- IPython and Jupyter provide computational environments for scientists using Python
- NumPy includes the ndarray for efficient storage and manipulation of dense data arrays
- Pandas contains the DataFrame for efficient storage and manipulation of labeled/columnar data
- Matplotlib includes capabilities for a flexible range of data visualizations
- Scikit-learn helps you build efficient and clean Python implementations of the most important and established machine learning algorithms
Preface What Is Data Science? Who Is This Book For? Why Python? Outline of the Book Installation Considerations Conventions Used in This Book Using Code Examples O’Reilly Online Learning How to Contact Us I. Jupyter: Beyond Normal Python 1. Getting Started in IPython and Jupyter Launching the IPython Shell Launching the Jupyter Notebook Help and Documentation in IPython Accessing Documentation with ? Accessing Source Code with ?? Exploring Modules with Tab Completion Tab completion of object contents Tab completion when importing Beyond tab completion: Wildcard matching Keyboard Shortcuts in the IPython Shell Navigation Shortcuts Text Entry Shortcuts Command History Shortcuts Miscellaneous Shortcuts 2. Enhanced Interactive Features IPython Magic Commands Running External Code: %run Timing Code Execution: %timeit Help on Magic Functions: ?, %magic, and %lsmagic Input and Output History IPython’s In and Out Objects Underscore Shortcuts and Previous Outputs Suppressing Output Related Magic Commands IPython and Shell Commands Quick Introduction to the Shell Shell Commands in IPython Passing Values to and from the Shell Shell-Related Magic Commands 3. Debugging and Profiling Errors and Debugging Controlling Exceptions: %xmode Debugging: When Reading Tracebacks Is Not Enough Profiling and Timing Code Timing Code Snippets: %timeit and %time Profiling Full Scripts: %prun Line-by-Line Profiling with %lprun Profiling Memory Use: %memit and %mprun More IPython Resources Web Resources Books II. Introduction to NumPy 4. Understanding Data Types in Python A Python Integer Is More Than Just an Integer A Python List Is More Than Just a List Fixed-Type Arrays in Python Creating Arrays from Python Lists Creating Arrays from Scratch NumPy Standard Data Types 5. The Basics of NumPy Arrays NumPy Array Attributes Array Indexing: Accessing Single Elements Array Slicing: Accessing Subarrays One-Dimensional Subarrays Multidimensional Subarrays Subarrays as No-Copy Views Creating Copies of Arrays Reshaping of Arrays Array Concatenation and Splitting Concatenation of Arrays Splitting of Arrays 6. Computation on NumPy Arrays: Universal Functions The Slowness of Loops Introducing Ufuncs Exploring NumPy’s Ufuncs Array Arithmetic Absolute Value Trigonometric Functions Exponents and Logarithms Specialized Ufuncs Advanced Ufunc Features Specifying Output Aggregations Outer Products Ufuncs: Learning More 7. Aggregations: min, max, and Everything in Between Summing the Values in an Array Minimum and Maximum Multidimensional Aggregates Other Aggregation Functions Example: What Is the Average Height of US Presidents? 8. Computation on Arrays: Broadcasting Introducing Broadcasting Rules of Broadcasting Broadcasting Example 1 Broadcasting Example 2 Broadcasting Example 3 Broadcasting in Practice Centering an Array Plotting a Two-Dimensional Function 9. Comparisons, Masks, and Boolean Logic Example: Counting Rainy Days Comparison Operators as Ufuncs Working with Boolean Arrays Counting Entries Boolean Operators Boolean Arrays as Masks Using the Keywords and/or Versus the Operators &/| 10. Fancy Indexing Exploring Fancy Indexing Combined Indexing Example: Selecting Random Points Modifying Values with Fancy Indexing Example: Binning Data 11. Sorting Arrays Fast Sorting in NumPy: np.sort and np.argsort Sorting Along Rows or Columns Partial Sorts: Partitioning Example: k-Nearest Neighbors 12. Structured Data: NumPy’s Structured Arrays Exploring Structured Array Creation More Advanced Compound Types Record Arrays: Structured Arrays with a Twist On to Pandas III. Data Manipulation with Pandas 13. Introducing Pandas Objects The Pandas Series Object Series as Generalized NumPy Array Series as Specialized Dictionary Constructing Series Objects The Pandas DataFrame Object DataFrame as Generalized NumPy Array DataFrame as Specialized Dictionary Constructing DataFrame Objects From a single Series object From a list of dicts From a dictionary of Series objects From a two-dimensional NumPy array From a NumPy structured array The Pandas Index Object Index as Immutable Array Index as Ordered Set 14. Data Indexing and Selection Data Selection in Series Series as Dictionary Series as One-Dimensional Array Indexers: loc and iloc Data Selection in DataFrames DataFrame as Dictionary DataFrame as Two-Dimensional Array Additional Indexing Conventions 15. Operating on Data in Pandas Ufuncs: Index Preservation Ufuncs: Index Alignment Index Alignment in Series Index Alignment in DataFrames Ufuncs: Operations Between DataFrames and Series 16. Handling Missing Data Trade-offs in Missing Data Conventions Missing Data in Pandas None as a Sentinel Value NaN: Missing Numerical Data NaN and None in Pandas Pandas Nullable Dtypes Operating on Null Values Detecting Null Values Dropping Null Values Filling Null Values 17. Hierarchical Indexing A Multiply Indexed Series The Bad Way The Better Way: The Pandas MultiIndex MultiIndex as Extra Dimension Methods of MultiIndex Creation Explicit MultiIndex Constructors MultiIndex Level Names MultiIndex for Columns Indexing and Slicing a MultiIndex Multiply Indexed Series Multiply Indexed DataFrames Rearranging Multi-Indexes Sorted and Unsorted Indices Stacking and Unstacking Indices Index Setting and Resetting 18. Combining Datasets: concat and append Recall: Concatenation of NumPy Arrays Simple Concatenation with pd.concat Duplicate Indices Treating repeated indices as an error Ignoring the index Adding MultiIndex keys Concatenation with Joins The append Method 19. Combining Datasets: merge and join Relational Algebra Categories of Joins One-to-One Joins Many-to-One Joins Many-to-Many Joins Specification of the Merge Key The on Keyword The left_on and right_on Keywords The left_index and right_index Keywords Specifying Set Arithmetic for Joins Overlapping Column Names: The suffixes Keyword Example: US States Data 20. Aggregation and Grouping Planets Data Simple Aggregation in Pandas groupby: Split, Apply, Combine Split, Apply, Combine The GroupBy Object Column indexing Iteration over groups Dispatch methods Aggregate, Filter, Transform, Apply Aggregation Filtering Transformation The apply method Specifying the Split Key A list, array, series, or index providing the grouping keys A dictionary or series mapping index to group Any Python function A list of valid keys Grouping Example 21. Pivot Tables Motivating Pivot Tables Pivot Tables by Hand Pivot Table Syntax Multilevel Pivot Tables Additional Pivot Table Options Example: Birthrate Data 22. Vectorized String Operations Introducing Pandas String Operations Tables of Pandas String Methods Methods Similar to Python String Methods Methods Using Regular Expressions Miscellaneous Methods Vectorized item access and slicing Indicator variables Example: Recipe Database A Simple Recipe Recommender Going Further with Recipes 23. Working with Time Series Dates and Times in Python Native Python Dates and Times: datetime and dateutil Typed Arrays of Times: NumPy’s datetime64 Dates and Times in Pandas: The Best of Both Worlds Pandas Time Series: Indexing by Time Pandas Time Series Data Structures Regular Sequences: pd.date_range Frequencies and Offsets Resampling, Shifting, and Windowing Resampling and Converting Frequencies Time Shifts Rolling Windows Example: Visualizing Seattle Bicycle Counts Visualizing the Data Digging into the Data 24. High-Performance Pandas: eval and query Motivating query and eval: Compound Expressions pandas.eval for Efficient Operations DataFrame.eval for Column-Wise Operations Assignment in DataFrame.eval Local Variables in DataFrame.eval The DataFrame.query Method Performance: When to Use These Functions Further Resources IV. Visualization with Matplotlib 25. General Matplotlib Tips Importing Matplotlib Setting Styles show or No show? How to Display Your Plots Plotting from a Script Plotting from an IPython Shell Plotting from a Jupyter Notebook Saving Figures to File Two Interfaces for the Price of One MATLAB-style Interface Object-oriented interface 26. Simple Line Plots Adjusting the Plot: Line Colors and Styles Adjusting the Plot: Axes Limits Labeling Plots Matplotlib Gotchas 27. Simple Scatter Plots Scatter Plots with plt.plot Scatter Plots with plt.scatter plot Versus scatter: A Note on Efficiency Visualizing Uncertainties Basic Errorbars Continuous Errors 28. Density and Contour Plots Visualizing a Three-Dimensional Function Histograms, Binnings, and Density Two-Dimensional Histograms and Binnings plt.hist2d: Two-Dimensional Histogram plt.hexbin: Hexagonal Binnings Kernel Density Estimation 29. Customizing Plot Legends Choosing Elements for the Legend Legend for Size of Points Multiple Legends 30. Customizing Colorbars Customizing Colorbars Choosing the Colormap Color Limits and Extensions Discrete Colorbars Example: Handwritten Digits 31. Multiple Subplots plt.axes: Subplots by Hand plt.subplot: Simple Grids of Subplots plt.subplots: The Whole Grid in One Go plt.GridSpec: More Complicated Arrangements 32. Text and Annotation Example: Effect of Holidays on US Births Transforms and Text Position Arrows and Annotation 33. Customizing Ticks Major and Minor Ticks Hiding Ticks or Labels Reducing or Increasing the Number of Ticks Fancy Tick Formats Summary of Formatters and Locators 34. Customizing Matplotlib: Configurations and Stylesheets Plot Customization by Hand Changing the Defaults: rcParams Stylesheets Default Style FiveThiryEight Style ggplot Style Bayesian Methods for Hackers Style Dark Background Style Grayscale Style Seaborn Style 35. Three-Dimensional Plotting in Matplotlib Three-Dimensional Points and Lines Three-Dimensional Contour Plots Wireframes and Surface Plots Surface Triangulations Example: Visualizing a Möbius Strip 36. Visualization with Seaborn Exploring Seaborn Plots Histograms, KDE, and Densities Pair Plots Faceted Histograms Categorical Plots Joint Distributions Bar Plots Example: Exploring Marathon Finishing Times Further Resources Other Python Visualization Libraries V. Machine Learning 37. What Is Machine Learning? Categories of Machine Learning Qualitative Examples of Machine Learning Applications Classification: Predicting Discrete Labels Regression: Predicting Continuous Labels Clustering: Inferring Labels on Unlabeled Data Dimensionality Reduction: Inferring Structure of Unlabeled Data Summary 38. Introducing Scikit-Learn Data Representation in Scikit-Learn The Features Matrix The Target Array The Estimator API Basics of the API Supervised Learning Example: Simple Linear Regression 1. Choose a class of model 2. Choose model hyperparameters 3. Arrange data into a features matrix and target vector 4. Fit the model to the data 5. Predict labels for unknown data Supervised Learning Example: Iris Classification Unsupervised Learning Example: Iris Dimensionality Unsupervised Learning Example: Iris Clustering Application: Exploring Handwritten Digits Loading and Visualizing the Digits Data Unsupervised Learning Example: Dimensionality Reduction Classification on Digits Summary 39. Hyperparameters and Model Validation Thinking About Model Validation Model Validation the Wrong Way Model Validation the Right Way: Holdout Sets Model Validation via Cross-Validation Selecting the Best Model The Bias-Variance Trade-off Validation Curves in Scikit-Learn Learning Curves Validation in Practice: Grid Search Summary 40. Feature Engineering Categorical Features Text Features Image Features Derived Features Imputation of Missing Data Feature Pipelines 41. In Depth: Naive Bayes Classification Bayesian Classification Gaussian Naive Bayes Multinomial Naive Bayes Example: Classifying Text When to Use Naive Bayes 42. In Depth: Linear Regression Simple Linear Regression Basis Function Regression Polynomial Basis Functions Gaussian Basis Functions Regularization Ridge Regression (L2 Regularization) Lasso Regression (L1 Regularization) Example: Predicting Bicycle Traffic 43. In Depth: Support Vector Machines Motivating Support Vector Machines Support Vector Machines: Maximizing the Margin Fitting a Support Vector Machine Beyond Linear Boundaries: Kernel SVM Tuning the SVM: Softening Margins Example: Face Recognition Summary 44. In Depth: Decision Trees and Random Forests Motivating Random Forests: Decision Trees Creating a Decision Tree Decision Trees and Overfitting Ensembles of Estimators: Random Forests Random Forest Regression Example: Random Forest for Classifying Digits Summary 45. In Depth: Principal Component Analysis Introducing Principal Component Analysis PCA as Dimensionality Reduction PCA for Visualization: Handwritten Digits What Do the Components Mean? Choosing the Number of Components PCA as Noise Filtering Example: Eigenfaces Summary 46. In Depth: Manifold Learning Manifold Learning: “HELLO” Multidimensional Scaling MDS as Manifold Learning Nonlinear Embeddings: Where MDS Fails Nonlinear Manifolds: Locally Linear Embedding Some Thoughts on Manifold Methods Example: Isomap on Faces Example: Visualizing Structure in Digits 47. In Depth: k-Means Clustering Introducing k-Means Expectation–Maximization Examples Example 1: k-Means on Digits Example 2: k-Means for Color Compression 48. In Depth: Gaussian Mixture Models Motivating Gaussian Mixtures: Weaknesses of k-Means Generalizing E–M: Gaussian Mixture Models Choosing the Covariance Type Gaussian Mixture Models as Density Estimation Example: GMMs for Generating New Data 49. In Depth: Kernel Density Estimation Motivating Kernel Density Estimation: Histograms Kernel Density Estimation in Practice Selecting the Bandwidth via Cross-Validation Example: Not-so-Naive Bayes Anatomy of a Custom Estimator Using Our Custom Estimator 50. Application: A Face Detection Pipeline HOG Features HOG in Action: A Simple Face Detector 1. Obtain a Set of Positive Training Samples 2. Obtain a Set of Negative Training Samples 3. Combine Sets and Extract HOG Features 4. Train a Support Vector Machine 5. Find Faces in a New Image Caveats and Improvements Further Machine Learning Resources Index
Donate to keep this site alive
How to download source code?
1. Go to: https://www.oreilly.com/
2. Search the book title: Python Data Science Handbook: Essential Tools for Working with Data, 2nd Edition
, sometime you may not get the results, please search the main title
3. Click the book title in the search results
3. Publisher resources
section, click Download Example Code
.
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.