Dealing With Data Pocket Primer
- Length: 250 pages
- Edition: 1
- Language: English
- Publisher: Mercury Learning and Information
- Publication Date: 2022-05-31
- ISBN-10: 1683928202
- ISBN-13: 9781683928201
- Sales Rank: #0 (See Top 100 Books)
As part of the best-selling Pocket Primer series, this book is designed to introduce the reader to the basic concepts of managing data using a variety of computer languages and applications. It is intended to be a fast-paced introduction to some basic features of data management and covers statistical concepts, data-related techniques, features of Pandas, RDBMS, SQL, NLP topics, Matplotlib, and data visualization. Companion files with source code and color figures are available. FEATURES: * Covers Pandas, RDBMS, NLP, data cleaning, SQL, and data visualization * Introduces probability and statistical concepts * Features numerous code samples throughout * Includes companion files with source code and figures
DDPP.FM Introduction to Probability and Statistics What Is a Probability? Calculating the Expected Value Random Variables Discrete versus Continuous Random Variables Well-Known Probability Distributions Fundamental Concepts in Statistics The Mean The Median The Mode The Variance and Standard Deviation Population, Sample, and Population Variance ChebyshevÕs Inequality What Is a P-Value? The Moments of a Function (Optional) What Is Skewness? What Is Kurtosis? Data and Statistics The Central Limit Theorem Correlation versus Causation Statistical Inferences Statistical Terms RSS, TSS, R^2, and F1 Score What Is an F1 Score? Gini Impurity, Entropy, and Perplexity What Is Gini Impurity? What Is Entropy? Calculating Gini Impurity and Entropy Values Multidimensional Gini Index What Is Perplexity? Cross-Entropy and KL Divergence What Is Cross-Entropy? What Is KL Divergence? What Is Their Purpose? Covariance and Correlation Matrices The Covariance Matrix Covariance Matrix: An Example The Correlation Matrix Eigenvalues and Eigenvectors Calculating Eigenvectors: A Simple Example Gauss Jordan Elimination (Optional) PCA (Principal Component Analysis) The New Matrix of Eigenvectors Well-Known Distance Metrics Pearson Correlation Coefficient Jaccard Index (or Similarity) Local Sensitivity Hashing (Optional) Types of Distance Metrics What Is Bayesian Inference? BayesÕ Theorem Some Bayesian Terminology What Is MAP? Why Use BayesÕ Theorem? Summary Working with Data Dealing With Data: What Can Go Wrong? What Is Data Drift? What Are Datasets? Data Preprocessing Data Types Preparing Datasets Discrete Data versus Continuous Data ÒBinningÓ Continuous Data Scaling Numeric Data via Normalization Scaling Numeric Data via Standardization Scaling Numeric Data via Robust Standardization What to Look for in Categorical Data Mapping Categorical Data to Numeric Values Working With Dates Working With Currency Working With Outliers and Anomalies Outlier Detection/Removal Finding Outliers With Numpy Finding Outliers With Pandas Calculating Z-Scores to Find Outliers Finding Outliers with SkLearn (Optional) Working With Missing Data Imputing Values: When Is Zero a Valid Value? Dealing With Imbalanced Datasets What Is SMOTE? SMOTE Extensions The Bias-Variance Tradeoff Types of Bias in Data Analyzing Classifiers (Optional) What Is LIME? What Is ANOVA? Summary Introduction to Pandas What Is Pandas? Pandas DataFrames Pandas Operations: In-place or Not? Data Frames and Data Cleaning Tasks A Pandas DataFrame Example Describing a Pandas Data Frame Pandas Boolean Data Frames Transposing a Pandas Data Frame Pandas Data Frames and Random Numbers Converting Categorical Data to Numeric Data Merging and Splitting Columns in Pandas Combining Pandas DataFrames Data Manipulation With Pandas DataFrames Pandas DataFrames and CSV Files Useful Options for the Pandas read_csv() Function Reading Selected Rows From CSV Files Pandas DataFrames and Excel Spreadsheets Useful Options for Reading Excel Spreadsheets Select, Add, and Delete Columns in Data frames Handling Outliers in Pandas Pandas DataFrames and Simple Statistics Finding Duplicate Rows in Pandas Finding Missing Values in Pandas Missing Values in Iris-Based Dataset Sorting Data Frames in Pandas Working With groupby() in Pandas Aggregate Operations With the titanic.csv Dataset Working With apply() and mapapply() in Pandas Useful One-Line Commands in Pandas Working With JSON-Based Data Python Dictionary and JSON Python, Pandas, and JSON Summary Introduction to RDBMS and SQL What Is an RDBMS? What Relationships Do Tables Have in an RDBMS? Features of an RDBMS What Is ACID? When Do We Need an RDBMS? The Importance of Normalization A Four-Table RDBMS Detailed Table Descriptions The customers Table The purchase_orders Table The line_items Table The item_desc Table What Is SQL? DCL, DDL, DQL, DML, and TCL SQL Privileges Properties of SQL Statements The CREATE Keyword What Is MySQL? What About MariaDB? Installing MySQL Data Types in MySQL The CHAR and VARCHAR Data Types String-Based Data Types FLOAT and DOUBLE Data Types BLOB and TEXT Data Types MySQL Database Operations Creating a Database Display a List of Databases Display a List of Database Users Dropping a Database Exporting a Database Renaming a Database The INFORMATION_SCHEMA Table The PROCESSLIST Table SQL Formatting Tools Summary Working with SQL and MySQL Create Database Tables Manually Creating Tables for mytools.com Creating Tables via an SQL Script for mytools.com Creating Tables With Japanese Text Creating Tables From the Command Line Drop Database Tables Dropping Tables via a SQL Script for mytools.com Altering Database Tables With the ALTER Keyword Add a Column to a Database Table Drop a Column From a Database Table Change the Data Type of a Column What Are Referential Constraints? Combining Data for a Table Update (Optional) Merging Data for a Table Update Appending Data to a Table From a CSV File Appending Table Data from CSV Files via SQL Inserting Data Into Tables Populating Tables From Text Files Working With Simple SELECT Statements Duplicate versus Distinct Rows Unique Rows The EXISTS Keyword The LIMIT Keyword DELETE, TRUNCATE, and DROP in SQL More Options for the DELETE Statement in SQL Creating Tables From Existing Tables in SQL Working With Temporary Tables in SQL Creating Copies of Existing Tables in SQL What Is an SQL Index? Types of Indexes Creating an Index Disabling and Enabling an Index View and Drop Indexes Overhead of Indexes Considerations for Defining Indexes Selecting Columns for an Index Finding Columns Included in Indexes Export Data From MySQL Export the Result Set of a SQL Query Export a Database or Its Contents Using LOAD DATA in MySQL Data Cleaning in SQL Replace NULL With Replace NULL Values With Average Value Replace Multiple Values With a Single Value Handle Mismatched Attribute Values Convert Strings to Date Values Data Cleaning From the Command Line (Optional) Working With the sed Utility Working With the awk Utility Summary NLP and Data Cleaning NLP Tasks in ML NLP Steps for Training a Model Text Normalization and Tokenization Word Tokenization in Japanese Text Tokenization With Unix Commands Handling Stop Words What Is Stemming? Singular versus Plural Word Endings Common Stemmers Stemmers and Word Prefixes Over Stemming and Under Stemming What Is Lemmatization? Stemming/Lemmatization Caveats Limitations of Stemming and Lemmatization Working W ith Text: POS POS Tagging POS Tagging Techniques Cleaning Data With Regular Expressions Cleaning Data With the cleantext Library Handling Contracted Words What Is BeautifulSoup? Web Scraping With Pure Regular Expressions What Is Scrapy? Summary Data Visualization What Is Data Visualization? Types of Data Visualization What Is Matplotlib? Lines in a Grid in Matplotlib A Colored Grid in Matplotlib Randomized Data Points in Matplotlib A Histogram in Matplotlib A Set of Line Segments in Matplotlib Plotting Multiple Lines in Matplotlib Trigonometric Functions in Matplotlib Display IQ Scores in Matplotlib Plot a Best-Fitting Line in Matplotlib The Iris Dataset in Sklearn Sklearn, Pandas, and the Iris Dataset Working With Seaborn Features of Seaborn Seaborn Built-In Datasets The Iris Dataset in Seaborn The Titanic Dataset in Seaborn Extracting Data From the Titanic Dataset in Seaborn (1) Extracting Data from Titanic Dataset in Seaborn (2) Visualizing a Pandas Dataset in Seaborn Data Visualization in Pandas What Is Bokeh? Summary DDPP.Ch1 DDPP.Ch2 DDPP.Ch3 DDPP.Ch4 DDPP.Ch5 DDPP.Ch6 DDPP.Ch7 DDPP.Index
Donate to keep this site alive
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.