Data Wrangling Using Pandas, SQL, and Java
- Length: 300 pages
- Edition: 1
- Language: English
- Publisher: Mercury Learning and Information
- Publication Date: 2022-10-30
- ISBN-10: 1683929047
- ISBN-13: 9781683929048
- Sales Rank: #0 (See Top 100 Books)
his book is intended primarily for those who plan to become data scientists as wellas anyone who needs to perform data cleaning tasks. It contains a variety of features of NumPy and Pandas and how to create databases and tables in MySQL. Chapter 7 covers many data wrangling tasks using Python scripts and awk-based shell scripts. Companion files with code are available for downloading from the publisher.
Features:
- Provides the reader with basic Python 3, Java, and Pandas programming concepts, and an introduction to awk
- Includes a chapter on RDBMs and SQL
- Companion files with code
Cover Title Page Copyright Dedication Contents Preface Chapter 1: Introduction to Python Tools for Python easy_install and pip virtualenv IPython Python Installation Setting the PATH Environment Variable (Windows Only) Launching Python on Your Machine The Python Interactive Interpreter Python Identifiers Lines, Indentation, and Multi-Lines Quotation and Comments Saving Your Code in a Module Some Standard Modules The help() and dir() Functions Compile Time and Runtime Code Checking Simple Data Types Working with Numbers Working with Other Bases The chr() Function The round() Function in Python Formatting Numbers in Python Working with Fractions Unicode and UTF-8 Working with Unicode Working with Strings Comparing Strings Formatting Strings in Python Uninitialized Variables and the Value None Slicing and Splicing Strings Testing for Digits and Alphabetic Characters Search and Replace a String in Other Strings Remove Leading and Trailing Characters Printing Text Without NewLine Characters Text Alignment Working with Dates Converting Strings to Dates Exception Handling Handling User Input Command-Line Arguments Summary Chapter 2: Working with Data Dealing with Data: What Can Go Wrong? What is Data Drift? What are Datasets? Data Preprocessing Data Types Preparing Datasets Discrete Data vs. Continuous Data “Binning” Continuous Data Scaling Numeric Data via Normalization Scaling Numeric Data via Standardization Scaling Numeric Data via Robust Standardization What to Look for in Categorical Data Mapping Categorical Data to Numeric Values Working with Dates Working with Currency Working with Outliers and Anomalies Outlier Detection/Removal Finding Outliers with NumPy Finding Outliers with Pandas Calculating Z-Scores to Find Outliers Finding Outliers with SkLearn (Optional) Working with Missing Data Imputing Values: When is Zero a Valid Value? Dealing with Imbalanced Datasets What is SMOTE? SMOTE Extensions The Bias-Variance Tradeoff Types of Bias in Data Analyzing Classifiers (Optional) What is LIME? What is ANOVA? Summary Chapter 3: Introduction to Pandas What is Pandas? Pandas Data Frames Data Frames and Data Cleaning Tasks A Pandas Data Frame Example Describing a Pandas Data Frame Pandas Boolean Data Frames Transposing a Pandas Data Frame Pandas Data Frames and Random Numbers Converting Categorical Data to Numeric Data Merging and Splitting Columns in Pandas Combining Pandas Data Frames Data Manipulation with Pandas Data Frames Pandas Data Frames and CSV Files Useful Options for the Pandas read_csv() Function Reading Selected Rows from CSV Files Pandas Data Frames and Excel Spreadsheets Useful Options for Reading Excel Spreadsheets Select, Add, and Delete Columns in Data Frames Handling Outliers in Pandas Pandas Data Frames and Simple Statistics Finding Duplicate Rows in Pandas Finding Missing Values in Pandas Missing Values in an Iris-Based Dataset Sorting Data Frames in Pandas Working with groupby() in Pandas Aggregate Operations with the titanic.csv Dataset Working with apply() and mapapply() in Pandas Useful One-line Commands in Pandas Working with JSON-based Data Python Dictionary and JSON Python, Pandas, and JSON Summary Chapter 4: RDBMS and SQL What is an RDBMS? What Relationships Do Tables Have in an RDBMS? Features of an RDBMS What is ACID? When Do We Need an RDBMS? The Importance of Normalization A Four-Table RDBMS Detailed Table Descriptions The customers Table The purchase_orders Table The line_items Table The item_desc Table What is SQL? DCL, DDL, DQL, DML, and TCL SQL Privileges Properties of SQL Statements The CREATE Keyword What is MySQL? What about MariaDB? Installing MySQL Data Types in MySQL The CHAR and VARCHAR Data Types String-based Data Types FLOAT and DOUBLE Data Types BLOB and TEXT Data Types MySQL Database Operations Creating a Database Display a List of Databases Display a List of Database Users Dropping a Database Exporting a Database Renaming a Database The INFORMATION_SCHEMA Table The PROCESSLIST Table SQL Formatting Tools Summary Chapter 5: Java, JSON, and XML Working with Java and MySQL Performing the Set-up Steps Creating a MySQL Database in Java Creating a MySQL Table in Java Inserting Data into a MySQL Table in Java Deleting Data and Dropping MySQL Tables in Java Selecting Data from a MySQL Table in Java Updating Data in a MySQL Table in Java Working with JSON, MySQL, and Java Select JSON-based Data from a MySQL Table in Java Working with XML, MySQL, and Java What is XML? What is an XML Schema? When are XML Schemas Useful? Create a MySQL Table for XML Data in Java Read an XML Document in Java Read an XML Document as a String in Java Insert XML-based Data into a MySQL Table in Java Select XML-based Data from a MySQL Table in Java Parse XML-based String Data from a MySQL Table in Java Working with XML Schemas Summary Chapter 6: Data Cleaning Tasks What is Data Cleaning? Data Cleaning for Personal Titles Data Cleaning in SQL Replace NULL with 0 Replace NULL Values with Average Value Replace Multiple Values with a Single Value Handle Mismatched Attribute Values Convert Strings to Date Values Data Cleaning from the Command Line (Optional) Working with the sed Utility Working with Variable Column Counts Truncating Rows in CSV Files Generating Rows with Fixed Columns with the awk Utility Converting Phone Numbers Converting Numeric Date Formats Converting Alphabetic Date Formats Working with Date and Time Date Formats Working with Codes, Countries, and Cities Data Cleaning on a Kaggle Dataset Summary Chapter 7: Data Wrangling What is Data Wrangling? Data Transformation: What Does This Mean? CSV Files with Multi-Row Records Pandas Solution (1) Pandas Solution (2) CSV Solution CSV Files, Multi-row Records, and the awk Command Quoted Fields Split on Two Lines (Optional) Overview of the Events Project Why This Project? Project Tasks Generate Country Codes Prepare a List of Cities in Countries Generating City Codes from Country Codes: awk Generating City Codes from Country Codes: Python Generating SQL Statements for the city_codes Table Generating a CSV File for Band Members (Java) Generating a CSV File for Band Members (Python) Generating a Calendar of Events (COE) Project Automation Script Project Follow-up Comments Summary Appendix A: Working with awk The awk Command Built-in Variables That Control awk How Does the awk Command Work? Aligning Text with the printf() Statement Conditional Logic and Control Statements The while Statement A for Loop in awk A for Loop with a break Statement The next and continue Statements Deleting Alternate Lines in Datasets Merging Lines in Datasets Printing File Contents as a Single Line Joining Groups of Lines in a Text File Joining Alternate Lines in a Text File Matching with Meta Characters and Character Sets Printing Lines Using Conditional Logic Splitting Filenames with awk Working with Postfix Arithmetic Operators Numeric Functions in awk One-line awk Commands Useful Short awk Scripts Printing the Words in a Text String in awk Count Occurrences of a String in Specific Rows Printing a String in a Fixed Number of Columns Printing a Dataset in a Fixed Number of Columns Aligning Columns in Datasets Aligning Columns and Multiple Rows in Datasets Removing a Column from a Text File Subsets of Column-aligned Rows in Datasets Counting Word Frequency in Datasets Displaying Only “Pure” Words in a Dataset Working with Multi-line Records in awk A Simple Use Case Another Use Case Summary Index
Donate to keep this site alive
To access the Link, solve the captcha.
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.