Beginning Data Science in R 4: Data Analysis, Visualization, and Modelling for the Data Scientist, 2nd Edition
- Length: 539 pages
- Edition: 2
- Language: English
- Publisher: Apress
- Publication Date: 2022-07-08
- ISBN-10: 1484281543
- ISBN-13: 9781484281543
- Sales Rank: #8690131 (See Top 100 Books)
Discover best practices for data analysis and software development in R and start on the path to becoming a fully-fledged data scientist. Updated for the R 4.0 release, this book teaches you techniques for both data manipulation and visualization and shows you the best way for developing new software packages for R.
Beginning Data Science in R 4, Second Edition details how data science is a combination of statistics, computational science, and machine learning. You’ll see how to efficiently structure and mine data to extract useful patterns and build mathematical models. This requires computational methods and programming, and R is an ideal programming language for this.
Modern data analysis requires computational skills and usually a minimum of programming. After reading and using this book, you’ll have what you need to get started with R programming with data science applications. Source code will be available to support your next projects as well.
What You Will Learn
- Perform data science and analytics using statistics and the R programming language
- Visualize and explore data, including working with large data sets found in big data
- Build an R package
- Test and check your code
- Practice version control
- Profile and optimize your code
Who This Book Is For
Those with some data science or analytics background, but not necessarily experience with the R programming language.
Table of Contents About the Author About the Technical Reviewer Acknowledgments Introduction What Is Data Science? Prerequisites for Reading This Book Plan for the Book Data Analysis and Visualization Software Development Getting R and RStudio Projects Chapter 1: Introduction to R Programming Basic Interaction with R Using R As a Calculator Simple Expressions Assignments Indexing Vectors Vectorized Expressions Comments Functions Getting Documentation for Functions Writing Your Own Functions Summarizing and Vector Functions A Quick Look at Control Flow Factors Data Frames Using R Packages Dealing with Missing Values Data Pipelines Writing Pipelines of Function Calls Writing Functions That Work with Pipelines The Magical “.” Argument Other Pipeline Operations Coding and Naming Conventions Exercises Mean of Positive Values Root Mean Square Error Chapter 2: Reproducible Analysis Literate Programming and Integration of Workflow and Documentation Creating an R Markdown/knitr Document in RStudio The YAML Language The Markdown Language Formatting Text Cross-Referencing Bibliographies Controlling the Output (Templates/Stylesheets) Running R Code in Markdown Documents Using chunks when analyzing data (without compiling documents) Caching Results Displaying Data Exercises Create an R Markdown Document Different Output Caching Chapter 3: Data Manipulation Data Already in R Quickly Reviewing Data Reading Data Examples of Reading and Formatting Data Sets Breast Cancer Data set Boston Housing Data Set The readr Package Manipulating Data with dplyr Some Useful dplyr Functions Breast Cancer Data Manipulation Tidying Data with tidyr Exercises Importing Data Using dplyr Using tidyr Chapter 4: Visualizing Data Basic Graphics The Grammar of Graphics and the ggplot2 Package Using qplot() Using Geometries Facets Scaling Themes and Other Graphics Transformations Figures with Multiple Plots Exercises Chapter 5: Working with Large Data Sets Subsample Your Data Before You Analyze the Full Data Set Running Out of Memory During an Analysis Too Large to Plot Too Slow to Analyze Too Large to Load Exercises Subsampling Hex and 2D Density Plots Chapter 6: Supervised Learning Machine Learning Supervised Learning Regression vs. Classification Inference vs. Prediction Specifying Models Linear Regression Logistic Regression (Classification, Really) Model Matrices and Formula Validating Models Evaluating Regression Models Evaluating Classification Models Confusion Matrix Accuracy Sensitivity and Specificity Other Measures More Than Two Classes Sampling Approaches Random Permutations of Your Data Cross-Validation Selecting Random Training and Testing Data Examples of Supervised Learning Packages Decision Trees Random Forests Neural Networks Support Vector Machines Naive Bayes Exercises Fitting Polynomials Evaluating Different Classification Measures Breast Cancer Classification Leave-One-Out Cross-Validation (Slightly More Difficult) Decision Trees Random Forests Neural Networks Support Vector Machines Compare Classification Algorithms Chapter 7: Unsupervised Learning Dimensionality Reduction Principal Component Analysis Multidimensional Scaling Clustering k-means Clustering Hierarchical Clustering Association Rules Exercises Dealing with Missing Data in the HouseVotes84 Data k-means Chapter 8: Project 1: Hitting the Bottle Importing Data Exploring the Data Distribution of Quality Scores Is This Wine Red or White? Fitting Models Exercises Exploring Other Formulas Exploring Different Models Analyzing Your Own Data Set Chapter 9: Deeper into R Programming Expressions Arithmetic Expressions Boolean Expressions Basic Data Types Numeric Integer Complex Logical Character Data Structures Vectors Matrix Lists Indexing Named Values Factors Formulas Control Structures Selection Statements Loops Functions Named Arguments Default Parameters Return Values Lazy Evaluation Scoping Function Names Are Different from Variable Names Recursive Functions Exercises Fibonacci Numbers Outer Product Linear Time Merge Binary Search More Sorting Selecting the k Smallest Element Chapter 10: Working with Vectors and Lists Working with Vectors and Vectorizing Functions ifelse Vectorizing Functions The apply Family apply Nothing Good, It Would Seem lapply sapply and vapply Advanced Functions Special Names Infix Operators Replacement Functions How Mutable Is Data Anyway? Exercises between rmq Chapter 11: Functional Programming Anonymous Functions Higher-Order Functions Functions Taking Functions As Arguments Functions Returning Functions (and Closures) Filter, Map, and Reduce Functional Programming with purrr Functions As Both Input and Output Ellipsis Parameters… Exercises apply_if power Row and Column Sums Factorial Again… Function Composition Implement This Operator Chapter 12: Object-Oriented Programming Immutable Objects and Polymorphic Functions Data Structures Example: Bayesian Linear Model Fitting Classes Polymorphic Functions Defining Your Own Polymorphic Functions Class Hierarchies Specialization As Interface Specialization in Implementations Exercises Shapes Polynomials Chapter 13: Building an R Package Creating an R Package Package Names The Structure of an R Package .Rbuildignore Description Title Version Description Author and Maintainer License Type, Date, LazyData URL and BugReports Dependencies Using an Imported Package Using a Suggested Package NAMESPACE R/ and man/ Checking the Package Roxygen Documenting Functions Import and Export Package Scope vs. Global Scope Internal Functions File Load Order Adding Data to Your Package NULL Building an R Package Exercises Chapter 14: Testing and Package Checking Unit Testing Automating Testing Using testthat Writing Good Tests Using Random Numbers in Tests Testing Random Results Checking a Package for Consistency Exercise Chapter 15: Version Control Version Control and Repositories Using Git in RStudio Installing Git Making Changes to Files, Staging Files, and Committing Changes Adding Git to an Existing Project Bare Repositories and Cloning Repositories Pushing Local Changes and Fetching and Pulling Remote Changes Handling Conflicts Working with Branches Typical Workflows Involve Lots of Branches Pushing Branches to the Global Repository GitHub Moving an Existing Repository to GitHub Installing Packages from GitHub Collaborating on GitHub Pull Requests Forking Repositories Instead of Cloning Exercises Chapter 16: Profiling and Optimizing Profiling A Graph-Flow Algorithm Speeding Up Your Code Parallel Execution Switching to C++ Exercises Chapter 17: Project 2: Bayesian Linear Regression Bayesian Linear Regression Exercises: Priors and Posteriors Sample from a Multivariate Normal Distribution Computing the Posterior Distribution Predicting Target Variables for New Predictor Values Formulas and Their Model Matrix Working with Model Matrices in R Exercises Building Model Matrices Fitting General Models Model Matrices Without Response Variables Exercises Model Matrices for New Data Predicting New Targets Interface to a blm Class Constructor Updating Distributions: An Example Interface Designing Your blm Class Model Methods coefficients confint deviance fitted plot predict print residuals summary Building an R Package for blm Deciding on the Package Interface Organization of Source Files Document Your Package Interface Well Adding README and NEWS Files to Your Package README NEWS Testing GitHub Conclusions Data Science Machine Learning Data Analysis R Programming The End Index
Donate to keep this site alive
How to download source code?
1. Go to: https://github.com/Apress
2. In the Find a repository… box, search the book title: Beginning Data Science in R 4: Data Analysis, Visualization, and Modelling for the Data Scientist, 2nd Edition
, sometime you may not get the results, please search the main title.
3. Click the book title in the search results.
3. Click Code to download.
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.