Julia for Data Analysis
- Length: 246 pages
- Edition: 1
- Language: English
- Publisher: Manning
- Publication Date: 2023-01-10
- ISBN-10: 1633439364
- ISBN-13: 9781633439368
- Sales Rank: #1857150 (See Top 100 Books)
Master core data analysis skills using Julia. Interesting hands-on projects guide you through time series data, predictive models, popularity ranking, and more.
Julia was designed for the unique needs of data scientists: it’s expressive and easy-to-use whilst also delivering super fast code execution.
Julia for Data Analysis teaches you how to perform core data science tasks with this amazing language. It’s written by Bogumil Kaminski, a top contributor to Julia, #1 Julia answerer on StackOverflow, and a lead developer of Julia’s core data package DataFrames.jl. You’ll learn how to write production-quality code in Julia, and utilize Julia’s core features for data gathering, visualization, and working with data frames. Plus, the engaging hands-on projects get you into the action quickly.
Julia for Data Analysis brief contents contents foreword preface acknowledgments about this book Who should read this book How this book is organized: A roadmap About the code liveBook discussion forum Other online resources about the author about the cover illustration 1 Introduction 1.1 What is Julia and why is it useful? 1.2 Key features of Julia from a data scientist’s perspective 1.2.1 Julia is fast because it is a compiled language 1.2.2 Julia provides full support for interactive workflows 1.2.3 Julia programs are highly reusable and easy to compose together 1.2.4 Julia has a built-in state-of-the-art package manager 1.2.5 It is easy to integrate existing code with Julia 1.3 Usage scenarios of tools presented in the book 1.4 Julia’s drawbacks 1.5 What data analysis skills will you learn? 1.6 How can Julia be used for data analysis? Summary Part 1 Essential Julia skills 2 Getting started with Julia 2.1 Representing values 2.2 Defining variables 2.3 Using the most important control-flow constructs 2.3.1 Computations depending on a Boolean condition 2.3.2 Loops 2.3.3 Compound expressions 2.3.4 A first approach to calculating the winsorized mean 2.4 Defining functions 2.4.1 Defining functions using the function keyword 2.4.2 Positional and keyword arguments of functions 2.4.3 Rules for passing arguments to functions 2.4.4 Short syntax for defining simple functions 2.4.5 Anonymous functions 2.4.6 Do blocks 2.4.7 Function-naming convention in Julia 2.4.8 A simplified definition of a function computing the winsorized mean 2.5 Understanding variable scoping rules Summary 3 Julia’s support for scaling projects 3.1 Understanding Julia’s type system 3.1.1 A single function in Julia may have multiple methods 3.1.2 Types in Julia are arranged in a hierarchy 3.1.3 Finding all supertypes of a type 3.1.4 Finding all subtypes of a type 3.1.5 Union of types 3.1.6 Deciding what type restrictions to put in method signature 3.2 Using multiple dispatch in Julia 3.2.1 Rules for defining methods of a function 3.2.2 Method ambiguity problem 3.2.3 Improved implementation of winsorized mean 3.3 Working with packages and modules 3.3.1 What is a module in Julia? 3.3.2 How can packages be used in Julia? 3.3.3 Using StatsBase.jl to compute the winsorized mean 3.4 Using macros Summary 4 Working with collections in Julia 4.1 Working with arrays 4.1.1 Getting the data into a matrix 4.1.2 Computing basic statistics of the data stored in a matrix 4.1.3 Indexing into arrays 4.1.4 Performance considerations of copying vs. making a view 4.1.5 Calculating correlations between variables 4.1.6 Fitting a linear regression 4.1.7 Plotting the Anscombe’s quartet data 4.2 Mapping key-value pairs with dictionaries 4.3 Structuring your data by using named tuples 4.3.1 Defining named tuples and accessing their contents 4.3.2 Analyzing Anscombe’s quartet data stored in a named tuple 4.3.3 Understanding composite types and mutability of values in Julia Summary 5 Advanced topics on handling collections 5.1 Vectorizing your code using broadcasting 5.1.1 Understanding syntax and meaning of broadcasting in Julia 5.1.2 Expanding length-1 dimensions in broadcasting 5.1.3 Protecting collections from being broadcasted over 5.1.4 Analyzing Anscombe’s quartet data using broadcasting 5.2 Defining methods with parametric types 5.2.1 Most collection types in Julia are parametric 5.2.2 Rules for subtyping of parametric types 5.2.3 Using subtyping rules to define the covariance function 5.3 Integrating with Python 5.3.1 Preparing data for dimensionality reduction using t-SNE 5.3.2 Calling Python from Julia 5.3.3 Visualizing the results of the t-SNE algorithm Summary 6 Working with strings 6.1 Getting and inspecting the data 6.1.1 Downloading files from the web 6.1.2 Using common techniques of string construction 6.1.3 Reading the contents of a file 6.2 Splitting strings 6.3 Using regular expressions to work with strings 6.3.1 Working with regular expressions 6.3.2 Writing a parser of a single line of movies.dat file 6.4 Extracting a subset from a string with indexing 6.4.1 UTF-8 encoding of strings in Julia 6.4.2 Character vs. byte indexing of strings 6.4.3 ASCII strings 6.4.4 The Char type 6.5 Analyzing genre frequency in movies.dat 6.5.1 Finding common movie genres 6.5.2 Understanding genre popularity evolution over the years 6.6 Introducing symbols 6.6.1 Creating symbols 6.6.2 Using symbols 6.7 Using fixed-width string types to improve performance 6.7.1 Available fixed-width strings 6.7.2 Performance of fixed-width strings 6.8 Compressing vectors of strings with PooledArrays.jl 6.8.1 Creating a file containing flower names 6.8.2 Reading in the data to a vector and compressing it 6.8.3 Understanding the internal design of PooledArray 6.9 Choosing appropriate storage for collections of strings Summary 7 Handling time-series data and missing values 7.1 Understanding the NBP Web API 7.1.1 Getting the data via a web browser 7.1.2 Getting the data by using Julia 7.1.3 Handling cases when an NBP Web API query fails 7.2 Working with missing data in Julia 7.2.1 Definition of the missing value 7.2.2 Working with missing values 7.3 Getting time-series data from the NBP Web API 7.3.1 Working with dates 7.3.2 Fetching data from the NBP Web API for a range of dates 7.4 Analyzing data fetched from the NBP Web API 7.4.1 Computing summary statistics 7.4.2 Finding which days of the week have the most missing values 7.4.3 Plotting the PLN/USD exchange rate Summary Part 2 Toolbox for data analysis 8 First steps with data frames 8.1 Fetching, unpacking, and inspecting the data 8.1.1 Downloading the file from the web 8.1.2 Working with bzip2 archives 8.1.3 Inspecting the CSV file 8.2 Loading the data to a data frame 8.2.1 Reading a CSV file into a data frame 8.2.2 Inspecting the contents of a data frame 8.2.3 Saving a data frame to a CSV file 8.3 Getting a column out of a data frame 8.3.1 Understanding the data frame’s storage model 8.3.2 Treating a data frame column as a property 8.3.3 Getting a column by using data frame indexing 8.3.4 Visualizing data stored in columns of a data frame 8.4 Reading and writing data frames using different formats 8.4.1 Apache Arrow 8.4.2 SQLite Summary 9 Getting data from a data frame 9.1 Advanced data frame indexing 9.1.1 Getting a reduced puzzles data frame 9.1.2 Overview of allowed column selectors 9.1.3 Overview of allowed row-subsetting values 9.1.4 Making views of data frame objects 9.2 Analyzing the relationship between puzzle difficulty and popularity 9.2.1 Calculating mean puzzle popularity by its rating 9.2.2 Fitting LOESS regression Summary 10 Creating data frame objects 10.1 Reviewing the most important ways to create a data frame 10.1.1 Creating a data frame from a matrix 10.1.2 Creating a data frame from vectors 10.1.3 Creating a data frame using a Tables.jl interface 10.1.4 Plotting a correlation matrix of data stored in a data frame 10.2 Creating data frames incrementally 10.2.1 Vertically concatenating data frames 10.2.2 Appending a table to a data frame 10.2.3 Adding a new row to an existing data frame 10.2.4 Storing simulation results in a data frame Summary 11 Converting and grouping data frames 11.1 Converting a data frame to other value types 11.1.1 Conversion to a matrix 11.1.2 Conversion to a named tuple of vectors 11.1.3 Other common conversions 11.2 Grouping data frame objects 11.2.1 Preparing the source data frame 11.2.2 Grouping a data frame 11.2.3 Getting group keys of a grouped data frame 11.2.4 Indexing a grouped data frame with a single value 11.2.5 Comparing performance of indexing methods 11.2.6 Indexing a grouped data frame with multiple values 11.2.7 Iterating a grouped data frame Summary 12 Mutating and transforming data frames 12.1 Getting and loading the GitHub developers data set 12.1.1 Understanding graphs 12.1.2 Fetching GitHub developer data from the web 12.1.3 Implementing a function that extracts data from a ZIP file 12.1.4 Reading the GitHub developer data into a data frame 12.2 Computing additional node features 12.2.1 Creating a SimpleGraph object 12.2.2 Computing features of nodes by using the Graphs.jl package 12.2.3 Counting a node’s web and machine learning neighbors 12.3 Using the split-apply-combine approach to predict the developer’s type 12.3.1 Computing summary statistics of web and machine learning developer features 12.3.2 Visualizing the relationship between the number of web and machine learning neighbors of a node 12.3.3 Fitting a logistic regression model predicting developer type 12.4 Reviewing data frame mutation operations 12.4.1 Performing low-level API operations 12.4.2 Using the insertcols! function to mutate a data frame Summary 13 Advanced transformations of data frames 13.1 Getting and preprocessing the police stop data set 13.1.1 Loading all required packages 13.1.2 Introducing the @chain macro 13.1.3 Getting the police stop data set 13.1.4 Comparing functions that perform operations on columns 13.1.5 Using short forms of operation specification syntax 13.2 Investigating the violation column 13.2.1 Finding the most frequent violations 13.2.2 Vectorizing functions by using the ByRow wrapper 13.2.3 Flattening data frames 13.2.4 Using convenience syntax to get the number of rows of a data frame 13.2.5 Sorting data frames 13.2.6 Using advanced functionalities of DataFramesMeta.jl 13.3 Preparing data for making predictions 13.3.1 Performing initial transformation of the data 13.3.2 Working with categorical data 13.3.3 Joining data frames 13.3.4 Reshaping data frames 13.3.5 Dropping rows of a data frame that hold missing values 13.4 Building a predictive model of arrest probability 13.4.1 Splitting the data into train and test data sets 13.4.2 Fitting a logistic regression model 13.4.3 Evaluating the quality of a model’s predictions 13.5 Reviewing functionalities provided by DataFrames.jl Summary 14 Creating web services for sharing data analysis results 14.1 Pricing financial options by using a Monte Carlo simulation 14.1.1 Calculating the payoff of an Asian option definition 14.1.2 Computing the value of an Asian option 14.1.3 Understanding GBM 14.1.4 Using a numerical approach to computing the Asian option value 14.2 Implementing the option pricing simulator 14.2.1 Starting Julia with multiple-thread support 14.2.2 Computing the option payoff for a single sample of stock prices 14.2.3 Computing the option value 14.3 Creating a web service serving the Asian option valuation 14.3.1 A general approach to building a web service 14.3.2 Creating a web service using Genie.jl 14.3.3 Running the web service 14.4 Using the Asian option pricing web service 14.4.1 Sending a single request to the web service 14.4.2 Collecting responses to multiple requests from a web service in a data frame 14.4.3 Unnesting a column of a data frame 14.4.4 Plotting the results of Asian option pricing Summary appendix A First steps with Julia A.1 Installing and setting up Julia A.2 Getting help in and about Julia A.3 Managing packages in Julia A.3.1 Project environments A.3.2 Activating project environments A.3.3 Potential issues with installing packages A.3.4 Managing packages A.3.5 Setting up integration with Python A.3.6 Setting up integration with R A.4 Reviewing standard ways to work with Julia A.4.1 Using a terminal A.4.2 Using Visual Studio Code A.4.3 Using Jupyter Notebook A.4.4 Using Pluto notebooks appendix B Solutions to exercises appendix C Julia packages for data science C.1 Plotting ecosystems in Julia C.2 Scaling computing with Julia C.3 Working with databases and data storage formats C.4 Using data science methods Summary index Symbols Numerics A B C D E F G H I J K L M N O P Q R S T U V W Z Julia for Data Analysis - back
Donate to keep this site alive
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.