Google BigQuery: The Definitive Guide: Data Warehousing, Analytics, and Machine Learning at Scale
- Length: 519 pages
- Edition: 1
- Language: English
- Publisher: O'Reilly Media
- Publication Date: 2019-12-03
- ISBN-10: 1492044466
- ISBN-13: 9781492044468
- Sales Rank: #62042 (See Top 100 Books)
Work with petabyte-scale datasets while building a collaborative, agile workplace in the process. This practical book is the canonical reference to Google BigQuery, the query engine that lets you conduct interactive analysis of large datasets. BigQuery enables enterprises to efficiently store, query, ingest, and learn from their data in a convenient framework. With this book, you’ll examine how to analyze data at scale to derive insights from large datasets efficiently.
Valliappa Lakshmanan, tech lead for Google Cloud Platform, and Jordan Tigani, engineering director for the BigQuery team, provide best practices for modern data warehousing within an autoscaled, serverless public cloud. Whether you want to explore parts of BigQuery you’re not familiar with or prefer to focus on specific tasks, this reference is indispensable.
Copyright Table of Contents Preface Who Is This Book For? Conventions Used in This Book Using Code Examples O’Reilly Online Learning How to Contact Us Acknowledgments Chapter 1. What Is Google BigQuery? Data Processing Architectures Relational Database Management System MapReduce Framework BigQuery: A Serverless, Distributed SQL Engine Working with BigQuery Deriving Insights Across Datasets ETL, EL, and ELT Powerful Analytics Simplicity of Management How BigQuery Came About What Makes BigQuery Possible? Separation of Compute and Storage Storage and Networking Infrastructure Managed Storage Integration with Google Cloud Platform Security and Compliance Summary Chapter 2. Query Essentials Simple Queries Retrieving Rows by Using SELECT Aliasing Column Names with AS Filtering with WHERE SELECT *, EXCEPT, REPLACE Subqueries with WITH Sorting with ORDER BY Aggregates Computing Aggregates by Using GROUP BY Counting Records by Using COUNT Filtering Grouped Items by Using HAVING Finding Unique Values by Using DISTINCT A Brief Primer on Arrays and Structs Creating Arrays by Using ARRAY_AGG Array of STRUCT TUPLE Working with Arrays UNNEST an Array Joining Tables The JOIN Explained INNER JOIN CROSS JOIN OUTER JOIN Saving and Sharing Query History and Caching Saved Queries Views Versus Shared Queries Summary Chapter 3. Data Types, Functions, and Operators Numeric Types and Functions Mathematical Functions Standard-Compliant Floating-Point Division SAFE Functions Comparisons Precise Decimal Calculations with NUMERIC Working with BOOL Logical Operations Conditional Expressions Cleaner NULL-Handling with COALESCE Casting and Coercion Using COUNTIF to Avoid Casting Booleans String Functions Internationalization Printing and Parsing String Manipulation Functions Transformation Functions Regular Expressions Summary of String Functions Working with TIMESTAMP Parsing and Formatting Timestamps Extracting Calendar Parts Arithmetic with Timestamps Date, Time, and DateTime Working with GIS Functions Summary Chapter 4. Loading Data into BigQuery The Basics Loading from a Local Source Specifying a Schema Copying into a New Table Data Management (DDL and DML) Loading Data Efficiently Federated Queries and External Data Sources How to Use Federated Queries When to Use Federated Queries and External Data Sources Interactive Exploration and Querying of Data in Google Sheets SQL Queries on Data in Cloud Bigtable Transfers and Exports Data Transfer Service Exporting Stackdriver Logs Using Cloud Dataflow to Read/Write from BigQuery Moving On-Premises Data Data Migration Methods Summary Chapter 5. Developing with BigQuery Developing Programmatically Accessing BigQuery via the REST API Google Cloud Client Library Accessing BigQuery from Data Science Tools Notebooks on Google Cloud Platform Working with BigQuery, pandas, and Jupyter Working with BigQuery from R Cloud Dataflow JDBC/ODBC drivers Incorporating BigQuery Data into Google Slides (in G Suite) Bash Scripting with BigQuery Creating Datasets and Tables Executing Queries BigQuery Objects Summary Chapter 6. Architecture of BigQuery High-Level Architecture Life of a Query Request BigQuery Upgrades Query Engine (Dremel) Dremel Architecture Query Execution Storage Storage Data Metadata Summary Chapter 7. Optimizing Performance and Cost Principles of Performance Key Drivers of Performance Controlling Cost Measuring and Troubleshooting Measuring Query Speed Using REST API Measuring Query Speed Using BigQuery Workload Tester Troubleshooting Workloads Using Stackdriver Reading Query Plan Information Increasing Query Speed Minimizing I/O Caching the Results of Previous Queries Performing Efficient Joins Avoiding Overwhelming a Worker Using Approximate Aggregation Functions Optimizing How Data Is Stored and Accessed Minimizing Network Overhead Choosing an Efficient Storage Format Partitioning Tables to Reduce Scan Size Clustering Tables Based on High-Cardinality Keys Time-Insensitive Use Cases Batch Queries File Loads Summary Checklist Chapter 8. Advanced Queries Reusable Queries Parameterized Queries SQL User-Defined Functions Reusing Parts of Queries Advanced SQL Working with Arrays Window Functions Table Metadata Data Definition Language and Data Manipulation Language Beyond SQL JavaScript UDFs Scripting Advanced Functions BigQuery Geographic Information Systems Useful Statistical Functions Hash Algorithms Summary Chapter 9. Machine Learning in BigQuery What Is Machine Learning? Formulating a Machine Learning Problem Types of Machine Learning Problems Building a Regression Model Choose the Label Exploring the Dataset to Find Features Creating a Training Dataset Training and Evaluating the Model Predicting with the Model Examining Model Weights More-Complex Regression Models Building a Classification Model Training Evaluation Prediction Choosing the Threshold Customizing BigQuery ML Controlling Data Split Balancing Classes Regularization k-Means Clustering What’s Being Clustered? Clustering Bicycle Stations Carrying Out Clustering Understanding the Clusters Data-Driven Decisions Recommender Systems The MovieLens Dataset Matrix Factorization Making Recommendations Incorporating User and Movie Information Custom Machine Learning Models on GCP Hyperparameter Tuning AutoML Support for TensorFlow Summary Chapter 10. Administering and Securing BigQuery Infrastructure Security Identity and Access Management Identity Role Resource Administering BigQuery Job Management Authorizing Users Restoring Deleted Records and Tables Continuous Integration/Continuous Deployment Cost/Billing Exports Dashboards, Monitoring, and Audit Logging Availability, Disaster Recovery, and Encryption Zones, Regions, and Multiregions BigQuery and Failure Handling Durability, Backups, and Disaster Recovery Privacy and Encryption Regulatory Compliance Data Locality Restricting Access to Subsets of Data Removing All Transactions Related to a Single Individual Data Loss Prevention CMEK Data Exfiltration Protection Summary Index About the Authors Colophon
Donate to keep this site alive
How to download source code?
1. Go to: https://www.oreilly.com/
2. Search the book title: Google BigQuery: The Definitive Guide: Data Warehousing, Analytics, and Machine Learning at Scale
, sometime you may not get the results, please search the main title
3. Click the book title in the search results
3. Publisher resources
section, click Download Example Code
.
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.