Snowflake: The Definitive Guide: Architecting, Designing, and Deploying on the Snowflake Data Cloud
- Length: 430 pages
- Edition: 1
- Language: English
- Publisher: O'Reilly Media
- Publication Date: 2022-09-20
- ISBN-10: 1098103823
- ISBN-13: 9781098103828
- Sales Rank: #469521 (See Top 100 Books)
Snowflake’s ability to eliminate data silos and run workloads from a single platform creates opportunities to democratize data analytics, allowing users at all levels within an organization to make data-driven decisions. Whether you’re an IT professional working in data warehousing or data science, a business analyst or technical manager, or an aspiring data professional wanting to get more hands-on experience with the Snowflake platform, this book is for you.
You’ll learn how Snowflake users can build modern integrated data applications and develop new revenue streams based on data. Using hands-on SQL examples, you’ll also discover how the Snowflake Data Cloud helps you accelerate data science by avoiding replatforming or migrating data unnecessarily.
You’ll be able to:
- Efficiently capture, store, and process large amounts of data at an amazing speed
- Ingest and transform real-time data feeds in both structured and semistructured formats and deliver meaningful data insights within minutes
- Use Snowflake Time Travel and zero-copy cloning to produce a sensible data recovery strategy that balances system resilience with ongoing storage costs
- Securely share data and reduce or eliminate data integration costs by accessing ready-to-query datasets available in the Snowflake Marketplace
Preface Origin of the Book Who Is This Book For? Goals of the Book Navigating this Book Using Code Examples Conventions Used in This Book O’Reilly Online Learning How to Contact Us Acknowledgments 1. Getting Started Snowflake Web User Interfaces Prep Work Snowsight Orientation Snowsight Preferences Navigating Snowsight Worksheets Context Setting Improved Productivity Using contextual suggestions from Smart Autocomplete Formatting SQL Using shortcuts Accessing version history Snowflake Community Snowflake Certifications Snowday and Snowflake Summit Events Important Caveats About Code Examples in the Book Code Cleanup Summary Knowledge Check 2. Creating and Managing the Snowflake Architecture Prep Work Traditional Data Platform Architectures Shared-Disk (Scalable) Architecture Shared-Nothing (Scalable) Architecture NoSQL Alternatives The Snowflake Architecture The Cloud Services Layer Managing the Cloud Services Layer Billing for the Cloud Services Layer The Query Processing (Virtual Warehouse) Compute Layer Virtual Warehouse Size Scaling Up a Virtual Warehouse to Process Large Data Volumes and Complex Queries Scaling Out with Multicluster Virtual Warehouses to Maximize Concurrency Creating and Using Virtual Warehouses Separation of Workloads and Workload Management Billing for the Virtual Warehouse Layer Centralized (Hybrid Columnar) Database Storage Layer Introduction to Zero-Copy Cloning Introduction to Time Travel Billing for the Storage Layer Snowflake Caching Query Result Cache Metadata Cache Virtual Warehouse Local Disk Cache Code Cleanup Summary Knowledge Check 3. Creating and Managing Snowflake Securable Database Objects Prep Work Creating and Managing Snowflake Databases Creating and Managing Snowflake Schemas INFORMATION_SCHEMA ACCOUNT_USAGE Schema Schema Object Hierarchy Introduction to Snowflake Tables Creating and Managing Views Introduction to Snowflake Stages: File Format Included Extending SQL with Stored Procedures and UDFs User-Defined Function (UDF): Task Included Secure SQL UDTF That Returns Tabular Value (Market Basket Analysis Example) Stored Procedures Introduction to Pipes, Streams, and Sequences Snowflake Streams (Deep Dive) Snowflake Tasks (Deep Dive) Code Cleanup Summary Knowledge Check 4. Exploring Snowflake SQL Commands, Data Types, and Functions Prep Work Working with SQL Commands in Snowflake DDL Commands DCL Commands DML Commands TCL Commands DQL Command SQL Query Development, Syntax, and Operators in Snowflake SQL Development and Management Query Syntax Subqueries, derived columns, and CTEs Caution about multirow inserts Query Operators Long-Running Queries, and Query Performance and Optimization Snowflake Query Limits Introduction to Data Types Supported by Snowflake Numeric Data Types String and Binary Data Types Date and Time Input/Output Data Types Semi-Structured Data Types Unstructured Data Types How Snowflake Supports Unstructured Data Use Stage file URL access Scoped URL access Presigned URL access Processing unstructured data with Java functions and external functions Snowflake SQL Functions and Session Variables Using System-Defined (Built-In) Functions Scalar functions Aggregate functions Table functions System functions Creating SQL and JavaScript UDFs and Using Session Variables External Functions Code Cleanup Summary Knowledge Check 5. Leveraging Snowflake Access Controls Prep Work Creating Snowflake Objects Snowflake System-Defined Roles Creating Custom Roles Functional-Level Business and IT Roles System-Level Service Account and Object Access Roles Role Hierarchy Assignments: Assigning Roles to Other Roles Granting Privileges to Roles Assigning Roles to Users Testing and Validating Our Work User Management Role Management Snowflake Multi-Account Strategy Managing Users and Groups with SCIM Code Cleanup Summary Knowledge Check 6. Data Loading and Unloading Prep Work Basics of Data Loading and Unloading Data Types Semi-structured data types File Formats Data File Compression Frequency of Data Processing Batch processing Streaming, continuous loading, and micro-batch processing Snowflake Stage References Named stages User stages Table stages Data Sources Data Loading Tools Snowflake Worksheet SQL Using INSERT INTO and INSERT ALL Commands Single-row inserts for structured and semi-structured data Multirow inserts for structured and semi-structured data Multitable inserts ARRAY_INSERT OBJECT_INSERT Web UI Load Data Wizard Structured data example SnowSQL CLI SQL PUT and COPY INTO Commands Data Pipelines Using Apache Kafka Automating Snowpipe using cloud messaging and optional on-premises Kafka clusters Calling Snowpipe REST endpoints Third-Party ETL and ELT Tools Alternatives to Loading Data Tools to Unload Data Data Loading Best Practices for Snowflake Data Engineers Select the Right Data Loading Tool and Consider the Appropriate Data Type Options Avoid Row-by-Row Data Processing Choose the Right Snowflake Virtual Warehouse Size and Split Files as Needed Transform Data in Steps and Use Transient Tables for Intermediate Results Code Cleanup Summary Knowledge Check 7. Implementing Data Governance, Account Security, and Data Protection and Recovery Prep Work Snowflake Security Controlling Account Access Authentication and user management Managing network security policies and firewall access Monitoring Activity with the Snowflake ACCESS_HISTORY Account Usage View Data Protection and Recovery Encryption and key management Time Travel and fail-safe Replication and Failover Democratizing Data with Data Governance Controls INFORMATION_SCHEMA Data Dictionary Object Tagging Classification Classification category types Data Masking Dynamic data masking Conditional masking Static masking Row Access Policies and Row-Level Security External Tokenization Secure Views and UDFs Object Dependencies Code Cleanup Summary Knowledge Check 8. Managing Snowflake Account Costs Prep Work Snowflake Monthly Bill Storage Fees Data Transfer Costs Compute Credits Consumed Creating Resource Monitors to Manage Virtual Warehouse Usage and Reduce Costs Resource Monitor Credit Quota Resource Monitor Credit Usage Resource Monitor Notifications and Other Actions Resource Monitor Rules for Assignments DDL Commands for Creating and Managing Resource Monitors Using Object Tagging for Cost Centers Querying the ACCOUNT_USAGE View Using BI Partner Dashboards to Monitor Snowflake Usage and Costs Snowflake Agile Software Delivery Why Do We Need DevOps? Continuous Data Integration, Continuous Delivery, and Continuous Deployment What Is Database Change Management? Overcoming the unique challenges of database changes DCM tools for a heavyweight Snowflake DevOps framework How Zero-Copy Cloning Can Be Used to Support Dev/Test Environments Code Cleanup Summary Knowledge Check 9. Analyzing and Improving Snowflake Query Performance Prep Work Analyzing Query Performance QUERY_HISTORY Profiling HASH() Function Web UI History Using Snowflake’s Query Profile tool Understanding Snowflake Micro-Partitions and Data Clustering Partitions Explained Snowflake Micro-Partitions Explained Snowflake Data Clustering Explained Clustering Width and Depth Choosing a Clustering Key Table data characteristics and workload considerations Creating a Clustering Key Reclustering Performance Benefits of Materialized Views Exploring Other Query Optimization Techniques Search Optimization Service Query Optimization Techniques Compared Summary Code Cleanup Knowledge Check 10. Configuring and Managing Secure Data Sharing Snowflake Architecture Data Sharing Support The Power of Snowgrid Data Sharing Use Cases Snowflake Support for Unified ID 2.0 Snowflake Secure Data Sharing Approaches Prep Work Snowflake’s Direct Secure Data Sharing Approach Creating Outbound Shares Data providers’ role in creating and managing shares Setting up a reader account How Inbound Shares Are Used by Snowflake Data Consumers Understanding consumer accounts: reader accounts versus full accounts How the ACCOUNT_USAGE share is different from all other inbound shares Comparison between databases on inbound shares and regular databases How to List and Shop on the Public Snowflake Marketplace Snowflake Marketplace for Providers Provider Studio Standard Versus Personalized Data Listings Harnessing the Power of a Snowflake Private Data Exchange Snowflake Data Clean Rooms Important Design, Security, and Performance Considerations Share Design Considerations Share Security Considerations Share Performance Considerations Difference Between Database Sharing and Database Cloning Data Shares and Time Travel Considerations Sharing of Data Shares Summary Code Cleanup Knowledge Check 11. Visualizing Data in Snowsight Prep Work Data Sampling in Snowsight Fixed-Size Sampling Based on a Specific Number of Rows Fraction-Based Sampling Based on Probability Previewing Fields and Data Sampling Examples Using Automatic Statistics and Interactive Results Snowsight Dashboard Visualization Creating a Dashboard and Tiles Working with Chart Visualizations Aggregating and Bucketing Data Editing and Deleting Tiles Collaboration Sharing Your Query Results Using a Private Link to Collaborate on Dashboards Summary Code Cleanup Knowledge Check 12. Workloads for the Snowflake Data Cloud Prep Work Data Engineering Data Warehousing Data Vault 2.0 Modeling Transforming Data within Snowflake Data Lake Data Collaboration Data Monetization Regulatory and Compliance Requirements for Data Sharing Data Analytics Advanced Analytics for the Finance Industry Advanced Analytics for the Healthcare Industry Advanced Analytics for the Manufacturing Industry and Logistics Services Marketing Analytics for Retail Verticals and the Communications and Media Industry Data Applications Data Science Snowpark Streamlit Cybersecurity Using Snowflake as a Security Data Lake Overcoming the Challenges of a SIEM-Only Architecture Search Optimization Service Versus Clustering Unistore Transactional Workload Versus Analytical Workload Hybrid Tables Summary Code Cleanup Knowledge Check A. Answers to the Knowledge Check Questions Chapter 1 Chapter 2 Chapter 3 Chapter 4 Chapter 5 Chapter 6 Chapter 7 Chapter 8 Chapter 9 Chapter 10 Chapter 11 Chapter 12 B. Snowflake Object Naming Best Practices General (Character Related) General (Not Character Related) Standard Label Abbreviations C. Setting Up a Snowflake Trial Account Index
Donate to keep this site alive
How to download source code?
1. Go to: https://www.oreilly.com/
2. Search the book title: Snowflake: The Definitive Guide: Architecting, Designing, and Deploying on the Snowflake Data Cloud
, sometime you may not get the results, please search the main title
3. Click the book title in the search results
3. Publisher resources
section, click Download Example Code
.
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.