Cloud Scale Analytics with Azure Data Services: Build modern data warehouses on Microsoft Azure

by Patrik Borosch

Length: 520 pages
Edition: 1
Language: English
Publisher: Packt Publishing
Publication Date: 2021-07-23
ISBN-10: 1800562934
ISBN-13: 9781800562936
Sales Rank: #996897 (See Top 100 Books)

0 ratings

Print Book Look Inside

A practical guide to implementing a scalable and fast state-of-the-art analytical data estate

Key Features

Store and analyze data with enterprise-grade security and auditing
Perform batch, streaming, and interactive analytics to optimize your big data solutions with ease
Develop and run parallel data processing programs using real-world enterprise scenarios

Book Description

Azure Data Lake, the modern data warehouse architecture, and related data services on Azure enable organizations to build their own customized analytical platform to fit any analytical requirements in terms of volume, speed, and quality.

This book is your guide to learning all the features and capabilities of Azure data services for storing, processing, and analyzing data (structured, unstructured, and semi-structured) of any size. You will explore key techniques for ingesting and storing data and perform batch, streaming, and interactive analytics. The book also shows you how to overcome various challenges and complexities relating to productivity and scaling. Next, you will be able to develop and run massive data workloads to perform different actions. Using a cloud-based big data-modern data warehouse-analytics setup, you will also be able to build secure, scalable data estates for enterprises. Finally, you will not only learn how to develop a data warehouse but also understand how to create enterprise-grade security and auditing big data programs.

By the end of this Azure book, you will have learned how to develop a powerful and efficient analytical platform to meet enterprise needs.

What you will learn

Implement data governance with Azure services
Use integrated monitoring in the Azure Portal and integrate Azure Data Lake Storage into the Azure Monitor
Explore the serverless feature for ad-hoc data discovery, logical data warehousing, and data wrangling
Implement networking with Synapse Analytics and Spark pools
Create and run Spark jobs with Databricks clusters
Implement streaming using Azure Functions, a serverless runtime environment on Azure
Explore the predefined ML services in Azure and use them in your app

Who this book is for

This book is for data architects, ETL developers, or anyone who wants to get well-versed with Azure data services to implement an analytical data estate for their enterprise. The book will also appeal to data scientists and data analysts who want to explore all the capabilities of Azure data services, which can be used to store, process, and analyze any kind of data. A beginner-level understanding of data analysis and streaming will be required.

Balancing the benefits of Data Lakes over Data Warehouses
The Modern Data Warehouse and Azure Data Services
Understanding the Data Lake Storage Layer
Relational Storage components: Synapse SQL Pools, SQL DB, Azure Databases
Data integration enterprise grade and even code-free
Spark on Azure: Synapse Spark Pools
Spark on Azure: Databricks
Streaming
Azure Cognitive Services / Azure Machine Learning
Machine Learning with Spark on Azure: Synapse Spark Pools / Azure Databricks
Synapse SQL Pools / Synapse Analytics
Analysis Service / Power BI / Data Share
Industry Data Models
Data Governance

Cloud Scale Analytics with Azure Data Services
Contributors
About the author
About the reviewers
Preface
    Who this book is for
    What this book covers
    To get the most out of this book
    Download the example code files
    Download the color images
    Conventions used
    Get in touch
    Reviews
    Share Your Thoughts
Section 1: Data Warehousing and Considerations Regarding Cloud Computing
Chapter 1: Balancing the Benefits of Data Lakes Over Data Warehouses
    Distinguishing between Data Warehouses and Data Lakes
        Understanding Data Warehouse patterns
        Investigating ETL/ELT
        Understanding Data Warehouse layers
        Implementing reporting and dashboarding
        Loading bigger amounts of data
        Starting with Data Lakes  
        Understanding the Data Lake ecosystem
        Comparing Data Lake zones
        Discovering caveats
    Understanding the opportunities of modern cloud computing
        Understanding Infrastructure-as-a-Service 
        Understanding Platform-as-a-Service
        Understanding Software-as-a-Service
        Examining the possibilities of virtual machines
        Understanding Serverless Functions
        Looking at the importance of containers
        Exploring the advantages of scalable environments
        Implementing elastic storage and compute
    Exploring the benefits of AI and ML
        Understanding ML challenges
        Sorting ML into the Modern Data Warehouse
        Understanding responsible ML/AI
    Answering the question
    Summary
Chapter 2: Connecting Requirements and Technology
    Formulating your requirements
        Asking in the right direction
    Understanding basic architecture patterns
        Examining the scalable storage component
        Looking at data integration
        Sorting in compute
        Adding a presentation layer
        Planning for dashboard/reporting
        Adding APIs/API management
        Relying on SSO/MFA/networking
        Not forgetting DevOps and CI/CD
    Finding the right Azure tool for the right purpose
    Understanding Industry Data Models
    Thinking about different sizes
        Planning for S size
        Planning for M size
        Planning for L size
    Understanding the supporting services
        Requiring data governance
        Establishing security
        Establishing DevOps and CI/CD
    Summary
    Questions
Section 2: The Storage Layer
Chapter 3: Understanding the Data Lake Storage Layer
    Technical requirements
    Setting up your Cloud Big Data Storage
        Provisioning a standard storage account instead
        Creating an Azure Data Lake Gen2 storage account
    Organizing your data lake
        Talking about zones in your data lake
        Creating structures in your data lake
        Planning the leaf level
        Understanding data life cycles
        Investigating storage tiers
        Planning for criticality
        Setting up confidentiality
        Using filetypes 
    Implementing a data model in your Data Lake
        Understanding interconnectivity between your data lake and the presentation layer
        Examining key implementation and usage
    Monitoring your storage account
        Creating alerts for Azure storage accounts
    Talking about backups
        Configuring delete locks for the storage service
        Backing up your data
    Implementing access control in your Data Lake
        Understanding RBAC
        Understanding ACLs
        Understanding the evaluation sequence of RBAC and ACLs 
        Understanding Shared Key authorization
        Understanding Shared Access Signature authorization
    Setting the networking options
        Understanding storage account firewalls
        Adding Azure virtual networks
        Using private endpoints with Data Lake Storage
    Discovering additional knowledge
    Summary
    Further reading
Chapter 4: Understanding Synapse SQL Pools and SQL Options
    Uncovering MPP in the cloud – the power of 60
        Understanding the control node
        Understanding compute nodes
        Understanding the data movement service
        Understanding distributions
    Provisioning a Synapse dedicated SQL pool
        Connecting to your database for the first time
        Distributing, replicating, and round-robin
        Understanding CCI 
    Talking about partitioning
    Implementing workload management
        Understanding concurrency and memory settings
        Using resource classes
        Implementing workload classification
        Adding workload importance
        Understanding workload isolation
    Scaling the database
        Using PowerShell to handle scaling and start/stop
        Using T-SQL to scale your database
    Loading data
        Using the COPY statement
        Maintaining statistics
    Understanding other SQL options in Azure
    Summary
    Further reading
        Additional links
        Static resource classes and concurrency slots
        Dynamic resource classes, memory allocation, and concurrency slots
        Effective values for REQUEST_MIN_RESOURCE_GRANT_PERCENT
Section 3: Cloud-Scale Data Integration and Data Transformation
Chapter 5: Integrating Data into Your Modern Data Warehouse
    Technical requirements
    Setting up Azure Data Factory
        Creating the Data Factory service
    Examining the authoring environment
        Understanding the Author section
        Understanding the Monitor section
        Understanding the Manage section
        Understanding the object types
    Using wizards
        Working with parameters
        Using variables
    Adding data transformation logic
        Understanding mapping flows
        Understanding wrangling flows
    Understanding integration runtimes
    Integrating with DevOps
    Summary
    Further reading
Chapter 6: Using Synapse Spark Pools
    Technical requirements
    Setting up a Synapse Spark pool
        Bringing your Spark cluster live for the first time
    Examining the Synapse Spark architecture
        Understanding the Synapse Spark pool and its components
        Running a Spark job
        Examining Synapse Spark instances
        Understanding Spark pools and Spark instances
        Understanding resource usage
    Programming with Synapse Spark pools
        Understanding Synapse Spark notebooks
        Running Spark applications
        Benefiting of the Synapse metadata exchange
    Using additional libraries with your Spark pool
        Using public libraries
        Adding your own packages
    Handling security
    Monitoring your Synapse Spark pools
    Summary
    Further reading
Chapter 7: Using Databricks Spark Clusters
    Technical requirements
    Provisioning Databricks
    Examining the Databricks workspace
    Understanding the Databricks components
        Creating Databricks clusters
        Managing clusters
        Using Databricks notebooks
        Using Databricks Spark jobs
        Adding dependent libraries to a job
        Creating Databricks tables
        Understanding Databricks Delta Lake
        Having a glance at Databricks SQL Analytics
        Adding libraries
        Adding dashboards
    Setting up security
        Examining access controls
        Understanding secrets
        Understanding networking
    Monitoring Databricks
    Summary
    Further reading
Chapter 8: Streaming Data into Your MDWH
    Technical requirements
    Provisioning ASA
    Implementing an ASA job
        Integrating sources
        Writing to sinks
    Understanding ASA SQL
        Understanding windowing
        Using window functions in your SQL
        Delivering to more than one output
        Adding reference data to your query
        Adding functions to your ASA job
        Understanding streaming units
        Resuming your job
    Using Structured Streaming with Spark
    Security in your streaming solution
        Connecting to sources and sinks
        Understanding ASA clusters
    Monitoring your streaming solution 
        Using Azure Monitor
    Summary
    Further reading
Chapter 9: Integrating Azure Cognitive Services and Machine Learning
    Technical requirements
    Understanding Azure Cognitive Services
        Examining available Cognitive Services
        Getting in touch with Cognitive Services
    Using Cognitive Services with your data
        Understanding the Azure Text Analytics cognitive service
        Implementing the call to your Text Analytics cognitive service with Spark
    Examining Azure Machine Learning
        Browsing the different Azure ML tools
        Examining Azure Machine Learning Studio
        Understanding the ML designer
        Creating a linear regression model with the designer
        Publishing your trained model for usage
    Using Azure Machine Learning with your modern data warehouse
        Connecting the services
        Understanding further options to integrate Azure ML with your modern data warehouse
    Summary
    Further reading
Chapter 10: Loading the Presentation Layer
    Technical requirements
    Understanding the loading strategy with Synapse-dedicated SQL pools
    Loading data into Synapse-dedicated SQL pools
        Examining PolyBase
        Loading data into a dedicated SQL pool using COPY
        Adding data with Synapse pipelines/Data Factory
    Using Synapse serverless SQL pools
        Browsing data ad hoc
        Using a serverless SQL pool to ELT
        Building a virtual data warehouse layer with Synapse serverless SQL pools
    Integrating data with Synapse Spark pools
        Reading and loading data
    Exchanging metadata between computes 
    Summary
    Further reading
Section 4: Data Presentation, Dashboarding, and Distribution
Chapter 11: Developing and Maintaining the Presentation Layer
    Developing with Synapse Studio
        Integrating Synapse Studio with Azure DevOps
        Understanding the development life cycle
        Automating deployments
        Understanding developer productivity with Synapse Studio
        Using the Copy Data Wizard
        Integrating Spark notebooks with Synapse pipelines
        Analyzing data ad hoc with Azure Synapse Spark pools
        Creating Spark tables
        Enriching Spark tables
        Enriching dedicated SQL pool tables
        Creating new integration datasets
        Starting serverless SQL analysis
    Backing up and DR in Azure Synapse
        Backing up data
        Backing up dedicated SQL pools
    Monitoring your MDWH
    Understanding security in your MDWH
        Implementing access control
        Implementing networking
    Summary
    Further reading
Chapter 12: Distributing Data
    Technical requirements
    Building data marts with Power BI
        Understanding the Power BI ecosystem
        Understanding Power BI object types
        Understanding Power BI offerings
        Acquiring data
        Optimizing the columnstore database in Power BI
        Building business logic with Data Analysis Expressions
        Visualizing data
        Publishing insights
    Creating data models with Azure Analysis Services
        Developing AAS models
    Distributing data using Azure Data Share
    Summary
    Further reading
Chapter 13: Introducing Industry Data Models
    Understanding Common Data Model
        Examining the basics of the SDK
        Understanding solutions and the manifest file
    Examining and leveraging predefined entities
        Finding CDM definitions
        Using the APIs of CDM
        Introducing Dataverse
    Discovering Azure Industry Data Workbench
    Summary
    Further reading
Chapter 14: Establishing Data Governance
    Technical requirements
    Discovering Azure Purview
        Provisioning the service
        Connecting to your data
        Scanning data
        Searching your catalog
        Browsing assets
        Examining assets
    Classifying data
        Creating a custom classification
        Creating a custom classification rule
        Using custom classifications 
    Integrating with Azure services
        Integrating with Synapse
        Integrating with Power BI
        Integrating with Azure Data Factory
    Using data lineage
    Discovering Insights
    Discovering more Purview
    Summary
    Further reading
    Why subscribe?
Other Books You May Enjoy
    Packt is searching for authors like you
    Share Your Thoughts

Donate to keep this site alive

To access the Link, solve the captcha.

How to download source code?

1. Go to: https://github.com/PacktPublishing

2. In the Find a repository… box, search the book title: Cloud Scale Analytics with Azure Data Services: Build modern data warehouses on Microsoft Azure, sometime you may not get the results, please search the main title.

Cloud Scale Analytics with Azure Data Services: Build modern data warehouses on Microsoft Azure

Key Features

Book Description

What you will learn

Who this book is for

Table of Contents

How to download source code?

Unifying Business, Data, and Code: Designing Data Products With Json Schema

Linkerd: Up and Running: A Guide to Operationalizing a Kubernetes-native Service Mesh

Mastering Prometheus: Gain expert tips to monitoring your infrastructure, applications, and services

Postman Cookbook: Hand-picked Solutions and Techniques across API Design, Testing, Performance, Networking, Kubernetes and Integration

Painless Docker: Unlock the Power of Docker and its Ecosystem

Solutions Architect's Handbook, 3rd Edition: Kick-start your career with architecture design principles, strategies, and generative AI techniques