Cloud Scale Analytics with Azure Data Services: Build modern data warehouses on Microsoft Azure
A practical guide to implementing a scalable and fast state-of-the-art analytical data estate
- Store and analyze data with enterprise-grade security and auditing
- Perform batch, streaming, and interactive analytics to optimize your big data solutions with ease
- Develop and run parallel data processing programs using real-world enterprise scenarios
Azure Data Lake, the modern data warehouse architecture, and related data services on Azure enable organizations to build their own customized analytical platform to fit any analytical requirements in terms of volume, speed, and quality.
This book is your guide to learning all the features and capabilities of Azure data services for storing, processing, and analyzing data (structured, unstructured, and semi-structured) of any size. You will explore key techniques for ingesting and storing data and perform batch, streaming, and interactive analytics. The book also shows you how to overcome various challenges and complexities relating to productivity and scaling. Next, you will be able to develop and run massive data workloads to perform different actions. Using a cloud-based big data-modern data warehouse-analytics setup, you will also be able to build secure, scalable data estates for enterprises. Finally, you will not only learn how to develop a data warehouse but also understand how to create enterprise-grade security and auditing big data programs.
By the end of this Azure book, you will have learned how to develop a powerful and efficient analytical platform to meet enterprise needs.
What you will learn
- Implement data governance with Azure services
- Use integrated monitoring in the Azure Portal and integrate Azure Data Lake Storage into the Azure Monitor
- Explore the serverless feature for ad-hoc data discovery, logical data warehousing, and data wrangling
- Implement networking with Synapse Analytics and Spark pools
- Create and run Spark jobs with Databricks clusters
- Implement streaming using Azure Functions, a serverless runtime environment on Azure
- Explore the predefined ML services in Azure and use them in your app
Who this book is for
This book is for data architects, ETL developers, or anyone who wants to get well-versed with Azure data services to implement an analytical data estate for their enterprise. The book will also appeal to data scientists and data analysts who want to explore all the capabilities of Azure data services, which can be used to store, process, and analyze any kind of data. A beginner-level understanding of data analysis and streaming will be required.
Table of Contents
- Balancing the benefits of Data Lakes over Data Warehouses
- The Modern Data Warehouse and Azure Data Services
- Understanding the Data Lake Storage Layer
- Relational Storage components: Synapse SQL Pools, SQL DB, Azure Databases
- Data integration enterprise grade and even code-free
- Spark on Azure: Synapse Spark Pools
- Spark on Azure: Databricks
- Azure Cognitive Services / Azure Machine Learning
- Machine Learning with Spark on Azure: Synapse Spark Pools / Azure Databricks
- Synapse SQL Pools / Synapse Analytics
- Analysis Service / Power BI / Data Share
- Industry Data Models
- Data Governance
Cloud Scale Analytics with Azure Data Services Contributors About the author About the reviewers Preface Who this book is for What this book covers To get the most out of this book Download the example code files Download the color images Conventions used Get in touch Reviews Share Your Thoughts Section 1: Data Warehousing and Considerations Regarding Cloud Computing Chapter 1: Balancing the Benefits of Data Lakes Over Data Warehouses Distinguishing between Data Warehouses and Data Lakes Understanding Data Warehouse patterns Investigating ETL/ELT Understanding Data Warehouse layers Implementing reporting and dashboarding Loading bigger amounts of data Starting with Data Lakes Understanding the Data Lake ecosystem Comparing Data Lake zones Discovering caveats Understanding the opportunities of modern cloud computing Understanding Infrastructure-as-a-Service Understanding Platform-as-a-Service Understanding Software-as-a-Service Examining the possibilities of virtual machines Understanding Serverless Functions Looking at the importance of containers Exploring the advantages of scalable environments Implementing elastic storage and compute Exploring the benefits of AI and ML Understanding ML challenges Sorting ML into the Modern Data Warehouse Understanding responsible ML/AI Answering the question Summary Chapter 2: Connecting Requirements and Technology Formulating your requirements Asking in the right direction Understanding basic architecture patterns Examining the scalable storage component Looking at data integration Sorting in compute Adding a presentation layer Planning for dashboard/reporting Adding APIs/API management Relying on SSO/MFA/networking Not forgetting DevOps and CI/CD Finding the right Azure tool for the right purpose Understanding Industry Data Models Thinking about different sizes Planning for S size Planning for M size Planning for L size Understanding the supporting services Requiring data governance Establishing security Establishing DevOps and CI/CD Summary Questions Section 2: The Storage Layer Chapter 3: Understanding the Data Lake Storage Layer Technical requirements Setting up your Cloud Big Data Storage Provisioning a standard storage account instead Creating an Azure Data Lake Gen2 storage account Organizing your data lake Talking about zones in your data lake Creating structures in your data lake Planning the leaf level Understanding data life cycles Investigating storage tiers Planning for criticality Setting up confidentiality Using filetypes Implementing a data model in your Data Lake Understanding interconnectivity between your data lake and the presentation layer Examining key implementation and usage Monitoring your storage account Creating alerts for Azure storage accounts Talking about backups Configuring delete locks for the storage service Backing up your data Implementing access control in your Data Lake Understanding RBAC Understanding ACLs Understanding the evaluation sequence of RBAC and ACLs Understanding Shared Key authorization Understanding Shared Access Signature authorization Setting the networking options Understanding storage account firewalls Adding Azure virtual networks Using private endpoints with Data Lake Storage Discovering additional knowledge Summary Further reading Chapter 4: Understanding Synapse SQL Pools and SQL Options Uncovering MPP in the cloud – the power of 60 Understanding the control node Understanding compute nodes Understanding the data movement service Understanding distributions Provisioning a Synapse dedicated SQL pool Connecting to your database for the first time Distributing, replicating, and round-robin Understanding CCI Talking about partitioning Implementing workload management Understanding concurrency and memory settings Using resource classes Implementing workload classification Adding workload importance Understanding workload isolation Scaling the database Using PowerShell to handle scaling and start/stop Using T-SQL to scale your database Loading data Using the COPY statement Maintaining statistics Understanding other SQL options in Azure Summary Further reading Additional links Static resource classes and concurrency slots Dynamic resource classes, memory allocation, and concurrency slots Effective values for REQUEST_MIN_RESOURCE_GRANT_PERCENT Section 3: Cloud-Scale Data Integration and Data Transformation Chapter 5: Integrating Data into Your Modern Data Warehouse Technical requirements Setting up Azure Data Factory Creating the Data Factory service Examining the authoring environment Understanding the Author section Understanding the Monitor section Understanding the Manage section Understanding the object types Using wizards Working with parameters Using variables Adding data transformation logic Understanding mapping flows Understanding wrangling flows Understanding integration runtimes Integrating with DevOps Summary Further reading Chapter 6: Using Synapse Spark Pools Technical requirements Setting up a Synapse Spark pool Bringing your Spark cluster live for the first time Examining the Synapse Spark architecture Understanding the Synapse Spark pool and its components Running a Spark job Examining Synapse Spark instances Understanding Spark pools and Spark instances Understanding resource usage Programming with Synapse Spark pools Understanding Synapse Spark notebooks Running Spark applications Benefiting of the Synapse metadata exchange Using additional libraries with your Spark pool Using public libraries Adding your own packages Handling security Monitoring your Synapse Spark pools Summary Further reading Chapter 7: Using Databricks Spark Clusters Technical requirements Provisioning Databricks Examining the Databricks workspace Understanding the Databricks components Creating Databricks clusters Managing clusters Using Databricks notebooks Using Databricks Spark jobs Adding dependent libraries to a job Creating Databricks tables Understanding Databricks Delta Lake Having a glance at Databricks SQL Analytics Adding libraries Adding dashboards Setting up security Examining access controls Understanding secrets Understanding networking Monitoring Databricks Summary Further reading Chapter 8: Streaming Data into Your MDWH Technical requirements Provisioning ASA Implementing an ASA job Integrating sources Writing to sinks Understanding ASA SQL Understanding windowing Using window functions in your SQL Delivering to more than one output Adding reference data to your query Adding functions to your ASA job Understanding streaming units Resuming your job Using Structured Streaming with Spark Security in your streaming solution Connecting to sources and sinks Understanding ASA clusters Monitoring your streaming solution Using Azure Monitor Summary Further reading Chapter 9: Integrating Azure Cognitive Services and Machine Learning Technical requirements Understanding Azure Cognitive Services Examining available Cognitive Services Getting in touch with Cognitive Services Using Cognitive Services with your data Understanding the Azure Text Analytics cognitive service Implementing the call to your Text Analytics cognitive service with Spark Examining Azure Machine Learning Browsing the different Azure ML tools Examining Azure Machine Learning Studio Understanding the ML designer Creating a linear regression model with the designer Publishing your trained model for usage Using Azure Machine Learning with your modern data warehouse Connecting the services Understanding further options to integrate Azure ML with your modern data warehouse Summary Further reading Chapter 10: Loading the Presentation Layer Technical requirements Understanding the loading strategy with Synapse-dedicated SQL pools Loading data into Synapse-dedicated SQL pools Examining PolyBase Loading data into a dedicated SQL pool using COPY Adding data with Synapse pipelines/Data Factory Using Synapse serverless SQL pools Browsing data ad hoc Using a serverless SQL pool to ELT Building a virtual data warehouse layer with Synapse serverless SQL pools Integrating data with Synapse Spark pools Reading and loading data Exchanging metadata between computes Summary Further reading Section 4: Data Presentation, Dashboarding, and Distribution Chapter 11: Developing and Maintaining the Presentation Layer Developing with Synapse Studio Integrating Synapse Studio with Azure DevOps Understanding the development life cycle Automating deployments Understanding developer productivity with Synapse Studio Using the Copy Data Wizard Integrating Spark notebooks with Synapse pipelines Analyzing data ad hoc with Azure Synapse Spark pools Creating Spark tables Enriching Spark tables Enriching dedicated SQL pool tables Creating new integration datasets Starting serverless SQL analysis Backing up and DR in Azure Synapse Backing up data Backing up dedicated SQL pools Monitoring your MDWH Understanding security in your MDWH Implementing access control Implementing networking Summary Further reading Chapter 12: Distributing Data Technical requirements Building data marts with Power BI Understanding the Power BI ecosystem Understanding Power BI object types Understanding Power BI offerings Acquiring data Optimizing the columnstore database in Power BI Building business logic with Data Analysis Expressions Visualizing data Publishing insights Creating data models with Azure Analysis Services Developing AAS models Distributing data using Azure Data Share Summary Further reading Chapter 13: Introducing Industry Data Models Understanding Common Data Model Examining the basics of the SDK Understanding solutions and the manifest file Examining and leveraging predefined entities Finding CDM definitions Using the APIs of CDM Introducing Dataverse Discovering Azure Industry Data Workbench Summary Further reading Chapter 14: Establishing Data Governance Technical requirements Discovering Azure Purview Provisioning the service Connecting to your data Scanning data Searching your catalog Browsing assets Examining assets Classifying data Creating a custom classification Creating a custom classification rule Using custom classifications Integrating with Azure services Integrating with Synapse Integrating with Power BI Integrating with Azure Data Factory Using data lineage Discovering Insights Discovering more Purview Summary Further reading Why subscribe? Other Books You May Enjoy Packt is searching for authors like you Share Your Thoughts
How to download source code?
1. Go to:
2. In the Find a repository… box, search the book title:
Cloud Scale Analytics with Azure Data Services: Build modern data warehouses on Microsoft Azure, sometime you may not get the results, please search the main title.
3. Click the book title in the search results.
3. Click Code to download.
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.