Data Modeling for Azure Data Services: Implement professional data design and structures in Azure
- Length: 428 pages
- Edition: 1
- Language: English
- Publisher: Packt Publishing
- Publication Date: 2021-07-30
- ISBN-10: 1801077347
- ISBN-13: 9781801077347
- Sales Rank: #202954 (See Top 100 Books)
Choose the right Azure data service and correct model design for successful implementation of your data model with the help of this hands-on guide
Key Features
- Design a cost-effective, performant, and scalable database in Azure
- Choose and implement the most suitable design for a database
- Discover how your database can scale with growing data volumes, concurrent users, and query complexity
Book Description
Data is at the heart of all applications and forms the foundation of modern data-driven businesses. With the multitude of data-related use cases and the availability of different data services, choosing the right service and implementing the right design becomes paramount to successful implementation.
Data Modeling for Azure Data Services starts with an introduction to databases, entity analysis, and normalizing data. The book then shows you how to design a NoSQL database for optimal performance and scalability and covers how to provision and implement Azure SQL DB, Azure Cosmos DB, and Azure Synapse SQL Pool. As you progress through the chapters, you’ll learn about data analytics, Azure Data Lake, and Azure SQL Data Warehouse and explore dimensional modeling, data vault modeling, along with designing and implementing a Data Lake using Azure Storage. You’ll also learn how to implement ETL with Azure Data Factory.
By the end of this book, you’ll have a solid understanding of which Azure data services are the best fit for your model and how to implement the best design for your solution.
What you will learn
- Model relational database using normalization, dimensional, or Data Vault modeling
- Provision and implement Azure SQL DB and Azure Synapse SQL Pools
- Discover how to model a Data Lake and implement it using Azure Storage
- Model a NoSQL database and provision and implement an Azure Cosmos DB
- Use Azure Data Factory to implement ETL/ELT processes
- Create a star schema model using dimensional modeling
Who this book is for
This book is for business intelligence developers and consultants who work on (modern) cloud data warehousing and design and implement databases. Beginner-level knowledge of cloud data management is expected.
Table of Contents
- Introduction to Databases
- Entity Analysis
- Normalizing Data
- Provisioning and Implementing an Azure SQL DB
- Design a NoSQL Database
- Provisioning and Implementing an Azure Cosmos DB Database
- Dimensional Modeling
- Provision and implement an Azure Synapse SQL Pool
- Data Vault Modeling
- Designing and Implementing a Data Lake Using Azure Storage
- Implementing ETL Using Azure Data Factory
Data Modeling for Azure Data Services Contributors About the author About the reviewers Preface Who this book is for What this book covers To get the most out of this book Download the example code files Download the color images Conventions used Get in touch Share Your Thoughts Section 1 – Operational/OLTP Databases Chapter 1: Introduction to Databases Overview of relational databases Files Relational databases Introduction to Structured Query Language Different categories of SQL Understanding the database schema Impact of intended usage patterns on database design Understanding relational theory Pillar 1 – Elements of a set are not ordered Pillar 2 – All elements in a set are unique Keys Types of keys Choosing the primary key Integrity The Check and Unique constraints Types of workload OLTP OLAP Summary Chapter 2: Entity Analysis Scope Project scope Product scope Understanding entity relationship diagrams Entities Understanding super- and sub-entities Naming entities Relationships Types of relationships Drawing conventions Recap Creating your first ERD Context of an ERD Summary Exercises Exercise 1 – student registration Exercise 2 – airline Chapter 3: Normalizing Data When to use normalization as a design strategy Considering all the details Preventing redundancy How to avoid redundancy The normalization steps Step zero First normal form Second normal form Third normal form Boyce-Codd and the fourth normal form Normalizing – a recap An alternative approach to normalizing data Step 1 Step 2 Step 3 Step 4 Integrating separate results Entity relationship diagram Summary Exercises Exercise 1 – Stock management of a bicycle shop Chapter 4: Provisioning and Implementing an Azure SQL DB Technical requirements Understanding SQL Server data types Numerical data Alphanumerical data Varying-length data types Dates Other data types Quantifying the data model Estimating the database size Analyzing expected usage patterns Provisioning an Azure SQL database Provisioned versus serverless vCores versus DTU Hyperscale and Business Critical Elastic pool Networking Additional settings Tags Review + create Connecting to the database Azure portal Azure Data Studio Data definition language Creating a table Altering a table Dropping a table Inserting data Indexing Clustered index Nonclustered index Automatic tuning Summary Chapter 5: Designing a NoSQL Database Understanding big data Understanding big data clusters Partitioning Getting to know Cosmos DB JSON Modeling JSON Using embedding versus referencing Referring to objects Cosmos DB partitioning Putting it together Key-value databases Modeling key-value databases Other NoSQL databases Gremlin Cassandra Extra considerations Polyglot persistence Concurrency Summary Exercise Chapter 6: Provisioning and Implementing an Azure Cosmos DB Database Technical requirements Provisioning a Cosmos DB database Basics Networking Backup policy Encryption Creating a container Uploading documents to a container Cosmos DB container settings Importing data using the Azure Cosmos DB Data Migration tool Summary Section 2 – Analytics with a Data Lake and Data Warehouse Chapter 7: Dimensional Modeling Background to dimensional modeling Performance Consistency Data quality The complexity of normalized database schemas Lack of historical data Understanding dimensional modeling Minimizing redundancy Using dependencies between attributes Understanding star schemas Understanding fact tables Understanding dimension tables Steps in dimensional modeling Choosing a process and defining the scope Determining the needed grain Determining the dimensions Determining the facts Designing dimensions Defining the primary key of a dimension table Adding an unknown member Creating star schemas versus creating snowflake schemas Implementing a date dimension Slowly changing dimensions Junk dimension Degenerate dimension Designing fact tables Understanding additive facts Understanding semi-additive facts Understanding non-additive facts Understanding transactional fact tables Understanding periodic snapshot fact tables Understanding accumulating snapshot fact tables Understanding the roleplaying dimension Using a coverage fact table Using a Kimball data warehouse versus data marts Summary Exercise Chapter 8: Provisioning and Implementing an Azure Synapse SQL Pool Overview of Synapse Analytics Introducing SQL pools Introducing Spark pools Introducing data integration Provisioning a Synapse Analytics workspace Creating a dedicated SQL pool Implementing tables in Synapse SQL pools Using hash distribution Using replicated distribution Using ROUND_ROBIN distribution Implementing columnstore indexes Understanding workload management Creating a workload group Creating a workload classifier Using PolyBase to load data Enabling a SQL pool to access a data lake account Configuring and using PolyBase Using CTAS to import data Using COPY to import data Connecting to and using a dedicated SQL pool Working with Azure Data Studio Working with Power BI Summary Chapter 9: Data Vault Modeling Background to Data Vault modeling Designing Hub tables Defining the business key Implementing a hash key Adding the load date Adding the name of the source system Adding optional columns Designing Link tables Designing Satellite tables Adding optional columns to a Satellite Choosing the number of Satellites to use Using hash keys Designing a Data Vault structure Choosing the Hubs Choosing the Links Choosing the Satellites Designing business vaults Adding a Meta Mart Adding a Metrics Vault Adding an Error Mart Using Point-in-Time tables Adding Bridge tables Adding a hierarchical link Implementing a Data Vault Summary Exercise Chapter 10: Designing and Implementing a Data Lake Using Azure Storage Technical requirements Background of data lakes Modeling a data lake Defining data lake zones Defining a data lake folder structure Designing time slices Using different file formats AVRO file format Parquet file format ORC file format Choosing the proper file size Provisioning an Azure storage account Locally redundant storage (LRS) Zone-redundant storage (ZRS) Geo-redundant storage (GRS) and geo-zone-redundant storage (GZRS) Read-access geo-redundant storage (RA_GRS) and read-access geo-zone-redundant storage (RA_GZRS) Creating a data lake filesystem Creating multiple storage accounts Considering DTAP Considering data diversity Considering cost sensitivity Considering management overhead Summary Section 3 – ETL with Azure Data Factory Chapter 11: Implementing ETL Using Azure Data Factory Technical requirements Introducing Azure Data Factory Introducing the main components of Azure Data Factory Understanding activities Understanding datasets Understanding linked services Understanding pipelines Understanding triggers and integration runtimes Using the copy activity Copying a single table to the data lake Copying all tables to the data lake Implementing a data flow Executing SQL code from Data Factory Summary Why subscribe? Other Books You May Enjoy Packt is searching for authors like you Share Your Thoughts
Donate to keep this site alive
How to download source code?
1. Go to: https://github.com/PacktPublishing
2. In the Find a repository… box, search the book title: Data Modeling for Azure Data Services: Implement professional data design and structures in Azure
, sometime you may not get the results, please search the main title.
3. Click the book title in the search results.
3. Click Code to download.
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.