Mastering Databricks Lakehouse Platform: Perform Data Warehousing, Data Engineering, Machine Learning, DevOps, and BI into a Single Platform
- Length: 332 pages
- Edition: 1
- Language: English
- Publisher: BPB Publications
- Publication Date: 2022-07-11
- ISBN-10: 9355511396
- ISBN-13: 9789355511393
- Sales Rank: #1207320 (See Top 100 Books)
Enable data and AI workloads with absolute security and scalability
Key Features
- Detailed, step-by-step instructions for every data professional starting a career with data engineering.
- Access to DevOps, Machine Learning, and Analytics wirthin a single unified platform.
- Includes design considerations and security best practices for efficient utilization of Databricks platform.
Description
Starting with the fundamentals of the databricks lakehouse platform, the book teaches readers on administering various data operations, including Machine Learning, DevOps, Data Warehousing, and BI on the single platform.
The subsequent chapters discuss working around data pipelines utilizing the databricks lakehouse platform with data processing and audit quality framework. The book teaches to leverage the Databricks Lakehouse platform to develop delta live tables, streamline ETL/ELT operations, and administer data sharing and orchestration. The book explores how to schedule and manage jobs through the Databricks notebook UI and the Jobs API. The book discusses how to implement DevOps methods on the Databricks Lakehouse platform for data and AI workloads. The book helps readers prepare and process data and standardizes the entire ML lifecycle, right from experimentation to production.
The book doesn’t just stop here; instead, it teaches how to directly query data lake with your favourite BI tools like Power BI, Tableau, or Qlik. Some of the best industry practices on building data engineering solutions are also demonstrated towards the end of the book.
What you will learn
- Acquire capabilities to administer end-to-end Databricks Lakehouse Platform.
- Utilize Flow to deploy and monitor machine learning solutions.
- Gain practical experience with SQL Analytics and connect Tableau, Power BI, and Qlik.
- Configure clusters and automate CI/CD deployment.
- Learn how to use Airflow, Data Factory, Delta Live Tables, Databricks notebook UI, and the Jobs API.
Who this book is for
This book is for every data professional, including data engineers, ETL developers, DB administrators, Data Scientists, SQL Developers, and BI specialists. You don’t need any prior expertise with this platform because the book covers all the basics.
Cover Page Title Page Copyright Page Dedication Page About the Authors About the Reviewer Acknowledgement Preface Errata Table of Contents 1. Getting Started with Databricks Platform Structure Objectives Introduction to Databricks What can we do with Databricks? Databricks architecture Control plane Data plane How does it work? Databricks for Data Engineers and Data Scientists Databricks SQL Features of Databricks SQL SQL endpoints for Databricks SQL Databricks components Workspace Notebooks Libraries Folder MLflow experiment Interface Databricks UI Databricks API Databricks CLI Data management DBFS Tables Database Metastore Computation management Cluster All-purpose cluster Job cluster Pools Databricks runtime Databricks runtime Databricks runtime for machine learning Photon Databricks light Databricks runtime for genomics (deprecated) Access management User Group Access Control Lists (ACLs) Conclusion Multiple choice questions Answers 2. Management of Databricks Platform Structure Objectives Databricks cluster basics Cluster computation resources Clusters Cluster governance Platform architecture, security, and data protection Platform architecture Platform security Data Protection Databricks data access management Databricks cluster management Databricks SQL Analytics administration Conclusion Multiple choice questions Answers 3. Spark, Databricks, and Building a Data Quality Framework Structure Objectives Introduction to Apache Spark History Evolution to DataBricks What happened to Apache Spark? Features of Apache Spark The book paraphrase and translation analogy Spark and its evolution Components of Apache Spark Resilient Distributed Dataset (RDD) Datasets and DataFrames Directed Acyclic Graph (DAG) Execution mechanism Processing data using Databricks pipeline Building an audit framework with Databricks Time travel Conclusion Multiple choice questions Answers 4. Data Sharing and Orchestration with Databricks Orchestrating Data and Machine Learning pipelines in Databricks Running Databricks tasks using Amazon Managed Airflow Run and orchestrate the Databricks tasks using Data Factory Create an Azure Databricks linked service Conclusion Multiple choice questions Answers 5. Simplified ETL with Delta Live Tables Structure Objectives Delta Live Table concepts Components of the Delta Live Table Creating Delta Live Tables using Python and SQL Delta Live Table components Development workflow with Delta Live Table Delta Live Table configurations Conclusion Multiple choice questions Answers 6. SCD Type 2 Implementation with Delta Lake Structure Objectives Streaming data with structure streaming Change Data Feed Conclusion Multiple choice questions Answers 7. Machine Learning Model Management with Databricks Structure Objectives Introduction to MLOps and MLflow Model life cycle management using MLflow Getting started with MLflow environment MLflow installation Setting up MLflow project with model repository Train and deploy the model Log model metrics Conclusion Multiple choice questions Answers 8. Continuous Integration and Delivery with Databricks Structure Objectives Repos for Git integration Conclusion Multiple Choice Questions Answers 9. Visualization with Databricks Structure Objectives Databricks SQL Analytics Databricks as a data source with Tableau Databricks DirectQuery with Power BI Databricks DirectQuery with Qlik Databricks DirectQuery with TIBCO Spotfire Analyst Conclusion Multiple choice questions Answers 10. Best Security and Compliance Practices of Databricks Structure Objectives Delta Lake: hyperparameter tuning with Hyperopt Access control and secret management Cluster configuration and policies Data governance GDPR and CCPA compliance using Delta Lake Conclusion Multiple choice questions Answers Index
Donate to keep this site alive
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.