Hands-on Data Virtualization with Polybase
- Length: 496 pages
- Edition: 1
- Language: English
- Publisher: BPB Publications
- Publication Date: 2021-04-05
- ISBN-10: 9390684412
- ISBN-13: 9789390684410
- Sales Rank: #0 (See Top 100 Books)
Run queries and analysis on big data clusters across relational and non relational databases
Key Features
● Connect to Hadoop, Azure, Spark, Oracle, Teradata, Cassandra, MongoDB, CosmosDB, MySQL, PostgreSQL, MariaDB, and SAP HANA.
● Numerous techniques on how to query data and troubleshoot Polybase for better data analytics.
● Exclusive coverage on Azure Synapse Analytics and building Big Data clusters.
Description
This book brings exciting coverage on establishing and managing data virtualization using polybase. This book teaches how to configure polybase on almost all relational and nonrelational databases. You will learn to set up the test environment for any tool or software instantly without hassle. You will practice how to design and build some of the high performing data warehousing solutions and that too in a few minutes of time.
You will almost become an expert in connecting to all databases including hadoop, cassandra, MySQL, PostgreSQL, MariaDB and Oracle database. This book also brings exclusive coverage on how to build data clusters on Azure and using Azure Synapse Analytics. By the end of this book, you just don’t administer the polybase for managing big data clusters but rather you learn to optimize and boost the performance for enabling data analytics and ease of data accessibility.
What you will learn
● Learn to configure Polybase and process Transact SQL queries with ease.
● Create a Docker container with SQL Server 2019 on Windows and Polybase.
● Establish SQL Server instance with any other software or tool using Polybase.
● Connect with Cassandra, MongoDB, MySQL, PostgreSQL, MariaDB, and IBM DB2.
Who this book is for
This book is for database developers and administrators familiar with the SQL language and command prompt. Managers and decision-makers will also find this book useful. No prior knowledge of any other technology or language is required.
Table of Contents
1. What is Data Virtualization (Polybase)
2. History of Polybase
3. Polybase current state
4. Differences with other technologies
5. Usage
6. Future
7. SQL Server
8. Hadoop Cloudera and Hortonworks
9. Windows Azure Storage Blob
10. Spark
11. From Azure Synapse Analytics
12. From Big Data Clusters
13. Oracle
14. Teradata
15. Cassandra
16. MongoDB
17. CosmosDB
18. MySQL
19. PostgreSQL
20. MariaDB
21. SAP HANA
22. IBM DB2
23. Excel
About the Authors
Pablo Echeverria is a talented database and software developer. He tuned long-running queries in Oracle and SQL achieving an execution time of under one-second, reducing resource usage up to 10%, and streamlined client processes, reducing work time by 50%. He is a critical thinker who focuses on implementation and testing. He loves learning and connecting new technologies.
LinkedIn profile: https://www.linkedin.com/in/pablo-echeverria/
Blog Link: https://www.sqlservercentral.com/author/pabechevb
Cover Page Title Page Copyright Page Dedication Page About the Author About the Reviewer Acknowledgement Preface Errata Table of Contents 1. Data Virtualization Structure Objectives Filtering the information Link relational data with storage/file system data What you would have to do without data virtualization How data virtualization simplifies querying external data How learning PolyBase can help you irrespective of your role Conclusion Points to remember Multiple choice questions Answer Questions 2. History of PolyBase Structure Objectives The data warehousing market November 2010, the basis: Parallel Data Warehouse (PDW) November 2012: PolyBase official announcement at the SQL PASS session July 12, 2013: PDW 2012 (v2) release May 1, 2014: PDW v2 AU1 also known as Analytics Platform System (APS) Other APS AU releases (2, 3, 4, and 5) SQL Server 2016 SQL Server 2017 Conclusion Points to remember Multiple choice questions Answer Questions 3. PolyBase Current State Structure Objectives Analytics Platform System and Azure Data Warehouse Azure Synapse Analytics Data Warehouse Fast Track deployment SQL Server 2019 Enterprise Big Data Clusters Conclusion Points to remember Multiple choice questions Answer Questions 4. Difference between PolyBase and Other Technologies Structure Objectives Reasoning OPENROWSET OPENDATASOURCE Linked servers Oracle EXTERNAL TABLE Oracle DATABASE LINK Oracle Data Service Integrator (ODSI) Teradata Other relational database management systems Any programming language/environment: Java, Scala, Python, R, .Net C#, PowerShell, etc. Azure Data Lake Storage (ADLS) U-SQL Power BI SQL Server Analysis Services/Azure Analysis Services Tibco Data Virtualization (TDV) Other data virtualization tools Conclusion Points to remember Multiple choice questions Answer Questions 5. Usage Structure Objectives Aging out or archiving tables Logging tables Programming language and environment Data Warehouse (DW) applications Data consuming programs: ETL, analysis, and reporting Other relational database management systems Places where you manifest your preferences, directly or indirectly Machine learning (ML) and artificial intelligence (AI) Conclusion Points to remember Multiple choice questions Answer Questions 6. Future of PolyBase Structure Objectives Support other file formats Query multiple Hadoop types at once Additional external sources Native support replacing Open DataBase Connectivity (ODBC) Execute commands directly on the external source Add an entire database instead of individual tables Query/write in parallel different external sources at the same time Independence from Java Other security protocols Large rows and large objects Tight control over results Table partitions to an external table Dashboard reports Get notified when the underlying data has changed Reuse data sets within the same plan and across nodes Index the external data Allow CRUD operations unsupported on the external source Automatically aggregate statistics about the external data Conclusion Points to remember Multiple choice questions Answer Questions 7. SQL Server Structure Objectives Download required media Installation considerations Installation process Configuration Test using a loopback source Read SQL Server data from a remote server Read SQL Server data using an intermediate server Serial vs. parallel loading Monitoring and troubleshooting Conclusion Points to remember Multiple choice questions Answer Questions 8. Hadoop Cloudera and Hortonworks Structure Objectives Prerequisites Hadoop SQL Server SequenceIQ Hadoop Hortonworks Data Platform (HDP) Sandbox Cloudbreak for Hortonworks Data Platform (HDP) Conclusion Points to remember Multiple choice questions Answer Questions 9. Azure Storage Structure Objectives Azure Storage SQL Server WASB Cloudbreak for Hortonworks Data Platform with Cloud Storage Conclusion Points to remember Multiple choice questions Answer Questions 10. Spark Structure Objectives Prerequisites Spark SQL Server HDInsight Conclusion Points to remember Multiple choice questions Answer Questions 11. Azure Synapse Analytics Structure Objectives Azure Data Lake Store (ADLS) Gen1 Azure Data Lake Store (ADLS) Gen2 Azure Synapse Analytics (formerly SQL DW) Azure Synapse Analytics (workspaces) Cloudbreak for Hortonworks Data Platform (HDP) Conclusion Points to remember Multiple choice questions Answer Questions 12. Big Data Clusters Structure Objectives Prerequisites for creation of Big Data Clusters Manual creation of Big Data Clusters Creation of Big Data Clusters using Azure Data Studio Conclusion Points to remember Multiple choice questions Answer Questions 13. Oracle Structure Objectives SQL Server Oracle Conclusion Points to remember Multiple choice questions Answer Questions 14. Teradata Structure Objectives SQL Server Acronyms and definitions Teradata in Azure Teradata in Amazon Web Services (AWS) Teradata in a Virtual Machine Conclusion Points to remember Multiple choice questions Answer Questions 15. Cassandra Structure Objectives SQL Server Cassandra Conclusion Points to remember Multiple choice questions Answer Questions 16. MongoDB Structure Objectives SQL Server MongoDB Conclusion Points to remember Multiple choice questions Answer Questions 17. Cosmos DB Structure Objectives Introduction to Cosmos DB SQL Server Cosmos DB Emulator in a Docker container on Windows Azure Cosmos DB Conclusion Points to remember Multiple choice questions Answer Questions 18. MySQL Structure Objectives SQL Server MySQL Conclusion Points to remember Multiple choice questions Answer Questions 19. PostgreSQL Structure Objectives SQL Server PostgreSQL Conclusion Points to remember Multiple choice questions Answer Questions 20. MariaDB Structure Objectives SQL Server MariaDB Conclusion Points to remember Multiple choice questions Answer Questions 21. SAP HANA Structure Objectives SQL Server SAP HANA Conclusion Points to remember Multiple choice questions Answer Questions 22. IBM Db2 Structure Objectives SQL Server IBM Db2 Conclusion Points to remember Multiple choice questions Answer Questions 23. Excel Structure Objectives SQL Server Excel Conclusion Points to remember Multiple choice questions Answer Questions Index
Donate to keep this site alive
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.