Data Analytics in the AWS Cloud: Building a Data Platform for BI and Predictive Analytics on AWS
- Length: 416 pages
- Edition: 1
- Language: English
- Publisher: Sybex
- Publication Date: 2023-05-02
- ISBN-10: 1119909244
- ISBN-13: 9781119909248
- Sales Rank: #5942895 (See Top 100 Books)
A comprehensive and accessible roadmap to performing data analytics in the AWS cloud
In Data Analytics in the AWS Cloud: Building a Data Platform for BI and Predictive Analytics on AWS, accomplished software engineer and data architect Joe Minichino delivers an expert blueprint to storing, processing, analyzing data on the Amazon Web Services cloud platform. In the book, you’ll explore every relevant aspect of data analytics―from data engineering to analysis, business intelligence, DevOps, and MLOps―as you discover how to integrate machine learning predictions with analytics engines and visualization tools.
You’ll also find:
- Real-world use cases of AWS architectures that demystify the applications of data analytics
- Accessible introductions to data acquisition, importation, storage, visualization, and reporting
- Expert insights into serverless data engineering and how to use it to reduce overhead and costs, improve stability, and simplify maintenance
A can’t-miss for data architects, analysts, engineers and technical professionals, Data Analytics in the AWS Cloud will also earn a place on the bookshelves of business leaders seeking a better understanding of data analytics on the AWS cloud platform.
Cover Title Page Copyright Page About the Author About the Technical Editor Acknowledgments Contents at a Glance Contents Introduction What Is a Data Lake? When You Do Not Need a Data Lake When Do You Need Analytics? When Do You Need a Data Lake for Analytics? How About an Analytics Team? The Data Platform The End of the Beginning Chapter 1 AWS Data Lakes and Analytics Technology Overview Why AWS? What Does a Data Lake Look Like in AWS? Analytics on AWS Skills Required to Build and Maintain an AWS Analytics Pipeline Chapter 2 The Path to Analytics: Setting Up a Data and Analytics Team The Data Vision Support DA Team Roles Early Stage Roles Team Lead Data Architect Data Engineer Data Analyst Maturity Stage Roles Data Scientist Cloud Engineer Business Intelligence (BI) Developer Machine Learning Engineer Business Analyst Niche Roles Analytics Flow at a Process Level Workflow Methodology The DA Team Mantra: “Automate Everything” Analytics Models in the Wild: Centralized, Distributed, Center of Excellence Centralized Distributed Center of Excellence Summary Chapter 3 Working on AWS Accessing AWS Everything Is a Resource S3: An Important Exception IAM: Policies, Roles, and Users Policies Identity-Based Policies Resource-Based Policies Roles Users and User Groups Summarizing IAM Working with the Web Console The AWS Command-Line Interface Installing AWS CLI Linux Installation macOS Installation Windows Configuring AWS CLI A Note on Region Setting Individual Parameters Using Profiles and Configuration Files Final Notes on Configuration Using the AWS CLI Using Skeletons and File Inputs Cleaning Up! Infrastructure-as-Code: CloudFormation and Terraform CloudFormation CloudFormation Stacks CloudFormation Template Anatomy CloudFormation Changesets Getting Stack Information Cleaning Up Again CloudFormation Conclusions Terraform Coding Style Modularity Limitations Terraform vs. CloudFormation Infrastructure-as-Code: CDK, Pulumi, Cloudcraft, and Other Solutions AWS CDK Pulumi Cloudcraft Infrastructure Management Conclusions Chapter 4 Serverless Computing and Data Engineering Serverless vs. Fully Managed AWS Serverless Technologies AWS Lambda Pricing Model Laser Focus on Code The Lambda Paradigm Shift Virtually Infinite Scalability Geographical Distribution A Lambda Hello World Lambda Configuration Runtime Container-Based Lambdas Architectures Memory Networking Execution Role Environment Variables AWS EventBridge AWS Fargate AWS DynamoDB AWS SNS Amazon SQS AWS CloudWatch Amazon QuickSight AWS Step Functions Amazon API Gateway Amazon Cognito AWS Serverless Application Model (SAM) Ephemeral Infrastructure AWS SAM Installation Configuration Creating Your First AWS SAM Project Application Structure SAM Resource Types SAM Lambda Template !! Recursive Lambda Invocation !! Function Metadata Outputs Implicitly Generated Resources Other Template Sections Lambda Code Building Your First SAM Application Testing the AWS SAM Application Locally Deployment Cleaning Up Summary Chapter 5 Data Ingestion AWS Data Lake Architecture Serverless Data Lake Architecture Structure Ingestion Storage and Processing Cataloging, Governance, and Search Security and Monitoring Consumption Sample Processing Architecture: Cataloging Images into DynamoDB Use Case Description SAM Application Creation S3-Triggered Lambda Adding DynamoDB Lambda Execution Context Inserting into DynamoDB Cleaning Up Serverless Ingestion AWS Fargate AWS Lambda Example Architecture: Fargate-Based Periodic Batch Import The Basic Importer ECS CLI AWS Copilot CLI Clean Up AWS Kinesis Ingestion Example Architecture: Two-Pronged Delivery Fully Managed Ingestion with AppFlow Operational Data Ingestion with Database Migration Service DMS Concepts DMS Instance DMS Endpoints DMS Tasks Summary of the Workflow Common Use of DMS Example Architecture: DMS to S3 DMS Instance DMS Endpoints DMS Task Summary Chapter 6 Processing Data Phases of Data Preparation What Is ETL? Why Should I Care? ETL Job vs. Streaming Job Overview of ETL in AWS ETL with AWS Glue ETL with Lambda Functions ETL with Hadoop/EMR Other Ways to Perform ETL ETL Job Design Concepts Source Identification Destination Identification Mappings Validation Filter Join, Denormalization, Relationalization AWS Glue for ETL Really, It’s Just Spark Visual Spark Script Editor Python Shell Script Editor Jupyter Notebook Connectors Creating Connections Creating Connections with the Web Console Creating Connections with the AWS CLI Creating ETL Jobs with AWS Glue Visual Editor ETL Example: Format Switch from Raw (JSON) to Cleaned (Parquet) Job Bookmarks Transformations Apply Mapping Filter Other Available Transforms Run the Edited Job Visual Editor with Source and Target Conclusions Creating ETL Jobs with AWS Glue Visual Editor (without Source and Target) Creating ETL Jobs with the Spark Script Editor Developing ETL Jobs with AWS Glue Notebooks What Is a Notebook? Notebook Structure Step 1: Load Code into a DynamicFrame Step 2: Apply Field Mapping Step 3: Apply the Filter Step 4: Write to S3 in Parquet Format Example: Joining and Denormalizing Data from Two S3 Locations Conclusions for Manually Authored Jobs with Notebooks Creating ETL Jobs with AWS Glue Interactive Sessions It’s Magic Development Workflow Streaming Jobs Differences with a Standard ETL Job Streaming Sources Example: Process Kinesis Streams with a Streaming Job Streaming ETL Jobs Conclusions Summary Chapter 7 Cataloging, Governance, and Search Cataloging with AWS Glue AWS Glue and the AWS Glue Data Catalog Glue Databases and Tables Databases The Idea of Schema-on-Read Tables Create Table Manually Creating a Table from an Existing Schema Creating a Table with a Crawler Summary on Databases and Tables Crawlers Updating or Not Updating? Running the Crawler Creating a Crawler from the AWS CLI Retrieving Table Information from the CLI Classifiers Classifier Example Crawlers and Classifiers Summary Search with Amazon Athena: The Heart of Analytics in AWS A Bit of History Interface Overview Creating Tables Manually Athena Data Types Complex Types Running a Query Connecting with JDBC and ODBC Query Stats Recent Queries and Saved Queries The Power of Partitions Athena Pricing Model Automatic Naming Athena Query Output Athena Peculiarities (SQL and Not) Computed Fields Gotcha and WITH Statement Workaround Lowercase! Query Explain Deduplicating Records Working with JSON, Flattening, and Unnesting Athena Views CREATE TABLE AS SELECT (CTAS) Saving Queries and Reusing Saved Queries Running Parameterized Queries Athena Federated Queries Athena Lambda Connectors Note on Connection Errors Performing Federated Queries Creating a View from a Federated Query Governing: Athena Workgroups, Lake Formation, and More Athena Workgroups Fine-Grained Athena Access with IAM Recap of Athena-Based Governance AWS Lake Formation Registering a Location in Lake Formation Creating a Database in Lake Formation Assigning Permissions in Lake Formation LF-Tags and Permissions in Lake Formation Data Filters Governance Conclusions Summary Chapter 8 Data Consumption: BI, Visualization, and Reporting QuickSight Signing Up for QuickSight Standard Plan Enterprise Plan Users and User Groups Managing Users and Groups Managing QuickSight Users and Groups Your Subscriptions SPICE Capacity Account Settings Security and Permissions VPC Connections Mobile Settings Domains and Embedding Single Sign-On Data Sources and Datasets Creating an Athena Data Source Creating Other Data Sources Creating a Data Source from the AWS CLI Creating a Dataset from a Table Creating a Dataset from a SQL Query Duplicating Datasets Note on Creating Datasets QuickSight Favorites, Recent, and Folders SPICE Manage SPICE Capacity Refresh Schedule QuickSight Data Editor QuickSight Data Types Change Data Types Calculated Fields Joining Data Excluding Fields Filtering Data Removing Data Geospatial Hierarchies and Adding Fields to Hierarchies Unsupported Format Dates Visualizing Data: QuickSight Analysis Adding a Title and a Description to Your Analysis Renaming the Sheet Your First Visual with AutoGraph Field Wells Visual Types Saving and Autosaving A First Example: Pie Chart Renaming a Visual Filtering Data Adding Drill-Downs Parameters Actions Insights ML-Powered Insights Sharing an Analysis Dashboards Dashboard Layouts and Themes Publishing a Dashboard Embedding Visuals and Dashboards Data Consumption: Not Only Dashboards Summary Chapter 9 Machine Learning at Scale Machine Learning and Artificial Intelligence What Are ML/AI Use Cases? Types of ML Models Overview of ML/AI AWS Solutions Amazon SageMaker SageMaker Domains Adding a User to the Domain SageMaker Studio SageMaker Example Notebook Step 1: Prerequisites and Preprocessing Step 2: Data Ingestion Step 3: Data Inspection Step 4: Data Conversion Step 5: Upload Training Data Step 6: Train the Model Step 7: Set Up Hosting and Deploy the Model Step 8: Validate the Model Step 9: Use the Model Inference Real Time Asynchronous Serverless Batch Transform Data Wrangler SageMaker Canvas Summary Appendix Example Data Architectures in AWS Modern Data Lake Architecture ETL in a Lake House Consuming Data in the Lake House The Modern Data Lake Architecture Batch Processing Stream Processing Architecture Design Recommendations Automate Everything Build on Events Performance = Cost Savings AWS Glue Catalog and Athena-Centric Workflow Design Flexible Pick Your Battles Parquet Summary Index EULA
Donate to keep this site alive
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.