APACHE SPARK: INVENT THE FUTURE
by Ernesto Lee
- Length: 482 pages
- Edition: 1
- Language: English
- Publisher: Independently published
- Publication Date: 2021-06-23
- ISBN-10: B097SNB8T6
- ISBN-13: 9798525708488
- Sales Rank: #3583299 (See Top 100 Books)
http://Ernesto.Net is the leader in high tech training courseware in the fields of Data Science, Full Stack Programming, Big Data, and Blockchain. This book is the primary courseware artifact used in the Apache Spark Master Class. In addition to the book, the following materials are also available:
- Complete Containerized Lab Environment (Zero Student Setup and hands on abs all done in the browser!) powered by: http://FeNAgO.com
- Student Workbook
- Instructor Workbook
- Customized Video Training (Optional)
- Customizable Content (Optional)
CHAPTER 1 INTRODUCTION TO APACHE SPARK Theory An Overview of Big Data Quick Introduction to Hadoop Why Hadoop? Quick Introduction to Hadoop Distributed File System Block Placement in HDFS HDFS Architecture Introduction to MapReduce Architecture of MapReduce LAB EXERCISE SUMMARY REFERENCES CHAPTER 2: PROGRAMMING WITH SCALA Theory What is Scala? Why Scala? Data Types in Scala Functions in Scala Collections in Scala Coding Scala Conclusion AIM LAB EXERCISE 1: PROGRAMMING WITH SCALA Task 1: Download and Install JDK Task 2: Download and Install Scala Task 3: Scala Basics Task 4: Loops Task 5: Functions Task 6: Collections LAB CHALLENGE SUMMARY REFERENCES CHAPTER 3: HANDS ON SPARK Theory Introduction to RDD Architecture of Spark AIM LAB EXERCISE 2: HANDS ON SPARK Task 1: Download and Install Spark Task 2: Installing Spark on Multi-Node Cluster Task 3: Creating RDDs from Spark-Shell Task 4: Basic RDD operations Task 5: Download and Install IntelliJ IDEA Task 6: Configuring Intellij IDEA SUMMARY REFERENCES CHAPTER 4: INTERNALS OF SPARK Theory Characteristics of RDD RDD Operations RDD Transformations RDD Actions Lineage Graph Directed Acyclic Graph AIM LAB EXERCISE 3: SPARK PROGRAM Task 1: Creating a new package in IntelliJ IDEA Task 2: Spark Program – Loading Data Task 3: Spark Program – Performing Operations Task 4: Spark Program – Saving Data Task 5: Spark Program – Lineage Graph Task 6: Spark Web Interface SUMMARY REFERENCES CHAPTER 5: RDD KEY-VALUE PAIRS & CACHING Theory Paired RDD Paired RDD Transformations Two Paired RDD Transformations Paired RDD Actions RDD Caching and Persistence Persistence Storage Levels AIM LAB EXERCISE 4: PAIRED RDD – HANDS ON Task 1: Creating a Tuple Task 2: Creating a Paired RDD Task 3: Performing Operations on Paired RDD Task 4: Performing more Operations on Paired RDD Task 5: Performing Joins on Paired RDDs Task 6: Performing Actions on Paired RDDs LAB CHALLENGE SUMMARY REFERENCES CHAPTER 6: SHARED VARIABLES Theory What are Shared Variables? Why Shared Variables? Broadcast Variables Optimizing Broadcast Variables Accumulators Points to remember when Accumulators are used Scala Monadic Collections Either Monadic Collection Option Monadic Collection Try Monadic Collection AIM LAB EXERCISE 5: SHARED VARIABLES – HANDS ON Task 1: Using Accumulator method Task 2: Implementing Record Parser Task 3: Implementing Counters Task 4: Implementing Accumulators V2 Task 5: Implementing Custom Accumulators V2 Task 6: Using Broadcast Variables SUMMARY REFERENCES CHAPTER 7: SPARK SQL Theory Types of Data What is Spark SQL? Why Spark SQL? Spark SQL Architecture AIM LAB EXERCISE 6: SPARK SQL – HANDS ON Task 1: Creating Data Frame using Data Source API Task 2: Creating DataFrame from an RDD Task 3: Creating Data Frame using StructType Task 4: Querying data using Spark SQL Task 5: Joins using Spark SQL Task 6: Operations using DataFrame API SUMMARY REFERENCES CHAPTER 8: DATASETS Theory RDD vs. DataFrame What are Datasets? Why Datasets? AIM LAB EXERCISE 7: DATASETS & FUNCTIONS Task 1: Creating Dataset using Data Source API Task 2: Creating Dataset from an RDD Task 3: Aggregate and Collection Functions Aggregate Functions Collection Functions Task 4: Date/Time Functions Task 5: Math and String Functions Math Functions String Functions Task 6: Window Functions SUMMARY REFERENCES CHAPTER 9: USER-DEFINED FUNCTIONS Theory Why User-Defined Functions? Steps to implement User-Defined Function UDAF Types Function currying in Scala Partially applied functions in Scala AIM LAB EXERCISE 8: USER DEFINED FUNCTIONS Task 1: Defining Currying Functions Task 2: Using partially applied functions Task 3: Writing User Defined Function Task 4: Writing Untyped UDAF Task 5: Using Untyped UDAF Task 6: Typed UDAF SUMMARY REFERENCES CHAPTER 10: FILE FORMATS Theory DataSource API Reading Data Read Modes Writing Data Save Modes Text Files CSV Files JSON Parquet Files ORC Files RDD API Text Files Sequence Files Hadoop Files AIM LAB EXERCISE 9: USING FILE FORMATS Task 1: Text Files RDD API DataSource API Task 2: CSV Files Task 3: JSON Files Task 4: Parquet Files Task 5: ORC Files Task 6: Hadoop and Sequence Files Sequence Files Hadoop Files SUMMARY REFERENCES CHAPTER 11: SPARK CONFIGURATIONS & OPTIMIZATIONS Theory Spark Configurations Spark Configuration Properties Environment Variables Logging Performance Optimization Using Datasets extensively Avoiding UDF and UDAF Data Serialization Spark Memory Tuning Level of Parallelism Levels of Data Locality Use Broadcast Variables Filter Data as soon as possible Logs More Power AIM LAB EXERCISE 10: SPARK CONFIGURATIONS & OPTIMIZATIONS Task 1: Spark Configuration File Task 2: Using spark-submit Tool Task 3: Environment Variables File Task 4: Logging Properties File Task 5: Checking Log Files SUMMARY REFERENCES
Donate to keep this site alive
To access the Link, solve the captcha.
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.