Data Processing and Modeling with Hadoop: Mastering Hadoop Ecosystem Including ETL, Data Vault, DMBok, GDPR, and Various Data-Centric Tools
Understand data in a simple way using a data lake.
- In-depth practical demonstration of Hadoop/Yarn concepts with numerous examples.
- Includes graphical illustrations and visual explanations for Hadoop commands and parameters.
- Includes details of dimensional modeling and Data Vault modeling.
- Includes details of how to create and define a structure to a data lake.
The book ‘Data Processing and Modeling with Hadoop’ explains how a distributed system works and its benefits in the big data era in a straightforward and clear manner. After reading the book, you will be able to plan and organize projects involving a massive amount of data.
The book describes the standards and technologies that aid in data management and compares them to other technology business standards. The reader receives practical guidance on how to segregate and separate data into zones, as well as how to develop a model that can aid in data evolution. It discusses security and the measures that are utilized to reduce the impact of security. Self-service analytics, Data Lake, Data Vault 2.0, and Data Mesh are discussed in the book.
After reading this book, the reader will have a thorough understanding of how to structure a data lake, as well as the ability to plan, organize, and carry out the implementation of a data-driven business with full governance and security.
What you will learn
- Learn the basics of components to the Hadoop Ecosystem.
- Understand the structure, files, and zones of a Data Lake.
- Learn to implement the security part of the Hadoop Ecosystem.
- Learn to work with the Data Vault 2.0 modeling.
- Learn to develop a strategy to define good governance.
- Learn new tools to work with Data and Big Data
Who this book is for
This book caters to big data developers, technical specialists, consultants, and students who want to build good proficiency in big data. Knowing basic SQL concepts, modeling, and development would be good, although not mandatory.
1. Understanding the Current Moment Introduction Structure Objectives A little context Why use it? Solving problems Hadoop ecosystem Building the data lake What does the data tell us? Conclusion Points to remember Questions Multiple choice questions Answers 2. Defining the Zones Introduction Structure Objectives Why separate data into zones? Transition zone RAW zone Trusted zone Refined zone Where to put my Sandbox Conclusion Points to remember Questions Multiple choice questions Answers 3. The Importance of Modeling Introduction Structure Objectives Why should we model our environment? Data Vault 2.0 How to plan modeling Conclusion Points to remember Questions Multiple choice questions Answers 4. Massive Parallel Processing Introduction Structure Objectives How did we arrive and where did we arrive? What is MapReduce? MapReduce features Introduction to Spark Resource Manager – YARN Introduction to Apache Tez Conclusion Points to remember Questions Multiple choice questions Answers 5. Doing ETL/ELT Introduction Structure Objectives Transforming data into information Identifying enemies Main types of transformations Planning the rollback Why a data mart? Feedback Data lake and data warehouse secrets Conclusion Points to remember Questions Multiple choice questions Answers 6. A Little Governance Introduction Structure Objectives Governing the data Main difficulties What methodologies and tools to use? Defining a deployment roadmap Conclusion Points to remember Questions Multiple choice questions Answers 7. Talking About Security Introduction Structure Objectives Need to worry about security The main difficulties Making identification, authorization, and authentication The main tools Defining a schedule Conclusion Points to remember Questions Multiple choice questions Answers 8. What Are the Next Steps? Introduction Structure Objectives A new era Separating a batch from real time Defining the visualization tools Machine learning New tendencies Conclusion Questions Multiple choice questions Answers Index
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.