Big Data and Hadoop: Learn by Example

Length: 333 pages
Edition: 1
Language: English
Publisher: BPB Publications
Publication Date: 2018-04-26
ISBN-10: 9386551993
ISBN-13: 9789386551993
Sales Rank: #8188670 (See Top 100 Books)

The book contains the latest trend in IT industry ‘BigData and Hadoop’. It explains how big is ‘Big Data’ and why everybody is trying to implement this into their IT project. It includes research work on various topics, theoretical and practical approach, each component of the architecture is described along with current industry trends. Big Data and Hadoop have taken together are a new skill as per the industry standards. Readers will get a compact book along with the industry experience and would be a reference to help readers. KEY FEATURES Overview Of Big Data, Basics of Hadoop, Hadoop Distributed File System, HBase, MapReduce, HIVE: The Dataware House Of Hadoop, PIG: The Higher Level Programming Environment, SQOOP: Importing Data From Heterogeneous Sources, Flume, Ozzie, Zookeeper & Big Data Stream Mining, Chapter-wise Questions & Previous Years Questions

Chapter 1: Big Data-Introduction and Demand
1.1 Big Data
1.1.1 Characteristics of Big Data
1.1.2 Why Big Data
1.2 Hadoop
1.2.1 History of Hadoop
1.2.2 Name of Hadoop
1.2.3 Hadoop Ecosystem
1.3 Convergence of Key Trends
1.3.1 Convergence of Big Data into Business
1.3.2 Big data Vs other techniques
1.4 Unstructured Data
1.5 Industry examples of Big data
1.5.1 Use of Big data-Hadoop at Yahoo
1.5.2 In RackSpace for log processing
1.5.3 Hadoop at Facebook
1.6 Usages of Big Data
1.6.1 Web analytics
1.6.2 Big Data and marketing
1.6.3 Big data and fraud
1.6.4 Risk management in Big Data with Credit card
1.6.5 Big data and algorithm trading
1.6.6 Big data in Healthcare
 
Chapter 2: NoSQL Data Management
2.1 Introduction to NoSQL database
2.1.1 Terminology used in NoSQL and RDBMS
2.1.2 Database use in NoSQL
2.2 SQL Vs NoSQL
2.2.1 Denormalization
2.2.2 Data distribution
2.2.3 Data Durability
2.3 Consistency in NoSQL
2.3.1 ACID Vs BASE
2.3.2 Relaxing Consistency
2.4 Hbase
2.4.1 Installation
2.4.2 History
2.4.3 Hbase Data Structure
2.4.4 Physical Storage
2.4.5 Components
2.4.6 Hbase Shell Commands
2.4.7 The different usages of scan command
2.4.8 Terminologies
2.4.8.1 Version Stamp
2.4.8.2 Region
2.4.8.3 Locking
2.5 MapReduce
2.5.1 MapReduce Architecture
2.5.2 MapReduce datatype
2.5.3 File input format
2.5.4 Java MapReduce
2.6 Partitioner and Combiner
2.6.1 Example in MapReduce
2.6.2 Situation for Partitioner and Combiner
2.6.3 Use of combiner
2.7 Composing MapReduce Calculations
 
Chapter 3: Basics of Hadoop
3.1 Data Format
3.2 Analysing data with Hadoop
3.3 Scale-in Vs Scale-out
3.3.1 Number of reducers used
3.3.2 Driver class with no reducer
3.4 Hadoop Streaming
3.4.1 Streaming in Ruby
3.4.2 Streaming in Python
3.4.3 Streaming in Java
3.5 Hadoop pipes
3.6 Design of HDFS
3.6.1 Very large
3.6.2 Streaming data access
3.6.3 Commodity Hardware
3.6.4 Low-latency data access
3.6.5 Lots of small files
3.6.6 Arbitrary file modifications
3.7 HDFS Concept
3.7.1 Blocks
3.7.2 Namenodes and Datanodes
3.7.3 HDFS group
3.7.4 All time availability
3.8 Hadoop Files System
3.9 Java Interface
3.9.1 HTTP
3.9.2 C
3.9.3 FUSE (File System in Userspace)
3.9.4 Reading data using Java interface (URL)
3.9.5 Reading data using java interface (File System API)
3.10 Data Flow
3.10.1 File Read
3.10.2 File write
3.10.3 Coherency Model
3.10.4 Cluster Balance
3.10.5 Hadoop Archive
3.11 Hadoop I/O
3.11.1 Data Integrity
3.11.2 Local File System
3.12 Compression
3.12.1 Codecs
3.12.2 Compression and Input Splits
3.12.3 Map output
3.13 Serialization
3.14 Avro file based data structure
3.14.1 Data type and schemas
3.14.2 Serialization and deserialization
3.14.3 Avro MapReduce
 
Chapter 4: Hadoop Installation (Step by Step)
4.1 Introduction
4.1.1 On VMware
4.1.2 Oracle Virtual Box
4.2 On Ubuntu 16.04
4.3 Fully Distributed Mode
 
Chapter 5: MapReduce Applications
5.1 Understanding of MapReduce
5.2 Traditional Way
5.3 MapReduce Workflow
5.3.1 Map Side
5.3.2 Reduce Side
5.4 Unit Test with MRUnit
5.4.1 Testing Mapper Class
5.4.2 Testing Reducer Class
5.4.3 Testing Driver Class of Program
5.4.4 Test output of program
5.5 Test Data and Local Data Check
5.5.1 Debugging MapReduce Job
5.5.2 Job Control
5.6 Anatomy of MapReduce Job
5.6.1 Anatomy of File Write
5.6.2 Anatomy of File Read
5.6.3 Replica Management
5.7 MapReduce Job Run
5.7.1 Classic MapReduce (MapReduce 1)
5.7.2 MapReduce2 (YARN)
5.7.3 Failure in MapReduce1
5.7.4 Failure in YARN
5.8 Job Scheduling
5.9 Shuffle and Sort
5.9.1 Map Side
5.9.2 Reduce Side
5.10 Task Execution
5.10.1 Task JVM
5.10.2 Skipping Bad Records
5.11 MapReduce Types
5.11.1 Input type
5.11.2 Output type
 
Chapter 6: Hadoop Related Tools-I (Hbase & Cassandra)
6.1 Installation of Hbase
6.2 Conceptual Architecture
6.2.1 Regions
6.2.2 Locking
6.3 Implementation
6.4 HBase Vs RDBMS
6.5 HBase Client
6.6 HBase Examples and Commands
6.6.1 Inserting data by using HBase shell
6.6.2 Updating data by using HBase shell
6.6.3 Reading data by using HBase shell
6.6.4 Reading a given column
6.6.5 Delete specific cell in table
6.6.6 Delete all cells in a table
6.6.7 Scanning using HBase shell
6.6.8 Count
6.6.9 Truncate
6.6.10 Disable table
6.6.11 Verification
6.6.12 Alter table
6.6.13 Scope operator for alter table
6.6.14 Deleting column family
6.6.15 Existence of table
6.6.16 Dropping a table
6.6.17 Drop all table
6.7 HBase using Java APIs
6.7.1 Creating table
6.7.2 List of the tables in HBase
6.7.3 Disable a table
6.7.4 Add column family
6.7.5 Deleting column family
6.7.6 Verifying existence of table
6.7.7 Deleting table
6.7.8 Stopping HBase
6.8 Praxis
6.8.1 Versions
6.8.2 HDFS
6.8.3 Schema design
6.9 Cassandra
6.9.1 CAP Theorem
6.9.2 Characteristics of Cassandra
6.9.3 Installing Cassandra
6.9.4 Basic CLI commands
6.10 Cassandra Data Model
6.10.1 Super Column family
6.10.2 Clusters
6.10.3 Keyspaces
6.10.4 Column families
6.10.5 Super columns
6.11 Cassandra Examples
6.11.1 Creating a keyspace
6.11.2 Alter Keyspace
6.11.3 Dropping a keyspace
6.11.4 Create table
6.11.5 Primary key
6.11.6 Alter table
6.11.7 Truncate table
6.11.8 Executing batch
6.11.9 Delete entire row
6.11.10 Describe
6.12 Cassandra Client
6.12.1 Thrift
6.12.2 Avro
6.12.3 Hector
6.12.4 Chirper
6.12.5 Pelops
6.13 Hadoop Integration
6.14 Use Cases
6.14.1 eBay
6.14.2 HULU
 
Chapter 7: Hadoop Related Tools-II (PigLatin & HiveQL)
7.1 PigLatin
7.2 Installation
7.3 Execution Type
7.3.1 Local mode
7.3.2 MapReduce mode
7.4 Platform for Running Pig Programs
7.4.1 Script
7.4.2 Grunt
7.4.3 Embedded
7.5 Grunt
7.5.1 Example
7.5.2 Commands in grunt
7.6 Pig Data Model
7.6.1 Scalar
7.6.2 Complex
7.7 PigLatin
7.7.1 Input & Output
7.7.2 Store
7.7.3 Relational Operations
7.7.4 User Defined Functions
7.8 Developing and Testing PigLatin Script
7.8.1 Dump operator
7.8.2 Describe operator
7.8.3 Explanation operator
7.8.4 Illustration operator
7.9 Hive
7.9.1 Installation Hive
7.9.2 Hive Architecture
7.9.3 Hive Services
7.10 Data Type and File Format
7.11 Comparison of HiveQL with Traditional Database
7.12 HiveQL
7.12.1 Data Definition Language
7.12.2 Data Manipulation Language
7.12.3 Example for practice
 
Chapter 8: Practical & Research based Topics
8.1 Data Analysis with Twitter
8.1.1 Using flume
8.1.2 Data Extraction using Java
8.1.3 Data Extraction using Python
8.2 Use of Bloom Filter in MapReduce
8.2.1 Function of Bloom filter
8.2.2 Working of bloom filter
8.2.3 Application of Bloom filter
8.2.4 Implementation of Bloom filter in MapReduce
8.3 Amazon Web Service
8.3.1 AWS
8.3.2 Setting AWS
8.3.3 Setting up Hadoop on EC2
8.4 Document Archived from NY Times
8.5 Data Mining in Mobiles
8.6 Hadoop Diagnosis
8.6.1 System's health
8.6.2 Setting permission
8.6.3 Managing quotas
8.6.4 Enabling trash
8.6.5 Removing datanode
 
Appendix: Hadoop Commands
 
Chapter wise Questions
 
Previous Year Question Paper