Data Science and Data Analytics: Opportunities and Challenges

Length: 482 pages
Edition: 1
Language: English
Publisher: Chapman and Hall/CRC
Publication Date: 2021-09-23
ISBN-10: 0367628821
ISBN-13: 9780367628826
Sales Rank: #0 (See Top 100 Books)

Data science is a multi-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge and insights from structured (labeled) and unstructured (unlabeled) data. It is the future of Artificial Intelligence (AI) and a necessity of the future to make things easier and more productive. In simple terms, data science is the discovery of data or uncovering hidden patterns (such as complex behaviors, trends, and inferences) from data. Moreover, Big Data analytics/data analytics are the analysis mechanisms used in data science by data scientists. Several tools, such as Hadoop, R, etc., are used to analyze this large amount of data to predict valuable information and for decision-making. Note that structured data can be easily analyzed by efficient (available) business intelligence tools, while most of the data (80% of data by 2020) is in an unstructured form that requires advanced analytics tools. But while analyzing this data, we face several concerns, such as complexity, scalability, privacy leaks, and trust issues.

Data science helps us to extract meaningful information or insights from unstructured or complex or large amounts of data (available or stored virtually in the cloud). Data Science and Data Analytics: Opportunities and Challenges covers all possible areas, applications with arising serious concerns, and challenges in this emerging field in detail with a comparative analysis/taxonomy.

FEATURES

Gives the concept of data science, tools, and algorithms that exist for many useful applications
Provides many challenges and opportunities in data science and data analytics that help researchers to identify research gaps or problems
Identifies many areas and uses of data science in the smart era
Applies data science to agriculture, healthcare, graph mining, education, security, etc.

Academicians, data scientists, and stockbrokers from industry/business will find this book useful for designing optimal strategies to enhance their firm’s productivity.

Cover
Half Title
Title Page
Copyright Page
Contents
Preface
Editor
Contributors
Section I: Introduction about Data Science and Data Analytics
1. Data Science and Data Analytics: Artificial Intelligence and Machine Learning Integrated Based Approach
	1.1 Introduction
	1.2 Artificial Intelligence
	1.3 Machine Learning (ML)
		1.3.1 Regression
			1.3.1.1 Linear Regression
			1.3.1.2 Logistic Regression
				Multi-class Logistic Regression
				Polytomous Logistic Regression
		1.3.2 Support Vector Machine (SVM)
	1.4 Deep Learning (DL)
		1.4.1 Methods for Deep Learning
			1.4.1.1 Convolutional Neural Networks (CNNs)
				General Model of Convolutional Neural Network
				Convolution Layer
				Nonlinear Activation Function
				Pooling Layer
				Fully Connected Layer
				Last Layer Activation Function
			1.4.1.2 Extreme Learning Machine
			1.4.1.3 Transfer Learning (TL)
				Important Considerations for Transfer Learning
				Types of Transfer Learning
	1.5 Bio-inspired Algorithms for Data Analytics
	1.6 Conclusion
	References
2. IoT Analytics/Data Science for IoT
	2.1 Preface
		2.1.1 Data Science Components
		2.1.2 Method for Data Science
		2.1.3 The Internet of Stuff
			2.1.3.1 Difficulties in the Comprehension of Stuff on the Internet
			2.1.3.2 Sub-domain of Data Science for IoT
			2.1.3.3 IoT and Relationship with Data
			2.1.3.4 IoT Applications in Data Science Challenges
			2.1.3.5 Ways to Distribute Algorithms in Computer Science to IoT Data
	2.2 Computational Methodology-IoT Science for Data Science
		2.2.1 Regression
		2.2.2 Set of Trainings
		2.2.3 Pre-processing
		2.2.4 Sensor Fusion Leverage for the Internet of Things
	2.3 Methodology-IoT Mechanism of Privacy
		2.3.1 Principles for IoT Security
		2.3.2 IoT Architecture Offline
		2.3.3 Offline IoT Architecture
		2.3.4 Online IoT Architecture
		2.3.5 IoT Security Issues
		2.3.6 Applications
	2.4 Consummation
	References
3. A Model to Identify Agriculture Production Using Data Science Techniques
	3.1 Agriculture System Application Based on GPS/GIS Gathered Information
		3.1.1 Important Tools Required for Developing GIS/GPS-Based Agricultural System
			3.1.1.1 Information (Gathered Data)
			3.1.1.2 Map
			3.1.1.3 System Apps
			3.1.1.4 Data Analysis
		3.1.2 GPS/GIS in Agricultural Conditions
			3.1.2.1 GIS System in Agriculture
		3.1.3 System Development Using GIS and GPS Data
	3.2 Design of Interface to Extract Soil Moisture and Mineral Content in Agricultural Lands
		3.2.1 Estimating Level of Soil Moisture and Mineral Content Using COSMIC-RAY (C-RAY) Sensing Technique
			3.2.1.1 Cosmic
		3.2.2 Soil Moisture and Mineral Content Measurement Using Long Duration Optical Fiber Grating (LDOPG)
		3.2.3 Moisture Level and Mineral Content Detection System Using a Sensor Device
		3.2.4 Soil Moisture Experiment
			3.2.4.1 Dataset Description
		3.2.5 Experimental Result
	3.3 Analysis and Guidelines for Seed Spacing
		3.3.1 Correct Spacing
		3.3.2 System Components
			3.3.2.1 Electronic Compass
			3.3.2.2 Optical Flow Sensor
			3.3.2.3 Motor Driver
			3.3.2.4 Microcontroller
	3.4 Analysis of Spread of Fertilizers
		3.4.1 Relationship between Soil pH Value and Nutrient Availability
		3.4.2 Methodology
			3.4.2.1 Understand Define Phase
			3.4.2.2 Analysis and Quick Design Phase
			3.4.2.3 Prototype Development Phase
			3.4.2.4 Testing Phase
		3.4.3 System Architecture
		3.4.4 Experimental Setup
		3.4.5 Implementation Phase
		3.4.6 Experimental Results
	3.5 Conclusion and Future Work
	References
4. Identification and Classification of Paddy Crop Diseases Using Big Data Machine Learning Techniques
	4.1 Introduction
		4.1.1 Overview of Paddy Crop Diseases
		4.1.2 Overview of Big Data
			4.1.2.1 Features of Big Data
		4.1.3 Overview of Machine Learning Techniques
			4.1.3.1 K-Nearest Neighbor
			4.1.3.2 Support Vector Machine
			4.1.3.3 K-Means
			4.1.3.4 Fuzzy C-Means
			4.1.3.5 Decision Tree
		4.1.4 Overview of Big Data Machine Learning Tools
			4.1.4.1 Hadoop
			4.1.4.2 Hadoop Distributed File System (HDFS)
			4.1.4.3 YARN ("Yet Another Resource Negotiator")
	4.2 Related Work
		4.2.1 Image Recognition/Processing
		4.2.2 Classification and Feature Extraction
		4.2.3 Problems and Diseases
	4.3 Proposed Architecture
		4.3.1 Image Acquisition
		4.3.2 Image Enhancement
		4.3.3 Image Segmentation
		4.3.4 Feature Extraction
		4.3.5 Classification
	4.4 Proposed Algorithms and Implementation Details
		4.4.1 Image Preprocessing
		4.4.2 Image Segmentation and the Fuzzy C-Means Model Using Spark
		4.4.3 Feature Extraction
		4.4.4 Classification
			4.4.4.1 Support Vector Machine (SVM)
			4.4.4.2 Naïve Bayes
			4.4.4.3 Decision Tree and Random Forest
	4.5 Result Analysis
		4.5.1 Comparison of Speed-up Performance between the Spark-Based and Hadoop-Based FCM Approach
		4.5.2 Comparison of Scale-up Performance between the Spark-Based and Hadoop-Based FCM Approach
		4.5.3 Result Analysis of Various Segmentation Techniques
		4.5.4 Results of Disease Identification
	4.6 Conclusion and Future Work
	References
Section II: Algorithms, Methods, and Tools for Data Science and Data Analytics
5. Crop Models and Decision Support Systems Using Machine Learning
	5.1 Introduction
		5.1.1 Decision Support System
		5.1.2 Decision Support System for Crop Yield
		5.1.3 What Is Crop Modeling?
		5.1.4 Necessity of Crop Modeling
		5.1.5 Recent Trends in Crop Modeling
	5.2 Methodologies
		5.2.1 Machine-Learning-Based Techniques
		5.2.2 Deep-Learning-Based Techniques
		5.2.3 Hyper-Spectral Imaging
		5.2.4 Popular Band Selection Techniques
		5.2.5 Leveraging Conventional Neural Network
	5.3 Role of Hyper-Spectral Data
		5.3.1 Farm Based
		5.3.2 Crop Based
		5.3.3 Advanced HSI Processing
	5.4 Potential Challenges and Strategies to Overcome the Challenges
	5.5 Current and Future Scope
	5.6 Conclusion
	References
6. An Ameliorated Methodology to Predict Diabetes Mellitus Using Random Forest
	6.1 Motivation to Use the "R" Language to Predict Diabetes Mellitus?
	6.2 Related Work
	6.3 Collection of Datasets
		6.3.1 Implementation Methods
			6.3.1.1 Decision Tree
			6.3.1.2 Random Forest
			6.3.1.3 Naïve Bayesian Algorithm
			6.3.1.4 Support Vector Machine (SVM)
	6.4 Visualization
	6.5 Correlation Matrix
	6.6 Training and Testing the Data
	6.7 Model Fitting
	6.8 Experimental Analysis
	6.9 Results and Analysis
	6.10 Conclusion
	References
7. High Dimensionality Dataset Reduction Methodologies in Applied Machine Learning
	7.1 Problems Faced with High Dimensionality Data: An Introduction
	7.2 Dimensionality Reduction Algorithms with Visualizations
		7.2.1 Feature Selection Using Covariance Matrix
			7.2.1.1 Importing the Modules
			7.2.1.2 The Boston Housing Dataset
			7.2.1.3 Perform Basic Data Visualization
			7.2.1.4 Pearson Coefficient Correlation Matrix
			7.2.1.5 Detailed Correlation Matrix Analysis
			7.2.1.6 3-Dimensional Data Visualization
			7.2.1.7 Extracting the Features and Target
			7.2.1.8 Feature Scaling
			7.2.1.9 Create Training and Testing Datasets
			7.2.1.10 Training and Evaluating Regression Model with Reduced Dataset
			7.2.1.11 Limitations of the Correlation Matrix Analysis
		7.2.2 t-Distributed Stochastic Neighbor Embedding (t-SNE)
			7.2.2.1 The MNIST Handwritten Digits Dataset
			7.2.2.2 Perform Exploratory Data Visualization
			7.2.2.3 Random Sampling of the Large Dataset
			7.2.2.4 T-Distributed Stochastic Neighboring Entities (t-SNE) - An Introduction
			7.2.2.5 Probability and Mathematics behind t-SNE
			7.2.2.6 Implementing and Visualizing t-SNE in 2-D
			7.2.2.7 Implementing adn Visualizing t-SNE in 3-D
			7.2.2.8 Applying k-Nearest Neighbors (k-NN) on the t-SNE MNIST Dataset
			7.2.2.9 Data Preparation - Extracting the Features and Target
			7.2.2.10 Create Training and Testing Dataset
			7.2.2.11 Choosing the k-NN hyperparameter - k
			7.2.2.12 Model Evaluation - Jaccard Index, F1 Score, Model Accuracy, and Confusion Matrix
			7.2.2.13 Limitations of the t-SNE Algorithm
		7.2.3 Principle Component Analysis (PCA)
			7.2.3.1 The UCI Breast Cancer Dataset
			7.2.3.2 Perform Basic Data Visualization
			7.2.3.3 Create Training and Testing Dataset
			7.2.3.4 Principal Component Analysis (PCA): An Introduction
			7.2.3.5 Transposing the Data for Usage into Python
			7.2.3.6 Standardization - Finding the Mean Vector
			7.2.3.7 Computing the n-Dimensional Covariance Matrix
			7.2.3.8 Calculating the Eigenvalues and Eigenvectors of the Covariance Matrix
			7.2.3.9 Sorting the Eigenvalues and Corresponding Eigenvectors Obtained
			7.2.3.10 Construct Feature Matrix - Choosing the k Eigenvectors with the Largest Eigenvalues
			7.2.3.11 Data Transformation - Derivation of New Dataset by PCA - Reduced Number of Dimensions
			7.2.3.12 PCA Using Scikit-Learn
			7.2.3.13 Verification of Library and Stepwise PCA
			7.2.3.14 PCA - Captured Variance and Data Lost
			7.2.3.15 PCA Visualizations
			7.2.3.16 Splitting the Data into Test and Train Sets
			7.2.3.17 An Introduction to Classification Modeling with Support Vector Machines (SVM)
			7.2.3.18 Types of SVM
			7.2.3.19 Limitations of PCA
			7.2.3.20 PCA vs. t-SNE
	Conclusion
8. Hybrid Cellular Automata Models for Discrete Dynamical Systems
	8.1 Introduction
	8.2 Basic Concepts
		8.2.1 Cellular Automaton
	8.3 Discussions on CA Evolutions
		8.3.1 Relation between Local and Global Transition Function of a Spatially Hybrid CA
	8.4 CA Modeling of Dynamical Systems
		8.4.1 Spatially Hybrid CA Models
		8.4.2 Temporally Hybrid CA Models
		8.4.3 Spatially and Temporally Hybrid CA Models
	8.5 Conclusion
	References
9. An Efficient Imputation Strategy Based on Adaptive Filter for Large Missing Value Datasets
	9.1 Introduction
		9.1.1 Motivation
	9.2 Literature Survey
	9.3 Proposed Algorithm
	9.4 Experiment Procedure
		9.4.1 Data Collection
		9.4.2 Data Preprocessing
		9.4.3 Classification
		9.4.4 Evaluation
	9.5 Experiment Results and Discussion
	9.6 Conclusions and Future Work
	References
10. An Analysis of Derivative-Based Optimizers on Deep Neural Network Models
	10.1 Introduction
	10.2 Methodology
		10.2.1 SGD
		10.2.2 SGD with Momentum
		10.2.3 RMSprop
		10.2.4 Adagrad
		10.2.5 Adadelta
		10.2.6 Adam
		10.2.7 AdaMax
		10.2.8 NADAM
	10.3 Result and Analysis
	10.4 Conclusion
	References
Section III: Applications of Data Science and Data Analytics
11. Wheat Rust Disease Detection Using Deep Learning
	11.1 Introduction
	11.2 Literature Review
	11.3 Proposed Model
	11.4 Experiment and Results
		11.4.1 Dataset Preparation
		11.4.2 Image Pre-processing
		11.4.3 Image Segmentation
		11.4.4 Discussion for the Model on Grayscale Images
		11.4.5 Evaluating the Model on RGB Images
		11.4.5 Result Comparison of the Model on RGB Images Based on Learning Rate
	11.5 Conclusion
	References
12. A Novel Data Analytics and Machine Learning Model Towards Prediction and Classification of Chronic Obstructive Pulmonary Disease
	12.1 Introduction
	12.2 Literature Review
	12.3 Research Methodology
		12.3.1 Logistical Regression Model for Disease Classification
		12.3.2 Random Forest (RF) for Disease Classification
		12.3.3 SVM for Disease Classification
		12.3.4 Decision Tree Analyses for Disease Classification
		12.3.5 KNN Algorithm for Disease Classification
	12.4 Experiment Results
	12.5 Concluding Remarks and Future Scope
	12.6 Declarations
	References
13. A Novel Multimodal Risk Disease Prediction of Coronavirus by Using Hierarchical LSTM Methods
	13.1 Introduction
	13.2 Related Works
	13.3 About Multimodality
		13.3.1 Risk Factors
	13.4 Methodology
		13.4.1 Naïve Bayes (NB)
		13.4.2 RNN-Multimodal
		13.4.3 LSTM Model
		13.4.4 Support Vector Machine (SVM)
		13.4.5 Performation Evaluation
			13.4.5.1 Accuracy
			13.4.5.2 Specificity
			13.4.5.3 Sensitivity
			13.4.5.4 Precision
			13.4.5.5 F1-Score
	13.5 Experimental Analysis
	13.6 Discussion
	13.7 Conclusion
	13.8 Future Enhancement
	References
14. A Tier-based Educational Analytics Framework
	14.1 Introduction
	14.2 Related Works
	14.3 The Three-Tiered Education Analysis Framework
		14.3.1 Structured Data Analysis
			14.3.1.1 Techniques for Structured Data Analysis
				14.3.1.1.1 Correlation Analysis
				14.3.1.1.2 Association Mining
				14.3.1.1.3 Predictive Modeling
			14.3.1.2 Challenges in Structured Data Analysis
		14.3.2 Analysis of Semi-Structured Data and Text Analysis
			14.3.2.1 Use Cases for Analysis of Semi-Structured and Text Content
			14.3.2.2 Challenges of Semi-Structured/Textual Data Analysis
		14.3.3 Analysis of Unstructured Data
			14.3.3.1 Analysis of Unstructured Data: Study and Use Cases
			14.3.3.2 Challenges in Unstructured and Multimodal Educational Data Analysis
	14.4 Implementation of the Three-Tiered Framework
	14.5 Scope and Boundaries of the Framework
	14.6 Conclusion and Scope of Future Research
	Note
	References
15. Breast Invasive Ductal Carcinoma Classification Based on Deep Transfer Learning Models with Histopathology Images
	15.1 Introduction
	15.2 Background Study
		15.2.1 Breast Cancer Detection Based on Machine Learning Approach
		15.2.2 Breast Cancer Detection Based on Deep Convolutional Neural Network Approach
		15.2.3 Breast Cancer Detection Based on Deep Transfer Learning Approach
	15.3 Methodology
		15.3.1 Data Acquisition
		15.3.2 Data Preprocessing Stage
		15.3.3 Transfer Learning Model
			15.3.3.1 Visual Geometry Group Network (VGGNet)
			15.3.3.2 Residual Neural Network (ResNet)
			15.3.3.3 Dense Convolutional Networks (DenseNet)
	15.4 Experimental Setup and Results
		15.4.1 Performance Evaluation Metrics
		15.4.2 Training Phase
		15.4.3 Result Analysis
		15.4.4 Comparison with Other State of Art Models
	15.5 Discussion with Advantages and Future Work
		15.5.1 Discussion
		15.5.2 Advantages
		15.5.3 Future Works
	15.6 Conclusion
	References
16. Prediction of Acoustic Performance Using Machine Learning Techniques
	16.1 Introduction
	16.2 Materials and Methods
	16.3 Proposed Methodology
		16.3.1 Step 1: Data Preprocessing
		16.3.2 Step 2: Fitting Regression Model
		16.3.3 Building a Backward Elimination Model
		16.3.4 Building the Model Using Forward Selection Model
		16.3.5 Step 3: Optimizing the Regressor Model—Mean Squared Error
		16.3.6 Step 4: Understanding the Results and Cross Validation
		16.3.7 Step 5: Deployment and Optimization
			16.3.7.1 Structural Parameters of Each Layer Material Is Shown in
	16.4 Results and Discussions
		16.4.1 Error Analysis and Validating Model Performance for All Test Samples
	16.5 Conclusion
	References
Section IV: Issue and Challenges in Data Science and Data Analytics
17. Feedforward Multi-Layer Perceptron Training by Hybridized Method between Genetic Algorithm and Artificial Bee Colony
	17.1 Introduction
	17.2 Nature-Inspired Metaheuristics
	17.3 Genetic Algorithm Overview
	17.4 Proposed Hybridized GA Metaheuristic
	17.5 MLP Training by GGEABC
	17.6 Simulation Setup and Results
	17.7 Conclusion
	Acknowledgment
	References
18. Algorithmic Trading Using Trend Following Strategy: Evidence from Indian Information Technology Stocks
	18.1 Introduction
	18.2 Literature Survey
		18.2.1 Data and Period of Study
	18.3 Methodology
	18.4 Results and Discussions
	18.5 Conclusions
		18.5.1 Future Scope
	References
19. A Novel Data Science Approach for Business and Decision Making for Prediction of Stock Market Movement Using Twitter Data and News Sentiments
	19.1 Introduction
	19.2 Review of Literature
	19.3 Proposed Methodology
		19.3.1 Sentiment Score
		19.3.2 Labeling
		19.3.3 Feature Matrix
		19.3.4 Probabilistic Neural Network
	19.4 Numerical Results and Discussion
		19.4.1 Data Description
		19.4.2 Statistical Measure
	19.5 Simulation Results and Validation
		19.5.1 Comparative Analysis over Existing and Proposed Decision-Making Methods
	19.6 Conclusion and Future Enhancement
	References
20. Churn Prediction in the Banking Sector
	20.1 Introduction
		20.1.1 Problem Statement
		20.1.2 Current Scenario
		20.1.3 Motivation
		20.1.4 Objective
	20.2 Related Work
	20.3 Methodology
		20.3.1 Dataset
		20.3.2 Proposed System for Customer Churn Prediction
	20.4 Results
		20.4.1 Analysis of Clustering of Churned Customers
	20.5 Conclusion
	20.6 Future Work
	References
21. Machine and Deep Learning Techniques for Internet of Things Based Cloud Systems
	21.1 Introduction
		21.1.1 Power of Remote Computing
		21.1.2 Security and Privacy Policies
		21.1.3 Integration of Data
		21.1.4 For Hosting, Providers Remove Entry Barrier
		21.1.5 Improves Business Continuity
		21.1.6 Facilitates Inter-device Communication
		21.1.7 Pairing with Edge Computing
		21.1.8 How IoT and Cloud Complement Each Other?
		21.1.9 Cloud and IoT: Which Is Better?
		21.1.10 The Challenges Posed by the Cloud and IoT Together?
			21.1.10.1 Handling an Outsized Amount of Knowledge
			21.1.10.2 Networking and Communication Protocols
			21.1.10.3 Sensor Networks
			21.1.10.4 Security Challenges
	21.2 Security Issues in IoT-Based Cloud Systems
		21.2.1 Attacks in IoT
			21.2.1.1 Active Attack
			21.2.1.2 Passive Attack
	21.3 Machine Learning and Deep Learning: A Solution to Cyber Security Challenges in IoT-Based Cloud Systems
		21.3.1 Machine Learning and Deep Learning Techniques Introduction
			21.3.1.1 A Tour of Machine Learning Algorithms
				21.3.1.1.1 Regression Algorithms
				21.3.1.1.2 Instance-Based Algorithms
				21.3.1.1.3 Regularization Algorithms
				21.3.1.1.4 Decision Tree Algorithms
				21.3.1.1.5 Bayesian Algorithms
				21.3.1.1.6 Clustering Algorithms
				21.3.1.1.7 Association Rule Learning Algorithms
				21.3.1.1.8 Artificial Neural Network Algorithms
				21.3.1.1.9 Deep Learning Algorithms
				21.3.1.1.10 Dimensionality Reduction Algorithms
				21.3.1.1.11 Ensemble Algorithms
		21.3.2 Machine Learning and Deep Learning Techniques Used in IoT Security
			21.3.2.1 Supervised Machine Learning
				21.3.2.1.1 Decision Trees
				21.3.2.1.2 Support Vector Machines (SVMs)
				21.3.2.1.3 Bayesian Theorem-Based Algorithms
				21.3.2.1.4 K-Nearest Neighbor (KNN)
				21.3.2.1.5 Random Forest (RF)
				21.3.2.1.6 Association Rule (AR) Algorithms
				21.3.2.1.7 Ensemble Learning (EL)
			21.3.2.2 Unsupervised ML
				21.3.2.2.1 K-Means Clustering
				21.3.2.2.2 Principal Component Analysis (PCA)
			21.3.2.3 Deep Learning (DL) Methods for IoT Security
				21.3.2.3.1 Convolution Neural Networks (CNNs)
				21.3.2.3.2 Recurrent Neural Networks (RNNs)
			21.3.2.4 Unsupervised DL (Generative Learning)
				21.3.2.4.1 Deep Auto Encoders (AEs)
				21.3.2.4.2 Restricted Boltzmann Machines (RBMs)
				21.3.2.4.3 Deep Belief Networks (DBNs)
			21.3.2.5 Semi-Supervised or Hybrid DL
				21.3.2.5.1 Generative Adversarial Networks (GANs)
				21.3.2.5.2 Ensemble of DL Networks (EDLNs)
				21.3.2.5.3 Deep Reinforcement Learning (DRL)
	21.4 Conclusion
	References
Section V: Future Research Opportunities towards Data Science and Data Analytics
22. Dialect Identification of the Bengali Language
	22.1 Introduction
	22.2 Previous Works
	22.3 Proposed Methodology
		22.3.1 Computation of Features
			22.3.1.1 Feature Selection
				22.3.1.1.1 Zero Crossing Rate (ZCR) Based Feature Computation
				22.3.1.1.2 Mel Frequency Cepstral Coefficients (MFCCs) Based Feature Computation
				22.3.1.1.3 Skewness-Based Feature Computation
				22.3.1.1.4 Spectral Flux Based Feature Computation
		22.3.2 Formation of Feature Vector and Classification
	22.4 Experimental Results
		22.4.1 Relative Analysis
	22.5 Conclusion
	References
23. Real-Time Security Using Computer Vision
	23.1 Introduction
		23.1.1 Biometric
		23.1.2 Computer Vision
		23.1.3 Opencv Library
	23.2 Data Security
	23.3 Technology
		23.3.1 Face Detection
		23.3.2 Face Recognition
		23.3.3 Haar Cascade Classifier
	23.4 Algorithm
		23.4.1 Algorithm to Capture the Image for Database
		23.4.2 Algorithm to Recognize the Face
		23.4.3 Algorithm to Train the Face Recognizer
		23.4.4 Algorithm for Security
	23.5 Result
	23.6 Conclusion
	23.7 Future Scope
	Reference
24. Data Analytics for Detecting DDoS Attacks in Network Traffic
	24.1 Introduction
	24.2 Background
	24.3 Related Work
	24.4 Methodology
		24.4.1 Oversampling and Synthetic Sampling of Data
		24.4.2 Detection of Stealthy DDoS attacks
		24.4.3 Performance Evaluation by Ranking Machine Learning Algorithms
	24.5 Result and Discussion
		24.5.1 Datasets Used for Evaluation
		24.5.2 Evaluation Metrics Used
		24.5.3 Observations
	24.6 Conclusion
	Notes
	References
25. Detection of Patterns in Attributed Graph Using Graph Mining
	25.1 Introduction
	25.2 Research Background
	25.3 Literature Survey
	25.4 General Definitions
		25.4.1 Multi-relational Edge-attributed Graph
		25.4.2 Multi-layer Edge-attributed Graph
		25.4.3 Attributed Graph
	25.5 Problem Definition
	25.6 Proposed Approach
		25.6.1 Pattern Length of 4, 5, and
			25.6.1.1 For Length =
			25.6.1.2 For Length =
			25.6.1.3 For Length =
		25.6.2 Node-Pair Generations
			25.6.2.1 Node-Pair Generation for Three Attributed Line and Loop Patterns
			25.6.2.2 Node-Pair Generation for Four Attributed Line and Loop Patterns
			25.6.2.3 Node-Pair Generation for Four Attributed Star Patterns
			25.6.2.4 Node-Pair Generation for Five Attributed Elongated Star Patterns
		25.6.3 Pattern Detections
			25.6.3.1 Three-Attributed Line Pattern
			25.6.3.2 Three-Attributed Loop Pattern
			25.6.3.3 Four-Attributed Line Pattern
			25.6.3.4 Four-Attributed Loop Pattern
			25.6.3.5 Four-Attributed Star Pattern
			25.6.3.6 Five-Attributed Elongated Star Pattern
	25.7 Proposed Algorithm for Detection of Patterns - Line, Loop, Star, and Elongated Star
		25.7.1 Algorithm PDAGraph345( )
		25.7.2 Procedure for Node-Pair Assignment
		25.7.3 Procedure to Create Three-Attributed Line and Loop Patterns
		25.7.4 Procedure to Display Three-Attributed Line and Loop Patterns
		25.7.5 Procedure to Create Four-Attributed Line and Loop Patterns
		25.7.6 Procedure to Display Four-Attributed Line and Loop Patterns
		25.7.7 Procedure to Create Four-Attributed Star Patterns
		25.7.8 Procedure to Display Four-Attributed Star Patterns
		25.7.9 Procedure to Create Five-Attributed Elongated Star Patterns
		25.7.10 Procedure to Assign Node IDs of Five-Attributed Elongated Star Patterns
		25.7.11 Procedure to Display Five-Attributed Elongated Star Patterns
		25.7.12 Procedure to Generate Node-Pairs
		25.7.13 Explanation of PDAGraph345( )
	25.8 Experimental Results
		25.8.1 Using C++ Programming Language
			25.8.1.1 Three-Attributed Line Pattern (1-2-3)
			25.8.1.2 Three-Attributed Loop Pattern (2-3-4-2)
			25.8.1.3 Four-Attributed Line Pattern (1-3-2-4)
			25.8.1.4 Four-Attributed Loop Pattern (1-3-4-2-1)
			25.8.1.5 Four Attributed Star Pattern (1-3-2-3-4)
			25.8.1.6 Five-Attributed Elongated Star Pattern (1-2-3-4-3-2)
		25.8.2 Using Python Programming Language
			25.8.2.1 Three-Attributed Line Pattern (1-2-3)
			25.8.2.2 Three-Attributed Loop Pattern (2-3-4-2)
			25.8.2.3 Four-Attributed Line Pattern (1-3-2-4)
			25.8.2.4 Four Attributed Loop Pattern (1-3-4-2-1)
			25.8.2.5 Four-Attributed Star Pattern (1-3-2-3-4)
			25.8.2.6 Five-Attributed Elongated Star Pattern (1-2-3-4-3-2)
	25.9 Analysis of Experimental Results
	25.10 Conclusion
	References
26. Analysis and Prediction of the Update of Mobile Android Version
	26.1 Introduction
		26.1.1 Mobile Fragmentation
		26.1.2 Treble - Google
		26.1.3 Security Fix Support and Android Update
	26.2 Systematic Literature Survey
		26.2.1 API Compatibility Issues and Android Updates
		26.2.2 Android Updates and Software Aging
		26.2.3 Android Updates and Google Play Store
		26.2.4 Security Standards Hardware Rooted in Mobile Phones
		26.2.5 Security Fixes and Android Update
		26.2.6 Machine Learning and Android Antivirus Updates
		26.2.7 Smells Detection in Android Using Machine Learning
		26.2.8 Android Malicious Classification Using Various ML Algorithms
	26.3 Existing Techniques
	26.4 Methodology and Tools Used in Existing Techniques
	26.5 Proposed System
		26.5.1 Schematic Overview of Mobile Android Update Prediction and Analysis
		26.5.2 Flow Chart Depicting Mobile Android Update Prediction and Analysis
		26.5.3 Algorithm for the Prediction and Analysis
			26.5.3.1 Algorithm for Linear Regression Model and R Programming
			26.5.3.2 Algorithm for Logistic Regression Model
			26.5.3.3 Algorithm for Decision Tree Model
		26.5.4 Methodology
		26.5.5 Software Packages Used
		26.5.6 Dataset Description
			26.5.6.1 Attribute and Values Information
			26.5.6.2 Missing Attribute Values: None
	26.6 Experimental Results and Discussions
		26.6.1 Graphical Representation
	26.7 Conclusions and Future Work
	References
	Appendix: Datasets Sample Attachments
Index