Introduction to Machine Learning with Applications in Information Security, 2nd Edition
- Length: 432 pages
- Edition: 2
- Language: English
- Publisher: Chapman and Hall/CRC
- Publication Date: 2022-10-04
- ISBN-10: 1032204923
- ISBN-13: 9781032204925
- Sales Rank: #6977587 (See Top 100 Books)
Introduction to Machine Learning with Applications in Information Security, Second Edition provides a classroom-tested introduction to a wide variety of machine learning and deep learning algorithms and techniques, reinforced via realistic applications. The book is accessible and doesn’t prove theorems, or dwell on mathematical theory. The goal is to present topics at an intuitive level, with just enough detail to clarify the underlying concepts.
The book covers core classic machine learning topics in depth, including Hidden Markov Models (HMM), Support Vector Machines (SVM), and clustering. Additional machine learning topics include k-Nearest Neighbor (k-NN), boosting, Random Forests, and Linear Discriminant Analysis (LDA). The fundamental deep learning topics of backpropagation, Convolutional Neural Networks (CNN), Multilayer Perceptrons (MLP), and Recurrent Neural Networks (RNN) are covered in depth. A broad range of advanced deep learning architectures are also presented, including Long Short-Term Memory (LSTM), Generative Adversarial Networks (GAN), Extreme Learning Machines (ELM), Residual Networks (ResNet), Deep Belief Networks (DBN), Bidirectional Encoder Representations from Transformers (BERT), and Word2Vec. Finally, several cutting-edge deep learning topics are discussed, including dropout regularization, attention, explainability, and adversarial attacks.
Most of the examples in the book are drawn from the field of information security, with many of the machine learning and deep learning applications focused on malware. The applications presented serve to demystify the topics by illustrating the use of various learning techniques in straightforward scenarios. Some of the exercises in this book require programming, and elementary computing concepts are assumed in a few of the application sections. However, anyone with a modest amount of computing experience should have no trouble with this aspect of the book.
Instructor resources, including PowerPoint slides, lecture videos, and other relevant material are provided on an accompanying website: http://www.cs.sjsu.edu/~stamp/ML/.
Cover Half Title Series Page Title Page Copyright Page Contents Preface 1. Introduction 1.1. Basic sampling concepts 1.1.1. Population parameters 1.1.2. Descriptive statistics vs. inference about a population 1.1.3. Random sampling vs. probability sampling 1.2. Design-based vs. model-based approach 1.3. Populations used in sampling experiments 1.3.1. Soil organic matter in Voorst, the Netherlands 1.3.2. Poppy fields in Kandahar, Afghanistan 1.3.3. Aboveground biomass in Eastern Amazonia, Brazil 1.3.4. Annual mean air temperature in Iberia I. Probability sampling for estimating population parameters 2. Introduction to probability sampling 2.1. Horvitz-Thompson estimator 2.2. Hansen-Hurwitz estimator 2.3. Using models in design-based approach 3. Simple random sampling 3.1. Estimation of population parameters 3.1.1. Population proportion 3.1.2. Cumulative distribution function and quantiles 3.2. Sampling variance of estimator of population parameters 3.3. Confidence interval estimate 3.3.1. Confidence interval for a proportion 3.4. Simple random sampling of circular plots 3.4.1. Sampling from a finite set of fixed circles 3.4.2. Sampling from an infinite set of floating circles 4. Stratified simple random sampling 4.1. Estimation of population parameters 4.1.1. Population proportion, cumulative distribution function, and quantiles 4.1.2. Why should we stratify? 4.2. Confidence interval estimate 4.3. Allocation of sample size to strata 4.4. Cum-root-f stratification 4.5. Stratification with multiple covariates 4.6. Geographical stratification 4.7. Multiway stratification 4.8. Multivariate stratification 5. Systematic random sampling 5.1. Estimation of population parameters 5.2. Approximating the sampling variance of the estimator of the mean 6. Cluster random sampling 6.1. Estimation of population parameters 6.2. Clusters selected with probabilities proportional to size, without replacement 6.3. Simple random sampling of clusters 6.4. Stratified cluster random sampling 7. Two-stage cluster random sampling 7.1. Estimation of population parameters 7.2. Primary sampling units selected without replacement 7.3. Simple random sampling of primary sampling units 7.4. Stratified two-stage cluster random sampling 8. Sampling with probabilities proportional to size 8.1. Probability-proportional-to-size sampling with replacement 8.2. Probability-proportional-to-size sampling without replacement 8.2.1. Systematic pps sampling without replacement 8.2.2. The pivotal method 9. Balanced and well-spread sampling 9.1. Balanced sampling 9.1.1. Balanced sample vs. balanced sampling design 9.1.2. Unequal inclusion probabilities 9.1.3. Stratified random sampling 9.1.4. Multiway stratification 9.2. Well-spread sampling 9.2.1. Local pivotal method 9.2.2. Generalised random-tessellation stratified sampling 9.3. Balanced sampling with spreading 10. Model-assisted estimation 10.1. Generalised regression estimator 10.1.1. Simple and multiple regression estimators 10.1.2. Penalised least squares estimation 10.1.3. Regression estimator with stratified simple random sampling 10.2. Ratio estimator 10.2.1. Ratio estimators with stratified simple random sampling 10.2.2. Poststratified estimator 10.3. Model-assisted estimation using machine learning techniques 10.3.1. Predicting with a regression tree 10.3.2. Predicting with a random forest 10.4. Big data and volunteer data 11. Two-phase random sampling 11.1. Two-phase random sampling for stratification 11.2. Two-phase random sampling for regression 12. Computing the required sample size 12.1. Standard error 12.2. Length of confidence interval 12.2.1. Length of confidence interval for a proportion 12.3. Statistical testing of hypothesis 12.3.1. Sample size for testing a proportion 12.4. Accounting for design effect 12.5. Bayesian sample size determination 12.5.1. Bayesian criteria for sample size computation 12.5.2. Mixed Bayesian-likelihood approach 12.5.3. Estimation of population mean 12.5.4. Estimation of a population proportion 13. Model-based optimisation of probability sampling designs 13.1. Model-based optimisation of sampling design type and sample size 13.1.1. Analytical approach 13.1.2. Geostatistical simulation approach 13.1.3. Bayesian approach 13.2. Model-based optimisation of spatial strata 14. Sampling for estimating parameters of domains 14.1. Direct estimator for large domains 14.2. Model-assisted estimators for small domains 14.2.1. Regression estimator 14.2.2. Synthetic estimator 14.3. Model-based prediction 14.3.1. Random intercept model 14.3.2. Geostatistical model 14.4. Supplemental probability sampling of small domains 15. Repeated sample surveys for monitoring population parameters 15.1. Space-time designs 15.2. Space-time population parameters 15.3. Design-based generalised least squares estimation of spatial means 15.3.1. Current mean 15.3.2. Change of the spatial mean 15.3.3. Temporal trend of the spatial mean 15.3.4. Space-time mean 15.4. Case study: annual mean daily temperature in Iberia 15.4.1. Static-synchronous design 15.4.2. Independent synchronous design 15.4.3. Serially alternating design 15.4.4. Supplemented panel design 15.4.5. Rotating panel design 15.4.6. Sampling experiment 15.5. Space-time sampling with stratified random sampling in space II. Sampling for mapping 16. Introduction to sampling for mapping 16.1. When is probability sampling not required? 16.2. Sampling for simultaneously mapping and estimating means 16.3. Broad overview of sampling designs for mapping 17. Regular grid and spatial coverage sampling 17.1. Regular grid sampling 17.2. Spatial coverage sampling 17.3. Spatial infill sampling 18. Covariate space coverage sampling 18.1. Covariate space infill sampling 18.2. Performance of covariate space coverage sampling in random forest prediction 19. Conditioned Latin hypercube sampling 19.1. Conditioned Latin hypercube infill sampling 19.2. Performance of conditioned Latin hypercube sampling in random forest prediction 20. Spatial response surface sampling 20.1. Increasing the sample size 20.2. Stratified spatial response surface sampling 20.3. Mapping 21. Introduction to kriging 21.1. Ordinary kriging 21.2. Block-kriging 21.3. Kriging with an external drift 21.4. Estimating the semivariogram 21.4.1. Method-of-moments 21.4.2. Maximum likelihood 21.5. Estimating the residual semivariogram 21.5.1. Iterative method-of-moments 21.5.2. Restricted maximum likelihood 22. Model-based optimisation of the grid spacing 22.1. Optimal grid spacing for ordinary kriging 22.2. Controlling the mean or a quantile of the ordinary kriging variance 22.3. Optimal grid spacing for block-kriging 22.4. Optimal grid spacing for kriging with an external drift 22.5. Bayesian approach 23. Model-based optimisation of the sampling pattern 23.1. Spatial simulated annealing 23.2. Optimising the sampling pattern for ordinary kriging 23.3. Optimising the sampling pattern for kriging with an external drift 23.4. Model-based infill sampling for ordinary kriging 23.5. Model-based infill sampling for kriging with an external drift 24. Sampling for estimating the semivariogram 24.1. Nested sampling 24.2. Independent sampling of pairs of points 24.3. Optimisation of sampling pattern for semivariogram estimation 24.3.1. Uncertainty about semivariogram parameters 24.3.2. Uncertainty about the kriging variance 24.4. Optimisation of sampling pattern for semivariogram estimation and mapping 24.5. A practical solution 25. Sampling for validation of maps 25.1. Map quality indices 25.1.1. Estimation of map quality indices 25.2. Real-world case study 25.2.1. Estimation of the population mean error and mean squared error 25.2.2. Estimation of the standard error of the estimator of the population mean error and mean squared error 25.2.3. Estimation of model efficiency coefficient 25.2.4. Statistical testing of hypothesis about population ME and MSE 26. Design-based, model-based, and model-assisted approach for sampling and inference 26.1. Two sources of randomness 26.2. Identically and independently distributed 26.3. Bias and variance 26.4. Effective sample size 26.5. Exploiting spatial structure in design-based approach 26.6. Model-assisted vs. model-dependent A. Answers to exercises Bibliography Index
Donate to keep this site alive
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.