Data Mining and Predictive Analytics for Business Decisions: A Case Study Approach
- Length: 272 pages
- Edition: 1
- Language: English
- Publisher: Mercury Learning and Information
- Publication Date: 2023-01-30
- ISBN-10: 1683926757
- ISBN-13: 9781683926757
- Sales Rank: #0 (See Top 100 Books)
With many recent advances in data science, we have many more tools and techniques available for data analysts to extract information from data sets. This book will assist data analysts to move up from simple tools such as Excel for descriptive analytics to answer more sophisticated questions using machine learning. Most of the exercises use R and Python, but rather than focus on coding algorithms, the book employs interactive interfaces to these tools to perform the analysis. Using the CRISP-DM data mining standard, the early chapters cover conducting the preparatory steps in data mining: translating business information needs into framed analytical questions and data preparation. The Jamovi and the JASP interfaces are used with R and the Orange3 data mining interface with Python. Where appropriate, Voyant and other open-source programs are used for text analytics. The techniques covered in this book range from basic descriptive statistics, such as summarization and tabulation, to more sophisticated predictivetechniques, such as linear and logistic regression, clustering, classification, and text analytics. Includes companion files with case study files, solution spreadsheets, data sets and charts, etc. from the book.
FEATURES:
- Covers basic descriptive statistics, such as summarization and tabulation, to more sophisticated predictive techniques, such as linear and logistic regression, clustering, classification, and text analytics
- Uses R, Python, Jamovi and JASP interfaces, and the Orange3 data mining interface
- Includes companion files with the case study files from the book, solution spreadsheets, data sets, etc.
Cover Title Page Copyright Dedication Contents Preface Acknowledgments Chapter 1: Data Mining and Business Data Mining Algorithms and Activities Data is the New Oil Data-Driven Decision-Making Business Analytics and Business Intelligence Algorithmic Technologies Associated with Data Mining Data Mining and Data Warehousing Case Study 1.1: Business Applications of Data Mining Case A – Classification Case B – Regression Case C – Anomaly Detection Case D – Time Series Case E – Clustering Reference Chapter 2: The Data Mining Process Data Mining as a Process Exploration Analysis Interpretation Exploitation Selecting a Data Mining Process The CRISP-DM Process Model Business Understanding Data Understanding Data Preparation Modeling Evaluation Deployment Selecting Data Analytics Languages The Choices for Languages References Chapter 3: Framing Analytical Questions How Does CRISP-DM Define the Business and Data Understanding Step? The World of the Business Data Analyst How Does Data Analysis Relate to Business Decision-Making? How Do We Frame Analytical Questions? What Are the Characteristics of Well-framed Analytical Questions? Exercise 3.1 – Framed Questions About the Titanic Disaster Case Study 3.1 – The San Francisco Airport Survey Case Study 3.2 – Small Business Administration Loans References Chapter 4: Data Preparation How Does CRISP-DM Define Data Preparation? Steps in Preparing the Data Set for Analysis Data Sources and Formats What is Data Shaping? The Flat-File Format Application of Tools for Data Acquisition and Preparation Exercise 4.1 – Shaping the Data File Exercise 4.2 – Cleaning the Data File Ensuring the Right Variables are Included Using SQL to Extract the Right Data Set from Data Warehouses Case Study 4.1: Cleaning and Shaping the SFO Survey Data Set Case Study 4.2: Shaping the SBA Loans Data Set Case Study 4.3: Additional SQL Queries Reference Chapter 5: Descriptive Analysis Getting a Sense of the Data Set Describe the Data Set Explore the Data Set Verify the Quality of the Data Set Analysis Techniques to Describe the Variables Exercise 5.1 – Descriptive Statistics Distributions of Numeric Variables Correlation Exercise 5.2 – Descriptive Analysis of the Titanic Disaster Data Case Study 5.1: Describing the SFO Survey Data Set Solution Using R Solution Using Python Case Study 5.2: Describing the SBA Loans Data Set Solution Using R Solution Using Python Reference Chapter 6: Modeling What is a Model? How Does CRISP-DM Define Modeling? Selecting the Modeling Technique Modeling Assumptions Generate Test Design Design of Model Testing Build the Model Parameter Setting Models Model Assessment Where Do Models Reside in a Computer? The Data Mining Engine The Model Data Sources and Outputs Traditional Data Sources Static Data Sources Real-Time Data Sources Analytic Outputs Model Building Step 1: Framing Questions Step 2: Selecting the Machine Step 3: Selecting Known Data Step 4: Training the Machine Step 5: Testing the Model Step 6: Deploying the Model Step 7: Collecting New Data Step 8: Updating the Model Step 9: Learning – Repeat Steps 7 and 8 Step 10: Recommending Answers to the User Reference Chapter 7: Predictive Analytics with Regression Models What is Supervised Learning? Regression to the Mean Linear Regression Simple Linear Regression The R-squared Coefficient The Use of the p-value of the Coefficients Strength of the Correlation Between Two Variables Exercise 7.1 – Using SLR Analysis to Understand Franchise Advertising Multivariate Linear Regression Preparing to Build the Multivariate Model Exercise 7.2 – Using Multivariate Linear Regression to Model Franchise Sales Logistic Regression What is Logistic Regression? Exercise 7.3 – PassClass Case Study Multivariate Logistic Regression Exercise 7.4 – MLR Used to Analyze the Results of a Database Marketing Initiative Where is Logistic Regression Used? Comparing Linear and Logistic Regressions for Binary Outcomes Case Study 7.1: Linear Regression Using the SFO Survey Data Set Solution in R Solution in Python Case Study 7.2: Linear Regression Using the SBA Loans Data Set Solution in R Solution in Python Case Study 7.3: Logistic Regression Using the SFO Survey Data Set Solution in R Solution in Python Case Study 7.4: Logistic Regression Using the SBA Loans Data Set Solution in R Solution in Python Chapter 8: Classification Classification with Decision Trees Building a Decision Tree Exercise 8.1 – The Iris Data Set The Problem with Decision Trees Classification with Random Forest Using a Random Forest Model Exercise 8.2 – The Iris Data Set Classification with Naïve Bayes Exercise 8.3 – The HIKING Data Set Computing the Conditional Probabilities Case Study 8.1: Classification with the SFO Survey Data Set Solution in R Solution in Python Case Study 8.2: Classification with the SBA Loans Data Set Solution in R Solution in Python Case Study 8.3: Classification with the Florence Nightingale Data Set Solution in Python Reference Chapter 9: Clustering What is Unsupervised Machine Learning? What is Clustering Analysis? Applying Clustering to Old Faithful Eruptions Examples of Applications of Clustering Analysis A Simple Clustering Example Using Regression Hierarchical Clustering Applying Hierarchical Clustering to Old Faithful Eruptions Exercise 9.1 – Hierarchical Clustering and the Iris Data Set K-Means Clustering How Does the K-Means Algorithm Compute Cluster Centroids? Applying K-Means Clustering to Old Faithful Eruptions Exercise 9.2 – K-Means Clustering and the Iris Data Set Hierarchical vs. K-Means Clustering Case Study 9.1: Clustering with the SFO Survey Data Set Solution in R Solution in Python Case Study 9.2: Clustering with the SBA Loans Data Set Solution in R Solution in Python Chapter 10: Time Series Forecasting What is a Time Series? Time Series Analysis Types of Time Series Analysis What is Forecasting? Exercise 10.1 – Analysis of the US and China GDP Data Set Case Studies Case Study 10.1: Time Series Analysis of the SFO Survey Data Set Solution in Excel Case Study 10.2: Time Series Analysis of the SBA Loans Data set Solution in R Solution in Python Case Study 10.3: Time Series Analysis of a Nest Data Set Solution in Python Reference Chapter 11: Feature Selection Using the Covariance Matrix Factor Analysis When to Use Factor Analysis First Step in FA – Correlation FA for Exploratory Analysis Selecting the Number of Factors – The Scree Plot Example 11.1: Restaurant Feedback Factor Interpretation Summary Activities to Perform a Factor Analysis Case Study 11.1: Variable Reduction with the SFO Survey Data Set Solution in R Solution in Python Case Study 11.2: Hunting Diamonds Solution in R Solution in Python Chapter 12: Anomaly Detection What is an Anomaly? What is an Outlier? The Case Studies for the Exercises in Anomaly Detection Anomaly Detection by Standardization – A Single Numerical Variable Exercise 12.1 - Outliers in the Airline Delays Data Set - Z-Score Anomaly Detection by Quartiles - Tukey Fences - With a Single Variable Comparing Z-scores and Tukey Fences Exercise 12.2 – Outliers in the Airline Delays Data Set – Tukey Fences Anomaly Detection by Category - A Single Variable Exercise 12.3 – Outliers in the Airline Delays Data Set – Categorical Anomaly Detection by Clustering - Multiple Variables Exercise 12.4 - Outliers in the Airline Delays Data Set - Clustering Anomaly Detection Using Linear Regression by Residuals - Multiple Variables Exercise 12.5 - Outliers in the Airline Delays Data Set - Residuals Case Study 12.1: Outliers in the SFO Survey Data Set Solution in R Solution in Python Case Study 12.2: Outliers in the SBA Loans Data Set Solution in R Solution in Python References Chapter 13: Text Data Mining What is Text Data Mining? What are Some Examples of Text-Based Analytical Questions? Tools for Text Data Mining Sources and Formats of Text Data Term Frequency Analysis How Does It Apply to Text Business Data Analysis? Exercise 13.1 – Case Study Using a Training Survey Data Set Word Frequency Analysis Using R Keyword Analysis Exercise 13.2 – Case Study Using Data Set D: Résumé and Job Description Keyword Word Analysis in Voyant Term Frequency Analysis in R Visualizing Text Data Exercise 13.3 – Case Study Using the Training Survey Data Set Visualizing the Text Using Excel Visualizing the Text Using Voyant Visualizing the Text Using R Text Similarity Scoring What is Text Similarity Scoring? Exercise 13.4 – Case Study Using the Occupation Description Data Set Analysis Using an Online Text Similarity Scoring Tool Similarity Scoring Analysis Using R Exercise 13.5 – Résumé and Job Descriptions Similarly Scoring Using R Case Study 13.1 – Term Frequency Analysis of Product Reviews Term Frequency Analysis Using Voyant Term Frequency Analysis Using R References Chapter 14: Working with Large Data Sets Using Sampling to Work with Large Data Files Exercise 14.1 – Big Data Analysis Case Study 14.1 Using the BankComplaints Big Data File Chapter 15: Visual Programming Comparing Visual Programming to Command-line Coding Leading Visual Programming Environments Visual Programming with the SAS Enterprise Guide Visual Programming with IBM SPSS Visual Programming with RapidMiner Visual Programming with Orange 3 Installing Orange 3 References Index
Donate to keep this site alive
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.