Mechanizing Hypothesis Formation: Principles and Case Studies
- Length: 346 pages
- Edition: 1
- Language: English
- Publisher: CRC Press
- Publication Date: 2022-10-20
- ISBN-10: 0367549808
- ISBN-13: 9780367549800
- Sales Rank: #0 (See Top 100 Books)
Mechanizing hypothesis formation is an approach to exploratory data analysis. Its development started in the 1960s inspired by the question “can computers formulate and verify scientific hypotheses?“. The development resulted in a general theory of logic of discovery. It comprises theoretical calculi dealing with theoretical statements as well as observational calculi dealing with observational statements concerning finite results of observation. Both calculi are related through statistical hypotheses tests. A GUHA method is a tool of the logic of discovery. It uses a one-to-one relation between theoretical and observational statements to get all interesting theoretical statements. A GUHA procedure generates all interesting observational statements and verifies them in a given observational data. Output of the procedure consists of all observational statements true in the given data. Several GUHA procedures dealing with association rules, couples of association rules, action rules, histograms, couples of histograms, and patterns based on general contingency tables are involved in the LISp-Miner system developed at the Prague University of Economics and Business. Various results about observational calculi were achieved and applied together with the LISp-Miner system.
The book covers a brief overview of logic of discovery. Many examples of applications of the GUHA procedures to solve real problems relevant to data mining and business intelligence are presented. An overview of recent research results relevant to dealing with domain knowledge in data mining and its automation is provided. Firsthand experiences with implementation of the GUHA method in the Python language are presented.
Cover Title Page Copyright Page Dedication Preface Table of Contents 1. Introduction 1.1 Mechanizing Hypothesis Formation 1.1.1 Questions of logic of discovery 1.1.2 Logic of discovery and observational calculi 1.1.3 GUHA method—tool of logic of discovery 1.1.4 Notes to history and overview of results 1.2 Data Mining 1.2.1 Discipline of informatics 1.2.2 CRISP-DM 1.2.3 Rules discovery 1.2.4 Exception rules and action rules 1.2.5 Subgroup discovery 1.2.6 Presented GUHA procedures and data mining 1.3 Business Intelligence and Data Science 1.3.1 Business Intelligence and GUHA procedures 1.3.2 Data science and mechanizing hypothesis formation 1.4 Data Matrix 1.4.1 Data matrix—an example 1.4.2 Data matrix—definition 1.4.3 Boolean attributes 1.4.4 Data sub-matrix 1.5 Data Matrix and Items of Domain Knowledge 1.5.1 Groups of attributes 1.5.2 Transformations of attributes 1.5.3 Global properties of attributes 1.5.4 Mutual dependence of attributes 1.6 Goals, Structure, and Using the Book 1.6.1 Goals and structure 1.6.2 Using the book 2. Datasets 2.1 Which Datasets and Where are they Used 2.2 Adult Datas 2.2.1 Adult Dataset—basic info 2.2.2 Adult Dataset—derived attributes 2.2.3 Adult Dataset—items of domain knowledge 2.3 UK Car Accidents Dataset 2.3.1 Accidents data matrix and groups of attributes 2.3.2 Group Date_Time 2.3.3 Group Driver 2.3.4 Group Conditions 2.3.5 Group Vehicle 2.3.6 Group Authorities 2.3.7 Group Consequences 2.4 STULONG Dataset 2.4.1 Entry data matrix and groups of attributes 2.4.2 Group Personal 2.4.3 Group Anamnesis 2.4.4 Group Risks 2.4.5 Group Measurement 2.4.6 Group Alcohol consumption 2.4.7 Group Blood pressure 2.4.8 Group Biochemical examination 2.5 Fictive Hotel Dataset 2.5.1 HotelPlusExternal data matrix and groups of attributes 2.5.2 Group Guest 2.5.3 Group Domicile 2.5.4 Group Meteo 2.5.5 Group Questionnaire 2.5.6 Group Stay 2.5.7 Group Check-in 2.5.8 Group Price Section I: The Guha Procedures 3. Principle and Simple Examples 3.1 GUHA Procedures Principle 3.2 Association Rules and 4ft-Miner 3.3 Histograms and CF-Miner 3.4 Pairs of Attributes and KL-Miner 3.5 Couples of Association Rules and SD4ft-Miner 3.6 Couples of Histograms and SDCF-Miner 3.7 Couples of Pairs of Attributes and SDKL-Miner 3.8 Action Rules and Ac4ft-Miner 4. Common Features 4.1 Overview of Procedures and Patterns 4.2 Contingency Tables 4.3 Principles of Patterns Evaluation 4.4 Set of Relevant Boolean Attributes 4.4.1 Literals and types of coefficients 4.4.2 Set of relevant literals 4.4.3 Example of partial cedents 4.4.4 Set of relevant partial cedents 4.4.5 Set of relevant cedents 4.5 Missing Information 4.5.1 Data matrices with missing information 4.5.2 Secured completion of missings and Boolean attributes 5. LISp-Miner System 5.1 Overview of LISp-Miner 5.1.1 Teaching and research tool 5.1.2 Home page 5.2 Requirements and Prerequisites 5.3 Main Concept 5.3.1 Context diagram 5.3.2 Analysed data 5.3.3 Metabase 5.3.4 Knowledgebase 5.3.5 Context diagram of GUHA-procedure 5.3.6 LM Workspace module 5.3.7 Data-mining automation module 5.4 EverMinerSimple Demo 5.5 System Design and Implementation 5.5.1 Programming language and environment 5.5.2 Implementation layers 5.5.3 Bitstrings Section II: Applying the Guha Procedures 6. Examples Overview 6.1 Overview of 4ft-Miner Application Examples 6.1.1 4ft-Miner and arules 6.1.2 Applying important features of GUHA association rules 6.1.3 Mining for exception GUHA association rules 6.2 Overview of CF-Miner Application Examples 6.2.1 Subgroup discovery in Adult dataset 6.2.2 Subgroup discovery in Accidents dataset 6.3 Overview of KL-Miner Application Examples 6.3.1 Blood pressure—ordinal dependence and independence 6.3.2 Subgroup discovery using range of quantifiers 6.4 Overview of SD4ft-Miner Application Examples 6.4.1 Comparing districts 6.4.2 Comparing female and male drivers 6.5 Overview of SDCF-Miner Application Examples 6.5.1 Exceptional histograms and authorities 6.5.2 Trends of the number of accidents and police forces 6.6 Overview of SDKL-Miner Applications 6.7 Overview of Ac4ft-Miner Application Examples 6.7.1 Action rules and blood pressure 6.7.2 Action rules and guest satisfaction 6.8 GUHA and Business Intelligence—Overview 6.9 GUHA and Python—CleverMiner Project 6.10 Examples Summary 6.10.1 Applying coefficients 6.10.2 Applying partial cedents 6.11 Important Notes 7. 4ft-Miner—GUHA Association Rules 7.1 GUHA Association Rules and 4ft-Miner Procedure 7.1.1 GUHA association rules and related notions 7.1.2 4ft-quantifiers for classical mode of 4ft-Miner 7.1.3 4ft-quantifiers for histogram mode of 4ft-Miner 7.1.4 Association rules and missing information 7.1.5 Secured completion and association rules 7.1.6 Ignoring missing information 7.1.7 Prime association rules 7.1.8 4ft-Miner input and output 7.2 Comparing 4ft-Miner and Arules 7.2.1 Principles of comparison 7.2.2 Performance 7.2.3 Comparing ignoring missings and secured completion 7.2.4 Loss of some interesting rules 7.2.5 Applying GUHA features 7.2.6 Summary of comparison 7.3 Applying 4ft-Miner in Adult Dataset 7.3.1 Applying sequences and right cuts—extreme gain 7.3.2 Conjunctions in succedent—very rich persons 7.3.3 Disjunctions in succedent—rich persons 7.3.4 Applying logical deduction—prime rules 7.4 Applying 4ft-Miner in Accidents Dataset 7.4.1 Exception rules—increasing columns of histogram 7.4.2 Exception rules—lowering columns of histogram 7.4.3 Exception from exception—increasing confidence 8. CF-Miner—Histograms 8.1 CF-Miner and Related Notions 8.1.1 Conditional histogram, CF-table and CF-pattern 8.1.2 CF-Miner input and output 8.1.3 Range of CF-quantifiers 8.1.4 Simple frequencies CF-quantifiers 8.1.5 CF-quantifiers concerning steps in histogram 8.2 Applying CF-Miner to Adult Dataset 8.2.1 Increasing histograms 8.2.2 Decreasing histograms 8.2.3 First decreasing and then increasing histograms 8.3 Applying CF-Miner to Accidents Dataset 8.3.1 Large segments of accidents with decreasing trend 8.3.2 Exceptions to generally decreasing trend 8.3.3 Exceptions to a concrete decreasing trend 8.3.4 Exceptions to exception to generally decreasing trend 9. KL-Miner—Pairs of Categorical Attributes 9.1 KL-Miner and Related Notions 9.1.1 KL-Miner input and output 9.1.2 Four types of frequencies 9.1.3 Range of KL-quantifiers 9.1.4 Simple frequencies KL-quantifiers 9.1.5 Advanced KL-quantifiers 9.2 Applying KL-Miner in STULONG Dataset 9.2.1 Conditions indicating high ordinal dependence 9.2.2 Conditions indicating almost ordinal independence 9.3 Applying KL-Miner in Hotel Dataset 9.3.1 Applying range of KL-quantifier 10. SD4ft-Miner—Couples of GUHA Association Rules 10.1 SD4ft-Miner and Related Notions 10.1.1 SD4ft-Miner input and output 10.1.2 SD4ft-quantifiers 10.2 Applying SD4ft-Miner in Accidents Dataset 10.2.1 Differences among districts 10.2.2 Confidence in districts higher than in the whole dataset 10.2.3 Confidence in districts lower than in the whole dataset 10.2.4 Relative frequency of accidents higher for male drivers 10.2.5 Relative frequency of accidents higher for female drivers 10.2.6 Similarities between male and female 11. SDCF-Miner—Couples of Histograms 11.1 SDCF-Miner and Related Notions 11.1.1 SDCF-Miner input and output 11.1.2 Modes of SDCF-Miner and SDCF-tables 11.1.3 Simple frequencies SDCF-quantifiers 11.1.4 SDCF-quantifiers concerning steps in histogram 11.2 Applying SDCF-Miner in Accidents Dataset 11.2.1 Exceptions to increasing trends and authorities 11.2.2 Differences between police forces 12. SDKL-Miner—Couples of Pairs of Categorical Attributes 12.1 SDKL-Miner and Related Notions 12.1.1 SDKL-Miner input and output 12.1.2 SDKL-quantifiers 12.2 Applying SDKL-Miner in STULONG Dataset 12.2.1 Drinking liquors—groups with the highest τB difference 12.2.2 Drinking vine—groups with the highest τB difference 12.2.3 Drinking beer—groups with the highest τB difference 13. Ac4ft-Miner—Action Rules 13.1 Ac4ft-Miner and Related Notions 13.1.1 Flexible and stable attributes 13.1.2 Changes of Boolean attributes 13.1.3 Action rules 13.1.4 Ac4ft-quantifiers and truthfulness of action rules 13.1.5 Relevant changes of Boolean attribute 13.1.6 Ac4ft-Miner procedure input and output 13.2 Applying Ac4ft-Miner in STULONG Dataset 13.2.1 Two analytical questions–common features 13.2.2 BMI and decreasing probability of high blood pressure 13.2.3 Increasing probability of average blood pressure 13.3 Applying Ac4ft-Miner in Hotel Dataset 13.3.1 Increasing guest satisfaction 13.3.2 Increasing guest satisfaction and consequences 14. GUHA Procedures and Business Intelligence 14.1 Business Intelligence and Self Service BI 14.2 Comparing Analysis Performed by Self Service BI and GUHA 14.3 Scenarios of Complementary Usage of BI and GUHA 14.3.1 Gaining insight into specific (interesting) part of the dataset 14.3.2 Automatic BI analysis using GUHA data mining 14.4 Examples on Accidents Dataset 14.4.1 Automatic BI analysis using GUHA data mining 14.4.2 Gaining inside into specific parts of dataset 14.5 Possible Extension of the Work 15. CleverMiner—GUHA and Python 15.1 Why GUHA in Python 15.2 Goals of Python Implementation of GUHA 15.3 Data Requirements and Representation on Analyzed Data Matrix 15.3.1 Requirements on input matrix and how to achieve it 15.3.2 Internal representation of data matrix in CleverMiner 15.4 CleverMiner Procedures 15.4.1 General parameters and calling 15.4.2 Quantifiers for individual GUHA procedures 15.5 Calling CleverMiner Procedures 15.6 Future Plans with CleverMiner Section III: Related Research and Theory 16. Artificial Data Generation and LM ReverseMiner Module 16.1 Evolutionary Approach 16.2 Evolutionary Operations 16.2.1 Evolutionary Fitness 16.3 ReverseMiner Module 16.3.1 Evolutionary Task Definition 16.3.2 Evolutionary process 16.3.3 Repeatibility of evolution 16.4 Evolution Helpers 16.5 Artifical Data Hotel 16.5.1 Data Specifications 16.5.2 Requirements Checklist 16.5.3 Evolution setup 16.5.4 Data Generation 16.5.5 Experiences 16.5.6 Advantages and limitations 17. Applying Domain Knowledge 17.1 Expert Deduction Rules and Association Rules 17.1.1 Informal considerations 17.1.2 Applying expert deduction rules to association rules 17.2 Items of Domain Knowledge and Association Rules 17.2.1 BMI ↑↑ Diastolic—principle of application 17.2.2 Applying 4ft-Miner 17.2.3 Atomic consequences of BMI ↑↑ Diastolic 17.2.4 Logical consequences of atomic consequence 17.2.5 Consequences of BMI ↑↑ Diastolic 17.2.6 Interpreting results of 4ft-Miner 17.3 Expert Deduction Rules and Histograms 17.3.1 Expert deduction and histograms—considerations 17.3.2 Applying expert deduction rules to histograms 18. Observational Calculi 18.1 Definition and Overview of Results 18.1.1 Logical calculus of association rules 18.1.2 Classes of association rules 18.1.3 Missing information in calculus of association rules 18.1.4 Deduction rules in calculus of association rules 18.1.5 Logical calculus of histograms 18.1.6 Research challenges to observational calculi 18.2 Expert Deduction Rules 18.2.1 Informally on expert deduction and association rules 18.2.2 Expert deduction rules for association rules 18.2.3 Results on expert deduction rules for association rules 18.2.4 Open problems and challenges References Index
Donate to keep this site alive
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.