Game Data Science
- Length: 416 pages
- Edition: 1
- Language: English
- Publisher: Oxford University Press
- Publication Date: 2021-10-19
- ISBN-10: 019289787X
- ISBN-13: 9780192897879
- Sales Rank: #1822333 (See Top 100 Books)
Game data science, defined as the practice of deriving insights from game data, has created a revolution in the multibillion-dollar games industry – informing and enhancing production, design, and development processes. Almost all game companies and academics have now adopted some type of game data science, every tool utilized by game developers allows collecting data from games, yet there has been no definitive resource for academics and professionals in this rapidly developing sector until now.
Games Data Science delivers an excellent introduction to this new domain and provides the definitive guide to methods and practices of computer science, analytics, and data science as applied to video games. It is the ideal resource for academic students and professional learners seeking to understand how data science is used within the game development and production cycle, as well as within the interdisciplinary field of games research.
Organized into chapters that integrate laboratory and game data examples, this book provides a unique resource to train and educate both industry professionals and academics about the use of game data science, with practical exercises and examples on how such processes are implemented and used in academia and industry, interweaving theoretical learning with practical application throughout.
Cover Game Data Science Copyright Foreword Intuition vs. analytics Generosity, collaboration, and tooling Get smart Preface Intended Audience Labs and Supplementary Materials Ackowledgment Contents How to Read the Book Chapter 1: Game Data Science: An Introduction 1.1 What is game data science? 1.2 What is game data? 1.3 Advantages of game data science 1.4 The historical context for game data science 1.4.1 The rise of the MMOG 1.4.2 Social network games 1.4.3 Democratizing data collection 1.4.4 Games User Research (GUR) 1.4.5 Games as a Service 1.4.6 Rise of machine learning and game data 1.4.7 Today 1.5 The process of game data science 1.6 Game data science: A glossary 1.6.1 Telemetry, metrics, and KPIs 1.6.2 Player, performance, and process metrics 1.6.2.1 Community Metrics 1.6.2.2 Customer Metrics 1.6.2.3 Gameplay Metrics 1.6.2.3.1 Engagement Engagement refers to the player’s commitment to a 1.6.2.3.2 Acquisition Acquisition is a KPI focused on new players; it 1.6.2.3.3 Retention Retention is a KPI focused onmaintaining active players 1.6.2.3.4 Progression Progression metrics are used to gauge the progress 1.7 Applications of metrics to game data science 1.8 Your journey begins 1.9 Takeaways and important terms Exercises Bibliography Chapter 2: Data Preprocessing 2.1 Data preprocessing overview 2.2 Chapter overview 2.3 Programming languages and libraries for data preprocessing 2.4 VPAL data: A data example 2.5 Measurement types 2.5.1 Nominal/Categorical data 2.5.2 Ordinal data 2.5.3 Ratio and interval data 2.6 Data representation 2.6.1 Variables 2.6.2 Special data 2.7 Process for preprocessing, cleaning, and normalizing data 2.7.1 Step 1: Data cleaning 2.7.1.1 Data Structures: Representing Data In R 2.7.1.2 Understanding Your Data 2.7.1.3 Reading And Parsing Files 2.7.1.4 Data Type Checks 2.7.1.5 Data Format Checks And Conversions 2.7.2 Step 2: Data consistency processing 2.7.2.1 Removing “na” 2.7.2.2 Removing “na” By Imputation 2.7.2.3 Identifying And Dealing With Outliers 2.7.2.4 Identifying And Dealing With Special Types 2.7.2.5 Identifying And Dealing With Other Types Of Inconsistencies 2.7.3 Step 3: Data normalization and standardization 2.7.3.1 Scaling 2.7.3.2 Standardization 2.7.3.3 Data Transformation From Skewed To A Normal Distribution 2.8 Data transformation 2.9 Summary Exercises Bibliography Chapter 3: Introduction to Statistics and Probability Theory 3.1 Descriptive statistics 3.1.1 Measures of centrality 3.1.1.1 Mean 3.1.1.2 Median 3.1.1.3 Mode 3.1.2 Measures of spread (dispersion) 3.1.2.1 Range 3.1.2.2 Variance And Standard Deviation 3.1.3 Relationships between variables 3.1.4 Correlation analysis 3.1.5 Modeling 3.2 Inferential statistics 3.2.1 Dependent and independent variables 3.2.2 t-tests 3.2.3 ANOVA 3.4 Introduction to probability 3.4.1 Probability 3.4.2 Joint probability 3.4.3 Conditional probability 3.4.4 Independence 3.4.5 Chain rule 3.4.6 Bayes’ theorem 3.4.7 Some common probability distributions 3.5 Summary Exercises Untitled Chapter 4: Data Abstraction 4.1 Dataset 4.2 Feature engineering 4.3 Feature extraction 4.3.1 PCA: Principal Component Analysis 4.3.1.1 How Does Pca Work? 4.3.1.2 A Practical Example Of Applying Pca 4.3.1.3 How To Use The Principle Components 4.3.2 How do we deal with nominal and ordinal measures? 4.3.3 Pros and cons of using PCA or PCAMix 4.4 Feature selection 4.4.1 AIC 4.4.2 Information entropy 4.5 Summary Exercises Bibliography Chapter 5: Data Analysis through Visualization 5.1 Heatmaps 5.1.1 What are heatmaps? 5.1.2 Heatmaps—Practical guide 5.1.3 Heatmaps and their use 5.1.4 Current state of the art of heatmaps 5.2 Spatio-temporal visualization systems 5.2.1 Visualizing flow 5.2.2 Stratmapper 5.2.3 Current state of the art of spatio-temporal visualization systems 5.3 State-action transition visualization systems 5.3.1 Representing states and actions 5.3.2 Visualizing state-actions in games 5.3.3 Using state-action transition visualization to compare players’ dialog choices 5.4 Summary and takeaways Exercises Acknowledgment Bibliography Chapter 6: Clustering Methods in Game Data Science 6.1 What is cluster analysis? 6.2 Clustering for exploratory analysis 6.3 Clustering models 6.3.1 Hierarchical clustering 6.3.2 Centroid clustering 6.3.3 Distribution clustering 6.3.4 Density clustering 6.4 The clustering process 6.5 Challenges in applying clustering 6.5.1 Cluster definition 6.5.2 Similarity definition 6.5.3 Cluster tendency 6.5.4 Outliers 6.6 Evaluation and tuning 6.6.1 Internal evaluation metrics 6.6.2 External evaluation metrics 6.6.3 Tuning methods 6.7 Partitional methods 6.7.1 K-means 6.7.1.1 Assumptions And Target Cluster Type 6.7.1.2 Method 6.7.1.3 Hyperparameter Tuning 6.7.1.4 A Practical Example Of Applying K-means 6.7.1.5 An Example From The Wild: Tera Online 6.7.1.5.1 Dataset The dataset fromTERA Online is fromthe game’s open beta 6.7.1.5.2 Data preparation and analysis Behavioral telemetry can suffer 6.7.1.5.3 Normalizing and scaling input data As discussed in Chapter 2, 6.7.1.5.4 Performing clustering on TERA Online data As discussed above, 6.7.2 Fuzzy C-means (FCM) 6.7.2.1 Assumptions And Target Cluster Type 6.7.2.2 Method 6.7.2.3 Hyperparameter Tuning 6.7.2.4 A Practical Example Of Applying Fcm 6.7.3 DBSCAN 6.7.3.1 Characteristics 6.7.3.2 Method 6.7.3.3 Hyperparameter Tuning 6.8 Hierarchical clustering methods 6.8.1 Characteristics 6.8.2 AHC 6.8.3 Hyperparameter tuning 6.9 Archetypal Analysis (AA) 6.9.1 Characteristics 6.9.2 Method 6.9.3 Hyperparameter tuning 6.9.4 Advanced method: Model-based Clustering (MC) 6.9.5 Characteristics 6.9.6 Method 6.9.7 Hyperparameter tuning 6.10 Advice on applying cluster methods to behavioral telemetry 6.10.1 Time dependency 6.10.2 High dimensionality and big data 6.10.3 Finding the right features 6.10.4 Mixtures of qualitative and quantitative data 6.11 Summary Exercises Untitled Chapter 7: Supervised Learning in Game Data Science 7.1 Predictive model categorizations 7.1.1 Parametric vs. nonparametric models 7.1.2 Discriminative vs. generative 7.2 Regression methods 7.2.1 Linear regression 7.2.1.1 Method 7.2.1.2 Goodness Of Fit 7.2.2 Beyond linear regression 7.3 Classification methods 7.3.1 K-Nearest Neighbor (KNN) 7.3.1.1 Method 7.3.1.2 Hyperparameter Tuning 7.3.2 NB 7.3.2.1 Method 7.3.2.2 Hyperparameter Tuning 7.3.3 Logistic regression 7.3.3.1 Method 7.3.4 Linear Discriminant Analysis (LDA) 7.3.4.1 Method 7.3.4.2 Hyperparameter Tuning 7.3.5 SVMs 7.3.5.1 Method 7.3.5.2 Kernel Methods With Svms 7.3.5.3 Hyperparameter Tuning 7.3.6 Decision Trees 7.3.6.1 Method 7.3.6.2 Hyperparameter Tuning 7.3.7 Random Forests 7.3.7.1 Method 7.3.7.2 By-products 7.3.7.3 Hyperparameter Tuning 7.4 Final remarks Chapter 8: Supervised Learning in Game Data Science: Model Validation and Evaluation 8.1 Machine learning pipeline 8.2 Performance metrics 8.2.1 Classification metrics 8.2.2 Area under the ROC Curve (AUC) 8.2.3 Regression metrics 8.3 Model validation process 8.3.1 Cross validation vs. dedicated validation set 8.3.2 Automated algorithms for validation 8.4 Model evaluation process 8.5 Debugging a learned model 8.5.1 Overfitting and underfitting 8.5.2 Bias, variance, and irreducible errors 8.6 Final remarks: Productization Exercises Bibliography Chaper 9: Neural Networks 9.1 Feedforward neural networks (FNNs) 9.1.1 Structure of the network 9.1.2 How does an FNN work? 9.1.3 Training an FNN 9.1.4 Advantages and disadvantages of FNNs 9.1.5 DNN in game data science 9.2 CNNs 9.2.1 CNN’s architecture 9.2.2 How does the convolutional layer work? 9.2.3 An example of CNN: AlexNet 9.2.4 CNN in game data science 9.2.4.1 Case Study 1: Player Experience Extraction From Gameplay Video 9.2.4.2 Deep Convolutional Player Modeling On Log And Level Data 9.3 Summary and conclusive remarks Chapter 10: Sequence Analysis of Game Data 10.1 Data representation: sequences, events, and actions 10.2 Why sequence data analysis? 10.2.1 Understanding sequence data 10.2.2 Sequence data enables temporal and spatial analysis 10.2.3 Sequence mining enables player profiling: A case study 10.2.3.1 Overview And Research Approach 10.2.3.2 Results 10.2.3.3 Benefits Of Sequence-based Methods 10.3 DOTA 2 data 10.4 Representation of sequence data 10.5 Explorative sequence data analysis 10.5.1 Plotting sequences and frequent sequences 10.5.1.1 Examining Overall Patterns 10.5.1.2 Examining Most Frequent Patterns 10.5.1.3 Inspecting Frequent Patterns Per Important Game Elements (e.g., Location And Game Segment) 10.5.1.4 Examining Frequencies Of States Over Time 10.5.2 Entropy of states over sequence positions 10.5.3 Transition probabilities 10.5.4 Summary of exploratory analysis 10.6 Sequence pattern mining 10.6.1 What is sequence pattern mining? 10.6.2 SPADE 10.6.3 Applying SPADE 10.6.4 Summary of sequence pattern mining 10.7 Clustering of sequences 10.7.1 Simple distance measures based on counts 10.7.2 Optimal Matching (OM) 10.7.3 Clustering based on optimal matching distances 10.8 Summary and takeaways Exercises Acknowledgments Bibliography Chapter 11: Advanced Sequence Analysis 11.1 Probabilistic planning-based approach 11.1.1 The approach 11.1.2 What does this mean for current commercial games? 11.1.3 Applying this approach to games 11.2 Bayesian Networks (BNs) or Dynamic Bayesian Networks (DBNs) 11.2.1 The approach 11.2.2 What does this mean for current commercial games? 11.2.3 Applying this approach to games 11.3 Hidden Markov Models (HMMs) 11.3.1 The approach 11.3.2 What does this mean for current commercial games? 11.3.3 Applying this approach to games 11.4 Markov Decision Process (MDP) 11.4.1 The approach 11.4.2 What does this mean for current commercial games? 11.4.3 Applying this approach to games 11.4.4 An extension of this approach—POMDP 11.4.5 Can we apply POMDP in current commercial games? 11.5 Markov Logic Networks (MLNs) 11.5.1 The approach 11.5.2 What does this mean for current commercial games? 11.5.3 Applying this approach to games 11.6 Recurrent Neural Networks (RNNs) and Deep Recurrent Neural Networks (DRNNs) 11.6.1 The approach 11.6.2 What does this mean for current commercial games? 11.6.3 Applying RNN to games 11.7 Summary Acknowledgments Bibliography Chapter 12: Case Study: Social Network Analysis Applied to In-game Communities to Identify Key Social Players 12.1 The game: Tom Clancy’s The Division (TCTD) 12.2 The dataset 12.3 Data analysis 12.3.1 Identifying influencers 12.3.2 Sampling comparison players: Power users and baseline players 12.3.3 Constructs and measures 12.4 Results 12.4.1 Descriptive statistics 12.4.2 Evaluation of impact 12.4.3 Retention and conversion to influencer 12.5 Discussion and conclusion Chapter 13: Conclusions and Remarks 13.1 Summary of the game data science process 13.2 Few words of advice on validity and reliability of your results 13.2.1 Ensure correctness 13.2.2 Replicability and reproducibility 13.3 Ethics, biases, and data 13.4 Other topics in game data science 13.4.1 Distributed big data 13.4.2 Spatio-temporal analysis 13.4.3 Probabilistic models 13.4.4 Applications of game data science 13.5 Conclusion Bibliography Appendix A: Games Used in the Book VPAL Game Intro House Outside The Sheriff’s Office The Bar Abandoned house Silver mine The Hotel Data Collected through the Game Defense of the Ancients (DOTA) Game Bibliography Index
Donate to keep this site alive
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.