Game Data Science

by Alessandro Canossa, Anders Drachen, Magy Seif El-Nasr, Truong-Huy D. Nguyen

Length: 416 pages
Edition: 1
Language: English
Publisher: Oxford University Press
Publication Date: 2021-10-19
ISBN-10: 019289787X
ISBN-13: 9780192897879
Sales Rank: #1822333 (See Top 100 Books)

Game data science, defined as the practice of deriving insights from game data, has created a revolution in the multibillion-dollar games industry – informing and enhancing production, design, and development processes. Almost all game companies and academics have now adopted some type of game data science, every tool utilized by game developers allows collecting data from games, yet there has been no definitive resource for academics and professionals in this rapidly developing sector until now.

Games Data Science delivers an excellent introduction to this new domain and provides the definitive guide to methods and practices of computer science, analytics, and data science as applied to video games. It is the ideal resource for academic students and professional learners seeking to understand how data science is used within the game development and production cycle, as well as within the interdisciplinary field of games research.

Organized into chapters that integrate laboratory and game data examples, this book provides a unique resource to train and educate both industry professionals and academics about the use of game data science, with practical exercises and examples on how such processes are implemented and used in academia and industry, interweaving theoretical learning with practical application throughout.

Cover
Game Data Science
Copyright
Foreword
	Intuition vs. analytics
	Generosity, collaboration, and tooling
	Get smart
Preface
	Intended Audience
	Labs and Supplementary Materials
Ackowledgment
Contents
How to Read the Book
Chapter 1: Game Data Science: An Introduction
	1.1 What is game data science?
	1.2 What is game data?
	1.3 Advantages of game data science
	1.4 The historical context for game data science
		1.4.1 The rise of the MMOG
		1.4.2 Social network games
		1.4.3 Democratizing data collection
		1.4.4 Games User Research (GUR)
		1.4.5 Games as a Service
		1.4.6 Rise of machine learning and game data
		1.4.7 Today
	1.5 The process of game data science
	1.6 Game data science: A glossary
		1.6.1 Telemetry, metrics, and KPIs
		1.6.2 Player, performance, and process metrics
			1.6.2.1 Community Metrics
			1.6.2.2 Customer Metrics
			1.6.2.3 Gameplay Metrics
				1.6.2.3.1 Engagement Engagement refers to the player’s commitment to a
				1.6.2.3.2 Acquisition Acquisition is a KPI focused on new players; it
				1.6.2.3.3 Retention Retention is a KPI focused onmaintaining active players
				1.6.2.3.4 Progression Progression metrics are used to gauge the progress
	1.7 Applications of metrics to game data science
	1.8 Your journey begins
	1.9 Takeaways and important terms
	Exercises
	Bibliography
Chapter 2: Data Preprocessing
	2.1 Data preprocessing overview
	2.2 Chapter overview
	2.3 Programming languages and libraries for data preprocessing
	2.4 VPAL data: A data example
	2.5 Measurement types
		2.5.1 Nominal/Categorical data
		2.5.2 Ordinal data
		2.5.3 Ratio and interval data
	2.6 Data representation
		2.6.1 Variables
		2.6.2 Special data
	2.7 Process for preprocessing, cleaning, and normalizing data
		2.7.1 Step 1: Data cleaning
			2.7.1.1 Data Structures: Representing Data In R
			2.7.1.2 Understanding Your Data
			2.7.1.3 Reading And Parsing Files
			2.7.1.4 Data Type Checks
			2.7.1.5 Data Format Checks And Conversions
		2.7.2 Step 2: Data consistency processing
			2.7.2.1 Removing “na”
			2.7.2.2 Removing “na” By Imputation
			2.7.2.3 Identifying And Dealing With Outliers
			2.7.2.4 Identifying And Dealing With Special Types
			2.7.2.5 Identifying And Dealing With Other Types Of Inconsistencies
		2.7.3 Step 3: Data normalization and standardization
			2.7.3.1 Scaling
			2.7.3.2 Standardization
			2.7.3.3 Data Transformation From Skewed To A Normal Distribution
	2.8 Data transformation
	2.9 Summary
	Exercises
	Bibliography
Chapter 3: Introduction to Statistics and Probability Theory
	3.1 Descriptive statistics
		3.1.1 Measures of centrality
			3.1.1.1 Mean
			3.1.1.2 Median
			3.1.1.3 Mode
		3.1.2 Measures of spread (dispersion)
			3.1.2.1 Range
			3.1.2.2 Variance And Standard Deviation
		3.1.3 Relationships between variables
		3.1.4 Correlation analysis
		3.1.5 Modeling
	3.2 Inferential statistics
		3.2.1 Dependent and independent variables
		3.2.2 t-tests
		3.2.3 ANOVA
	3.4 Introduction to probability
		3.4.1 Probability
		3.4.2 Joint probability
		3.4.3 Conditional probability
		3.4.4 Independence
		3.4.5 Chain rule
		3.4.6 Bayes’ theorem
		3.4.7 Some common probability distributions
	3.5 Summary
	Exercises
	Untitled
Chapter 4: Data Abstraction
	4.1 Dataset
	4.2 Feature engineering
	4.3 Feature extraction
		4.3.1 PCA: Principal Component Analysis
			4.3.1.1 How Does Pca Work?
			4.3.1.2 A Practical Example Of Applying Pca
			4.3.1.3 How To Use The Principle Components
		4.3.2 How do we deal with nominal and ordinal measures?
		4.3.3 Pros and cons of using PCA or PCAMix
	4.4 Feature selection
		4.4.1 AIC
		4.4.2 Information entropy
	4.5 Summary
	Exercises
	Bibliography
Chapter 5: Data Analysis through Visualization
	5.1 Heatmaps
		5.1.1 What are heatmaps?
		5.1.2 Heatmaps—Practical guide
		5.1.3 Heatmaps and their use
		5.1.4 Current state of the art of heatmaps
	5.2 Spatio-temporal visualization systems
		5.2.1 Visualizing flow
		5.2.2 Stratmapper
		5.2.3 Current state of the art of spatio-temporal visualization systems
	5.3 State-action transition visualization systems
		5.3.1 Representing states and actions
		5.3.2 Visualizing state-actions in games
		5.3.3 Using state-action transition visualization to compare players’ dialog choices
	5.4 Summary and takeaways
	Exercises
	Acknowledgment
	Bibliography
Chapter 6: Clustering Methods in Game Data Science
	6.1 What is cluster analysis?
	6.2 Clustering for exploratory analysis
	6.3 Clustering models
		6.3.1 Hierarchical clustering
		6.3.2 Centroid clustering
		6.3.3 Distribution clustering
		6.3.4 Density clustering
	6.4 The clustering process
	6.5 Challenges in applying clustering
		6.5.1 Cluster definition
		6.5.2 Similarity definition
		6.5.3 Cluster tendency
		6.5.4 Outliers
	6.6 Evaluation and tuning
		6.6.1 Internal evaluation metrics
		6.6.2 External evaluation metrics
		6.6.3 Tuning methods
	6.7 Partitional methods
		6.7.1 K-means
			6.7.1.1 Assumptions And Target Cluster Type
			6.7.1.2 Method
			6.7.1.3 Hyperparameter Tuning
			6.7.1.4 A Practical Example Of Applying K-means
			6.7.1.5 An Example From The Wild: Tera Online
				6.7.1.5.1 Dataset The dataset fromTERA Online is fromthe game’s open beta
				6.7.1.5.2 Data preparation and analysis Behavioral telemetry can suffer
				6.7.1.5.3 Normalizing and scaling input data As discussed in Chapter 2,
				6.7.1.5.4 Performing clustering on TERA Online data As discussed above,
		6.7.2 Fuzzy C-means (FCM)
			6.7.2.1 Assumptions And Target Cluster Type
			6.7.2.2 Method
			6.7.2.3 Hyperparameter Tuning
			6.7.2.4 A Practical Example Of Applying Fcm
		6.7.3 DBSCAN
			6.7.3.1 Characteristics
			6.7.3.2 Method
			6.7.3.3 Hyperparameter Tuning
	6.8 Hierarchical clustering methods
		6.8.1 Characteristics
		6.8.2 AHC
		6.8.3 Hyperparameter tuning
	6.9 Archetypal Analysis (AA)
		6.9.1 Characteristics
		6.9.2 Method
		6.9.3 Hyperparameter tuning
		6.9.4 Advanced method: Model-based Clustering (MC)
		6.9.5 Characteristics
		6.9.6 Method
		6.9.7 Hyperparameter tuning
	6.10 Advice on applying cluster methods to behavioral telemetry
		6.10.1 Time dependency
		6.10.2 High dimensionality and big data
		6.10.3 Finding the right features
		6.10.4 Mixtures of qualitative and quantitative data
	6.11 Summary
	Exercises
	Untitled
Chapter 7: Supervised Learning in Game Data Science
	7.1 Predictive model categorizations
		7.1.1 Parametric vs. nonparametric models
		7.1.2 Discriminative vs. generative
	7.2 Regression methods
		7.2.1 Linear regression
			7.2.1.1 Method
			7.2.1.2 Goodness Of Fit
		7.2.2 Beyond linear regression
	7.3 Classification methods
		7.3.1 K-Nearest Neighbor (KNN)
			7.3.1.1 Method
			7.3.1.2 Hyperparameter Tuning
		7.3.2 NB
			7.3.2.1 Method
			7.3.2.2 Hyperparameter Tuning
		7.3.3 Logistic regression
			7.3.3.1 Method
		7.3.4 Linear Discriminant Analysis (LDA)
			7.3.4.1 Method
			7.3.4.2 Hyperparameter Tuning
		7.3.5 SVMs
			7.3.5.1 Method
			7.3.5.2 Kernel Methods With Svms
			7.3.5.3 Hyperparameter Tuning
		7.3.6 Decision Trees
			7.3.6.1 Method
			7.3.6.2 Hyperparameter Tuning
		7.3.7 Random Forests
			7.3.7.1 Method
			7.3.7.2 By-products
			7.3.7.3 Hyperparameter Tuning
	7.4 Final remarks
Chapter 8: Supervised Learning in Game Data Science: Model Validation and Evaluation
	8.1 Machine learning pipeline
	8.2 Performance metrics
		8.2.1 Classification metrics
		8.2.2 Area under the ROC Curve (AUC)
		8.2.3 Regression metrics
	8.3 Model validation process
		8.3.1 Cross validation vs. dedicated validation set
		8.3.2 Automated algorithms for validation
	8.4 Model evaluation process
	8.5 Debugging a learned model
		8.5.1 Overfitting and underfitting
		8.5.2 Bias, variance, and irreducible errors
	8.6 Final remarks: Productization
	Exercises
	Bibliography
Chaper 9: Neural Networks
	9.1 Feedforward neural networks (FNNs)
		9.1.1 Structure of the network
		9.1.2 How does an FNN work?
		9.1.3 Training an FNN
		9.1.4 Advantages and disadvantages of FNNs
		9.1.5 DNN in game data science
	9.2 CNNs
		9.2.1 CNN’s architecture
		9.2.2 How does the convolutional layer work?
		9.2.3 An example of CNN: AlexNet
		9.2.4 CNN in game data science
			9.2.4.1 Case Study 1: Player Experience Extraction From Gameplay Video
			9.2.4.2 Deep Convolutional Player Modeling On Log And Level Data
	9.3 Summary and conclusive remarks
Chapter 10: Sequence Analysis of Game Data
	10.1 Data representation: sequences, events, and actions
	10.2 Why sequence data analysis?
		10.2.1 Understanding sequence data
		10.2.2 Sequence data enables temporal and spatial analysis
		10.2.3 Sequence mining enables player profiling: A case study
			10.2.3.1 Overview And Research Approach
			10.2.3.2 Results
			10.2.3.3 Benefits Of Sequence-based Methods
	10.3 DOTA 2 data
	10.4 Representation of sequence data
	10.5 Explorative sequence data analysis
		10.5.1 Plotting sequences and frequent sequences
			10.5.1.1 Examining Overall Patterns
			10.5.1.2 Examining Most Frequent Patterns
			10.5.1.3 Inspecting Frequent Patterns Per Important Game Elements (e.g., Location And Game Segment)
			10.5.1.4 Examining Frequencies Of States Over Time
		10.5.2 Entropy of states over sequence positions
		10.5.3 Transition probabilities
		10.5.4 Summary of exploratory analysis
	10.6 Sequence pattern mining
		10.6.1 What is sequence pattern mining?
		10.6.2 SPADE
		10.6.3 Applying SPADE
		10.6.4 Summary of sequence pattern mining
	10.7 Clustering of sequences
		10.7.1 Simple distance measures based on counts
		10.7.2 Optimal Matching (OM)
		10.7.3 Clustering based on optimal matching distances
	10.8 Summary and takeaways
	Exercises
	Acknowledgments
	Bibliography
Chapter 11: Advanced Sequence Analysis
	11.1 Probabilistic planning-based approach
		11.1.1 The approach
		11.1.2 What does this mean for current commercial games?
		11.1.3 Applying this approach to games
	11.2 Bayesian Networks (BNs) or Dynamic Bayesian Networks (DBNs)
		11.2.1 The approach
		11.2.2 What does this mean for current commercial games?
		11.2.3 Applying this approach to games
	11.3 Hidden Markov Models (HMMs)
		11.3.1 The approach
		11.3.2 What does this mean for current commercial games?
		11.3.3 Applying this approach to games
	11.4 Markov Decision Process (MDP)
		11.4.1 The approach
		11.4.2 What does this mean for current commercial games?
		11.4.3 Applying this approach to games
		11.4.4 An extension of this approach—POMDP
		11.4.5 Can we apply POMDP in current commercial games?
	11.5 Markov Logic Networks (MLNs)
		11.5.1 The approach
		11.5.2 What does this mean for current commercial games?
		11.5.3 Applying this approach to games
	11.6 Recurrent Neural Networks (RNNs) and Deep Recurrent Neural Networks (DRNNs)
		11.6.1 The approach
		11.6.2 What does this mean for current commercial games?
		11.6.3 Applying RNN to games
	11.7 Summary
	Acknowledgments
	Bibliography
Chapter 12: Case Study: Social Network Analysis Applied to In-game Communities to Identify Key Social Players
	12.1 The game: Tom Clancy’s The Division (TCTD)
	12.2 The dataset
	12.3 Data analysis
		12.3.1 Identifying influencers
		12.3.2 Sampling comparison players: Power users and baseline players
		12.3.3 Constructs and measures
	12.4 Results
		12.4.1 Descriptive statistics
		12.4.2 Evaluation of impact
		12.4.3 Retention and conversion to influencer
	12.5 Discussion and conclusion
Chapter 13: Conclusions and Remarks
	13.1 Summary of the game data science process
	13.2 Few words of advice on validity and reliability of your results
		13.2.1 Ensure correctness
		13.2.2 Replicability and reproducibility
	13.3 Ethics, biases, and data
	13.4 Other topics in game data science
		13.4.1 Distributed big data
		13.4.2 Spatio-temporal analysis
		13.4.3 Probabilistic models
		13.4.4 Applications of game data science
	13.5 Conclusion
	Bibliography
Appendix A: Games Used in the Book
	VPAL Game
		Intro House
		Outside
		The Sheriff’s Office
		The Bar
		Abandoned house
		Silver mine
		The Hotel
		Data Collected through the Game
	Defense of the Ancients (DOTA) Game
	Bibliography
Index