Explainable Natural Language Processing

Length: 118 pages
Edition: 1
Language: English
Publisher: Morgan & Claypool
Publication Date: 2021-09-22
ISBN-10: 163639213X
ISBN-13: 9781636392134
Sales Rank: #0 (See Top 100 Books)

This book presents a taxonomy framework and survey of methods relevant to explaining the decisions and analyzing the inner workings of Natural Language Processing (NLP) models. The book is intended to provide a snapshot of Explainable NLP, though the field continues to rapidly grow. The book is intended to be both readable by first-year M.Sc. students and interesting to an expert audience. The book opens by motivating a focus on providing a consistent taxonomy, pointing out inconsistencies and redundancies in previous taxonomies. It goes on to present (i) a taxonomy or framework for thinking about how approaches to explainable NLP relate to one another; (ii) brief surveys of each of the classes in the taxonomy, with a focus on methods that are relevant for NLP; and (iii) a discussion of the inherent limitations of some classes of methods, as well as how to best evaluate them. Finally, the book closes by providing a list of resources for further research on explainability.

Acknowledgments
Introduction
	Two Common Distinctions
		Local and Global Explanations
		Intrinsic and Post-Hoc Explanations
	Shortcomings of Existing Taxonomies
		Guidotti et al. (2018)
		Adadi and Berrada (2019)
		Carvalho et al. (2019)
		Molnar (2019)
		Zhang et al. (2020)
		Danilevsky et al. (2020)
		Das et al. (2020)
		Atanasova et al. (2020)
		Kotonya and Toni (2020)
	The Method-Form Fallacy
	Inconsistent Classifications
	A Novel Taxonomy
A Framework for Explainable NLP
	NLP Architectures
		Linear and Nonlinear Classification
		Recurrent Models
		Transformers
		Overview of Applications and Architectures
	Local and Global Explanations
	Backward Methods
	Forward Explaining by Intermediate Representations
	Forward Explaining by Continuous Outputs
	Forward Explaining by Discrete Outputs
Local-Backward Explanations
	Vanilla Gradients
	Guided Back-Propagation
	Layer-Wise Relevance Propagation
	Deep Taylor Decomposition
	Integrated Gradients
	DeepLift
Global-Backward Explanations
	Post-Hoc Unstructured Pruning
	Lottery Tickets
	Dynamic Sparse Training
	Binary Networks and Sparse Coding
Local-Forward Explanations of Intermediate Representations
	Gates
	Attention
	Attention Roll-Out and Attention Flow
	Layer-Wise Attention Tracing
	Attention Decoding
Global-Forward Explanations of Intermediate Representations
	Gate Pruning
	Attention Head Pruning
Local-Forward Explanations of Continuous Output
	Word Association Norms
	Word Analogies
	Time Step Dynamics
Global-Forward Explanations of Continuous Output
	Correlation of Representations
	Clustering
	Probing Classifiers
	Concept Activation
	Influential Examples
Local-Forward Explanations of Discrete Output
	Challenge Datasets
	Local Uptraining
	Influential Examples
Global-Forward Explanations of Discrete Output
	Uptraining
	Meta-Analysis
	Downstream Evaluation
Evaluating Explanations
	Flavors of Explanations
	Heuristics
	Human Annotations
	Human Experiments
Perspectives
	General Observations
	Beyond Taxonomy
	Moral Foundations of Explanations
Resources
	Code
	Datasets and Benchmarks
Bibliography
Author's Biography