Deep Learning for Genomics: Data-driven approaches for genomics applications in life sciences and biotechnology

Length: 270 pages
Edition: 1
Language: English
Publisher: Packt Publishing
Publication Date: 2022-11-11
ISBN-10: 1804615447
ISBN-13: 9781804615447
Sales Rank: #121274 (See Top 100 Books)

Learn concepts, methodologies, and applications of deep learning for building predictive models from complex genomics data sets to overcome challenges in the life sciences and biotechnology industries

Key Features

Apply deep learning algorithms to solve real-world problems in the field of genomics
Extract biological insights from deep learning models built from genomic datasets
Train, tune, evaluate, deploy, and monitor deep learning models for enabling predictions in genomics

Book Description

Deep learning has shown remarkable promise in the field of genomics; however, there is a lack of a skilled deep learning workforce in this discipline. This book will help researchers and data scientists to stand out from the rest of the crowd and solve real-world problems in genomics by developing the necessary skill set. Starting with an introduction to the essential concepts, this book highlights the power of deep learning in handling big data in genomics. First, you’ll learn about conventional genomics analysis, then transition to state-of-the-art machine learning-based genomics applications, and finally dive into deep learning approaches for genomics. The book covers all of the important deep learning algorithms commonly used by the research community and goes into the details of what they are, how they work, and their practical applications in genomics. The book dedicates an entire section to operationalizing deep learning models, which will provide the necessary hands-on tutorials for researchers and any deep learning practitioners to build, tune, interpret, deploy, evaluate, and monitor deep learning models from genomics big data sets.

By the end of this book, you’ll have learned about the challenges, best practices, and pitfalls of deep learning for genomics.

What you will learn

Discover the machine learning applications for genomics
Explore deep learning concepts and methodologies for genomics applications
Understand supervised deep learning algorithms for genomics applications
Get to grips with unsupervised deep learning with autoencoders
Improve deep learning models using generative models
Operationalize deep learning models from genomics datasets
Visualize and interpret deep learning models
Understand deep learning challenges, pitfalls, and best practices

Who this book is for

This deep learning book is for machine learning engineers, data scientists, and academicians practicing in the field of genomics. It assumes that readers have intermediate Python programming knowledge, basic knowledge of Python libraries such as NumPy and Pandas to manipulate and parse data, Matplotlib, and Seaborn for visualizing data, along with a base in genomics and genomic analysis concepts.

Cover
Title Page
Copyright and Credits
Contributors
Table of Contents
Preface
Part 1 – Machine Learning in Genomics
Introducing Machine Learning for Genomics
	What is machine learning?
	Why machine learning for genomics?
	Machine learning for genomics in life sciences and biotechnology
		Exploring machine learning software
		Python programming language
		Visualization
		Biopython
		Scikit-learn
	Summary
Genomics Data Analysis
	Technical requirements
		Installing Biopython
		Matplotlib
	What is a genome?
	Genome sequencing
		Sanger sequencing of nucleic acids
		Evolution of next-generation sequencing
	Analysis of genomic data
		Steps in genomics data analysis
	Introduction to Biopython for genomic data analysis
		What is Biopython?
		Genomic data analysis use case – Sequence analysis of Covid-19
		Calculating GC content
		Calculating nucleotide content
		Dinucleotide content
		Modeling
		Motif finder
		Summary
Machine Learning Methods for Genomic Applications
	Technical requirements
		Python packages
		ML libraries
	Genomics big data
	Supervised and unsupervised ML
		Supervised ML
		Unsupervised ML
	ML for genomics
		The basic workflow of ML in genomics
	An ML use case for genomics – Disease prediction
		Data collection
		Data preprocessing
		EDA
		Data transformation
		Data splitting
		Model training
		Model evaluation
	ML challenges in genomics
	Summary
Part 2 – Deep Learning for Genomic Applications
Deep Learning for Genomics
	Understanding what deep learning is and how it works
		Neural network definition
	Anatomy of deep neural networks
		Key concepts of DNNs
		An example of how neural networks work
		DNN architectures
	DNNs for genomics
		Deep learning workflow for genomics
		Broad application of DNNs in genomics
		Protein structure predictions
		Regulatory genomics
		Gene regulatory networks
		Single-cell RNA sequencing
	Introducing deep learning algorithms and Python libraries
		General deep learning libraries
		Deep learning libraries for genomics
	Summary
Introducing Convolutional Neural Networks for Genomics
	Introduction to CNNs
		What are CNNs?
		Transfer Learning
	CNNs for genomics
	Applications of CNNs in genomics
		DeepBind
		DeepInsight
		DeepChrome
		DeepVariant
	Summary
Recurrent Neural Networks in Genomics
	What are RNNs?
	Introducing RNNs
		How do RNNs work?
	Different RNN architectures
		Bidirectional RNNs (BiLSTM )
		LSTMs and GRUs
		Different types of RNNs
	Applications and use cases of RNNs in genomics
		DeepNano
		ProLanGo
		DanQ
		Understanding RNNs through Transcription Factor Binding Site (TFBS) predictions
	Summary
Unsupervised Deep Learning with Autoencoders
	What is unsupervised DL?
	Types of unsupervised DL
		Clustering
		Anomaly detection
		Association
	What are autoencoders?
		Properties of autoencoders
		How do autoencoders work?
		Architecture of autoencoders
		Types of autoencoders
	Autoencoders for genomics
		Gene expression
		Use case – Predicting gene expression from TCGA pan-cancer RNA-Seq data using denoising autoencoders
	Summary
GANs for Improving Models in Genomics
	What are GANs?
		Differences between Discriminative and Generative models
		Intuition about GANs
		How do GANs work?
	Challenges working with genomics datasets
		What is synthetic data?
	How can GANs help improve models?
	Practical applications of GANs in genomics
		Analysis of ScRNA-Seq data
		Generation of DNA
		Using GANs for augmenting population-scale genomics data
	Summary
Part 3 – Operationalizing models
Building and Tuning Deep Learning Models
	Technical requirements
	DL life cycle
	Data processing
		Data collection
		Data wrangling
		Feature engineering
	Developing models
		Selecting an appropriate algorithm
		Model training
	Tuning the models
		Hyperparameter tuning
		Hyperparameter tuning libraries
		Classification metrics or performance statistics
		Visualizing performance
		Regression metrics
	Use case – Predicting the binding site location of the JunD TF
		Framing the TFBS prediction problem in terms of DL
		Processing the data
		Model training
	Summary
Model Interpretability in Genomics
	What is model interpretability?
		Black-box model interpretability
	Unlocking business value from model interpretability
		Better business decisions
		Building trust
		Profitability
	Model interpretability methods in genomics
		Partial dependence plot
		Individual conditional expectation
		Permuted feature importance
		Global surrogate
		LIME
		Shapley value
		ExSum
		Saliency map
	Use case – Model interpretability for genomics
		Data collection
		Feature extraction
		Target labels
		Train-test split
		Creating a CNN architecture
	Summary
Model Deployment and Monitoring
	Technical requirements
		Streamlit
		Hugging Face
	Introducing model deployment
		Steps in model deployment
		Types of model deployment
		Deploying models as services
		A use case for deploying a DL model as a web service – building a Streamlit application of the CNN model
	Monitoring models using advanced tools
		Why monitor models?
		Reasons for model degradation
		How to monitor DL models
		Advanced tools for model monitoring
		Addressing drifts
	Summary
Challenges, Pitfalls, and Best Practices for Deep Learning in Genomics
	Deep learning challenges regarding genomics
		Lack of flexible tools
		Fewer biological samples
		Computational resource requirements
		Expertise in DL frameworks
		Lack of high-quality labeled data
		Lack of model interpretability
	Common pitfalls for applying deep learning to genomics
		Confounding
		Data leakage
		Imbalanced data
		Improper model comparisons
	Best practices for applying deep learning to genomics
		Understand the problem and know your data better
		A simple model for a simple problem
		Establish a baseline for your model
		Ensure reproducibility
		Using pre-existing models for genomics
		Do not reinvent the rule
		Tune hyperparameters automatically
		Focus on feature engineering
		Normalize the data
		Always perform model interpretation
		Avoid overfitting
	Summary
Index
About Packt
Other Books You May Enjoy

AI & Machine Learning Artificial Intelligence Computer Vision & Pattern Recognition Data Modeling & Design Intelligence & Semantics

Donate to keep this site alive

To access the Link, solve the captcha.

How to download source code?

1. Go to: https://github.com/PacktPublishing

2. In the Find a repository… box, search the book title: Deep Learning for Genomics: Data-driven approaches for genomics applications in life sciences and biotechnology, sometime you may not get the results, please search the main title.