Algorithms in Bioinformatics: Theory and Implementation
- Length: 528 pages
- Edition: 1
- Language: English
- Publisher: Wiley
- Publication Date: 2021-08-10
- ISBN-10: 1119697964
- ISBN-13: 9781119697961
- Sales Rank: #0 (See Top 100 Books)
ALGORITHMS IN BIOINFORMATICS
Explore a comprehensive and insightful treatment of the practical application of bioinformatic algorithms in a variety of fields
Algorithms in Bioinformatics: Theory and Implementation delivers a fulsome treatment of some of the main algorithms used to explain biological functions and relationships. It introduces readers to the art of algorithms in a practical manner which is linked with biological theory and interpretation. The book covers many key areas of bioinformatics, including global and local sequence alignment, forced alignment, detection of motifs, Sequence logos, Markov chains or information entropy. Other novel approaches are also described, such as Self-Sequence alignment, Objective Digital Stains (ODSs) or Spectral Forecast and the Discrete Probability Detector (DPD) algorithm.
The text incorporates graphical illustrations to highlight and emphasize the technical details of computational algorithms found within, to further the reader’s understanding and retention of the material. Throughout, the book is written in an accessible and practical manner, showing how algorithms can be implemented and used in JavaScript on Internet Browsers. The author has included more than 120 open-source implementations of the material, as well as 33 ready-to-use presentations. The book contains original material that has been class-tested by the author and numerous cases are examined in a biological and medical context. Readers will also benefit from the inclusion of:
- A thorough introduction to biological evolution, including the emergence of life, classifications and some known theories and molecular mechanisms
- A detailed presentation of new methods, such as Self-sequence alignment, Objective Digital Stains and Spectral Forecast
- A treatment of sequence alignment, including local sequence alignment, global sequence alignment and forced sequence alignment with full implementations
- Discussions of position-specific weight matrices, including the count, weight, relative frequencies, and log-likelihoods matrices
- A detailed presentation of the methods related to Markov Chains as well as a description of their implementation in Bioinformatics and adjacent fields
- An examination of information and entropy, including sequence logos and explanations related to their meaning
- An exploration of the current state of bioinformatics, including what is known and what issues are usually avoided in the field
- A chapter on philosophical transactions that allows the reader a broader view of the prediction process
- Native computer implementations in the context of the field of Bioinformatics
- Extensive worked examples with detailed case studies that point out the meaning of different results
Perfect for professionals and researchers in biology, medicine, engineering, and information technology, as well as upper level undergraduate students in these fields, Algorithms in Bioinformatics: Theory and Implementation will also earn a place in the libraries of software engineers who wish to understand how to implement bioinformatic algorithms in their products.
Cover Title Page Copyright Contents Preface About the Companion Website Chapter 1 The Tree of Life (I) 1.1 Introduction 1.2 Emergence of Life 1.2.1 Timeline Disagreements 1.3 Classifications and Mechanisms 1.4 Chromatin Structure 1.5 Molecular Mechanisms 1.5.1 Precursor Messenger RNA 1.5.2 Precursor Messenger RNA to Messenger RNA 1.5.3 Classes of Introns 1.5.4 Messenger RNA 1.5.5 mRNA to Proteins 1.5.6 Transfer RNA 1.5.7 Small RNA 1.5.8 The Transcriptome 1.5.9 Gene Networks and Information Processing 1.5.10 Eukaryotic vs. Prokaryotic Regulation 1.5.11 What Is Life? 1.6 Known Species 1.7 Approaches for Compartmentalization 1.7.1 Two Main Approaches for Organism Formation 1.7.2 Size and Metabolism 1.8 Sizes in Eukaryotes 1.8.1 Sizes in Unicellular Eukaryotes 1.8.2 Sizes in Multicellular Eukaryotes 1.9 Sizes in Prokaryotes 1.10 Virus Sizes 1.10.1 Viruses vs. the Spark of Metabolism 1.11 The Diffusion Coefficient 1.12 The Origins of Eukaryotic Cells 1.12.1 Endosymbiosis Theory 1.12.2 DNA and Organelles 1.12.3 Membrane‐bound Organelles with DNA 1.12.4 Membrane‐bound Organelles Without DNA 1.12.5 Control and Division of Organelles 1.12.6 The Horizontal Gene Transfer 1.12.7 On the Mechanisms of Horizontal Gene Transfer 1.13 Origins of Eukaryotic Multicellularity 1.13.1 Colonies Inside an Early Unicellular Common Ancestor 1.13.2 Colonies of Early Unicellular Common Ancestors 1.13.3 Colonies of Inseparable Early Unicellular Common Ancestors 1.13.4 Chimerism and Mosaicism 1.14 Conclusions Chapter 2 Tree of Life: Genomes (II) 2.1 Introduction 2.2 Rules of Engagement 2.3 Genome Sizes in the Tree of Life 2.3.1 Alternative Methods 2.3.2 The Weaving of Scales 2.3.3 Computations on the Average Genome Size 2.3.4 Observations on Data 2.4 Organellar Genomes 2.4.1 Chloroplasts 2.4.2 Apicoplasts 2.4.3 Chromatophores 2.4.4 Cyanelles 2.4.5 Kinetoplasts 2.4.6 Mitochondria 2.5 Plasmids 2.6 Virus Genomes 2.7 Viroids and Their Implications 2.8 Genes vs. Proteins in the Tree of Life 2.9 Conclusions Chapter 3 Sequence Alignment (I) 3.1 Introduction 3.2 Style and Visualization 3.3 Initialization of the Score Matrix 3.4 Calculation of Scores 3.4.1 Initialization of the Score Matrix for Global Alignment 3.4.2 Initialization of the Score Matrix for Local Alignment 3.4.3 Optimization of the Initialization Steps 3.4.4 Curiosities 3.5 Traceback 3.6 Global Alignment 3.7 Local Alignment 3.8 Alignment Layout 3.9 Local Sequence Alignment – The Final Version 3.10 Complementarity 3.11 Conclusions Chapter 4 Forced Alignment (II) 4.1 Introduction 4.2 Global and Local Sequence Alignment 4.2.1 Short Notes 4.2.2 Understanding the Technology 4.2.3 Main Objectives 4.3 Experiments and Discussions 4.3.1 Alignment Layout 4.3.2 Forced Alignment Regime 4.3.3 Alignment Scores and Significance 4.3.4 Optimal Alignments 4.3.5 The Main Significance Scores 4.3.6 The Information Content 4.3.7 The Match Percentage 4.3.8 Significance vs. Chance 4.3.9 The Importance of Randomness 4.3.10 Sequence Quality and the Score Matrix 4.3.11 The Significance Threshold 4.3.12 Optimal Alignments by Numbers 4.3.13 Chaos Theory on Sequence Alignment 4.3.14 Image‐Encoding Possibilities 4.4 Advanced Features and Methods 4.4.1 Sequence Detector 4.4.2 Parameters 4.4.3 Heatmap 4.4.4 Text Visualization 4.4.5 Graphics for Manuscript Figures and Didactic Presentations 4.4.6 Dynamics 4.4.7 Independence 4.4.8 Limits 4.4.9 Local Storage 4.5 Conclusions Chapter 5 Self‐Sequence Alignment (I) 5.1 Introduction 5.2 True Randomness 5.3 Information and Compression Algorithms 5.4 White Noise and Biological Sequences 5.5 The Mathematical Model 5.5.1 A Concrete Example 5.5.2 Model Dissection 5.5.3 Conditions for Maxima and Minima 5.6 Noise vs. Redundancy 5.7 Global and Local Information Content 5.8 Signal Sensitivity 5.9 Implementation 5.9.1 Global Self‐Sequence Alignment 5.9.2 Local Self‐Sequence Alignment 5.10 A Complete Scanner for Information Content 5.11 Conclusions Chapter 6 Frequencies and Percentages (II) 6.1 Introduction 6.2 Base Composition 6.3 Percentage of Nucleotide Combinations 6.4 Implementation 6.5 A Frequency Scanner 6.6 Examples of Known Significance 6.7 Observation vs. Expectation 6.8 A Frequency Scanner with a Threshold 6.9 Conclusions Chapter 7 Objective Digital Stains (III) 7.1 Introduction 7.2 Information and Frequency 7.3 The Objective Digital Stain 7.3.1 A 3D Representation Over a 2D Plane 7.3.2 ODSs Relative to the Background 7.4 Interpretation of ODSs 7.5 The Significance of the Areas in the ODS 7.6 Discussions 7.6.1 A Similarity Between Dissimilar Sequences 7.7 Conclusions Chapter 8 Detection of Motifs (I) 8.1 Introduction 8.2 DNA Motifs 8.2.1 DNA‐binding Proteins vs. Motifs and Degeneracy 8.2.2 Concrete Examples of DNA Motifs 8.3 Major Functions of DNA Motifs 8.3.1 RNA Splicing and DNA Motifs 8.4 Conclusions Chapter 9 Representation of Motifs (II) 9.1 Introduction 9.2 The Training Data 9.3 A Visualization Function 9.4 The Alignment Matrix 9.5 Alphabet Detection 9.6 The Position‐Specific Scoring Matrix (PSSM) Initialization 9.7 The Position Frequency Matrix (PFM) 9.8 The Position Probability Matrix (PPM) 9.8.1 A Kind of PPM Pseudo‐Scanner 9.9 The Position Weight Matrix (PWM) 9.10 The Background Model 9.11 The Consensus Sequence 9.11.1 The Consensus – Not Necessarily Functional 9.12 Mutational Intolerance 9.13 From Motifs to PWMs 9.14 Pseudo‐Counts and Negative Infinity 9.15 Conclusions Chapter 10 The Motif Scanner (III) 10.1 Introduction 10.2 Looking for Signals 10.3 A Functional Scanner 10.4 The Meaning of Scores 10.4.1 A Score Value Above Zero 10.4.2 A Score Value Below Zero 10.4.3 A Score Value of Zero 10.5 Conclusions Chapter 11 Understanding the Parameters (IV) 11.1 Introduction 11.2 Experimentation 11.2.1 A Scanner Implementation Based on Pseudo‐Counts 11.2.2 A Scanner Implementation Based on Propagation of Zero Counts 11.3 Signal Discrimination 11.4 False‐Positive Results 11.5 Sensitivity Adjustments 11.6 Beyond Bioinformatics 11.7 A Scanner That Uses a Known PWM 11.8 Signal Thresholds 11.8.1 Implementation and Filter Testing 11.9 Conclusions Chapter 12 Dynamic Backgrounds (V) 12.1 Introduction 12.2 Toward a Scanner with Two PFMs 12.2.1 The Implementation of Dynamic PWMs 12.2.2 Issues and Corrections for Dynamic PWMs 12.2.3 Solutions for Aberrant Positive Likelihood Values 12.3 A Scanner with Two PFMs 12.4 Information and Background Frequencies on Score Values 12.5 Dynamic Background vs. Null Model 12.6 Conclusions Chapter 13 Markov Chains: The Machine (I) 13.1 Introduction 13.2 Transition Matrices 13.3 Discrete Probability Detector 13.3.1 Alphabet Detection 13.3.2 Matrix Initialization 13.3.3 Frequency Detection 13.3.4 Calculation of Transition Probabilities 13.3.5 Particularities in Calculating the Transition Probabilities 13.4 Markov Chains Generators 13.4.1 The Experiment 13.4.2 The Implementation 13.4.3 Simulation of Transition Probabilities 13.4.4 The Markov machine 13.4.5 Result Verification 13.5 Conclusions Chapter 14 Markov Chains: Log Likelihood (II) 14.1 Introduction 14.2 The Log‐Likelihood Matrix 14.2.1 A Log‐Likelihood Matrix Based on the Null Model 14.2.2 A Log‐Likelihood Matrix Based on Two Models 14.3 Interpretation and Use of the Log‐Likelihood Matrix 14.4 Construction of a Markov Scanner 14.5 A Scanner That Uses a Known LLM 14.6 The Meaning of Scores 14.7 Beyond Bioinformatics 14.8 Conclusions Chapter 15 Spectral Forecast (I) 15.1 Introduction 15.2 The Spectral Forecast Model 15.3 The Spectral Forecast Equation 15.4 The Spectral Forecast Inner Workings 15.4.1 Each Part on a Single Matrix 15.4.2 Both Parts on a Single Matrix 15.4.3 Both Parts on Separate Matrices 15.4.4 Concrete Example 1 15.4.5 Concrete Example 2 15.4.6 Concrete Example 3 15.5 Implementations 15.5.1 Spectral Forecast for Signals 15.5.2 What Does the Value of d Mean? 15.5.3 Spectral Forecast for Matrices 15.6 The Spectral Forecast Model for Predictions 15.6.1 The Spectral Forecast Model for Signals 15.6.2 Experiments on the Similarity Index Values 15.6.3 The Spectral Forecast Model for Matrices 15.7 Conclusions Chapter 16 Entropy vs. Content (I) 16.1 Introduction 16.2 Information Entropy 16.3 Implementation 16.4 Information Content vs. Information Entropy 16.4.1 Implementation 16.4.2 Additional Considerations 16.5 Conclusions Chapter 17 Philosophical Transactions 17.1 Introduction 17.2 The Frame of Reference 17.2.1 The Fundamental Layer of Complexity 17.2.2 On the Complexity of Life 17.3 Random vs. Pseudo‐random 17.4 Random Numbers and Noise 17.5 Determinism and Chaos 17.5.1 Chaos Without Noise 17.5.2 Chaos with Noise 17.5.3 Limits of Prediction 17.5.4 On the Wings of Chaos 17.6 Free Will and Determinism 17.6.1 The Greatest Disappointment 17.6.2 The Most Powerful Processor in Existence 17.6.3 Certainty vs. Interpretation 17.6.4 A Wisdom that Applies 17.7 Conclusions Appendix A Appendix A A.1 Association of Numerical Values with Letters A.2 Sorting Values on Columns A.3 The Implementation of a Sequence Logo A.4 Sequence Logos Based on Maximum Values A.5 Using Logarithms to Build Sequence Logos A.6 From a Motif Set to a Sequence Logo References Index EULA
Donate to keep this site alive
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.