Software Source Code: Statistical Modeling

by Debabrata Samanta, Debanjan Konar, Raghavendra Rao Althar, Siddhartha Bhattacharyya

Length: 256 pages
Edition: 1
Language: English
Publisher: de Gruyter
Publication Date: 2021-08-23
ISBN-10: 3110703300
ISBN-13: 9783110703306
Sales Rank: #0 (See Top 100 Books)

This book will focus on utilizing statistical modelling of the software source code, in order to resolve issues associated with the software development processes. Writing and maintaining software source code is a costly business; software developers need to constantly rely on large existing code bases. Statistical modelling identifies the patterns in software artifacts and utilize them for predicting the possible issues.

Title Page
Copyright
Contents
Preface
Chapter 1 Software development processes evolution
    1.1 Introduction
    1.2 Data science evolution
    1.3 Areas and applications of data science
    1.4 Focus areas in software development
    1.5 Objectives
    1.6 Prospective areas for exploration
    1.7 Related work
    1.8 The motivation of the work
    1.9 Software source code and its naturalness characteristics
    1.10 Machine learning in the context of text and code
    1.11 Probabilistic modeling of source code and its inspiration
    1.12 Applications of the source code statistically modeling
    1.13 Code convention inferencing
    1.14 Defects associated with code and patterns recognition
    1.15 Motivation in the area of code translation and copying
    1.16 Inspiration in area of code to text and text to code
    1.17 Inspiration in the area of documentation, traceability
    1.18 Program synthesis opportunities
    1.19 Conclusion
Chapter 2 A probabilistic model for fault prediction across functional and security aspects
    2.1 Introduction
    2.2 Statistical debugging models
        2.2.1 Program logging
        2.2.2 Statistical debugging methods
    2.3 Program sampling framework
        2.3.1 Feature selection
        2.3.2 Utility function
    2.4 Identifying multiple bugs
    2.5 Probabilistic programming
    2.6 Deep representation learning of vulnerabilities
    2.7 ANN for software vulnerability assessment
    2.8 Text analysis techniques for software vulnerabilities assessment
    2.9 Software vulnerability assessment automation
        2.9.1 K-fold cross-validation-based model selection on a time scale
        2.9.2 Aggregation of the feature space
        2.9.3 To assess the influence of the character-based model
    2.10 Vulnerability assessment with stochastic models
        2.10.1 Transition matrix for the life cycle of the vulnerability
        2.10.2 Risk factor computation
        2.10.3 Exploitability exploration model
    2.11 Vulnerabilities modeled based on time series methods
        2.11.1 ARIMA
        2.11.2 Exponential smoothing
    2.12 Conclusion and future work
Chapter 3 Establishing traceability between software development domain artifacts
    3.1 Introduction
    3.2 Overview of requirement traceability problem
    3.3 Software traceability establishing via deep learning techniques
        3.3.1 Deep NLP
        3.3.2 Word embeddings
        3.3.3 Architecture
        3.3.4 Recurrent neural network
        3.3.5 LSTM
        3.3.6 Gated recurrent unit
        3.3.7 Network for tracing
        3.3.8 Tracing network training
    3.4 Cluster hypothesis for requirements tracking
        3.4.1 Methodology
        3.4.2 Experiment on clustering traceability
        3.4.3 Outcome
    3.5 Establishing traceability based on issue reports and commits
        3.5.1 ChangeScribe
        3.5.2 Random forest
        3.5.3 Approach
        3.5.4 Threat to study
        3.5.5 Other related work
        3.5.6 Classification models application in other areas
    3.6 Topic modeling–based traceability for software
        3.6.1 Tracing based on retrospection
        3.6.2 Topic modeling
        3.6.3 LDA application
        3.6.4 LDA drawback
        3.6.5 Topic modeling capturing tool
        3.6.6 Visualization tool
    3.7 Software traceability by machine learning classification
    3.8 Classification method for nonfunctional requirements
        3.8.1 A general way of NFRs classification
        3.8.2 Experiment setup
        3.8.3 Outcomes
    3.9 Conclusion and future work
    3.10 Extension of work to other domains
Chapter 4 Auto code completion facilitation by structured prediction-based auto-completion model
    4.1 Introduction
    4.2 Code suggestion tool for language modeling
    4.3 Neural language model for code suggestions
        4.3.1 Context setting
        4.3.2 Mechanism
        4.3.3 Neural language model
        4.3.4 Attention mechanism
        4.3.5 Pointer network
    4.4 Code summarization with convolution attention network
        4.4.1 Attention model based on convolution
        4.4.2 Attention feature
        4.4.3 Convolution attention model copying
        4.4.4 Data
        4.4.5 Experiment
        4.4.6 Evaluation
    4.5 Neural Turing Machine
        4.5.1 Neuroscience and psychology
        4.5.2 RNN
        4.5.3 Neural Turing Machine
        4.5.4 Content focus
        4.5.5 Location focus
        4.5.6 Controller
        4.5.7 Experiment
    4.6 Program comments prediction with NLP
        4.6.1 Introduction
        4.6.2 Data and training
        4.6.3 Evaluation
        4.6.4 Performance
    4.7 Software naturalness
        4.7.1 Motivation
        4.7.2 Good model
        4.7.3 Method
        4.7.4 N-gram language model to capture regularity in software
        4.7.5 Next token suggestion
        4.7.6 Advantage of language model
        4.7.7 Other work
        4.7.8 Conclusion
    4.8 Conclusion and future work
Chapter 5 Transfer learning and one-shot learning to address software deployment issues
    5.1 Introduction
    5.2 History of software deployment
        5.2.1 Deployment process
        5.2.2 The life cycle of the deployment
        5.2.3 Virtualization case study
        5.2.4 Deployment issues
    5.3 Issues and challenges in deployment
        5.3.1 The granularity of components and environment
        5.3.2 Deployment based on the distribution
        5.3.3 Specifications for architecture
    5.4 Artificial Intelligence deployment as reference
    5.5 DNN model deployment on smartphones
        5.5.1 Software tools for deep learning
        5.5.2 Process of deployment
        5.5.3 Model generation for iOS
        5.5.4 Android model generation
        5.5.5 Software tools for smartphone
        5.5.6 Processors for smartphone
    5.6 Benchmarking metrics for DNN smartphones
        5.6.1 Models of DNN
        5.6.2 Multi-thread format
        5.6.3 Metrics benchmarking
    5.7 Discussion on results
    5.8 Automated software deployment survey
        5.8.1 Images and scripting
        5.8.2 Infrastructure delivery and application decoupling
        5.8.3 Software delivery tools
        5.8.4 Summarization
    5.9 Source code modeling with deep transfer learning
        5.9.1 Approach
        5.9.2 Transfer learning
    5.10 Conclusion
Chapter 6 Enabling intelligent IDEs with probabilistic models
    6.1 Introduction
    6.2 IntelliDE
    6.3 Aggregation of data and acquiring the knowledge
    6.4 General review of data aggregation (approaches)
    6.5 Smart assistance
    6.6 Knowledge graph for software
    6.7 Review of natural language processing (NLP) in software (approaches)
    6.8 IntelliDE with the software knowledge graph
    6.9 Latent topic modeling of source code for IDE (approaches)
        6.9.1 About latent topic modeling
        6.9.2 Latent semantic analysis approach
    6.10 Applications and case study of IntelliDE
    6.11 Summary of work (Lin et al. 2017)
    6.12 Review of probabilistic programming in the context of IDE
    6.13 Knowledge graph possibilities in software development
    6.14 Focus areas of the work
    6.15 Possible extension of Smart IDE (approaches)
    6.16 Constructing the graphs
    6.17 Collaboration graph and its contribution to IDE (approaches)
    6.18 Metrics in graph-based analysis
    6.19 Importance of ranking metrics in the context of smart IDE (approaches)
    6.20 Software structure evolution based on graph characterization
    6.21 Influence of machine translation approach on Smart IDE (approaches)
    6.22 Bug severity prediction
    6.23 Effort prediction
    6.24 Defect count prediction
    6.25 Conclusion of work done on knowledge graphs
    6.26 Overall conclusion
Chapter 7 Natural language processing–based deep learning in the context of statistical modeling for software source code
    7.1 Introduction
    7.2 Deep learning–based NLP review
    7.3 Distributed representations
        7.3.1 Word embeddings
        7.3.2 Word2vec
        7.3.3 Character embedding
    7.4 Convolutional neural network (CNN)
        7.4.1 Basic CNN
        7.4.2 Recurrent Neural Network (RNN)
        7.4.3 Variations of RNN models
        7.4.4 Gated recurrent units (GRU)
        7.4.5 Applications of RNN
    7.5 Recursive neural networks
    7.6 Deep learning for NLP with statistical source code modeling context
    7.7 Key areas of application of deep learning with NLP
Chapter 8 Impact of machine learning in cognitive computing
    8.1 Introduction
    8.2 Architecture for cognitive computing for machine learning
        8.2.1 Machine learning challenges
        8.2.2 General review so far
        8.2.3 Computing stack for cognitive
    8.3 Machine learning technique in cognitive radio
        8.3.1 General review of the section
        8.3.2 Purpose
        8.3.3 Importance of learning in cognitive radio
        8.3.4 Cognitive radio learning problems characteristics
        8.3.5 Supervised vs unsupervised learning perspectives
        8.3.6 Issues associated with learning in cognitive radio
        8.3.7 General review of the section
        8.3.8 Decision-making in cognitive radio
        8.3.9 Supervised classification for cognitive radios
        8.3.10 Cognitive radio-centralized and decentralized learning
        8.3.11 Review of the section so far
    8.4 Conclusion
Chapter 9 Predicting rainfall with a regression model
    9.1 Introduction
    9.2 Regression and time series forecasting with ensemble deep learning
        9.2.1 Context setting
        9.2.2 General review
        9.2.3 Forecasting models
        9.2.4 General review of the section
        9.2.5 Ensemble deep learning methods proposed in this work
        9.2.6 A generic review of this section
        9.2.7 Outcome and comparison
        9.2.8 Wrapping up
        9.2.9 General review of the section
    9.3 Machine learning techniques for rainfall prediction
        9.3.1 Let us begin
        9.3.2 Theory
        9.3.3 General review of the section
        9.3.4 Review of literature
        9.3.5 A generic review of the section
    9.4 Machine learning regression model for rainfall prediction
        9.4.1 Clustering
        9.4.2 SOM
        9.4.3 SVM
        9.4.4 Architecture
        9.4.5 Wrapping up
    9.5 General review of the section
    9.6 Conclusion
Chapter 10 Understanding programming language structure at its full scale
    10.1 Introduction
    10.2 Abstract syntax tree–based representation for source code
    10.3 Further exploration of the AST
    10.4 The motivation for applying AST
    10.5 Activation functions and their role in machine learning architecture
    10.6 The approach used for ASTNN
    10.7 The criticality of vanishing gradient in the context of source code modeling
    10.8 Application of the model
    10.9 Code cloning issue in software source code
    10.10 Experiments with ASTNN
    10.11 Results of the research questions
    10.12 Binary Search Tree (BST) and its significance related to source code structure
    10.13 Work-related to approaches
    10.14 Key concerns
    10.15 Summarization
    10.16 Neural attention model for source code summarization
    10.17 Related approaches
    10.18 Data management
    10.19 Feature engineering in source code
    10.20 CODE-NN model
    10.21 Qualitative analysis and conclusion
    10.22 Overall conclusion
References
Index
    Notes