Knowledge Management in the Development of Data-Intensive Systems

by Bedir Tekinerdogan, Bruce R. Maxim, Ivan Mistrik, Matthias Galster

Length: 342 pages
Edition: 1
Language: English
Publisher: Auerbach Publications
Publication Date: 2021-06-16
ISBN-10: 0367430789
ISBN-13: 9780367430788
Sales Rank: #1148983 (See Top 100 Books)

Data-intensive systems are software applications that process and generate Big Data. Data-intensive systems support the use of large amounts of data strategically and efficiently to provide intelligence. For example, examining industrial sensor data or business process data can enhance production, guide proactive improvements of development processes, or optimize supply chain systems. Designing data-intensive software systems is difficult because distribution of knowledge across stakeholders creates a symmetry of ignorance, because a shared vision of the future requires the development of new knowledge that extends and synthesizes existing knowledge.

Knowledge Management in the Development of Data-Intensive Systems addresses new challenges arising from knowledge management in the development of data-intensive software systems. These challenges concern requirements, architectural design, detailed design, implementation and maintenance. The book covers the current state and future directions of knowledge management in development of data-intensive software systems. The book features both academic and industrial contributions which discuss the role software engineering can play for addressing challenges that confront developing, maintaining and evolving systems;data-intensive software systems of cloud and mobile services; and the scalability requirements they imply. The book features software engineering approaches that can efficiently deal with data-intensive systems as well as applications and use cases benefiting from data-intensive systems.

Providing a comprehensive reference on the notion of data-intensive systems from a technical and non-technical perspective, the book focuses uniquely on software engineering and knowledge management in the design and maintenance of data-intensive systems. The book covers constructing, deploying, and maintaining high quality software products and software engineering in and for dynamic and flexible environments. This book provides a holistic guide for those who need to understand the impact of variability on all aspects of the software life cycle. It leverages practical experience and evidence to look ahead at the challenges faced by organizations in a fast-moving world with increasingly fast-changing customer requirements and expectations.

Cover
Half Title
Title Page
Copyright Page
Table of Contents
Foreword
Preface
Acknowledgments
Editors
Contributors
1 Data-Intensive Systems, Knowledge Management, and Software Engineering
	1.1 Introduction
		1.1.1 Big Data – What It Is and What It Is Not?
		1.1.2 Data Science
		1.1.3 Data Mining
		1.1.4 Machine Learning and Artificial Intelligence
	1.2 Data-Intensive Systems
		1.2.1 What Makes a System Data-Intensive?
		1.2.2 Cloud Computing
		1.2.3 Big Data Architecture
	1.3 Knowledge Management
		1.3.1 Knowledge Identification
		1.3.2 Knowledge Creation
		1.3.3 Knowledge Acquisition
		1.3.4 Knowledge Organization
		1.3.5 Knowledge Distribution
		1.3.6 Knowledge Application
		1.3.7 Knowledge Adaption
	1.4 Relating Data-Intensive Systems, Knowledge Management, and Software Engineering
		1.4.1 Relating Knowledge Life Cycle to Software Development Life Cycle
		1.4.2 Artificial Intelligence and Software Engineering
		1.4.3 Knowledge Repositories
	1.5 Management of Software Engineering Knowledge
		1.5.1 Software Engineering Challenges in a Data-Intensive World
		1.5.2 Communication Practices
		1.5.3 Engineering Practices
	1.6 Knowledge Management in Software Engineering Processes
		1.6.1 Requirements Engineering
		1.6.2 Architectural Design
		1.6.3 Design Implementation
		1.6.4 Verification and Validation
		1.6.5 Maintenance and Support
		1.6.6 Software Evolution
	1.7 Development of Data-Intensive Systems
		1.7.1 Software Engineering Challenges
		1.7.2 Building and Maintaining Data-Intensive Systems
			1.7.2.1 Requirements Engineering
			1.7.2.2 Architecture and Design
			1.7.2.3 Debugging, Evolution, and Deployment
			1.7.2.4 Organizational Aspects and Training
		1.7.3 Ensuring Software Quality in Data-Intensive Systems
		1.7.4 Software Design Principles for Data-Intensive Systems
		1.7.5 Data-Intensive System Development Environments
	1.8 Outlook and Future Directions
	References
Part I: CONCEPTS AND MODELS
	2 Software Artifact Traceability in Big Data Systems
		Chapter Points
		2.1 Introduction
		2.2 Background
			2.2.1 Software Requirements Representation
			2.2.2 Traceability
			2.2.3 Big Data
		2.3 Uncertainty in Big Data
			2.3.1 Value
			2.3.2 Variety
			2.3.3 Velocity
			2.3.4 Veracity
			2.3.5 Volume
		2.4 Software Artifacts in the Big Data World
		2.5 Automated Traceability Techniques (State of the Art)
			2.5.1 Automated Traceability Generation
			2.5.2 Semantic Link Discovery and Recovery
		2.6 Traceability Adaptation
		2.7 Discussion
		Acknowledgments
		References
	3 Architecting Software Model Management and Analytics Framework
		3.1 Introduction
		3.2 Preliminaries
			3.2.1 Big Data Analytics
			3.2.2 Architecture Design
		3.3 Approach for Deriving Reference Architecture
		3.4 Big Data Analytics Feature Model
		3.5 Big Data Analytics Reference Architectures
			3.5.1 Lambda Architecture
			3.5.2 Functional Architecture
		3.6 Application Model Analytics Features
		3.7 Related Work and Discussion
		3.8 Conclusion
		References
	4 Variability in Data-Intensive Systems: An Architecture Perspective
		4.1 Introduction
		4.2 Variability in Data-Intensive Systems
			4.2.1 How Variability Occurs in Data-Intensive Systems
			4.2.2 Types of Variability in Data-Intensive Systems
			4.2.3 Variability Management in Data-Intensive Systems
			4.2.4 A Business Perspective
		4.3 The Role of Architecture in Data-Intensive Systems
		4.4 Reference Architectures to Support Data-Intensive Systems
		4.5 Service-Oriented Architecture and Cloud Computing
		4.6 Serverless Architectures for Data-Intensive Systems
		4.7 Ethical Considerations
		4.8 Conclusions
		References
Part II: KNOWLEDGE DISCOVERY AND MANAGEMENT
	5 Knowledge Management via Human-Centric, Domain-Specific Visual Languages for Data-Intensive Software Systems
		5.1 Introduction
		5.2 Motivation
		5.3 Approach
		5.4 High-Level Requirements Capture
			5.4.1 Brainstorming
			5.4.2 Process Definition
		5.5 Design
			5.5.1 Data Management
			5.5.2 Data Processing
		5.6 Deployment
		5.7 Tool Support
		5.8 Discussion
			5.8.1 Experience to Date
			5.8.2 Evaluation
			5.8.3 Strengths and Limitations
		5.9 Summary
		Acknowledgment
		References
	6 Augmented Analytics for Data Mining: A Formal Framework and Methodology
		6.1 Introduction
		6.2 Specific Aims of Research in Augmented Analytics
		6.3 Related Work in Augmented Analytics
			6.3.1 Axiomatic System Design
			6.3.2 Data Preparation and Data Modeling
			6.3.3 Machine Learning for Data Preparation and Data Discovery
			6.3.4 Natural Language Processing for Data Preparation and Data Discovery
			6.3.5 Business Analytics and Data Analytics
		6.4 Proposed Framework and Methodology for Research on Augmented Analytics
		6.5 Applications of Augmented Analytics and Conversational Query Tool
			6.5.1 Conversational Query Tool Architecture
		Acknowledgement
		References
	7 Mining and Managing Big Data Refactoring for Design Improvement: Are We There Yet?
		7.1 Introduction
		7.2 Mining and Detection
		7.3 Refactoring Documentation
		7.4 Refactoring Automation
			7.4.1 Refactoring Tools
			7.4.2 Lack of Use
			7.4.3 Lack of Trust
			7.4.4 Behavior Preservation
		7.5 Refactoring Recommendation
			7.5.1 Structural Relationship
			7.5.2 Semantic Relationship
			7.5.3 Historical Information
		7.6 Refactoring Visualization
		7.7 Conclusion
		References
	8 Knowledge Discovery in Systems of Systems: Observations and Trends
		8.1 Introduction
		8.2 Overview of the State of the Art on Knowledge Management in SoS
		8.3 Data Collection in Systems of Systems
		8.4 Data Integration in Systems of Systems
		8.5 Knowledge Discovery in Systems of Systems
		8.6 Research Agenda
		8.7 Final Considerations
		Acknowledgments
		References
Part III: CLOUD SERVICES FOR DATA-INTENSIVE SYSTEMS
	9 The Challenging Landscape of Cloud Monitoring
		9.1 Introduction
			9.1.1 Cloud Computing and Its Key Features
			9.1.2 Cloud Computing Delivery Options
			9.1.3 Our Contributions
		9.2 Challenges
			9.2.1 Cloud-Generated Logs, Their Importance, and Challenges
				9.2.1.1 Ensuring the Authenticity, Reliability, and Usability of Collected Logs
				9.2.1.2 Trust Among Cloud Participants
				9.2.1.3 Log Tampering Prevention/Detection Challenges
			9.2.2 Cloud Monitoring Challenges
				9.2.2.1 Monitoring Large-Scale Cloud Infrastructure
				9.2.2.2 Unique Cloud Characteristics
				9.2.2.3 Layered Architecture
				9.2.2.4 Access Requirement
				9.2.2.5 Billing and Monitoring Bound
				9.2.2.6 Diverse Service Delivery Options
				9.2.2.7 XaaS and Its Complex Monitoring Requirement
				9.2.2.8 Establishing High-Availability Failover Strategies
		9.3 Solutions
			9.3.1 Examples of Solved Challenges
			9.3.2 Proposed Solution for Authenticity, Reliability, and Usability of Cloud-Generated Logs: LcaaS
				9.3.2.1 Blockchain as a Log Storage Option
				9.3.2.2 LCaaS Technical Details
				9.3.2.3 LCaaS Summary
			9.3.3 Proposed Solution for Monitoring Large-Scale Cloud Infrastructure: Dogfooding
				9.3.3.1 Challenges of Storing and Analyzing Cloud-Generated Logs
				9.3.3.2 Dogfooding Technical Details
				9.3.3.3 Dogfooding Summary
		9.4 Conclusions
		References
	10 Machine Learning as a Service for Software Application Categorization
		10.1 Introduction
		10.2 Background and Related Work
		10.3 Methodology
		10.4 Experimental Results
		10.5 Discussion
		10.6 Conclusion
		References
	11 Workflow-as-a-Service Cloud Platform and Deployment of Bioinformatics Workflow Applications
		11.1 Introduction
		11.2 Related Work
		11.3 Prototype of WaaS Cloud Platform
			11.3.1 CloudBus Workflow Management System
			11.3.2 WaaS Cloud Platform Development
			11.3.3 Implementation of Multiple Workflows Scheduling Algorithm
		11.4 Case Study
			11.4.1 Bioinformatics Applications Workload
				11.4.1.1 Identifying Mutational Overlapping Genes
				11.4.1.2 Virtual Screening for Drug Discovery
			11.4.2 Workload Preparation
			11.4.3 Experimental Infrastructure Setup
			11.4.4 Results and Analysis
				11.4.4.1 More Cost to Gain Faster Execution
				11.4.4.2 Budget Met Analysis
				11.4.4.3 Makespan Evaluation
				11.4.4.4 VM Utilization Analysis
		11.5 Conclusions and Future Work
		References
Part IV: CASE STUDIES
	12 Application-Centric Real-Time Decisions in Practice: Preliminary Findings
		12.1 Introduction
		12.2 Opportunities and Challenges
		12.3 Application-Centric Decision Enablement
		12.4 Method
		12.5 Initial Experiences
			12.5.1 Accounts
			12.5.2 Instrumentation and Control
			12.5.3 Discovery
			12.5.4 Experimentation
			12.5.5 Analysis and Training
		12.6 Knowledge Management
		12.7 Conclusions
		References
	13 Industrial Evaluation of an Architectural Assumption Documentation Tool: A Case Study
		13.1 Introduction
			13.1.1 Relation to Our Previous Work on Architectural Assumption and Their Management
		13.2 Assumptions in Software Development
		13.3 Related Work on AA Documentation
			13.3.1 Approaches used for AA Documentation
			13.3.2 Tools used for AA Documentation
			13.3.3 Relation to Requirements and Architecture
		13.4 Architectural Assumptions Manager – ArAM
			13.4.1 Background
			13.4.2 ArAM in Detail
				13.4.2.1 AA Detail Viewpoint
				13.4.2.2 AA Relationship and Tracing Viewpoint
				13.4.2.3 AA Evolution Viewpoint
				13.4.2.4 Putting It All Together
		13.5 Case Study
			13.5.1 Goal and Research Questions
			13.5.2 Case and Subject Selection
				13.5.2.1 Case Description and Units of Analysis
				13.5.2.2 Case Study Procedure
			13.5.3 Data Collection and Analysis
			13.5.4 Pilot Study
		13.6 Results
			13.6.1 Overview of the Case Study
			13.6.2 Results of RQ1
			13.6.3 Results of RQ2
			13.6.4 Summary of Results of RQs
		13.7 Discussion
			13.7.1 Interpretation of the Results
			13.7.2 Implications for Researchers
			13.7.3 Implications for Practitioners
		13.8 Threats to Validity
		13.9 Conclusions and Future Work
		Acknowledgments
		Appendix
		References
Glossary
Index