Data Deduplication Approaches: Concepts, Strategies, and Challenges
- Length: 404 pages
- Edition: 1
- Language: English
- Publisher: Academic Press
- Publication Date: 2020-12-11
- ISBN-10: 0128233958
- ISBN-13: 9780128233955
- Sales Rank: #0 (See Top 100 Books)
In the age of data science, the rapidly increasing amount of data is a major concern in numerous applications of computing operations and data storage. Duplicated data or redundant data is a main challenge in the field of data science research. Data Deduplication Approaches: Concepts, Strategies, and Challenges shows readers the various methods that can be used to eliminate multiple copies of the same files as well as duplicated segments or chunks of data within the associated files. Due to ever-increasing data duplication, its deduplication has become an especially useful field of research for storage environments, in particular persistent data storage. Data Deduplication Approaches provides readers with an overview of the concepts and background of data deduplication approaches, then proceeds to demonstrate in technical detail the strategies and challenges of real-time implementations of handling big data, data science, data backup, and recovery. The book also includes future research directions, case studies, and real-world applications of data deduplication, focusing on reduced storage, backup, recovery, and reliability.
Cover image Title page Table of Contents Copyright Dedication List of contributors About the editors Preface Acknowledgement 1. Introduction to data deduplication approaches Abstract 1.1 Introduction 1.2 Methods of data deduplication 1.3 Classic research and classification of methods 1.4 File chunking and metadata 1.5 Implementation strategies 1.6 Performance evaluation and concluding remarks References 2. Data deduplication concepts Abstract 2.1 History 2.2 Need of data deduplication 2.3 Techniques for data redundancy removal 2.4 Problems with existing techniques 2.5 Redundant arrays of independent disks 2.6 Direct attached storage 2.7 Storage area network 2.8 Network attached storage 2.9 Comparison between direct attached storage, network attached storage, and storage area network 2.10 Data deduplication techniques 2.11 Benefits of data deduplication 2.12 How data deduplication operates 2.13 Hashing 2.14 Deduplication taxonomy 2.15 Deduplication versus compression 2.16 Challenges in data deduplication References 3. Concepts, strategies, and challenges of data deduplication Abstract 3.1 Deduplication approaches 3.2 Required components for data deduplication approaches 3.3 Centered on granularity for elimination of data duplication 3.4 Centered on location for elimination of data duplication 3.5 Centered on time for elimination of data duplication 3.6 Comparative discussion on different studied and prevailing data deduplication approaches and its challenges 3.7 Summary References 4. Existing mechanisms for data deduplication Abstract 4.1 Introduction 4.2 Classification of data deduplication techniques 4.3 Data deduplication in the cloud 4.4 Deduplication ratio 4.5 Importance of data deduplication 4.6 Deduplication for big data 4.7 Conclusion References 5. Classification criteria for data deduplication methods Abstract 5.1 Introduction 5.2 Granularity 5.3 Technique to handle duplicates 5.4 Locality assumptions for efficiency 5.5 Place 5.6 Time 5.7 Data format awareness 5.8 Indexing and techniques to find duplicates 5.9 Scope 5.10 Data type 5.11 Storage type 5.12 Conclusion References 6. File chunking approaches Abstract 6.1 Introduction 6.2 Materials and methods 6.3 File-level chunking 6.4 Implementation of file chunking 6.5 Case study: Deduplicator 6.6 Case study: Duplicates Cleaner 6.7 Conclusion 6.8 Bibliographic note 6.9 Supporting GitHub repositories and blogs References 7. Study of data deduplication for file chunking approaches Abstract 7.1 Introduction 7.2 Related literature 7.3 Conclusion References 8. Essentials of data deduplication using open-source toolkit Abstract 8.1 Introduction 8.2 Basic deduplication structure 8.3 Implementation using Python 8.4 Record linkage toolkit 8.5 Summary References 9. Efficient data deduplication scheme for scale-out distributed storage Abstract 9.1 Introduction 9.2 Distributed storage system 9.3 Related work 9.4 Overview of capacity optimization for scale-out distributed storage 9.5 Bloom filter array–based data deduplication scheme for scale-out distributed storage 9.6 Ensuring reliability in deduplication data by erasure-coded replication 9.7 Summary References 10. Identification of duplicate bug reports in software bug repositories: a systematic review, challenges, and future scope Abstract 10.1 Introduction 10.2 Motivation 10.3 Duplicate bug detection 10.4 Systematic review 10.5 Conclusion, challenges, and future scope References 11. A survey and critical analysis on energy generation from datacenter Abstract 11.1 Introduction 11.2 Datacenter framework 11.3 Power supply among different components of datacenter 11.4 Power distribution among different components of datacenter 11.5 Significance of efficient energy consumption models 11.6 Energy consumption reduction approaches 11.7 Conclusion References 12. Review of MODIS EVI and NDVI data for data mining applications Abstract 12.1 Introduction 12.2 MODIS vegetation indices 12.3 MODIS sinusoidal tiling system 12.4 MODIS file naming conversion 12.5 Data conversion 12.6 Quality assurance 12.7 Techniques to prepare EVI time series data set 12.8 Data mining–based land cover change detection 12.9 Summary References 13. Performance modeling for secure migration processes of legacy systems to the cloud computing Abstract 13.1 Data migration in cloud computing 13.2 Literature review 13.3 Proposed work 13.4 Proposed encryption approach 13.5 Result and conclusion References 14. DedupCloud: an optimized efficient virtual machine deduplication algorithm in cloud computing environment Abstract 14.1 Introduction 14.2 Motivation 14.3 Literature review 14.4 Data deduplication on cloud storage systems 14.5 DedupCloud: proposed methodology for data deduplication in cloud 14.6 Conclusion References 15. Data deduplication for cloud storage Abstract 15.1 Introduction 15.2 Cloud storage 15.3 Data deduplication for cloud storage 15.4 Conclusion References 16. Data duplication using Amazon Web Services cloud storage Abstract 16.1 Introduction 16.2 The workflow of data deduplication 16.3 Deduplication in Amazon Web Services 16.4 How to deduplicate 16.5 Integrate and deduplicate datasets using AWS Lake Formation FindMatches 16.6 Additional services and benefits 16.7 Comparison of Cloud backup services with AWS, GCP, Azure 16.8 Key terms and definitions References 17. Game-theoretic analysis of encrypted cloud data deduplication Abstract 17.1 Introduction 17.2 Related work review and open research problems 17.3 Preliminaries and notations 17.4 Game-theoretic analysis of server-controlled deduplication 17.5 Game-theoretic analysis of client-controlled deduplication 17.6 Conclusion and future work Acknowledgment References 18. Data deduplication applications in cognitive science and computer vision research Abstract 18.1 Introduction 18.2 Redundancy and dimensionality reduction 18.3 Interactive deduplication 18.4 Image-specific data deduplication 18.5 Cognitive science load and dimensionality problem 18.6 Conclusion References Index
Donate to keep this site alive
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.