Storage Systems: Organization, Performance, Coding, Reliability, and Their Data Processing
- Length: 746 pages
- Edition: 1
- Language: English
- Publisher: Morgan Kaufmann
- Publication Date: 2021-11-12
- ISBN-10: 0323907962
- ISBN-13: 9780323907965
- Sales Rank: #0 (See Top 100 Books)
Storage Systems: Organization, Performance, Coding, Reliability and Their Data Processing covers the coding, reliability and performance of popular RAID organizations: RAID1 mirrored disks, RAID5/6/7 1/2/3-disk failure tolerant – 1/2/3DFT arrays. Readers will learn about the storage of files, SQL and NoSQL databases on disk and SSD to achieve higher efficiency. As data compression, deduplication and encryption techniques for storage systems have led to new technologies, startups and techniques to save power in storage and server systems, the book discusses Fast Array of Wimpy Nodes (FAWN) at CMU, RAMCloud at Stanford, and key-value flash Lightstore at MIT, along with several storage proposals.
Finally, storage technologies from punched cards up to ash memories and beyond are discussed, along with the data placement and the scheduling of magnetic disks.
Provides readers with an in-depth understanding of the architecture and operation of computer storage systems Includes descriptions of various RAID levels, their coding, organization, performance and reliability Covers techniques for efficient and secure data storage through data compression, deduplication and encryption Presents readers with an in-depth understanding of the storage of files and SQL and NoSQL databases
Cover image Title page Table of Contents Copyright Dedication About the author Preface Why this book? Text overview Intended audience and required background to read the book Overview of book chapters Miscellaneous Bibliography Acknowledgments Bibliography Abbreviations and acronyms Chapter 1: Introduction Abstract 1.1. Computer systems after WW II 1.2. High level programming languages - Fortran 1.3. Effect of data representation on storage space requirements 1.4. Basic computer arithmetic 1.5. Author's experience with IBM computers in 1970s 1.6. IBM's System 360 and its successors 1.7. The IBM S/360 computer family 1.8. Operating systems associated with IBM mainframes 1.9. Early computer companies possibly competing with IBM 1.10. My experience at Burroughs Corp. 1.11. Computer company revenue rankings 1.12. Computer structures book 1.13. Computer family architectures - CFA 1.14. Virtual memory and page replacement algorithms 1.15. Memory space fragmentation and dynamic storage allocation 1.16. Analysis of thrashing in 2-phase locking - 2PL systems 1.17. CPU caches 1.18. Multiprogrammed computer systems 1.19. Timesharing systems 1.20. Mean response with FCFS and processor-sharing scheduling 1.21. Analysis of open and closed queueing network models 1.22. Bottleneck analysis and balanced job bounds 1.23. Performance analyses of I/O subsystems 1.24. Vector supercomputers 1.25. Parallel computers 1.26. The future of supercomputing 1.27. Microprocessor CPUs, GPUs, FPGAs, and ASICs 1.28. RISCV and other microprocessors 1.29. The IBM PC and its compatibles 1.30. Storage studies by Alan Jay Smith at Berkeley 1.31. Prefetching 1.32. Database buffers 1.33. Checkpointing in processing large jobs 1.34. Computer related rule of thumb 1.35. Conclusions and summary Bibliography Chapter 2: Storage technologies and their data Abstract 2.1. Evolution of recording material 2.2. Advertising and e-commerce 2.3. Computer storage technologies 2.4. Reliability studies of DRAM, HDDs, & flash SSDs 2.5. Storage Networking Industry Association - SNIA 2.6. Big data and its sources 2.7. Sources of storage content 2.8. Ranking and description of media companies The Cisco Cloud Services Stack - CCSS 2.9. Sources of news: newspapers, radio and TV stations 2.10. Text editing and formatting languages 2.11. Online books sources 2.12. Free book download web sites 2.13. Data, image, audio and video compression 2.14. Main memory data compression 2.15. Data deduplication in storage systems 2.16. Up and coming data deduplication companies 2.17. Storage research at IBM's Almaden Research Center in 1990s 2.18. Cleversafe and its information dispersal technology 2.19. Recent developments at IBM Research at ARC 2.20. Storage research at Hewlett-Packard - HP 2.21. Primary storage vendors and enterprise companies in 2020 2.22. All-flash upstart storage companies 2.23. Hyperconverged infrastructure for storage systems 2.24. Top enterprise storage backup players 2.25. Data storage companies: up and coming storage vendors 2.26. Parallel file systems 2.27. Cloud storage 2.28. Jai Menon's predictions on the future of clouds 2.29. Cloud storage companies 2.30. Distributed systems research related to clouds 2.31. Data encryption 2.32. Conclusions - predictions about storage systems Bibliography Chapter 3: Disk drive data placement and scheduling Abstract 3.1. The organization of Hard Disk Drives - HDDs 3.2. Internal organization of files in UNIX 3.3. Review of disk arm scheduling 3.4. Disk scheduling for mixed workloads 3.5. Real time disk scheduling for multimedia 3.6. Storage virtualization 3.7. File placement on disk 3.8. Disks with Shingled Magnetic Recording - SMR 3.9. Review of analyses of disk scheduling methods 3.10. Analytic studies of disk storage 3.11. Analysis of a zoned disk with the FCFS scheduling 3.12. Performance analysis of the SCAN policy 3.13. Analysis of the SATF policy 3.14. Conclusions Bibliography Chapter 4: Mirrored & hybrid arrays Abstract 4.1. Introduction to mirrored and hybrid disk arrays 4.2. Mirrored and hybrid disk array organizations 4.3. Routing read requests in mirrored disks 4.4. Shortening the tail for response times 4.5. Improving write performance in mirrored disks 4.6. Disks with multiple R/W heads on a single and multiple arms 4.7. Seek distances in single and mirrored disks 4.8. Mirrored disk performance in normal, degraded, rebuild modes 4.9. Protecting against rare event failures in archival systems 4.10. RAIDP: ReplicAtion with IntraDisk Parity for cost effective storage of warm data 4.11. Remote mirroring for disaster recovery 4.12. RAID reliability analysis 4.13. Storage reliability research at IBM's Zurich Research Lab 4.14. Conclusions Bibliography Chapter 5: Redundant Arrays of Independent Disks - RAID Abstract 5.1. Redundant Arrays of Inexpensive Disks 5.2. Early RAID products 5.3. RAID classification and motivation 5.4. RAID0 and striping 5.5. RAID2 5.6. RAID3 5.7. RAID4 5.8. RAID5 5.9. RAID5 performance analysis in normal mode 5.10. RAID(4+k) disk arrays in normal and degraded mode 5.11. Rebuild processing in disk arrays 5.12. Vacationing server model for rebuild processing 5.13. RAID5 sparing configurations for rebuild 5.14. IntraDisk Redundancy - IDR for higher reliability rebuild 5.15. Disk scrubbing for higher reliability rebuild processing 5.16. Predictive Failure Analysis - PFA 5.17. Undetected disk errors and Silent Data Corruption - SDC 5.18. Clustered RAID5 layouts 5.19. Clustered RAID designs by Walter Burkhard et al. at UCSD 5.20. Log-structured file systems and arrays 5.21. RAID6 5.22. Reed-Solomon coding for higher reliability 5.23. Parity based MDS codes 5.24. RDP arrays and their optimal recovery 5.25. EVENODD defined and efficient rebuild of a single disk 5.26. Blaum-Roth - BR code 5.27. X-code disk arrays and rebuild mode with one and two disk failures 5.28. The RM2 disk array 5.29. RAID7 5.30. Erasure coding for distributed storage 5.31. ReGenerating codes 5.32. Protection schemes for flash memories 5.33. Conclusions Bibliography Chapter 6: Coding for multiple disk failures Abstract 6.1. Introduction 6.2. 2-Dimensional string layouts 6.3. Simple data entanglement layouts with high reliability 6.4. Reed-Solomon codes 6.5. A family of MDS block array codes with two parities 6.6. Codes for correcting two erasures with independent parities 6.7. Row-Diagonal Parity - RDP codes 6.8. Short write operations 6.9. Additional reading Bibliography Chapter 7: Saving power in disks, flash memories, and servers Abstract 7.1. Introduction to power consumption in computer systems 7.2. Saving battery power in laptop computers 7.3. Varying spindown threshold based on user behavior 7.4. Exploiting idleness in storage systems 7.5. Making enterprise computers greener by protecting them better 7.6. Policy optimization for dynamic power management 7.7. Managing energy and server resources in hosting centers 7.8. Interplay of energy and performance for RAID running OLTP 7.9. Dynamic speed control for server disk power management 7.10. Approaches to conserve disk energy in network servers 7.11. Energy efficiency through burstiness 7.12. Dempsey: a tool for modeling hard disk power consumption 7.13. MAID - Massive Arrays of Idle Disks alternative to tape storage 7.14. Self-tuning power aware storage cache replacement algorithm 7.15. Popular Data Concentration - PDC 7.16. Disk layout optimization for reducing energy consumption 7.17. Managing server energy and operational costs in hosting centers 7.18. Performance directed energy management for main memory and disks 7.19. Exploiting redundancy to conserve energy in storage systems 7.20. Thermal disk drive design: challenges and possible solutions 7.21. PARAID: the gear-shifting Power-Aware RAID 7.22. DiskGroup: energy efficient disk layout for RAID1 systems 7.23. Pergamum: replacing tape with disk-based archival storage 7.24. Energy efficient RAID - ERAID 7.25. Power reduction via write-offloading 7.26. Redundant Arrays of Hybrid Disks - RAHD 7.27. Achieving power-efficient, erasure-coded storage 7.28. Effect of energy-saving schemes on disk reliability 7.29. Mathematical model of disk reliability versus load and temperature 7.30. Sample-Replicate-Consolidate mapping - SRCMap 7.31. Power Proportional Distributed File Systems - PPDFS 7.32. Dynamic locality improvement to increase effective storage performance 7.33. Disk data reorganization for reducing energy consumption 7.34. File assignment with minimal variance of service time 7.35. Striping-based Energy Aware - SEA placement 7.36. PEARL: Performance, Energy, and ReLiability balanced dynamic data distribution 7.37. Power proportionality for data center storage 7.38. Economic evaluation of energy saving with reliability constraint 7.39. Dynamic server provisioning for data center power management 7.40. Modeling the energy costs of I/O workloads 7.41. Energy proportionality is required in addition to energy efficiency 7.42. SDD design tradeoffs from energy perspective 7.43. Green AI 7.44. Conclusions Bibliography Chapter 8: Database parallelism, big data and analytics, deep learning Abstract 8.1. Stonebraker's classification of computer systems 8.2. Comparison of systems from the viewpoint of CPU performance 8.3. High performance network and channel-based interconnects for storage 8.4. Concurrency and coherency control in data sharing systems 8.5. Combined shared disk and nothing systems 8.6. Parallel systems at IBM Research 8.7. Interconnection networks in IBM's BlueGene/L 8.8. Data allocation and transaction routing in multicomputers 8.9. Data allocation with a distributed relational databases 8.10. Review of multicomputer Data Base Machines - DBMs 8.11. Benchmarking in various forms 8.12. Data Base Machines - DBMs and backend processors 8.13. Head-per-track disks 8.14. Active disks projects 8.15. Multidimensional indices on disk, DRAM, and flash 8.16. Implementing indices in flash memories 8.17. Redesign of relational databases by Stonebraker et al. 8.18. Parallel Data Base Machines - DBMs 8.19. Google File System, Bigtable, and Spanner 8.20. Microsoft Azure 8.21. IBM and other cloud service providers 8.22. Distributed databases in cloud computing 8.23. SpringFS bridging agility and performance in elastic distributed storage 8.24. Snowflake cloud based data warehousing with SQL support 8.25. Review of peer-to-peer computing 8.26. Fast Array of Wimpy Nodes - FAWN 8.27. RAMCloud project at Stanford 8.28. How flash changes the design of database storage engines 8.29. Hybrid Transaction Analytic Processing - HTAP 8.30. Intelligent page store for concurrent txn and query processing 8.31. Oracle Exadata database machine 8.32. Oracle in memory option or Database in Main Memory - DBIM 8.33. MemSQL/SingleStore 8.34. Amazon Aurora 8.35. Transaction processing in the cloud 8.36. RAPID and Oracle AutoML: a fast and predictive AutoML pipeline 8.37. Benchmarking automatic ML frameworks 8.38. Alibaba's X-engine 8.39. RocksDB with ultrafast data access 8.40. LightStore project at MIT 8.41. PinK: high-speed in-storage key-value store with bounded tails 8.42. BlueDBM: an appliance for big data analytics 8.43. WiSer highly available HTAP DBMS for IoT applications 8.44. Raven RDBMS at Microsoft provides ML 8.45. Machine Learning data platform - MLdp 8.46. Databricks 8.47. Fungible - a new storage architecture for big data Ranking of networking companies 8.48. Network requirements for resource disaggregation 8.49. Deep learning and associated hardware 8.50. GPU accelerated database systems 8.51. Graphics Processing Unit - GPU solutions 8.52. Field Programmable Gate Array - FPGA solutions 8.53. Multichip modules 8.54. Unified solutions 8.55. Power consumption in FPGAs and ASICs 8.56. Hybrid approaches to acceleration 8.57. Application Specific Integrated Circuit - ASIC 8.58. Tensorflow and Tensor Processing Units - TPUs 8.59. Increasing computational challenges 8.60. Quantum Neural Nets - QNNs 8.61. Data acceleration examples 8.62. Cerebras wafer size chips vs GPUS 8.63. Conclusions Bibliography Chapter 9: Structured, unstructured, and diverse databases Abstract 9.1. Categories of file systems 9.2. Mainframe count-key-data disk organizations 9.3. Hierarchical and network Data Base Management Systems - DBMSs 9.4. Relational data model 9.5. Ranking methodology for database engines 9.6. Overall ranking of all database types 9.7. Relational database management systems 9.8. Object relational databases 9.9. Data mining 9.10. Data warehousing and OLAP 9.11. Distinct schools of thought in data warehouse design 9.12. Data lakes 9.13. Open source big data projects 9.14. Semi-structured data and its model 9.15. Big data technology and the five Vs 9.16. Hadoop technology ecosphere 9.17. Distributed batch vs inline processing 9.18. NoSQL/non-relational databases 9.19. Key-value stores 9.20. Document stores 9.21. Time-series databases 9.22. Kubernetes and other containers 9.23. Graph databases 9.24. Object-oriented databases 9.25. Search engines for text 9.26. Web search engines 9.27. Resource Description Framework - RDF 9.28. Wide column stores 9.29. Multivalue databases 9.30. Native XML databases 9.31. Realtime stream processing 9.32. Event stores 9.33. Streaming analytics 9.34. Trill: a high-performance incremental query processor for diverse analytics 9.35. Summary of Forrester WaveTM streaming analytics, Q3, 2109 9.36. Content stores 9.37. Multimodel databases 9.38. Main memory databases 9.39. Distributed file systems and object storage 9.40. Enterprise Backup and recovery software solutions 9.41. Analytics and Business Intelligence - ABI platforms 9.42. Blockchain, Bitcoin, Ethereum Bibliography Chapter 10: Heterogeneous Disk Arrays - HDAs Abstract 10.1. Introduction to RAID 10.2. Data allocation in a Heterogeneous Disk Array - HDA 10.3. Analytic justification for HDA 10.4. HDA data allocation experiment setup 10.5. Data allocation experiments 10.6. Rebuild processing in HDA 10.7. RAID+ data layout based on Latin squares 10.8. Related work 10.9. Using utility functions to provision storage systems 10.10. Conclusions Bibliography Chapter 11: Hierarchical RAID - HRAID Abstract 11.1. Introduction to HRAID 11.2. Intranode & internode coding in HRAID 11.3. Concurrency control in HRAID 11.4. RAID IOPS with no disk failures 11.5. RAID IOPS with disk failures 11.6. HRAID response times 11.7. HRAID2/2 performance 11.8. RAID and HRAID reliability 11.9. Shortcut reliability analysis of HRAID 11.10. Simulation to estimate the MTTDL 11.11. Multistep recovery in HRAID 11.12. Related work 11.13. Collective Intelligent Bricks - CIB or Icecube project at IBM 11.14. Conclusions Bibliography Chapter 12: Conclusions Abstract Bibliography Appendix A.1. Books on topics related to storage A.2. ACM, IEEE, USENIX, and their publications A.3. Journals, conferences, and workshops dealing with storage systems A.4. Web sites for trade publications A.5. Storage research in industry A.6. Storage research at universities A.7. Funding agencies, national labs, and research institutes Bibliography Bibliography Bibliography Index
Donate to keep this site alive
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.