The Art of Writing Efficient Programs: An advanced programmer’s guide to efficient hardware utilization and compiler optimizations using C++ examples
Get to grips with various performance improvement techniques such as concurrency, lock-free programming, atomic operations, parallelism, and memory management
- Understand the limitations of modern CPUs and their performance impact
- Find out how you can avoid writing inefficient code and get the best optimizations from the compiler
- Learn the tradeoffs and costs of writing high-performance programs
The great free lunch of “performance taking care of itself” is over. Until recently, programs got faster by themselves as CPUs were upgraded, but that doesn’t happen anymore. The clock frequency of new processors has almost peaked. New architectures provide small improvements to existing programs, but this only helps slightly. Processors do get larger and more powerful, but most of this new power is consumed by the increased number of processing cores and other “extra” computing units. To write efficient software, you now have to know how to program by making good use of the available computing resources, and this book will teach you how to do that.
The book covers all the major aspects of writing efficient programs, such as using CPU resources and memory efficiently, avoiding unnecessary computations, measuring performance, and how to put concurrency and multithreading to good use. You’ll also learn about compiler optimizations and how to use the programming language (C++) more efficiently. Finally, you’ll understand how design decisions impact performance.
By the end of this book, you’ll not only have enough knowledge of processors and compilers to write efficient programs, but you’ll also be able to understand which techniques to use and what to measure while improving performance. At its core, this book is about learning how to learn.
What you will learn
- Discover how to use the hardware computing resources in your programs effectively
- Understand the relationship between memory order and memory barriers
- Familiarize yourself with the performance implications of different data structures and organizations
- Assess the performance impact of concurrent memory accessed and how to minimize it
- Discover when to use and when not to use lock-free programming techniques
- Explore different ways to improve the effectiveness of compiler optimizations
- Design APIs for concurrent data structures and high-performance data structures to avoid inefficiencies
Who this book is for
This book is for experienced developers and programmers who work on performance-critical projects and want to learn different techniques to improve the performance of their code. Programmers who belong to algorithmic trading, gaming, bioinformatics, computational genomics, or computational fluid dynamics communities can learn various techniques from this book and apply them in their domain of work.
Although this book uses the C++ language, the concepts demonstrated in the book can be easily transferred or applied to other compiled languages such as C, Java, Rust, Go, and more.
Table of Contents
- Introduction to Performance and Concurrency
- Performance Measurements
- CPU Architecture, Resources, and Performance Implications
- Memory Architecture and Performance
- Threads, Memory, and Concurrency
- Concurrency and Performance
- Data Structures for Concurrency
- Concurrency in C++
- High-Performance C++
- Compiler Optimizations in C++
- Undefined Behavior and Performance
- Design for Performance
The Art of Writing Efficient Programs Contributors About the author About the reviewer Preface Who is this book for? What this book covers To get the most out of this book Download the example code files Download the color images Conventions used Get in touch Share Your Thoughts Section 1 – Performance Fundamentals Chapter 1: Introduction to Performance and Concurrency Why focus on performance? Why performance matters What is performance? Performance as throughput Performance as power consumption Performance for real-time applications Performance as dependent on context Evaluating, estimating, and predicting performance Learning about high performance Summary Questions Chapter 2: Performance Measurements Technical requirements Performance measurements by example Performance benchmarking C++ chrono timers High-resolution timers Performance profiling The perf profiler Detailed profiling with perf The Google Performance profiler Profiling with call graphs Optimization and inlining Practical profiling Micro-benchmarking Basics of micro-benchmarking Micro-benchmarking and compiler optimizations Google Benchmark Micro-benchmarks are lies Summary Questions Chapter 3: CPU Architecture, Resources, and Performance Technical requirements The performance begins with the CPU Probing performance with micro-benchmarks Visualizing instruction-level parallelism Data dependencies and pipelining Pipelining and branches Branch prediction Profiling for branch mispredictions Speculative execution Optimization of complex conditions Branchless computing Loop unrolling Branchless selection Branchless computing examples Summary Questions Chapter 4: Memory Architecture and Performance Technical requirements The performance begins with the CPU but does not end there Measuring memory access speed Memory architecture Measuring memory and cache speeds The speed of memory: the numbers The speed of random memory access The speed of sequential memory access Memory performance optimizations in hardware Optimizing memory performance Memory-efficient data structures Profiling memory performance Optimizing algorithms for memory performance The ghost in the machine What is Spectre? Spectre by example Spectre, unleashed Summary Questions Chapter 5: Threads, Memory, and Concurrency Technical requirements Understanding threads and concurrency What is a thread? Symmetric multi-threading Threads and memory Memory-bound programs and concurrency Understanding the cost of memory synchronization Why data sharing is expensive Learning about concurrency and order The need for order Memory order and memory barriers Memory order in C++ Memory model Summary Questions Section 2 – Advanced Concurrency Chapter 6: Concurrency and Performance Technical requirements What is needed to use concurrency effectively? Locks, alternatives, and their performance Lock-based, lock-free, and wait-free programs Different locks for different problems Lock-based versus lock-free, what is the real difference? Building blocks for concurrent programming The basics of concurrent data structures Counters and accumulators Publishing protocol Smart pointers for concurrent programming Summary Questions Chapter 7: Data Structures for Concurrency Technical requirements What is a thread-safe data structure? The best kind of thread safety The real thread safety The thread-safe stack Interface design for thread safety Performance of mutex-guarded data structures Performance requirements for different uses Stack performance in detail Performance estimates for synchronization schemes Lock-free stack The thread-safe queue Lock-free queue Non-sequentially consistent data structures Memory management for concurrent data structures The thread-safe list Lock-free list Summary Questions Chapter 8: Concurrency in C++ Technical requirements Concurrency support in C++11 Concurrency support in C++17 Concurrency support in C++20 The foundations of coroutines Coroutine C++ syntax Coroutine examples Summary Questions Section 3 – Designing and Coding High-Performance Programs Chapter 9: High-Performance C++ Technical requirements What is the efficiency of a programming language? Unnecessary copying Copying and argument passing Copying as an implementation technique Copying to store data Copying of return values Using pointers to avoid copying How to avoid unnecessary copying Inefficient memory management Unnecessary memory allocations Memory management in concurrent programs Avoiding memory fragmentation Optimization of conditional execution Summary Questions Chapter 10: Compiler Optimizations in C++ Technical requirements Compilers optimizing code Basics of compiler optimizations Function inlining What does the compiler really know? Lifting knowledge from runtime to compile time Summary Questions Chapter 11: Undefined Behavior and Performance Technical requirements What is undefined behavior? Why have undefined behavior? Undefined behavior and C++ optimization Using undefined behavior for efficient design Summary Questions Chapter 12: Design for Performance Technical requirements Interaction between the design and performance Design for performance The minimum information principle The maximum information principle API design considerations API design for concurrency Copying and sending data Design for optimal data access Performance trade-offs Interface design Component design Errors and undefined behavior Making informed design decisions Summary Questions Assessments Chapter 1: Chapter 2: Chapter 3: Chapter 4: Chapter 5: Chapter 6: Chapter 7: Chapter 8: Chapter 9: Chapter 10: Chapter 11: Chapter 12: Why subscribe? Other Books You May Enjoy Packt is searching for authors like you Share Your Thoughts
How to download source code?
1. Go to:
2. In the Find a repository… box, search the book title:
The Art of Writing Efficient Programs: An advanced programmer’s guide to efficient hardware utilization and compiler optimizations using C++ examples, sometime you may not get the results, please search the main title.
3. Click the book title in the search results.
3. Click Code to download.
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.