Building Git
by James Coglan
- Length: 717 pages
- Edition: 1
- Language: English
- Publication Date: 2019-02-25
Building Git is a deep dive into the internals of the Git version control system. By rebuilding it in a high-level programming language, we explore the computer science behind this widely used tool. In the process, we gain a deeper understanding of Git itself as well as covering a wide array of broadly applicable programming topics, including:
Unix concepts
- Reading and writing from files, making writes appear atomic, prevent race conditions between processes
- Launching child processes in the foreground and background, communicating with them concurrently
- Displaying output in the terminal, including colour formatting, paged output, and interacting with the user’s text editor
- Parsing various file formats, including Git’s Merkle-tree-based commit model, the index, configuration files and packed object files
Data structures
- How Git stores content on disk to make effective use of space, make the history efficient to search, and make it easy to detect differences between commits
- Using diffs to efficiently update the contents of the workspace when checking out a new commit
- Effectively using simple in-memory data structures to solve programming problems
- Parsing and interpreting a query language for addressing commits
Concurrent editing
- How Git uses branches to model concurrent edits
- Algorithms for detecting differences between file versions and merging branches back together
- Why merge conflicts happen, how they can be avoided, and how Git helps users prevent lost updates
- How merging can be used as the basis for numerous operations to edit the commit history
Software engineering
- Bootstrapping and growing a self-hosting system
- Test-driven development
- Refactoring to enable new feature development
- Crash-only software design that allows programs to be interrupted and resumed
Networking
- Using SSH to bootstrap a network protocol
- How Git repositories communicate to minimise the data they need to transfer when fetching content
- How the network protocol uses atomic operations to prevent users overwriting each other’s changes
Building Git Change history 1. (1.0.0) February 25th 2019 2. (1.0.1) May 22nd 2019 3. (1.0.2) October 31st 2019 4. (1.0.3) April 10th 2020 5. (1.0.4) June 18th 2021 License and acknowledgements 1. Introduction 1.1. Prerequisites 1.2. How the book is structured 1.3. Typographic conventions 1.4. The Jit codebase 2. Getting to know .git 2.1. The .git directory 2.1.1. .git/config 2.1.2. .git/description 2.1.3. .git/HEAD 2.1.4. .git/info 2.1.5. .git/hooks 2.1.6. .git/objects 2.1.7. .git/refs 2.2. A simple commit 2.2.1. .git/COMMIT_EDITMSG 2.2.2. .git/index 2.2.3. .git/logs 2.2.4. .git/refs/heads/master 2.3. Storing objects 2.3.1. The cat-file command 2.3.2. Blobs on disk 2.3.3. Trees on disk 2.3.4. Commits on disk 2.3.5. Computing object IDs 2.3.6. Problems with SHA-1 2.4. The bare essentials I. Storing changes 3. The first commit 3.1. Initialising a repository 3.1.1. A basic init implementation 3.1.2. Handling errors 3.1.3. Running Jit for the first time 3.2. The commit command 3.2.1. Storing blobs 3.2.2. Storing trees 3.2.3. Storing commits 4. Making history 4.1. The parent field 4.1.1. A link to the past 4.1.2. Differences between trees 4.2. Implementing the parent chain 4.2.1. Safely updating .git/HEAD 4.2.2. Concurrency and the filesystem 4.3. Don’t overwrite objects 5. Growing trees 5.1. Executable files 5.1.1. File modes 5.1.2. Storing executables in trees 5.2. Nested trees 5.2.1. Recursive trees in Git 5.2.2. Building a Merkle tree 5.2.3. Flat or nested? 5.3. Reorganising the project 6. The index 6.1. The add command 6.2. Inspecting .git/index 6.3. Basic add implementation 6.4. Storing multiple entries 6.5. Adding files from directories 7. Incremental change 7.1. Modifying the index 7.1.1. Parsing .git/index 7.1.2. Storing updates 7.2. Committing from the index 7.3. Stop making sense 7.3.1. Starting a test suite 7.3.2. Replacing a file with a directory 7.3.3. Replacing a directory with a file 7.4. Handling bad inputs 7.4.1. Non-existent files 7.4.2. Unreadable files 7.4.3. Locked index file 8. First-class commands 8.1. Abstracting the repository 8.2. Commands as classes 8.2.1. Injecting dependencies 8.3. Testing the commands 8.4. Refactoring the commands 8.4.1. Extracting common code 8.4.2. Reorganising the add command 9. Status report 9.1. Untracked files 9.1.1. Untracked files not in the index 9.1.2. Untracked directories 9.1.3. Empty untracked directories 9.2. Index/workspace differences 9.2.1. Changed contents 9.2.2. Changed mode 9.2.3. Size-preserving changes 9.2.4. Timestamp optimisation 9.2.5. Deleted files 10. The next commit 10.1. Reading from the database 10.1.1. Parsing blobs 10.1.2. Parsing commits 10.1.3. Parsing trees 10.1.4. Listing the files in a commit 10.2. HEAD/index differences 10.2.1. Added files 10.2.2. Modified files 10.2.3. Deleted files 10.3. The long format 10.3.1. Making the change easy 10.3.2. Making the easy change 10.3.3. Orderly change 10.4. Printing in colour 11. The Myers diff algorithm 11.1. What’s in a diff? 11.2. Time for some graph theory 11.2.1. Walking the graph 11.2.2. A change of perspective 11.2.3. Implementing the shortest-edit search 11.3. Retracing our steps 11.3.1. Recording the search 11.3.2. And you may ask yourself, how did I get here? 12. Spot the difference 12.1. Reusing status 12.2. Just the headlines 12.2.1. Unstaged changes 12.2.2. A common pattern 12.2.3. Staged changes 12.3. Displaying edits 12.3.1. Splitting edits into hunks 12.3.2. Displaying diffs in colour 12.3.3. Invoking the pager II. Branching and merging 13. Branching out 13.1. Examining the branch command 13.2. Creating a branch 13.3. Setting the start point 13.3.1. Parsing revisions 13.3.2. Interpreting the AST 13.3.3. Revisions and object IDs 14. Migrating between trees 14.1. Telling trees apart 14.2. Planning the changes 14.3. Updating the workspace 14.4. Updating the index 14.5. Preventing conflicts 14.5.1. Single-file status checks 14.5.2. Checking the migration for conflicts 14.5.3. Reporting conflicts 14.6. The perils of self-hosting 15. Switching branches 15.1. Symbolic references 15.1.1. Tracking branch pointers 15.1.2. Detached HEAD 15.1.3. Retaining detached histories 15.2. Linking HEAD on checkout 15.2.1. Reading symbolic references 15.3. Printing checkout results 15.4. Updating HEAD on commit 15.4.1. The master branch 15.5. Branch management 15.5.1. Parsing command-line options 15.5.2. Listing branches 15.5.3. Deleting branches 16. Reviewing history 16.1. Linear history 16.1.1. Medium format 16.1.2. Abbreviated commit IDs 16.1.3. One-line format 16.1.4. Branch decoration 16.1.5. Displaying patches 16.2. Branching histories 16.2.1. Revision lists 16.2.2. Logging multiple branches 16.2.3. Excluding branches 16.2.4. Filtering by changed paths 17. Basic merging 17.1. What is a merge? 17.1.1. Merging single commits 17.1.2. Merging a chain of commits 17.1.3. Interpreting merges 17.2. Finding the best common ancestor 17.3. Commits with multiple parents 17.4. Performing a merge 17.5. Best common ancestors with merges 17.6. Logs in a merging history 17.6.1. Following all commit parents 17.6.2. Hiding patches for merge commits 17.6.3. Pruning treesame commits 17.6.4. Following only treesame parents 17.7. Revisions with multiple parents 18. When merges fail 18.1. A little refactoring 18.2. Null and fast-forward merges 18.2.1. Merging an existing ancestor 18.2.2. Fast-forward merge 18.3. Conflicted index entries 18.3.1. Inspecting the conflicted repository 18.3.2. Stages in the index 18.3.3. Storing entries by stage 18.3.4. Storing conflicts 18.4. Conflict detection 18.4.1. Concurrency, causality and locks 18.4.2. Add/edit/delete conflicts 18.4.3. File/directory conflicts 19. Conflict resolution 19.1. Printing conflict warnings 19.2. Conflicted status 19.2.1. Long status format 19.2.2. Porcelain status format 19.3. Conflicted diffs 19.3.1. Unmerged paths 19.3.2. Selecting stages 19.4. Resuming a merge 19.4.1. Resolving conflicts in the index 19.4.2. Retaining state across commands 19.4.3. Writing a merge commit 20. Merging inside files 20.1. The diff3 algorithm 20.1.1. Worked example 20.1.2. Implementing diff3 20.1.3. Using diff3 during a merge 20.2. Logging merge commits 20.2.1. Unifying hunks 20.2.2. Diffs during merge conflicts 20.2.3. Diffs for merge commits 21. Correcting mistakes 21.1. Removing files from the index 21.1.1. Preventing data loss 21.1.2. Refinements to the rm command 21.2. Resetting the index state 21.2.1. Resetting to a different commit 21.3. Discarding commits from your branch 21.3.1. Hard reset 21.3.2. I’m losing my HEAD 21.4. Escaping from merges 22. Editing messages 22.1. Setting the commit message 22.2. Composing the commit message 22.2.1. Launching the editor 22.2.2. Starting and resuming merges 22.3. Reusing messages 22.3.1. Amending the HEAD 22.3.2. Recording the committer 23. Cherry-picking 23.1. Cherry-picking a single commit 23.1.1. New types of pending commit 23.1.2. Resuming from conflicts 23.2. Multiple commits and ranges 23.2.1. Rev-list without walking 23.2.2. Conflicts during ranges 23.2.3. When all else fails 24. Reshaping history 24.1. Changing old commits 24.1.1. Amending an old commit 24.1.2. Reordering commits 24.2. Rebase 24.2.1. Rebase onto a different branch 24.2.2. Interactive rebase 24.3. Reverting existing commits 24.3.1. Cherry-pick in reverse 24.3.2. Sequencing infrastructure 24.3.3. The revert command 24.3.4. Pending commit status 24.3.5. Reverting merge commits 24.4. Stashing changes III. Distribution 25. Configuration 25.1. The Git config format 25.1.1. Whitespace and comments 25.1.2. Abstract and concrete representation 25.2. Modelling the .git/config file 25.2.1. Parsing the configuration 25.2.2. Manipulating the settings 25.2.3. The configuration stack 25.3. Applications 25.3.1. Launching the editor 25.3.2. Setting user details 25.3.3. Changing diff formatting 25.3.4. Cherry-picking merge commits 26. Remote repositories 26.1. Storing remote references 26.2. The remote command 26.2.1. Adding a remote 26.2.2. Removing a remote 26.2.3. Listing remotes 26.3. Refspecs 26.4. Finding objects 27. The network protocol 27.1. Programs as ad-hoc servers 27.2. Remote agents 27.3. The packet-line protocol 27.4. The pack format 27.4.1. Writing packs 27.4.2. Reading from packs 27.4.3. Reading from a stream 28. Fetching content 28.1. Pack negotiation 28.1.1. Non-fast-forward updates 28.2. The fetch and upload-pack commands 28.2.1. Connecting to the remote 28.2.2. Transferring references 28.2.3. Negotiating the pack 28.2.4. Sending object packs 28.2.5. Updating remote refs 28.2.6. Connecting to remote repositories 28.3. Clone and pull 28.3.1. Pulling and rebasing 28.3.2. Historic disagreement 29. Pushing changes 29.1. Shorthand refspecs 29.2. The push and receive-pack commands 29.2.1. Sending update requests 29.2.2. Updating remote refs 29.2.3. Validating update requests 29.3. Progress meters 30. Delta compression 30.1. The XDelta algorithm 30.1.1. Comparison with diffs 30.1.2. Implementation 30.2. Delta encoding 30.3. Expanding deltas 31. Compressing packs 31.1. Finding similar objects 31.1.1. Generating object paths 31.1.2. Sorting packed objects 31.2. Forming delta pairs 31.2.1. Sliding-window compression 31.2.2. Limiting delta chain length 31.3. Writing and reading deltas 32. Packs in the database 32.1. Indexing packs 32.1.1. Extracting TempFile 32.1.2. Processing the incoming pack 32.1.3. Generating the index 32.1.4. Reconstructing objects 32.1.5. Storing the index 32.2. A new database backend 32.2.1. Reading the pack index 32.2.2. Replacing the backend 32.3. Offset deltas 33. Working with remote branches 33.1. Remote-tracking branches 33.1.1. Logging remote branches 33.1.2. Listing remote branches 33.2. Upstream branches 33.2.1. Setting an upstream branch 33.2.2. Safely deleting branches 33.2.3. Upstream branch divergence 33.2.4. The @{upstream} revision 33.2.5. Fetching and pushing upstream 34. …and everything else IV. Appendices A. Programming in Ruby A.1. Installation A.2. Core language A.2.1. Control flow A.2.2. Error handling A.2.3. Objects, classes, and methods A.2.4. Blocks A.2.5. Constants A.3. Built-in data types A.3.1. true, false and nil A.3.2. Integer A.3.3. String A.3.4. Regexp A.3.5. Symbol A.3.6. Array A.3.7. Range A.3.8. Hash A.3.9. Struct A.4. Mixins A.4.1. Enumerable A.4.2. Comparable A.5. Libraries A.5.1. Digest A.5.2. FileUtils A.5.3. Forwardable A.5.4. Open3 A.5.5. OptionParser A.5.6. Pathname A.5.7. Set A.5.8. Shellwords A.5.9. StringIO A.5.10. StringScanner A.5.11. Time A.5.12. URI A.5.13. Zlib B. Bitwise arithmetic
Donate to keep this site alive
To access the Link, solve the captcha.
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.