Basic Computer Architecture

Length: 682 pages
Edition: 1
Language: English
Publisher: White Falcon Publishing
Publication Date: 2021-09-01
ISBN-10: 1636403034
ISBN-13: 9781636403038
Sales Rank: #0 (See Top 100 Books)

This book is a comprehensive text on basic, undergraduate-level computer architecture. It starts from theoretical preliminaries and simple Boolean algebra. After a quick discussion on logic gates, it describes three classes of assembly languages: a custom RISC ISA called SimpleRisc, ARM, and x86. In the next part, a processor is designed for the SimpleRisc ISA from scratch. This includes the combinational units, ALUs, processor, basic 5-stage pipeline, and a microcode-based design. The last part of the book discusses caches, virtual memory, parallel programming, multiprocessors, storage devices and modern I/O systems. The book’s website has links to slides for each chapter and video lectures hosted on YouTube.

Introduction to Computer Architecture
	What is a Computer?
	Structure of a Typical Desktop Computer
	Computers are Dumb Machines
	The Language of Instructions
	Instruction Set Design
		Complete - The ISA should be able to Implement all User Programs
		Concise – Limited Size of the Instruction Set
		Generic – Instructions should Capture the Common Case
		Simple – Instructions should be Simple
	How to Ensure that an ISA is Complete?
		Towards a Universal ISA*
		Turing Machine*
		Universal Turing Machine*
		A Modified Universal Turing Machine*
		Single Instruction ISA*
		Multiple Instruction ISA*
		Summary of Theoretical Results
	Design of Practical Machines
		Harvard Architecture
		Von Neumann Architecture
		Towards a Modern Machine with Registers and Stacks
	The Road Ahead
		Representing Information
		Processing Information
		Processing More Information
	Summary and Further Reading
		Summary
		Further Reading
I Architecture: Software Interface
	The Language of Bits
		Logical Operations
			Basic Operators
			Derived Operators
			Boolean Algebra
			De Morgan's Laws
			Logic Gates
			Implementing Boolean Functions
			The Road Ahead
		Positive Integers
			Ancient Number Systems
			Binary Number System
			Adding Binary Numbers
			Sizes of Integers
		Negative Integers
			Sign-Magnitude based Representation
			The 1's Complement Approach
			Bias-based Approach
			The 2's Complement Method
		Floating Point Numbers
			Fixed Point Numbers
			Generic Form of Floating Point Numbers
			IEEE 754 Format for Representing Floating Point Numbers
			Denormal Numbers
			Double Precision Numbers
			Floating Point Mathematics
		Strings
			ASCII Format
			UTF-8
			UTF-16 and UTF-32
		Summary and Further Reading
			Summary
			Further Reading
	Assembly Language
		Why Assembly Language
			Software Developer's Perspective
			Hardware Designer's Perspective
		The Basics of Assembly Language
			Machine Model
			View of Memory
			Assembly Language Syntax
			Types of Instructions
			Types of Operands
		SimpleRisc
			Different Instruction Sets
			Model of the SimpleRisc Machine
			Register Transfer Instruction – mov
			Arithmetic Instructions
			Logical Instructions
			Shift Instructions – lsl, lsr, asr
			Data Transfer Instructions: ld and st
			Unconditional Branch Instructions
			Conditional Branch Instructions
			Functions
			Function Call/Return Instructions
			The nop Instruction
			Modifiers
			Encoding the SimpleRisc Instruction Set
		Summary and Further Reading
			Summary
			Further Reading
	ARM® Assembly Language
		The ARM® Machine Model
		Basic Assembly Instructions
			Simple Data Processing Instructions
			Advanced Data-Processing Instructions
			Compare Instructions
			Instructions that Set CPSR Flags – The `S' Suffix
			Data Processing Instructions that use CPSR Flags
			Simple Branch Instructions
			Branch and Link Instruction
			Conditional Instructions
			Load-Store Instructions
		Advanced Features
			Arrays
			Functions
		Encoding the Instruction Set
			Data Processing Instructions
			Load-Store Instructions
			Branch Instructions
		Summary and Further Reading
			Summary
			Further Reading
	x86 Assembly Language
		Overview of the x86 Family of Assembly Languages
			Brief History
			Main Features of the x86 ISA
		x86 Machine Model
			Integer Registers
			Floating Point Registers
			View of Memory
			Addressing Modes
			x86 Assembly Language
		Integer Instructions
			Data Transfer Instructions
			ALU Instructions
			Branch/ Function Call Instructions
			Advanced Memory Instructions
		Floating Point Instructions
			Data Transfer Instructions
			Arithmetic Instructions
			Instructions for Special Functions
			Compare Instruction
			Stack Cleanup Instructions
		Encoding the x86 ISA
			High Level View of x86 Instruction Encoding
		Summary and Further Reading
			Summary
			Further Reading
II Organisation: Processor Design
	Logic Gates, Registers, and Memories
		Silicon based Transistors
			Doping
			P-N Junction
			NMOS Transistor
			PMOS Transistor
			A Basic CMOS based Inverter
			NAND and NOR Gates
		Combinational Logic
			XOR Gate
			Decoder
			Multiplexer
			Demultiplexer
			Encoder
			Priority Encoder
		Sequential Logic
			SR Latch
			The Clock
			Clocked SR Latch
			Edge Sensitive SR Flip-flop
			JK Flip-flop
			D Flip-flop
			Master-slave D Flip-flop
			Metastability
			Registers
		Memories
			Static RAM (SRAM)
			Content Addressable Memory (CAM)
			Dynamic RAM (DRAM)
			Read Only Memory (ROM)
			Programmable Logic Arrays
		Summary and Further Reading
			Summary
			Further Reading
	Computer Arithmetic
		Addition
			Addition of Two 1-bit Numbers
			Addition of Three 1-bit Numbers
			Ripple Carry Adder
			Carry Select Adder
			Carry Lookahead Adder
		Multiplication
			Overview
			Iterative Multiplier
			Booth Multiplier
			An O(log(n)2) Time Algorithm
			Wallace Tree Multiplier
		Division
			Overview
			Restoring Division
			Non-Restoring Division
		Floating Point Addition and Subtraction
			Simple Addition with Same Signs
			Rounding
			Implementing Rounding
			Addition of Numbers with Opposite Signs
			Generic Algorithm for Adding Floating Point Numbers
		Multiplication of Floating Point Numbers
		Division of Floating Point Numbers
			Simple Division
			Goldschmidt Division
			Division Using the Newton-Raphson Method
		Summary and Further Reading
			Summary
			Further Reading
	Processor Design
		Design of a Basic Processor
			Overview
		Units in a Processor
			Instruction Fetch – Fetch Unit
			Data Path and Control Path
			Operand Fetch Unit
			Execute Unit
			Memory Access Unit
			Register Writeback Unit
			The Data Path
		The Control Unit
		Microprogram-Based Processor
		Microprogrammed Data Path
			Fetch Unit
			Decode Unit
			Register File
			ALU
			Memory Unit
			Overview of the Data Path
		Microassembly Language
			Machine Model
			Microinstructions
			Implementing Instructions in the Microassembly Language
			3-Address Format ALU Instructions
			2-Address Format ALU Instructions
			The nop Instruction
			ld and st instructions
			Branch Instructions
		Shared Bus and Control Signals
			Control Signals
			Functional Unit Arguments
		The Microcontrol Unit
			Vertical Microprogramming
			Horizontal Microprogramming
			Tradeoffs between Horizontal and Vertical Microprogramming
		Summary and Further Reading
			Summary
			Further Reading
	Principles of Pipelining
		A Pipelined Processor
			The Notion of Pipelining
			Overview of Pipelining
			Performance Benefits
		Design of a Simple Pipeline
			Splitting the Data Path
			Timing
			The Instruction Packet
		Pipeline Stages
			IF Stage
			OF Stage
			EX Stage
			MA Stage
			RW Stage
			Putting it All Together
		Pipeline Hazards
			The Pipeline Diagram
			Data Hazards
			Control Hazards
			Structural Hazards
		Solutions in Software
			RAW Hazards
			Control Hazards
		Pipeline with Interlocks
			A Conceptual Look at a Pipeline with Interlocks
			Ensuring the Data-Lock Condition
			Ensuring the Branch-Lock condition
		Pipeline with Forwarding
			Basic Concepts
			Forwarding Paths in a Pipeline
			Data Hazards with Forwarding
			Implementation of a Pipeline with Forwarding
			Forwarding Conditions
		Support for Interrupts/ Exceptions*
			Interrupts
			Exceptions
			Precise Exceptions
			Saving and Restoring Program State
			SimpleRisc Assembly Code of an Interrupt Handler
			Processor with Support for Exceptions
		Performance Metrics
			The Performance Equation
			Performance of an Ideal Pipelined Processor
			Performance of a Non-Ideal Pipeline
			Performance of a Suite of Programs
			Inter-Relationship between Performance, the Compiler, Architecture, and Technology
		Power and Temperature Issues
			Overview
			Dynamic Power
			Leakage Power
			Modeling Temperature*
			The ED2 Metric
		Advanced Techniques*
			Branch Prediction
			Multiple Issue In-Order Pipeline
			EPIC and VLIW Processors
			Out-of-Order Pipelines
		Summary and Further Reading
			Summary
			Further Reading
III Organisation: System Design
	The Memory System
		Overview
			Need for a Fast Memory System
			Memory Access Patterns
			Temporal and Spatial Locality of Instruction Accesses
			Characterising Temporal Locality
			Characterising Spatial Locality
			Utilising Spatial and Temporal Locality
			Exploiting Temporal Locality – Hierarchical Memory System
			Exploiting Spatial Locality – Cache Blocks
		Caches
			Overview of a Basic Cache
			Cache Lookup and Cache Design
			Data read and data write Operations
			The insert Operation
			The replace Operation
			The evict Operation
			Putting all the Pieces Together
		The Memory System
			Mathematical Model of the Memory System
			Cache Misses
			Reduction of Hit Time and Miss Penalty
			Summary of Memory System Optimisation Techniques
		Virtual Memory
			Process – A Running Instance of a Program
			The ``Overlap'' and ``Size'' Problems
			Implementation of Virtual Memory with Paging
			Swap Space
			Memory Management Unit (MMU)
			Advanced Features of the Paging System
		Summary and Further Reading
			Summary
			Further Reading
	Multiprocessor Systems
		Background
			Moore's Law
			Implications of the Moore's Law
		Software for Multiprocessor Systems
			Strong and Loosely Coupled Multiprocessing
			Shared Memory vs Message Passing
			Amdahl's Law
		Design Space of Multiprocessors
		MIMD Multiprocessors
			Logical Point of View
			Coherence
			Memory Consistency
			Physical View of Memory
			Shared Caches
			Coherent Private Caches
			Implementing a Memory Consistency Model*
			Multithreaded Processors
		SIMD Multiprocessors
			SIMD – Vector Processors
			Software Interface
			A Practical Example using SSE Instructions
			Predicated Instructions
			Design of a Vector Processor
		Interconnection Networks
			Overview
			Bisection Bandwidth and Network Diameter
			Network Topologies
		Summary and Further Reading
			Summary
			Further Reading
	I/O and Storage Devices
		I/O System – Overview
			Overview
			Requirements of the I/O System
			Design of the I/O System
			Layers in the I/O System
		Physical Layer – Transmission Sublayer
			Single Ended Signalling
			Low Voltage Differential Signalling (LVDS)
			Transmission of Multiple Bits
			Return to Zero (RZ) Protocols
			Manchester Encoding
			Non Return to Zero (NRZ) Protocol
			Non Return to Zero (NRZI) Inverted Protocol
		Physical Layer – Synchronisation Sublayer
			Synchronous Buses
			Source Synchronous Bus*
			Asynchronous Buses
		Data Link Layer
			Framing and Buffering
			Error Detection and Correction
			Arbitration
			Transaction-Oriented Buses
			Split Transaction Buses
		Network Layer
			I/O Port Addressing
			Memory Mapped Addressing
		Protocol Layer
			Polling
			Interrupts
			DMA
		Case Studies – I/O Protocols
			PCI Express®
			SATA
			SCSI and SAS
			USB
			FireWire Protocol
		Storage
			Hard Disks
			RAID Arrays
			Optical Disks – CD, DVD, Blu-ray
			Flash Memory
		Summary and Further Reading
			Summary
			Further Reading
IV Appendix
	Case Studies of Real Processors
		ARM® Processors
			ARM® Cortex® -M3
			ARM® Cortex® -A8
			ARM® Cortex® -A15
		AMD® Processors
			AMD Bobcat
			AMD Bulldozer
		Intel® Processors
			Intel® Atom™
			Intel Sandy Bridge
	Graphics Processors
		Overview
			Graphics Applications
			Graphics Pipeline
			Fusion of High Performance Computing and Graphics Computing
		NVIDIA Tesla Architecture
			Work Distribution
			GPU Compute Engines
			Interconnection Network, DRAM Modules, L2 Caches, and ROPs
		Streaming Multiprocessors (SMs)
		Computation on a GPU
		CUDA Programs

Design & Architecture