Build Your Own Programming Language: A programmer’s guide to designing compilers, interpreters, and DSLs for solving modern computing problems
- Length: 494 pages
- Edition: 1
- Language: English
- Publisher: Packt Publishing
- Publication Date: 2021-12-31
- ISBN-10: 1800204809
- ISBN-13: 9781800204805
- Sales Rank: #226014 (See Top 100 Books)
Written by the creator of the Unicon programming language, this book will show you how to implement programming languages to reduce the time and cost of creating applications for new or specialized areas of computing
Key Features
- Reduce development time and solve pain points in your application domain by building a custom programming language
- Learn how to create parsers, code generators, file readers, analyzers, and interpreters
- Create an alternative to frameworks and libraries to solve domain-specific problems
Book Description
The need for different types of computer languages is growing rapidly and developers prefer creating domain-specific languages for solving specific application domain problems. Building your own programming language has its advantages. It can be your antidote to the ever-increasing size and complexity of software. However, creating a custom language isn’t easy.
In this book, you’ll be able to put the knowledge you gain to work in language design and implementation. You’ll implement the frontend of a compiler for your language, including a lexical analyzer and parser. The book covers a series of traversals of syntax trees, culminating with code generation for a bytecode virtual machine. Moving ahead, you’ll learn how domain-specific language (DSL) features are often best represented by operators and functions that are built into the language, rather than library functions. The book concludes by showing you how to implement garbage collection, including reference counting and mark-and-sweep garbage collection. Throughout the book, Dr. Jeffery weaves in his experience of building the Unicon programming language to give better context to the concepts, while providing relevant examples in Unicon and Java.
By the end of this book, you’ll be able to build and deploy your own domain-specific languages, capable of compiling and running programs.
What you will learn
- Perform requirements analysis for the new language and design language syntax and semantics
- Write lexical and context-free grammar rules for common expressions and control structures
- Develop a scanner that reads source code and generate a parser that checks syntax
- Build key data structures in a compiler and use your compiler to build a syntax-coloring code editor
- Implement a bytecode interpreter and run bytecode generated by your compiler
- Write tree traversals that insert information into the syntax tree
- Implement garbage collection in your language
Who this book is for
This book is for software developers interested in the idea of inventing their own language or developing a domain-specific language. Computer science students taking compiler construction courses will also find this book highly useful as a practical guide to language implementation to supplement more theoretical textbooks. Intermediate-level knowledge and experience working with a high-level language such as Java or the C++ language are expected to help you get the most out of this book.
Table of Contents
- Why Build Another Programming Language
- Programming Language Design
- Scanning Source Code
- Parsing
- Syntax Trees
- Symbol Tables
- Checking Base Types
- Checking Types on Arrays, Method Calls, and Structure Accesses
- Intermediate Code Generation
- Syntax Cloning in an IDE
- Bytecode Interpreters
- Generating Bytecode
- Native Code Generation
- Implementing Operators and Built-In Functions
- Domain Control Structures
- Garbage Collection
- Final Thoughts
- Appendix A – Unicon Essentials
Build Your Own Programming Language Build Your Own Programming Language 1 Why Build Another Programming Language? So, you want to write your own programming language… Types of programming language implementations Organizing a bytecode language implementation Languages used in the examples Language versus library – what’s the difference? Applicability to other software engineering tasks Establishing the requirements for your language Case study – requirements that inspired the Unicon language Unicon requirement #1 – preserve what people love about Icon Unicon requirement #2 – support large-scale programs working on big data Unicon requirement #3 – high-level input/output for modern applications Unicon requirement #4 – provide universally implementable system interfaces Summary Questions 2 Programming Language Design Determining the kinds of words and punctuation to provide in your language Specifying the control flow Deciding on what kinds of data to support Atomic types Composite types Domain-specific types Overall program structure Completing the Jzero language definition Case study – designing graphics facilities in Unicon Language support for 2D graphics Adding support for 3D graphics Summary Questions 3 Scanning Source Code Technical requirements Lexemes, lexical categories, and tokens Regular expressions Regular expression rules Regular expression examples Using UFlex and JFlex Header section Regular expressions section Writing a simple source code scanner Running your scanner Tokens and lexical attributes Expanding our example to construct tokens Writing a scanner for Jzero The Jzero flex specification Unicon Jzero code Java Jzero code Running the Jzero scanner Regular expressions are not always enough Summary Questions Answers 4 Parsing Technical requirements Analyzing syntax Understanding context-free grammars Writing context-free grammar rules Writing rules for programming constructs Using iyacc and BYACC/J Declaring symbols in the header section Putting together the yacc context-free grammar section Understanding yacc parsers Fixing conflicts in yacc parsers Syntax error recovery Putting together a toy example Writing a parser for Jzero The Jzero lex specification The Jzero yacc specification Unicon Jzero code Java Jzero parser code Running the Jzero parser Improving syntax error messages Adding detail to Unicon syntax error messages Adding detail to Java syntax error messages Using Merr to generate better syntax error messages Summary Questions Answers 5 Syntax Trees Technical requirements Using GNU make Learning about trees Defining a syntax tree type Parse trees versus syntax trees Creating leaves from terminal symbols Wrapping tokens in leaves Working with YACC’s value stack Wrapping leaves for the parser’s value stack Determining which leaves you need Building internal nodes from production rules Accessing tree nodes on the value stack Using the tree node factory method Forming syntax trees for the Jzero language Debugging and testing your syntax tree Avoiding common syntax tree bugs Printing your tree in a text format Printing your tree using dot Summary Questions 6 Symbol Tables Establishing the groundwork for symbol tables Declarations and scopes Assigning and dereferencing variables Choosing the right tree traversal for the job Creating and populating symbol tables for each scope Adding semantic attributes to syntax trees Defining classes for symbol tables and symbol table entries Creating symbol tables Populating symbol tables Synthesizing the isConst attribute Checking for undeclared variables Identifying the bodies of methods Spotting uses of variables within method bodies Finding redeclared variables Inserting symbols into the symbol table Reporting semantic errors Handling package and class scopes in Unicon Mangling names Inserting self for member variable references Inserting self as the first parameter in method calls Testing and debugging symbol tables Summary Questions Answers 7 Checking Base Types Type representation in the compiler Defining a base class for representing types Subclassing the base class for complex types Assigning type information to declared variables Synthesizing types from reserved words Inheriting types into a list of variables Determining the type at each syntax tree node Determining the type at the leaves Calculating and checking the types at internal nodes Runtime type checks and type inference in Unicon Summary Questions Answers 8 Checking Types on Arrays, Method Calls, and Structure Accesses Checking operations on array types Handling array variable declarations Checking types during array creation Checking types during array accesses Checking method calls Calculating the parameters and return type information Checking the types at each method call site Checking the type at return statements Checking structured type accesses Handling instance variable declarations Checking types at instance creation Checking types at instance accesses Summary Questions Answers 9 Intermediate Code Generation Preparing to generate code Why generate intermediate code? Learning about the memory regions in the generated program Introducing data types for intermediate code Adding the intermediate code attributes to the tree Generating labels and temporary variables An intermediate code instruction set Instructions Declarations Annotating syntax trees with labels for control flow Generating code for expressions Generating code for control flow Generating label targets for condition expressions Generating code for loops Generating intermediate code for method calls Reviewing the generated intermediate code Summary 10 Syntax Coloring in an IDE Downloading the example IDEs used in this chapter Integrating a compiler into a programmer’s editor Analyzing source code from within the IDE Sending compiler output to the IDE Avoiding reparsing the entire file on every change Using lexical information to colorize tokens Extending the EditableTextList component to support color Coloring individual tokens as they are drawn Highlighting errors using parse results Adding Java support Summary 11 Bytecode Interpreters Understanding what bytecode is Comparing bytecode with intermediate code Building a bytecode instruction set for Jzero Defining the Jzero bytecode file format Understanding the basics of stack machine operation Implementing a bytecode interpreter Loading bytecode into memory Initializing the interpreter state Fetching instructions and advancing the instruction pointer Instruction decoding Executing instructions Starting up the Jzero interpreter Writing a runtime system for Jzero Running a Jzero program Examining iconx, the Unicon bytecode interpreter Understanding goal-directed bytecode Leaving type information in at runtime Fetching, decoding, and executing instructions Crafting the rest of the runtime system Summary Questions Answers 12 Generating Bytecode Converting intermediate code to Jzero bytecode Adding a class for bytecode instructions Mapping intermediate code addresses to bytecode addresses Implementing the bytecode generator method Generating bytecode for simple expressions Generating code for pointer manipulation Generating bytecode for branches and conditional branches Generating code for method calls and returns Handling labels and other pseudo-instructions in intermediate code Comparing bytecode assembler with binary formats Printing bytecode in assembler format Printing bytecode in binary format Linking, loading, and including the runtime system Unicon example – bytecode generation in icont Summary Questions Answers 13 Native Code Generation Deciding whether to generate native code Introducing the x64 instruction set Adding a class for x64 instructions Mapping memory regions to x64 register-based address modes Using registers Starting from a null strategy Assigning registers to speed up the local region Converting intermediate code to x64 code Mapping intermediate code addresses to x64 locations Implementing the x64 code generator method Generating x64 code for simple expressions Generating code for pointer manipulation Generating native code for branches and conditional branches Generating code for method calls and returns Handling labels and pseudo-instructions Generating x64 output Writing the x64 code in assembly language format Going from native assembler to an object file Linking, loading, and including the runtime system Summary Questions Answers 14 Implementing Operators and Built-In Functions Implementing operators Asking whether operators imply hardware support and vice versa Adding String concatenation to intermediate code generation Adding String concatenation to the bytecode interpreter Adding String concatenation to the native runtime system Writing built-in functions Adding built-in functions to the bytecode interpreter Writing built-in functions for use with the native code implementation Integrating built-ins with control structures Developing operators and functions for Unicon Writing operators in Unicon Developing Unicon’s built-in functions Summary Questions Answers 15 Domain Control Structures Knowing when you need a new control structure Defining what a control structure is Reducing excessive redundant parameters Scanning strings in Icon and Unicon Scanning environments and their primitive operations Eliminating excessive parameters via a control structure Rendering regions in Unicon Rendering 3D graphics from a display list Specifying rendering regions using built-in functions Varying graphical levels of detail using nested rendering regions Creating a rendering region control structure Summary Questions Answers 16 Garbage Collection Appreciating the importance of garbage collection Counting references to objects Adding reference counting to Jzero Generating code for heap allocation Modifying the generated code for the assignment operator Considering the drawbacks and limitations of reference counting Marking live data and sweeping the rest Organizing heap memory regions Traversing the basis to mark live data Reclaiming live memory and placing it into contiguous chunks Summary Questions Answers 17 Final Thoughts Reflecting on what was learned from writing this book Deciding where to go from here Studying programming language design Learning about implementing interpreters and bytecode machines Acquiring expertise in code optimization Monitoring and debugging program executions Designing and implementing IDEs and GUI builders Exploring references for further reading Studying programming language design Learning about implementing interpreters and bytecode machines Acquiring expertise in native code and code optimization Monitoring and debugging program executions Designing and implementing IDEs and GUI builders Summary Appendix AUnicon Essentials Running Unicon Using Unicon’s declarations and data types Declaring different kinds of program components Using atomic data types Organizing multiple values using structure types Evaluating expressions Forming basic expressions using operators Invoking procedures, functions, and methods Iterating and selecting what and how to execute Generators Debugging and environmental issues Learning the basics of the UDB debugger Environment variables Preprocessor Function mini-reference Selected keywords
Donate to keep this site alive
How to download source code?
1. Go to: https://github.com/PacktPublishing
2. In the Find a repository… box, search the book title: Build Your Own Programming Language: A programmer’s guide to designing compilers, interpreters, and DSLs for solving modern computing problems
, sometime you may not get the results, please search the main title.
3. Click the book title in the search results.
3. Click Code to download.
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.