DSLs in Boo: Domain Specific Languages in .NET
By Oren Eini
()
About this ebook
DSLs in Boo shows you how to design, extend, and evolve DSLs for .NET by focusing on approaches and patterns. You learn to define an app in terms that match the domain, and to use Boo to build DSLs that generate efficient executables. And you won't deal with the awkward XML-laden syntax many DSLs require. The book concentrates on writing internal (textual) DSLs that allow easy extensibility of the application and framework. And if you don't know Boo, don't worry-you'll learn right here all the techniques you need.
Purchase of the print book comes with an offer of a free PDF, ePub, and Kindle eBook from Manning. Also available is all code from the book.
Related to DSLs in Boo
Related ebooks
DSLs in Action Rating: 4 out of 5 stars4/5Building User-Friendly DSLs Rating: 0 out of 5 stars0 ratingsMetaprogramming in .NET Rating: 5 out of 5 stars5/5WPF in Action with Visual Studio 2008: Covers Visual Studio 2008 Service Pack 1 and .NET 3.5 Service Pack 1! Rating: 0 out of 5 stars0 ratingsZend Framework in Action Rating: 0 out of 5 stars0 ratingsExpress in Action: Writing, building, and testing Node.js applications Rating: 4 out of 5 stars4/5Dependency Injection Principles, Practices, and Patterns Rating: 5 out of 5 stars5/5GWT in Practice Rating: 0 out of 5 stars0 ratings.NET Core in Action Rating: 0 out of 5 stars0 ratingsGetting MEAN with Mongo, Express, Angular, and Node Rating: 5 out of 5 stars5/5Play for Java Rating: 0 out of 5 stars0 ratingsContinuous Integration in .NET Rating: 0 out of 5 stars0 ratingsDependency Injection: Design patterns using Spring and Guice Rating: 0 out of 5 stars0 ratingsApache Cordova in Action Rating: 0 out of 5 stars0 ratingsEnterprise OSGi In Action Rating: 0 out of 5 stars0 ratingsRust Servers, Services, and Apps Rating: 0 out of 5 stars0 ratingsFlex on Java Rating: 0 out of 5 stars0 ratingsTeam Foundation Server 2008 in Action Rating: 0 out of 5 stars0 ratingsDart in Action Rating: 0 out of 5 stars0 ratingsWindows Store App Development: C# and XAML: C# and XAML Rating: 0 out of 5 stars0 ratingsAdvanced iOS 4 Programming: Developing Mobile Applications for Apple iPhone, iPad, and iPod touch Rating: 0 out of 5 stars0 ratingsThird-Party JavaScript Rating: 0 out of 5 stars0 ratingsOpen Source SOA Rating: 0 out of 5 stars0 ratingsHTML5 for .NET Developers: Single page web apps, JavaScript, and semantic markup Rating: 0 out of 5 stars0 ratingsHTML5 in Action Rating: 0 out of 5 stars0 ratingsWeb Components in Action Rating: 0 out of 5 stars0 ratingsTypeScript Quickly Rating: 0 out of 5 stars0 ratingsSPA Design and Architecture: Understanding single-page web applications Rating: 0 out of 5 stars0 ratingsFront-End Tooling with Gulp, Bower, and Yeoman Rating: 0 out of 5 stars0 ratings
Programming For You
Grokking Algorithms: An illustrated guide for programmers and other curious people Rating: 4 out of 5 stars4/5Python: For Beginners A Crash Course Guide To Learn Python in 1 Week Rating: 4 out of 5 stars4/5Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer. Rating: 5 out of 5 stars5/5Microservices Architecture Handbook: Non-Programmer's Guide for Building Microservices Rating: 4 out of 5 stars4/5Coding All-in-One For Dummies Rating: 4 out of 5 stars4/5Python: Learn Python in 24 Hours Rating: 4 out of 5 stars4/5Python Programming : How to Code Python Fast In Just 24 Hours With 7 Simple Steps Rating: 4 out of 5 stars4/5SQL: For Beginners: Your Guide To Easily Learn SQL Programming in 7 Days Rating: 5 out of 5 stars5/5Python Machine Learning By Example Rating: 4 out of 5 stars4/5Learn Algorithmic Trading: Build and deploy algorithmic trading systems and strategies using Python and advanced data analysis Rating: 0 out of 5 stars0 ratingsGrokking Artificial Intelligence Algorithms Rating: 0 out of 5 stars0 ratingsLearn Python in 10 Minutes Rating: 4 out of 5 stars4/5Learn JavaScript in 24 Hours Rating: 3 out of 5 stars3/5Python Data Structures and Algorithms Rating: 5 out of 5 stars5/5Narrative Design for Indies: Getting Started Rating: 4 out of 5 stars4/5SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL Rating: 4 out of 5 stars4/5TensorFlow in 1 Day: Make your own Neural Network Rating: 4 out of 5 stars4/5Python 3 Object-oriented Programming - Second Edition Rating: 4 out of 5 stars4/5Python for Finance Cookbook: Over 50 recipes for applying modern Python libraries to financial data analysis Rating: 0 out of 5 stars0 ratingsTypeScript Quickly Rating: 0 out of 5 stars0 ratingsGrokking Simplicity: Taming complex software with functional thinking Rating: 4 out of 5 stars4/5Learn NodeJS in 1 Day: Complete Node JS Guide with Examples Rating: 3 out of 5 stars3/5Beginning C++ Programming Rating: 3 out of 5 stars3/5Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1 Rating: 5 out of 5 stars5/5
Reviews for DSLs in Boo
0 ratings0 reviews
Book preview
DSLs in Boo - Oren Eini
Copyright
For online information and ordering of this and other Manning books, please visit www.manning.com. The publisher offers discounts on this book when ordered in quantity. For more information, please contact
Special Sales Department
Manning Publications Co.
Sound View Court 3B
Greenwich, CT 06830
Email: [email protected]
©2010 by Manning Publications Co. All rights reserved.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher.
Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps.
Recognizing the importance of preserving what has been written, it is Manning’s policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine
Printed in the United States of America
1 2 3 4 5 6 7 8 9 10 – MAL – 15 14 13 12 11 10
Dedication
For Mom who told me it would take longer than I expected
Brief Table of Contents
Copyright
Brief Table of Contents
Table of Contents
Preface
Acknowledgments
About this Book
About the Author
About the Cover Illustration
Chapter 1. What are domain-specific languages?
Chapter 2. An overview of the Boo language
Chapter 3. The drive toward DSLs
Chapter 4. Building DSLs
Chapter 5. Integrating DSLs into your applications
Chapter 6. Advanced complier extensibility approaches
Chapter 7. DSL infrastructure with Rhino DSL
Chapter 8. Testing DSLs
Chapter 9. Versioning DSLs
Chapter 10. Creating a professional UI for a DSL
Chapter 11. DSLs and documentation
Chapter 12. DSL implementation challenges
Chapter 13. A real-world DSL implementation
Appendix A. Boo basic reference
Appendix B. Boo language syntax
Index
List of Figures
List of Tables
List of Listings
Table of Contents
Copyright
Brief Table of Contents
Table of Contents
Preface
Acknowledgments
About this Book
About the Author
About the Cover Illustration
Chapter 1. What are domain-specific languages?
1.1. Striving for simplicity
1.1.1. Creating simple code
1.1.2. Creating clear code
1.1.3. Creating intention-revealing code
1.2. Understanding domain-specific languages
1.2.1. Expressing intent
1.2.2. Creating your own languages
1.3. Distinguishing between DSL types
1.3.1. External DSLs
1.3.2. Graphical DSLs
1.3.3. Fluent interfaces
1.3.4. Internal or embedded DSLs
1.4. Why write DSLs?
1.4.1. Technical DSLs
1.4.2. Business DSLs
1.4.3. Automatic or extensible DSLs
1.5. Boo’s DSL capabilities
1.6. Examining DSL examples
1.6.1. Brail
1.6.2. Rhino ETL
1.6.3. Bake (Boo Build System)
1.6.4. Specter
1.7. Summary
Chapter 2. An overview of the Boo language
2.1. Why use Boo?
2.2. Exploring compiler extensibility
2.3. Basic Boo syntax
2.4. Boo’s built-in language-oriented features
2.4.1. String interpolation
2.4.2. Is, and, not, and or
2.4.3. Optional parentheses
2.4.4. Anonymous blocks
2.4.5. Statement modifiers
2.4.6. Naming conventions
2.4.7. Extension methods
2.4.8. Extension properties
2.4.9. The IQuackFu interface
2.5. Summary
Chapter 3. The drive toward DSLs
3.1. Choosing the DSL type to build
3.1.1. The difference between fluent interfaces and DSLs
3.1.2. Choosing between a fluent interface and a DSL
3.2. Building different types of DSLs
3.2.1. Building technical DSLs
3.2.2. Building business DSLs
3.2.3. Building Extensibility DSLs
3.3. Fleshing out the syntax
3.4. Choosing between imperative and declarative DSLs
3.5. Taking a DSL apart—what makes it tick?
3.6. Combining domain-driven design and DSLs
3.6.1. Language-oriented programming in DDD
3.6.2. Applying a DSL in a DDD application
3.7. Implementing the Scheduling DSL
3.8. Running the Scheduling DSL
3.9. Summary
Chapter 4. Building DSLs
4.1. Designing a system with DSLs
4.2. Creating the Message-Routing DSL
4.2.1. Designing the Message-Routing DSL
4.3. Creating the Authorization DSL
4.3.1. Exploring the Authorization DSL design
4.3.2. Building the Authorization DSL
4.4. The dark side
of using a DSL
4.5. The Quote-Generation DSL
4.5.1. Building business-facing DSLs
4.5.2. Selecting the appropriate medium
4.6. Summary
Chapter 5. Integrating DSLs into your applications
5.1. Exploring DSL integration
5.2. Naming conventions
5.3. Ordering the execution of scripts
5.3.1. Handling ordering without order
5.3.2. Ordering by name
5.3.3. Prioritizing scripts
5.3.4. Ordering using external configuration
5.4. Managing reuse and dependencies
5.5. Performance considerations when using a DSL
5.5.1. Script compilation
5.5.2. Script execution
5.5.3. Script management
5.5.4. Memory pressure
5.6. Segregating the DSL from the application
5.6.1. Building your own security infrastructure
5.6.2. Segregating the DSL
5.6.3. Considerations for securing a DSL in your application
5.7. Handling DSL errors
5.7.1. Handling runtime errors
5.7.2. Handling compilation errors
5.7.3. Error-handling strategies
5.8. Administrating DSL integration
5.9. Summary
Chapter 6. Advanced complier extensibility approaches
6.1. The compiler pipeline
6.2. Meta-methods
6.3. Quasi-quotation
6.4. AST macros
6.4.1. The unroll macro
6.4.2. Building macros with the MacroMacro
6.4.3. Analyzing the using macro
6.4.4. Building an SLA macro
6.4.5. Using nested macros
6.5. AST attributes
6.6. Compiler steps
6.6.1. Compiler structure
6.6.2. Building the implicit base class compiler step
6.7. Summary
Chapter 7. DSL infrastructure with Rhino DSL
7.1. Understanding a DSL infrastructure
7.2. The structure of Rhino DSL
7.2.1. The DslFactory
7.2.2. The DslEngine
7.2.3. Creating a custom IDslEngineStorage
7.3. Codifying DSL idioms
7.3.1. ImplicitBaseClassCompilerStep
7.3.2. AutoReferenceFilesCompilerStep
7.3.3. AutoImportCompilerStep
7.3.4. UseSymbolsStep
7.3.5. UnderscoreNamingConventionsToPascalCaseCompilerStep
7.3.6. GeneratePropertyMacro
7.4. Batch compilation and compilation caches
7.5. Supplying external dependencies to our DSL
7.6. Summary
Chapter 8. Testing DSLs
8.1. Building testable DSLs
8.2. Creating tests for a DSL
8.2.1. Testing the syntax
8.2.2. Testing the DSL API
8.2.3. Testing the DSL engine
8.3. Testing the DSL scripts
8.3.1. Testing DSL scripts using standard unit testing
8.3.2. Creating the Testing DSL
8.4. Integrating with a testing framework
8.5. Taking testing further
8.5.1. Building an application-testing DSL
8.5.2. Mandatory testing
8.6. Summary
Chapter 9. Versioning DSLs
9.1. Starting from a stable origin
9.2. Planning a DSL versioning story
9.2.1. Implications of modifying the DSL engine
9.2.2. Implications of modifying the DSL API and model
9.2.3. Implications of modifying the DSL syntax
9.2.4. Implications of modifying the DSL environment
9.3. Building a regression test suite
9.4. Choosing a versioning strategy
9.4.1. Abandon-ship strategy
9.4.2. Single-shot strategy
9.4.3. Additive-change strategy
9.4.4. Tower of Babel strategy
9.4.5. Adapter strategy
9.4.6. The great-migration strategy
9.5. Applying versioning strategies
9.5.1. Managing safe, additive changes
9.5.2. Handling required breaking change
9.6. DSL versioning in the real world
9.6.1. Versioning Brail
9.6.2. Versioning Binsor
9.6.3. Versioning Rhino ETL
9.7. When to version
9.8. Summary
Chapter 10. Creating a professional UI for a DSL
10.1. Creating an IDE for a DSL
10.1.1. Using Visual Studio as your DSL IDE
10.1.2. Using #develop as your DSL IDE
10.2. Integrating an IDE with a DSL application
10.2.1. Extending #develop highlighting for our DSLs
10.2.2. Adding code completion to our DSL
10.2.3. Adding contextual code completion support for our DSL
10.3. Creating a graphical representation for a textual DSL
10.3.1. Displaying DSL execution
10.3.2. Creating a UI dialect
10.3.3. Treating code as data
10.4. DSL code generation
10.4.1. The CodeDOM provider for Boo
10.4.2. Specific DSL writers
10.5. Handling errors and warnings
10.6. Summary
Chapter 11. DSLs and documentation
11.1. Types of documentation
11.2. Writing the Getting Started Guide
11.2.1. Begin with an introduction
11.2.2. Provide examples
11.3. Writing the User Guide
11.3.1. Explain the domain and model
11.3.2. Document the language syntax
11.3.3. Create the language reference
11.3.4. Explain debugging to business users
11.4. Creating the Developer Guide
11.4.1. Outline the prerequisites
11.4.2. Explore the DSL’s implementation
11.4.3. Document the syntax implementation
11.4.4. Documenting AST transformations
11.5. Creating executable documentation
11.6. Summary
Chapter 12. DSL implementation challenges
12.1. Scaling DSL usage
12.1.1. Technical—managing large numbers of scripts
12.1.2. Performing precompilation
12.1.3. Compiling in the background
12.1.4. Managing assembly leaks
12.2. Deployment—strategies for editing DSL scripts in production
12.3. Ensuring system transparency
12.3.1. Introducing transparency to the Order-Processing DSL
12.3.2. Capturing the script filename
12.3.3. Accessing the code at runtime
12.3.4. Processing the AST at runtime
12.4. Changing runtime behavior based on AST information
12.5. Data mining your scripts
12.6. Creating DSLs that span multiple files
12.7. Creating DSLs that span multiple languages
12.8. Creating user-extensible languages
12.8.1. The basics of user-extensible languages
12.8.2. Creating the Business-Condition DSL
12.9. Summary
Chapter 13. A real-world DSL implementation
13.1. Exploring the scenario
13.2. Designing the order-processing system
13.3. Thinking in tongues
13.4. Moving from an acceptable to an excellent language
13.5. Implementing the language
13.5.1. Exploring the treatment of statement’s implementation
13.5.2. Implementing the upon and when keywords
13.5.3. Tracking which file is the source of a policy
13.5.4. Bringing it all together
13.6. Using the language
13.7. Looking beyond the code
13.7.1. Testing our DSL
13.7.2. Integrating with the user interface
13.7.3. Limited DSL scope
13.8. Going beyond the limits of the language
13.9. Summary
Appendix A. Boo basic reference
A.1. Prerequisites
A.2. The Boo interactive shell, interpreter, and compiler
A.2.1. Expressions
A.2.2. Boolean values and Boolean expressions
A.3. Comments
A.4. Control statements
A.4.1. If statement
A.4.2. While statement
A.4.3. For statement
A.5. Types
A.5.1. Lists
A.5.2. Range
A.5.3. Arrays
A.5.4. Hashes
A.5.5. Strings
A.5.6. Slicing
A.5.7. Declaring types explicitly
A.6. Creating real programs
A.6.1. Methods
A.6.2. Classes and objects
A.6.3. Imports
A.7. Generators
Appendix B. Boo language syntax
B.1. Interesting keywords
B.2. Conditionals
B.3. Loops and iterations
B.4. Type declarations
B.5. Methods, properties, and control structures
B.6. Useful macros
Index
List of Figures
List of Tables
List of Listings
Preface
In 2007, I gave a talk about using Boo to build your own domain-specific languages (DSLs) at JAOO (http://jaoo.dk), a software conference in Denmark. I had been working with Boo and creating DSLs since 2005, but as I prepared for the talk, I was surprised to see just how easy it was to build DSLs with Boo. (I find that teaching something gives you a fresh perspective on it.)
That experience, and the audience’s response, convinced me that you don’t have to be a compiler expert or a parser wizard to build your own mini-languages. I realized that I needed to formalize the practices I had been using and make them publicly available.
One of the most challenging problems in the industry today is finding a way of clearly expressing intent in a particular domain. A lot of time and effort has been spent tackling that problem. A DSL is usually a good solution, but there is a strong perception in the community that writing your own language for a particular task is an extremely difficult task.
The truth is different from the perception. Creating a language from scratch would be a big task, but you don’t need to start from scratch. Today, there are lots of tools and plenty of support for creating languages. When you decide to make an internal DSL—one that is hosted inside an existing programming language (such as Boo)—the cost of building that language drops significantly.
I routinely build new languages during presentations (onstage, within 5 or 10 minutes), because once you understand the basic principles, it is easy. Easy enough that it deserves to be a standard part of your toolset, ready to be used whenever you spot a problem that is suitable for a DSL solution.
That 2007 JAOO talk was the start of the journey that led to the creation of this book. Finishing up this project took longer than expected, but I am very happy to say that I have been successful in what I set out to do.
This book is meant to be an actionable guide, not a theory book. I go over the theory in the relevant places, but my goal is that, by the time you are halfway through the book, you’ll be able to write your own DSLs.
Acknowledgments
Like most books, this wasn’t a solo effort. I would like to send my heartfelt gratitude to the people who made this book possible.
Thanks to Rodrigo B. de Oliveira, for creating the Boo language in the first place, and Cedric Vivier, Daniel Grunwald, Dmitry Malyshev, Greg Nagel, Joao Braganca, Martinho Fernandes, Paul Lang, and Avishay Lavie for helping to create such a wonderful language.
To the people who worked on and extended the Rhino DSL project, Simone Busoli, Nathan Stott, Jason Meckley, Craig Neuwirt, Tobias Hertkorn, Markus Zywitza, Adam Tybor, Paul Barriere, and Leonard Smith, thanks for making my job so much easier.
To everyone at Manning, especially publisher Marjan Bace and associate publisher Mike Stephens, thanks for your guidance, support, and patience. To development editor Tom Cirtin, copyeditor Andy Carroll, and proofreader Katie Tennant, thanks for being so patient with me, even when I took too long to get things done. Special thanks to technical proofreader Justin Chase for carefully reading the final manuscript once it was in production and for checking the code.
To the reviewers who read the manuscript numerous times during development, thanks for your comments and valuable feedback: Andrew Glover, Jon Skeet, Derik Whittaker, Freedom Dumlao, Justin Lee, Paul King, Matthew Pope, Craig Neuwirt, Mark Seemann, Steven Kelly, Robert Wenner, Garabed Garo
Yeriazarian, and Avishay Lavie.
About this Book
This book is meant for intermediate to advanced .NET developers who are interested in using domain-specific languages in their applications.
If you are new to language-oriented programming, this book will teach you how to create, build, and maintain your own languages.
If you are experienced with language-oriented programming, this book will give you all the practical knowledge necessary to easily build DSLs using the Boo programming language.
Note, however, that this book is focused on the practical side of building DSLs. While I talk about the theory underlying this field, I focus on practical aspects. If you are interested in learning more about DSLs, I also recommend reading Martin Fowler’s forthcoming book on the topic: http://www.martinfowler.com/bliki/DomainSpecificLanguage.html. The book isn’t finished yet, but much of the content can already be found on his site.
Roadmap
This book has five main sections.
Chapters 1–2 discuss DSLs in general, introduce the Boo language, and explain why I chose to use it as the basis for my DSL adventures.
Chapters 3–5 walk through the implementation of several different DSLs, their integration into applications, and all the various concerns you’ll have to deal with when you add a DSL to your project.
Chapters 6–7 dive into advanced language manipulation and the infrastructure required to build an industry-strength DSL.
Chapters 8–11 go into the details surrounding a production-worthy DSL implementation: building testable languages and test languages, creating versionable DSLs, working with user interfaces for the languages, and documenting them.
Chapters 12–13 talk about implementation challenges for DSLs and walk through the steps of building a full real-world DSL example.
Two appendixes conclude the book. Appendix A is a basic Boo reference, familiarizing you with how to use Boo as a programming language, while appendix B covers the Boo language syntax.
Code conventions and downloads
All source code in listings or in text is in a fixed-width font like this to separate it from ordinary text. Source code for all working examples in this book is available for download from the publisher’s website at www.manning.com/DSLsinBoo.
You can download the binary distribution of Boo from the Boo website at http://boo.codehaus.org. For more information on using Boo once you’ve downloaded it, please see page 23.
Author Online
The purchase of DSLs in Boo includes free access to a private web forum run by Manning Publications, where you can make comments about the book, ask technical questions, and receive help from the author and from other users. To access the forum and subscribe to it, point your web browser to www.manning.com/DSLsinBoo. This page provides information about how to get on the forum once you’re registered, what kind of help is available, and the rules of conduct on the forum.
Manning’s commitment to our readers is to provide a venue where a meaningful dialogue between individual readers and between readers and the author can take place. It’s not a commitment to any specific amount of participation on the part of the author, whose contribution to the book’s forum remains voluntary (and unpaid). We suggest you try asking him some challenging questions, lest his interest stray!
The Author Online forum and the archives of previous discussions will be accessible from the publisher’s website as long as the book is in print.
About the Author
Oren Eini is an independent consultant based in Israel. He is a frequent blogger at www.ayende.com/Blog/ under his pseudonym Ayende Rahien, and he’s an internationally known presenter, having spoken at conferences such as DevTeach, JAOO, Oredev, NDC, and Progressive.NET.
Oren’s main focus is on architecture and best practices that promote quality software and zero-friction development. He is the author of Rhino Mocks, one of the most popular mocking frameworks on the .NET platform, and he’s also a leading figure in other well-known open source projects, including the Castle project and NHibernate.
Oren’s hobbies include reading fantasy novels, reviewing code, and writing about himself in the third person. Oren is also a Microsoft MVP, a fact that he tends to forget when writing a bio.
About the Cover Illustration
The figure on the cover of DSLs in Boo is captioned Le Dauber,
which means art student. The illustration is taken from a 19th-century edition of Sylvain Maréchal’s four-volume compendium of regional dress customs published in France. Each illustration is finely drawn and colored by hand. The rich variety of Maréchal’s collection reminds us vividly of how culturally apart the world’s towns and regions were just 200 years ago. Isolated from each other, people spoke different dialects and languages. In the streets or in the countryside, it was easy to identify where they lived and what their trade or station in life was just by their dress.
Dress codes have changed since then and the diversity by region, so rich at the time, has faded away. It is now hard to tell apart the inhabitants of different continents, let alone different towns or regions. Perhaps we have traded cultural diversity for a more varied personal life-certainly for a more varied and fast-paced technological life.
At a time when it is hard to tell one computer book from another, Manning celebrates the inventiveness and initiative of the computer business with book covers based on the rich diversity of regional life of two centuries ago, brought back to life by Maréchal’s pictures.
Chapter 1. What are domain-specific languages?
In this chapter
Understanding domain-specific languages
Distinguishing between domain-specific language types
Why write a domain-specific language?
Why use Boo?
Examining domain-specific language examples
In the beginning, there was the bit. And the bit shifted left, and the bit shifted right, and there was the byte. The byte grew into a word, and then into a double word. And the developer saw the work, and it was good. And the evening and the morning were the first day. And on the next day, the developer came back to the work and spent the whole day trying to figure out what he had been thinking the day before.
If this story rings any bells, you’re familiar with one of the most fundamental problems in computer science. The computer does what it is told, not what the programmer meant to tell it. Often enough, what the programmer tells it to do is in direct contradiction to what the programmer meant it to do. And that’s a problem. I’ve experienced this myself many times, and I’m not particularly incompetent. How, then, did I reach that point?
1.1. Striving for simplicity
Take a look at this piece of code:
for (p = freelist, oldp = 0;
p && p != (struct chunk *)brkval;
oldp = p, p = p->next) {
if (p->len > nelems) {
p->len -= nelems;
q = p + p->len;
q->next = 0;
q->len = nelems;
q++;
return (void *)q;
}
if (p->len == nelems) {
if (oldp == 0)
freelist = p->next;
else
oldp->next = p->next;
p->next = 0;
p++;
return (void *)p;
}
}
You’re among a decided minority if you can take a single glance at this code and deduce immediately what it’s doing. Most developers would have to decipher this piece of code.
How does this connect to my difficulty in telling the computer what I want it to do? The problem is the level at which I instruct the computer what to do. If I am working down at the assembly level (or near assembly), I have to instruct the machine what to do in excruciating detail. The preceding piece of code was taken from the FreeBSD boot loader’s malloc method, and there are good reasons it looks the way it does, but writing at this level has a big cost in productivity and flexibility.
Alternatively, I can instruct the computer to do things in higher-level terms, where it can better interpret what I want it to do.
As developers, we always want to achieve the simplest, clearest way to talk to the computer, regardless of the task at hand. Different tasks (low-level memory manipulation, for example) require us to work at different levels, but we always strive for readable, easily maintainable code. Within a given context, we may need to sacrifice those goals for other, more important goals (usually performance), but that should only done very cautiously.
And when we are not working on low-level code, we will, at some point, have to leave general-purpose programming languages behind to get the desired level of clarity. Building our own languages, each focused specifically on a single task, is a great way to achieve this simplicity and clarity.
What we’d like to find are clear, concise, and simple ways to instruct the machine what we want it to do, rather than to laboriously micromanage it.
1.1.1. Creating simple code
Producing code that’s readable, maintainable, and simple is a great goal. But simple code is much harder to write than complex code. It’s easy to throw code at a problem until it goes away. Simple code, on the other hand, is what you get when you remove all the complexity from the code. That isn’t to say that it’s complicated to write simple code; it’s just that writing complex code is easy. The amount of effort it takes to decipher what a piece of code does is a good indication of how simple the code is.
Consider these two examples of getting the date in two weeks’ time. Which is more readable?
C# code: DateTime.Now.AddDays(14);
C code: time() + 1209600;
I don’t think there’s any question about which is more readable. In fact, an even better solution would be this:
DateTime.Now.AddWeeks(2);
But this isn’t part of .NET’s Base Class Library (BCL) DateTime API.
Using higher-level concepts means you can concentrate more on what you want to be done, and less on how it should be done at the machine level. When using .NET or Java, for instance, I rarely need to concern myself with memory allocation.
That’s helpful, but more often than not, you’ll need to do more interesting things than merely calculate the date two weeks from now. You’ll need to express concepts and algorithms in ways that make sense, and you’ll need to be able to use them in projects of significant size and complexity. Having clearer ways to express those concepts translates directly into a more maintainable code base, which means reduced maintenance costs and an easier time changing and growing the system.
Note
It’s considered polite to express intent in code in a manner that will make sense to the next developer who works with your code, particularly because that poor person may be you. A good suggestion that I take to heart is to assume that the next developer to touch your code will be an axe murderer who knows where you live and has a short fuse.
1.1.2. Creating clear code
Code may be clear about how it’s doing things, but it might not be clear about what it’s doing or why. Because we’re assuming that the next developer will be a vicious killer with a nasty temper, we should make it easy to figure out what we’ve done and what we meant.
We can make our code easier to understand by using intention-revealing programming and concepts taken from domain-driven design, a design approach that says that your API, code structure, and the code itself should express intent, be expressed in the language of the domain, and generally have a high correlation with the problem domain that the application is trying to solve.
Even then, we quickly reach a point where our ability to express intent is hampered by the syntax of the language that we’re using.
1.1.3. Creating intention-revealing code
Programming languages make it easy to tell the computer what it should do, but they can be less effective at expressing developer intent. For that matter, most general-purpose languages (such as C# or Java) are far less suited for a host of other tasks.
Let’s consider text processing, for example. Suppose you want to validate an Israeli phone number like this: 03-9876543. You might do this with the code shown in listing 1.1.
Listing 1.1. Validating a phone number
public bool ValidatePhoneNumber(string input)
{
if (input.Length != 10)
return false;
for (int i = 0; i < input.Length; i++)
{
if (i == 2 && input[i] != '-')
return false;
else if (char.IsDigit(input[i]) == false)
return false;
}
return true;
}
Can you look at this code and understand what input it will accept without deciphering it? If you haven’t noticed by now, I consider the need to decipher code bad.
Now let’s look at a tool that’s dedicated to text processing: regular expressions. Validating the phone number using a regular expression is as simple as the one-liner in listing 1.2.
Listing 1.2. Validating a phone number using regular expressions
public bool ValidatePhoneNumber(string input)
{
return Regex.IsMatch(input, @^\d{2}-\d{7}$
);
}
In this case, the use of a specialized tool for text processing has made the intent much easier to understand, but you need to understand the tool. Anyone who knows regular expressions can glance at the code and figure out what input it will accept.
Another approach is to use masked input to define a mask for certain input, which would result in code like this:
Mask.Validate(input, ##-########
);
The challenges of specialized tools
Regular expressions are notorious for being write-only tools because the results can be difficult to read, particularly if you don’t write them carefully.
Using special tools to handle specialized tasks requires that you understand how to use the tools. If you don’t understand regular expressions, and I hand you listing 1.2, how will you deal with it?
We’ll touch on this topic later in the book; most of chapter 11 is dedicated to techniques that can help people come to grips with custom languages.
Assuming you know that # is the character for matching a numeral, this is even easier to understand than the regular expression approach. (The .NET framework doesn’t have any masked-input validation facilities beyond WinForms’ MaskedTestBox.)
Querying and filtering are other situations where code is no longer sufficient. Let’s say we want to retrieve data for all the customers in London. This isn’t a query that you’ll want to handle by yourself. Building an optimized query plan, instructing the data store which section of the data should be scanned, building manual filters for each individual query ... all of that can be quite tedious. It’s quite a complex task, particularly if you want to handle it efficiently and in a transaction-safe manner.
It’s far easier to send a SQL statement to the database and let it sort out how it wants to handle the request on its own. This allows us to speak at a much higher level of abstraction and ignore the details of how the data is retrieved.
So far, I have been consciously avoiding the use of the term domain-specific languages, but it’s time we started discussing it.
1.2. Understanding domain-specific languages
Martin Fowler defines a domain-specific language (DSL) as a computer language that’s targeted to a particular kind of problem, rather than a general purpose language that’s aimed at any kind of software problem
(http://martinfowler.com/bliki/DomainSpecificLanguage.html).
Domain-specific languages aren’t a new idea by any means. DSLs have been around since long before the start of computing. People have always developed specialized vocabularies for specialized tasks. That’s why sailors use terms like port and starboard and are not particularly afraid of gallows. Doctors similarly have a vocabulary that is baffling to the uninitiated, and weather forecasters have specific terms for various types of clouds, winds, and storms.
Regular expressions and SQL are similarly specialized languages:
Both are languages designed for a narrow domain—text processing and database querying, respectively.
Both are focused on letting you express what you mean, not how the implementation should work—that’s left to some magic engine in the background.
The reason these languages are so successful is that the focus they offer is incredibly useful. They reduce the complexity that you need to handle, and they’re flexible in terms of what you can make them do.
1.2.1. Expressing intent
From the beginning of computer programming, it was recognized that trying to express what you mean in natural language isn’t a viable approach. A clearer, much more focused, way to express intent was needed—that’s why we have code, which is unambiguous (most of the time) and easy for the computer to understand.
But while code may be unambiguous to a computer, it can certainly be incomprehensible to people. Understanding code can be a big problem. You tend to write the code once, and read it many more times. Clarity is much more important than brevity. By ensuring that our code is readable, clear, and concise, we make an investment that will benefit us both in the immediate future (producing software that is simpler and easier to change) and in the long term (providing easier maintainability and a clearer path for extensibility and growth).
But, as we’ve seen, code isn’t always the clearest way to express intent. This is where intention-revealing programming comes into play, and one of the tools in that category is creating a DSL to clearly and efficiently express intent and meaning in code.
1.2.2. Creating your own languages
Most people assume that creating your own computer language is a fairly complex matter. This is because most of the literature out there assumes that you want to build a full-blown general language. This puts a lot of burden on you, as the language author.
It isn’t simple to create a general language, but it’s certainly possible. It just isn’t something you’d want to do on a rainy afternoon or over a long weekend. The experience is out there, but the initial cost remains nontrivial.
But you don’t always have to write your own language from scratch. You can utilize an existing language (called the host language or base language) to provide built-in language and runtime facilities, and then add more syntax and behavior on top of it. A popular example is Ruby on Rails, which is, in essence, a DSL for building web applications.
Building your own compiler
I stated that building your own compiler or interpreter isn’t hard. This is true, to some extent. The main difficulties in going that route are the scope of the work and the fact that most of the work is arcane at worst and tedious at best. This is particularly true if you want to write a full-fledged language.
Writing a general-purpose language is a big task. You need to deal with the details of the syntax and worry about creating an execution engine (for interpreted languages) or generating IL (Intermediate Language) or machine code (for compiled languages). I don’t consider it to be a complex task, but it is a big one.
Building a single-purpose language is a far easier (and smaller) task, because the scope is much reduced. A good example of that can be seen in rSpec, a Ruby library for creating behavior-driven specifications. One of its capabilities is a story runner that accepts specifications written in English (http://blog.davidchelimsky.net/articles/2007/10/21/story-runner-in-plain-english). I suggest looking at how it works. It’s quite ingenious in its simplicity.
The problem with that approach for natural language processing is that you hit its limits quickly. It works only when the statements follow a rigid format, so although it may look like natural language, it is, in fact, nothing of the sort. If you want to make the language more intelligent, you have to accept the additional complexity of building a more full-featured language.
I once consulted for a company that had built a DSL for defining business rules. They had over 100,000 lines of C++ code that they needed to maintain, and performance was a big concern. It became apparent that they could have switched the whole thing to an internal DSL (a DSL that’s hosted in an existing language, which we’ll talk about shortly) and saved quite a bit of time, effort, and pain.
The tools for language-oriented programming had been improving for quite a while, but it was the introduction of Ruby on Rails—a wildly popular DSL that was recognized as such—that really started to get things rolling.
1.3. Distinguishing between DSL types
In the world of DSLs, we often distinguish between several types:
External DSLs
Graphical DSLs
Fluent interfaces
Internal or embedded DSLs
We’ll discuss those types in turn, and look at their properties and uses.
1.3.1. External DSLs
When we talk about external DSLs, we’re discussing DSLs that exist outside the confines of an existing language. SQL and regular expressions are two examples of external DSLs.
Building an external DSL means starting work from a blank slate. You need to define the syntax and required capabilities, and start working from there. This means that you have a lot of power in your hands, but you also need to handle everything yourself. And by everything,
I do mean everything, from defining operator precedence semantics to specifying how an if statement works.
Common tools for building external DSLs include Lex, Yacc, ANTLR, GOLD Parser, and Coco/R, among others. Those tools handle the first stage, translating text in a known syntax to a format that a computer program can consume to produce executable output. The part about producing executable output
is usually left as an exercise for the reader. There are few tools to help you with that.
Note
One tool that comes to mind for producing executable output is the Dynamic Language Runtime (DLR), a Microsoft project that aims to give us dynamic languages in .NET. One basic underpinning of this project is a set of classes that specify the behavior of a program (the abstract syntax tree, or AST) that the DLR can turn into an executable. There are other such tools, for sure, but the DLR is the only one I know of in the .NET space.
Building rich external DSLs is similar to building a general purpose language. You need to understand compiler theory before starting on that path. If you’re interested in that, I recommend reading Compilers: Principles, Techniques, and Tools, by Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman, which is a classic book on the subject.
This book focuses on building languages on top of existing languages, not starting from scratch and going the whole way. Nevertheless, some background in compiler theory is certainly helpful, even when building a DSL that uses an existing language, so let’s take a quick look at the process of building a language from scratch.
First, the grammar and syntax are often defined using a notation such as BNF (Backus-Naur Form) or a derivative, and then you use a tool to generate a parser. Once you’ve done that, you can run the parser over a code string, which will produce an abstract syntax tree (AST), which is the representation of the original string as an AST based on your definition of the language.
An example will make this clearer. Consider the code in listing 1.3, written in a fictional language.
Listing 1.3. An if statement in a fictional language
if 1 equals 2:
print 1 = 2
else:
print 1 != 2
The AST that was generated from the code in listing 1.3 is shown in figure 1.1.
Figure 1.1. A hierarchical representation of the AST generated from a simple if statement
You can then either build an interpreter that understands this AST and can execute it, or output an executable from the AST. Another common approach is to transform the AST into a semantic model that’s easier