I’ve a dream to build an object oriented language. Last year, I built an tiny modeling language for Mealy and Moore automatons. It felt great to develop my own little grammar with ANTLR and seeing it working. You can find this project (AutomatonLang) in my github repository. Now as I’m beginning to study computer science, I thought about a new project (after I finished my work for the abipage project) to work on in my spare time, a project that has more to do with my field of study then building little websites, etc. And so some thoughts popped up in my mind. One of it was to realise on of my dreams – to build an programming language. It sounds annoying for most of my contemporaries, but for me it felt just just great thinking about designing and developing a new programming language with grammars, parsers, documentation and everything. It’s now my main spare time project for as long as it takes, I assume several months or more, and I hope it’ll be as much fun as I’m expecting it to be and that I’m learning all sorts of skills (I’m optimistic). I’m going to write, in this and in following blog posts, about thoughts and ideas on programming languages and my attempt to build one on my own. I hope they are helpful to you if you’re planning the same type project. Up to this day I’ve only ideas about the design in my mind and haven’t written any line of code or grammar – this post is about my ideas and the general decisions I’ve made.
General decisions I’ve made
These general decisions are important to specify the frame of my project. I skip the decision whether to build a functional or object oriented language, as I love object orientation (I learned programming with Java and have not much experience with functional programming languages) and therefore their was no real decision.
It’s relevant to know the purpose of the language you’re going to create. As you may guess, the language I’m creating has only one purpose: to learn and improve skills in my field of study and especially in the field of parsers and compilers. I don’t plan to use the language in serious projects, so performance or usability aren’t as important as the implementation of own ideas and algorithms (which hopefully help me improving the performance, to make the language usable for tiny scripts).
The base language
I’m going to use Java as the base language, the language you implement parser and compiler/interpreter in. Why? Because:
- I know Java well
- I programmed lot’s of Java code during the last three years and so I know Java well, well enough to use it in a project that’s going to have several thousand lines of code.
- Java is fast
- The performance of the language I’m creating isn’t really important, as I’ve stated before, but it’s important for me to use a fast base language to ensure that I’ve the chance to write an language implementation having tolerably performance. Java is fast (in some cases even faster than compiled C++ Code, due to run-time optimizations).
- Java is platform independent
- I (as well as my mates) am working on different devices and on different operating systems (not yet, but I’m going to use a Linux distribution), so it’s important to use a language which runs equally on several platforms.
The parser generator
This decision is very critical, as the parser (and accompanying lexer) is the foundation of the project. And at the beginning the thing to work on, developing grammar files. Some may say, that I could also write a parser and lexer on my own, without a generator. It’s right: I could. But I’ve already written some tiny parsers on my own and my experience is that it’s a mess to do it right. Maybe I’m capable of writing one when I gained more experience and the field of language processing but for this project I’m going to use a parser generator to get the parser done.
I’ve chosen ANTLR v3. Here are some reasons why and especially why I chose version 3 and not the current version 4:
- I’m having some little experience developing with ANTLR
- I used ANTLR before in my AutomatonLang project, it’s not a very strong argument for ANTLR but it’s one to mention.
- The documentation is great
- Not the one you find at the Internet – but the two books about ANTLR and the task of creating a programming language: The defintive ANTLR Reference and Language Implementation Patterns. Both books are written by the creator of ANTLR, Terence Parr, who does a great job explaining ANTLR and how to work with it (with lot’s of examples and great detail) in the first book and how to build parser and lexer by hand and creating a programming language in the second one. I bought both books, the first before creating the project named above and the second some days ago when I saw that I need to know more about developing a language then I already knew, ready the first book.
- ANTLRWorks does a great job
- ANTLRWorks was an IDE for ANTLR with syntax highlighting, code completion, grammar visualizations and a grammar debugger with which you’re able to step through your grammar rules for a given input. The new version two is know not a standalone IDE, but a plugin for Netbeans 7.3. This program really helps developing your grammar and is much more than only a nice editor like notepad++.
- It’s easy to use
- The concept of ANTLR, to allow combined lexer and parser generation, makes the usage, in combination with the arguments noted before, simple (in comparison with other parser generators like YACC or JavaCC, which a thought using before I stumbled upon ANTLR). Other parser generators also do a great job, I chose ANTLR as it’s concept, philosophy and documentation fits my needs, not because all other alternatives are bad, so it’s worth to make your own experiences, a I did.
- I choose ANTLR v3 because it’s allows the usage of ASTs
- When I bought the book Language Implementation Patterns, I mentioned above, I also bought The Definitive ANTLR 4 Reference as I planned to use the new version of ANTLR (which was released just at the end of last year). ANTLR 4 has a new concept of creating language applications, normally (and with ANTLR 3) you’re going to create first a parser which parses text and emits an AST (Abstract Syntax Tree) which is then refactored and parsed by a tree parser building the tree walkers and visitors containing the grammar actions and building up the data structures of you application, now with ANTLR 4, you also create an input parser, but it doesn’t emit a real AST, it emits only a parser tree on which you than work with special, event driven (you write a method for each rule and this method is called when the parser tree parser visits the tree of this rule), tree visitors and walkers. The problem: I saw nowhere in the hole book, how to for example refactor ASTs or how to even proper create some structure abstracting the input. ASTs are important, as they only contain the relevant information, the semantic, of the input. As you have a separate tree grammar on which you build your application, you’re able to modify your parser grammar without changing the code relying on the tree grammar, if the parser emits the same AST.
- There’s another important disadvantage of this new concept: The book Language Implementation Patterns is based on ANTLR 3 and therefore covers the usage of ASTs but not of the listeners, so the book is merely useless if you try to develop with ANTLR 4. And as ASTs are normally used by other parser generators and tools, you’re also more likely to find other literature for this topic than for the ANTLR 4’s concept.
First, I thought about a language being only pseudo object oriented, therefore the abbreviation POOL for Pseudo Object Oriented Language. But the more I thought about the design of the language, the more it came clear, that the language will be purely object oriented, with everything being an object (see below). And so POOL is know probably the abbreviation for Purely Object Oriented Language, but the meaning of POOL is able to change during development.
Dynamic or static typed?
The language is dynamic typed, the reasons are the following:
- It’s easier to implement
- It’s easier to create a dynamic typed language than to create a static a typed language, as you don’t have to enforce types during the parse time and also don’t need to specify type annotations in the input and tree parser.
- Dynamic typing is cool
- Dynamic typing allows me to be able to implement cool features without caring about types. I’m even able to don’t implement real types and classes at all without loosing the possibility of creating an object oriented language. My language of choice, Ruby, is dynamic typed and I like its features like monkey patching (extend built in class like String with your own functions) and hope to be able to implement some in a dynamic typed language.
I’m leaving my ideas for the grammar of the language for another post, as I’ve already written more about the decisions I made, then I originally expected. A last remark: All code for this project will be on github and licensed under the GLP v3.
(Sorry for my faulty english, I’m going to proof read this text as soon as possible…)