Monday, October 29, 2012

Xtext Corner Revived

It's been a long time since I wrote about Xtext tips and tricks. However, I assembled a bunch of interesting tips and tricks while I prepared my Xtext Best Practices session for this years EclipseCon which I want to share with you.



The talk starts with a short overview on how I personally like to tackle the task of implementing a language with Xtext. If the syntax is not yet carved in stone, I usually start of with some sketched sample files to get an idea about the different use cases. In doing so it's quite important to find a concise notation for the more common cases and to be more verbose with the unusual patterns that are anticipated in the language. As soon as the first version of the syntax is settled, it's obvious to begin with the grammar declaration.

That's a task that I really like. The grammar language of Xtext is probably the most concise and information rich DSL that I ever worked with. With very few orthogonal concepts it's possible to describe how a text is parsed and in the very same breath map those parsed information to a in memory representation. This representation is called abstract syntax tree (AST) and often referred to as model. The AST that Xtext yields is strongly typed and therefore heterogeneous, but still provides generic traversal possibilities since it is based on the Eclipse Modeling Framework (EMF, also: Ed Merks Framework). So the grammar is about the concrete syntax and its mapping to the abstract syntax.


As soon as the result of the parsing is satisfying, the next step when implementing a language is scoping. Without that one, any subsequent implementation efforts are quite a waste of effort. Scoping is the utility that helps to enrich the information in the AST by creating a graph of objects (Abstract Syntax Graph, ASG). This process is often called cross linking. Thereby some nodes in the tree will be linked with others that are not directly related to them in the first place. This is one of the most important aspects of a language implementation because after the linking and scoping was done, the model is actually far more powerful from a clients perspective. Any code that is written on top of that can leverage and traverse the complete graph even if the concrete language is split across many files.

Validation is the next step and it is implemented on top of the linked ASG. While the parser and the linking algorithm already produced some error annotations on invalid input sequences, it's the static constraint checking which will find the remaining semantic problems in the input. If the files were parsed and linked successfully and the static analysis does not reveal any problems, the model can be considered valid.

Now that one can be sure that the ASG as the in-memory representation of the files fulfills the semantic constraints of the language, it's possible to implement the execution layer which is often a compiler, a code generator or an interpreter. Actually those three are all very similar. You can think of a code generator as an interpreter which evaluates a model to a string. And of course a compiler is pretty much the same as a code generator but the output is not plain text but some sequence of bytes. The important thing is that the evaluation layer should (at least in the beginning) only consider valid input models. This will dramatically simplify the implementation and that's the reason why I like to implement that on top of a checked ASG. You don't have to take all those possible violated constraints into account.

Now there is of course still the huge field of the user interface that entwines around the editor and its services like content assist, navigation or syntax coloring. However, I would usually postpone that until the language runtime works at least to some extend.

The most important message in this intro is that this is not a waterfall process. All this can be implemented in small iterations each of which is accompanied with refined sample models, unit tests (!) and feedback from potential users.

In the next days I'll wrap up some of the main points of my presentation which will be about grammar tips, some hints on scoping, validation or content assist. Stay tuned for those!

5 comments:

Hendy said...

Awesome stuff!

I wasn't aware there should be a separate ASG step involved.

Thank you! :)

Unknown said...

Thanks, Hendy.

Please note that the 'separate ASG' step is implicit with Xtext. It's often referred to as linking or reference resolution.

Regards,
Sebastian

Hendy said...

Thanks Sebastian.

Indeed, my comment is because I'm dealing with "pure Ecore models" lately.

Last time I used Xtext was about a year ago...

Do you think one should go straight to Xtext.. or start with EMF first and after the metamodel is "stable", only then incorporate Xtext and create a grammar for that metamodel? I was assuming that it's possible to use previously created EMF metamodels with Xtext. (i.e. "just add grammar")

Right now we're using plain OSGi (not Eclipse) in Karaf with EMF, and even getting EMF to work well is already a chore... I'm not sure what are the challenges of running Xtext under OSGi, not to mention sharing another technology team members.

For templating and generation we use a combination of Mustache and StringTemplate, in different parts. I also have used Xtend but I'm not sure how it applies to our use case now...

Also, do you have a guidelines or thoughts regarding StringTemplate vs Xtend ?

Unknown said...

There was a question on SO regarding Xtext and Xtend: http://stackoverflow.com/questions/10917386/linking-xtext-with-stringtemplate-code-generator
I personally prefer to use Xtend over ST since it's much more powerful, especially for the aspects that are not directly related to code gen.

And yes, it's perfectly reasonable to start with an Ecore model and create the grammar for that afterwards. You can import the EPackage into your grammar in order to refer to its EClasses. Please refer to the docs for details.

Using Xtext without OSGi is possible, too, but you'll have to deal with the dependencies manually which may be cumbersome.

Hendy said...

Thank you very much for your helpful suggestions, Sebastian! :)