Generate EBNF-Descriptions for Xtext-based DSL

Creating DSLs in Xtext is easy and works out of the box. However, when it comes to documentation,  things get tricky. Your DSL should be used by other programmers to model a system that maybe didn’t take part in defining the language. So how can you describe the usage of your language to these guys?

Of course there is the possibility to provide code-snippets as examples and the tool support Xtext generated for your languages eases its use. However, sometimes a more complete documentation is useful. Using the original .xtext file for that purpose is not the best way, because its crowded with internal information for the parser that the user of the DSLs doesn’t care about. Furthermore, only few people are used to the Xtext-Language. The good old way is to provide a BNF or EBNF description of the language.

In this posts I will describe how to automatically generate EBNF descriptions for your DSL.

EBNF-Grammar defined using Xtext

In order to build the generator that creates the EBNF documentation of our DSL, the first thing we need to do is define a formal grammar of our target language ISO-EBNF. This can be done – of course – by means of Xtext.

There are a lots of different EBNF dialects out there In this example I picked the one according to ISO/IEC 13250-6. Its operators and their precendence is as follows (from highest to lowest):

  1. quotes ('...' and "..."), comments ((*...*)), groups ((...)), optional group ([...]), repetition ({...} or {...}-), special sequences (?...?)
  2. repetition-symbol (*)
  3. except-symbol (-)
  4. concatenate-symbol (,)
  5. definition-separator-symbol (|)
  6. defining-symbol (=)
  7. terminator-symbol (;)

That all we need to known to start creating our .xtext language definition. There we go:

grammar org.eclipse.xtext.Ebnf hidden (WS, COMMENT)
// Grammar definition of EBNF according to ISO/IEC 13250-6, see: http://www.iso.org/

generate ebnf "http://www.eclipse.org/xtext/Ebnf"

import "http://www.eclipse.org/emf/2002/Ecore" as ecore

EbnfGrammar:
	{EbnfGrammar} // Enforce object creation
	rules+=ProductionRule*;

ProductionRule:
	name=NAME '=' expr=Expression ';';

Expression:
	Expression_Alternative;

// The operator | defines alternatives. It has got the lowest priority and is left associative
Expression_Alternative returns Expression:
	Expression_Concatenation ({Expression_Alternative.elements+=current} ('|' elements+=Expression_Concatenation)+)?;

// The operator , concatenates two or more symbols. It has got a higher priority than the alternative and is left associative, too
Expression_Concatenation returns Expression:
	Expression_Exception ({Expression_Concatenation.elements+=current} (',' elements+=Expression_Exception)+)?;

// The operator - defines an exception and substracts a symbol from a set. It has got the highest priority of all operators
Expression_Exception returns Expression:
	Expression_Repetition ({Expression_Exception.left=current} '-' right=Expression_Repetition)*;

// The operator * defines an exception and substracts a symbol from a set. It has got the highest priority of all operators
Expression_Repetition returns Expression:
	Expression_Terminal |
	({Expression_Repetition} repetitions=NUMBER '*' expr=Expression_Terminal);

// The parts of a rule can be
// - a reference pointing to another rule
// - a terminal symbol
// - a special sequence (non standardized)
// - a group (repetition, optional or simple group)
Expression_Terminal returns Expression:
	Expression_Rule_Reference |
	Expression_Terminal_Symbol |
	Expression_Repetition_Group |
	Expression_Optional_Group |
	Expression_Group |
	Expression_Special_Sequence;

// Reference another rule by its name
Expression_Rule_Reference returns Expression:
	{Expression_Rule_Reference} rule=[ProductionRule|NAME];

// A special sequence
Expression_Special_Sequence returns Expression:
	{Expression_Special_Sequence} text=SPECIAL_SEQUENCE;

// A terminal symbol is a string which can be enclosed within ' or "
Expression_Terminal_Symbol returns Expression:
	{Expression_Terminal_Symbol} text=TERMINAL_SYMBOL;

// An expression enclosed within curly brackets means that this part can be repeated 0...n times. If the brackets are followed by an - the minimum number of repetitions is 1
Expression_Repetition_Group returns Expression:
	{Expression_Repetition_Group} '{' expr=Expression '}' (atLeastOne?='-')?;

// An expression enclosed within brackets means that this part is optional
Expression_Optional_Group returns Expression:
	{Expression_Optional_Group} '[' expr=Expression ']';

// Parts of an expression can be put in paranthesis to influence the parsing order
Expression_Group returns Expression:
	{Expression_Group} '(' expr=Expression ')';

// Comments in EBNF are placed within (* and *)
terminal COMMENT returns ecore::EString:
	'(*' -> '*)';

// A number >0
terminal NUMBER returns ecore::EInt:
	'1'..'9' ('0'..'9')*;

// A name of a rules (need to start and end with a non whitespace character). This differs from the original EBNF as no blank are allowed in order to simplify this grammar definition
terminal NAME returns ecore::EString:
	('a'..'z' | 'A'..'Z' | '_')+;

// The contents of a terminal symbol, at least one character long
terminal TERMINAL_SYMBOL returns ecore::EString:
	( '"'  (!'"')+ '"'  ) |
	( '\'' (!'\'')+ '\'' );

// A (non-standardized) special sequence, at least one character long
terminal SPECIAL_SEQUENCE returns ecore::EString:
	( '?'  (!'?')+ '?'  );

// Whitespace - Anything that is ignored. Note that this grammar regards blanks as whitespace
terminal WS:
	(' '|'\t'|'\r'|'\n')+;

Please note that in contrast to regular ISO-EBNF this grammar implementation doesn’t allow to use blanks and other special characters in names of production rules. This simplifies the grammar definition in Xtext. I personally think that this is not a major drawback – I you do, you need to adapt the WS and the NAME rules to overcome this limitation.

Value Converter

In order to ease model transformations on EBFN-files we should add some value converters strips of unnescessary escape characters from the terminals SPECIAL_SEQUENCE, TERMINAL_SYMBOL and maybe even COMMENT.

What we need to do is to register a value converter first:

public class EbnfRuntimeModule extends org.eclipse.xtext.ebnf.AbstractEbnfRuntimeModule {
	...
	@Override
	public Class bindIValueConverterService() {
	  return EbnfValueConverterService.class;
	}
	...
}

The following snippets show the implementation of one of the value converters in the class EbnfValueConverterService:

public class EbnfValueConverterService extends AbstractDeclarativeValueConverterService {
	...
	public static class TerminalSymbolValueConverter extends AbstractValueConverter {
		public String toString(String value) {
			boolean containsDoubleQuots = value.contains("\"");
			boolean containsSingleQuots = value.contains("\'");
			if (containsDoubleQuots && containsSingleQuots) {
				throw new ValueConverterException("Cannot quote terminal symbol as it contains double and single quotes.", null, null);
			}
			if (containsDoubleQuots) {
				return '\'' + value + '\''; // Use single quotes
			} else {
				return '"' + value + '"'; // Use double quotes
			}
		}

		public String toValue(String string, AbstractNode node) {
			if (string == null || string.length() < 2) {
				throw new ValueConverterException("Cannot unquote string", node, null);
			} else {
				return string.substring(1, string.length() - 1);
			}
		}
	}

	@Inject
	private TerminalSymbolValueConverter terminalSymbolValueConverter;

	@ValueConverter(rule = "TERMINAL_SYMBOL")
	public IValueConverter TERMINAL_SYMBOL() {
		return terminalSymbolValueConverter;
	}

	...
}

Formatter

The last think we have to do is to provide a formatter. Such formatters can be used as post processors for generated files, too. Providing such a formatter therefore will be useful later on when we will generate EBNF files automatically. If we modify the existing one that the Xtext-new Project wizard created we do not need to register that formatter in the RuntimeModule (it is already registered in the generated AbstractRuntimeModule).

Here is the code:

public class EbnfFormatter extends AbstractDeclarativeFormatter {

	@Override
	protected void configureFormatting(FormattingConfig c) {
		EbnfGrammarAccess grammar = (EbnfGrammarAccess) getGrammarAccess();

		// Break lines automatically after 120 characters
		c.setAutoLinewrap(120);

		c.setNoSpace().before(grammar.getProductionRuleAccess().getSemicolonKeyword_3());
		c.setNoSpace().before(grammar.getExpression_Repetition_GroupAccess().getAtLeastOneHyphenMinusKeyword_4_0());
		c.setNoSpace().before(grammar.getExpression_ConcatenationAccess().getCommaKeyword_1_1_0());

		// Separate production rules from each other by a blank line
		c.setLinewrap(2).after(grammar.getProductionRuleRule());
		// Indent expression part of each production rule
		c.setLinewrap().after(grammar.getProductionRuleAccess().getEqualsSignKeyword_1());
		c.setIndentation(grammar.getProductionRuleAccess().getEqualsSignKeyword_1(), grammar.getProductionRuleAccess().getSemicolonKeyword_3());

		// Indent all grouped parts
		c.setIndentation(
				grammar.getExpression_GroupAccess().getLeftParenthesisKeyword_1(),
				grammar.getExpression_GroupAccess().getRightParenthesisKeyword_3()
				);
		c.setLinewrap().after(grammar.getExpression_GroupAccess().getLeftParenthesisKeyword_1());
		c.setLinewrap().before(grammar.getExpression_GroupAccess().getRightParenthesisKeyword_3());
		c.setIndentation(
				grammar.getExpression_Optional_GroupAccess().getLeftSquareBracketKeyword_1(),
				grammar.getExpression_Optional_GroupAccess().getRightSquareBracketKeyword_3()
				);
		c.setLinewrap().after(grammar.getExpression_Optional_GroupAccess().getLeftSquareBracketKeyword_1());
		c.setLinewrap().before(grammar.getExpression_Optional_GroupAccess().getRightSquareBracketKeyword_3());
		c.setIndentation(
				grammar.getExpression_Repetition_GroupAccess().getLeftCurlyBracketKeyword_1(),
				grammar.getExpression_Repetition_GroupAccess().getRightCurlyBracketKeyword_3()
				);
		c.setLinewrap().after(grammar.getExpression_Repetition_GroupAccess().getLeftCurlyBracketKeyword_1());
		c.setLinewrap().before(grammar.getExpression_Repetition_GroupAccess().getRightCurlyBracketKeyword_3());
	}
}

Convert Xtext-Grammars to EBFN

Now that we have created the grammar for EBNF we can start with the most important step. That’s what we are about to create:

Workflow

Workflow

This process is controlled by the MWE-Engine. The first thing we need is the .mwe2 file:

module workflow.EbnfGenerator

import org.eclipse.emf.mwe.utils.*

var modelPath = "src/model/"
var targetFilename = "src-gen/grammar.ebnf"

var xtextGrammarSlot = "xtextGrammar"
var ebnfGrammarSlot = "ebnfGrammar"

Workflow {
	// Load Xtext grammar
	component = org.eclipse.xtext.mwe.Reader {
		path = modelPath

		// register the EBNF-language and the Xtext-Language
		register = org.eclipse.xtext.EbnfStandaloneSetup {}
		register = org.eclipse.xtext.XtextStandaloneSetup {}

		load = {
			slot = xtextGrammarSlot
			type="Grammar"
		}
	}

	// Transform Xtext to EBNF
	component = org.eclipse.xtend.XtendComponent {
		metaModel = org.eclipse.xtend.typesystem.emf.EmfRegistryMetaModel {}
		metaModel = org.eclipse.xtend.typesystem.emf.EmfMetaModel {
			metaModelFile =  [Path to the ebnf.ecore file]
		}
		invoke = "templates::XtextToEbnf::transform(${xtextGrammarSlot})"
		outputSlot = ebnfGrammarSlot
	}

	// Write resulting EBNF-File
	component = org.eclipse.emf.mwe.utils.Writer {
		modelSlot = ebnfGrammarSlot
		uri = targetFilename
	}
}

The Model-to-Model Transformation

The first step in this workflow loads the .xtext file of the source language and represents it as a model in memory. The corresponding meta-model is provided by Xtext. Then the source model is converted to another model based on the EBNF meta-model. This model-to-model transformation is implemented in Xtend. Finally the converted model is written to a file in EBNF-syntax.

The model-to-model transformation performs the following steps:

  • Xtext Alternatives: a | b | c → EBNF: a | b | c
  • Xtext Unordered Groups: a & b & c → EBNF: ( (a,b,c) | (a,c,b) | (b,a,c) | (b,c,a) | (c,a,b) | (c,b,a) )
  • Xtext Groups: (...) → EBNF: (...)
  • Xtext Concatenation: a b c → EBNF: a,b,c
  • Xtext Actions: {...} → No EBNF element
  • Xtext Keywords: "keyword" → EBNF: "keyword"
  • Xtext Rule Call: a → EBNF: a
  • Xtext Assignment: a=b → EBNF: b
  • Xtext Cross Reference: [a|b] → EBNF: b
  • Xtext Negated Token: !a → EBNF: (?any character? - a)
  • Xtext Until Token: a->b → EBNF: a (?any character? - b) b
  • Xtext Wildcard: . → EBNF: ?any character?
  • Xtext Character Range: a..c → EBNF: (a|b|c)
  • Xtext Enum Literal Declaration: a b c → EBNF: a,b,c

This is the source of the model transformation:

extension org::eclipse::xtend::util::stdlib::io;

extension templates::PrintEbnfTree;
extension templates::OptimizeEbnf;

List[ebnf::EbnfGrammar] transform(List[xtext::Grammar] grammars) :
	info("Starting Xtext to EBNF transformation") ->
	grammars.collect(grammar|grammar.transform());

create ebnf::EbnfGrammar transform(xtext::Grammar grammar) :
	info("Transforming grammar: " + grammar.name + " (Rules: " + grammar.rules.size + ")") ->
	grammar.rules.collect(rule|rule.
		transformRule(this).
		register(this)) ->
	optimize();

// ---------------------------------------------------------------------
// HELPER
// ---------------------------------------------------------------------

// Registers a rule in the grammar if it is not already contained
// Returns whether the rule was not known before and was successfully registered
private boolean register(ebnf::ProductionRule rule, ebnf::EbnfGrammar ebnfGrammar) :
	if (!ebnfGrammar.rules.contains(rule)) then (
		ebnfGrammar.rules.add(rule) ->
		true
	) else (
		false
	);

// Transforms all elements of the given list and returns all non null results
private List[ebnf::Expression] transformAll(List[xtext::AbstractElement] elements, ebnf::EbnfGrammar ebnfGrammar) :
	elements.collect(element|element.transform(ebnfGrammar)). // transform all contained elements
		reject(e|e==null); // reject all null elements

// Transforms the given element and wraps it (if nescessary) into optional or repetition groups
private ebnf::Expression transform(xtext::AbstractElement xtextElement, ebnf::EbnfGrammar ebnfGrammar):
	let transformedElement = xtextElement.internalTransform(ebnfGrammar):
		if (xtextElement.cardinality=="*") then (
			let repetitionGroup = new ebnf::Expression_Repetition_Group :
				repetitionGroup.setExpr(transformedElement) ->
				repetitionGroup.setAtLeastOne(false) ->
				repetitionGroup

		) else ( if (xtextElement.cardinality=="+") then (
			let repetitionGroup = new ebnf::Expression_Repetition_Group :
				repetitionGroup.setExpr(transformedElement) ->
				repetitionGroup.setAtLeastOne(true) ->
				repetitionGroup

		) else ( if (xtextElement.cardinality=="?") then (
			let optionalGroup = new ebnf::Expression_Optional_Group :
				optionalGroup.setExpr(transformedElement) ->
				optionalGroup

		) else ( // default cardinality (1)
			transformedElement
		)));

// Creates a group that contains the given element
private ebnf::Expression_Group wrapIntoGroup(ebnf::Expression child) :
	let group = new ebnf::Expression_Group:
		group.setExpr(child) ->
		group;

// Create a concatenation that contains all the given elements
private ebnf::Expression_Concatenation createConcatenation(List[ebnf::Expression] elements) :
	let concatenation = new ebnf::Expression_Concatenation:
		concatenation.elements.addAll(elements) ->
		concatenation;

// Create alternatives that contains all the given elements
private ebnf::Expression_Alternative createAlternative(List[ebnf::Expression] elements) :
	let alternative = new ebnf::Expression_Alternative :
		alternative.elements.addAll(elements) ->
		alternative;

// A helper method that returns the unicode value of the given string's first character
private int unicodeValueForCharacter(String s):
	JAVA templates.Utils.unicodeValueForCharacter(java.lang.String);
// A helper method that returns a string containing the character whose's unicode value was given
private String characterFromUnicodeValue(int unicodeValue):
	JAVA templates.Utils.characterFromUnicodeValue(java.lang.Integer);

// ---------------------------------------------------------------------
// TRANSFORMATION
// ---------------------------------------------------------------------

// RULES

private Void transformRule(xtext::AbstractRule rule, ebnf::EbnfGrammar ebnfGrammar) :
	error("No transformation defined to rule type: " + rule.metaType.name) -> null;

private create ebnf::ProductionRule transformRule(xtext::EnumRule xtextRule, ebnf::EbnfGrammar ebnfGrammar) :
	info("Production rule: " + xtextRule.name) ->
	this.setName(xtextRule.name) ->
	this.setExpr(xtextRule.alternatives.transform(ebnfGrammar));

private create ebnf::ProductionRule transformRule(xtext::ParserRule xtextRule, ebnf::EbnfGrammar ebnfGrammar) :
	info("Parser rule: " + xtextRule.name) ->
	this.setName(xtextRule.name) ->
	this.setExpr(xtextRule.alternatives.transform(ebnfGrammar));

private create ebnf::ProductionRule transformRule(xtext::TerminalRule xtextRule, ebnf::EbnfGrammar ebnfGrammar) :
	info("Terminal rule: " + xtextRule.name) ->
	this.setName(xtextRule.name) ->
	this.setExpr(xtextRule.alternatives.transform(ebnfGrammar));

// FALLBACK ERROR CASES FOR DYNAMIC DISPATCH
private Void internalTransform(xtext::AbstractElement element, ebnf::EbnfGrammar ebnfGrammar) :
	error("No transformation defined for element type: " + element.metaType.name) -> null;

// An alternative in a terminal rule is is transformed to an alternative
private internalTransform(xtext::Alternatives xtextAlternatives, ebnf::EbnfGrammar ebnfGrammar) :
	createAlternative(xtextAlternatives.elements.transformAll(ebnfGrammar));

// There is no unordered group in EBNF so we need to roll out all possible orders
private internalTransform(xtext::UnorderedGroup xtextUnorderedGroup, ebnf::EbnfGrammar ebnfGrammar) :
	wrapIntoGroup(
		createAlternative(
			createGrouping({}, xtextUnorderedGroup.elements, ebnfGrammar)
		)
	);
private List[ebnf::Expression_Group] createGrouping(List[xtext::AbstractElement] processed, List[xtext::AbstractElement] unprocessed, ebnf::EbnfGrammar ebnfGrammar) :
	if (unprocessed.isEmpty) then (
		// Transform the elements in the list.
		let elements = processed.collect(e|e.transform(ebnfGrammar)) :
			// Return a list that contains the grouped concantenation of the transformed elements
			{wrapIntoGroup(createConcatenation(elements))}
	) else (
		unprocessed.collect(e|
			createGrouping(
				processed.select(e2|true).add(e).toList(), // Clone list and append e
				unprocessed.without({e}).toList(), // Clone list and reject e
				ebnfGrammar)
		).flatten()
	);

// A group in a parser rule is transformed to a concatenation
private internalTransform(xtext::Group xtextGroup, ebnf::EbnfGrammar ebnfGrammar) :
	if (xtextGroup.elements.size < 2) then ( 		error("GROUP IS SMALLER 2") 	) else ( 		wrapIntoGroup( 			createConcatenation( 				xtextGroup.elements.transformAll(ebnfGrammar) 			) 		) 	); // An action is not transformed private internalTransform(xtext::Action xtextAction, ebnf::EbnfGrammar ebnfGrammar) : 	null; // A keyword is transformed to a terminal symbol private ebnf::Expression_Terminal_Symbol internalTransform(xtext::Keyword xtextKeyword, ebnf::EbnfGrammar ebnfGrammar) : 	let terminalSymbol = new ebnf::Expression_Terminal_Symbol : 		terminalSymbol.setText(xtextKeyword.value) ->
		terminalSymbol;

// A rule call creates a ruleReference and registers the corresponding rule
private ebnf::Expression_Rule_Reference internalTransform(xtext::RuleCall xtextRuleCall, ebnf::EbnfGrammar ebnfGrammar) :
	// Transform the referenced rule
	let referencedRule = xtextRuleCall.rule.transformRule(ebnfGrammar) :
	let ruleReference = new ebnf::Expression_Rule_Reference:
		referencedRule.register(ebnfGrammar) ->
		ruleReference.setRule(referencedRule) ->
		ruleReference;

// The transformation of an assignment delegates to the one of the contained terminal
private internalTransform(xtext::Assignment xtextAssignment, ebnf::EbnfGrammar ebnfGrammar) :
	xtextAssignment.terminal.transform(ebnfGrammar);

// A cross reference is handled as a regular rule call (only the terminal is transformed)
private internalTransform(xtext::CrossReference xtextCrossReference, ebnf::EbnfGrammar ebnfGrammar) :
	if (xtextCrossReference.terminal == null) then (
		error("Cannot transform cross reference as no terminal is specified") ->
		null
	) else (
		transform(xtextCrossReference.terminal, ebnfGrammar)
	);

// The Abstract Negated Token is an abstract element that cannot be processed
private internalTransform(xtext::AbstractNegatedToken xtextAbstractNegatedToken, ebnf::EbnfGrammar ebnfGrammar) :
	error("Undefined input element type: " + xtextAbstractNegatedToken.metaType.name) ->
	null;

// The xtext NegatedToken is represented in EBNF as a wildcard substracted by the given character
private internalTransform(xtext::NegatedToken xtextNegatedToken, ebnf::EbnfGrammar ebnfGrammar) :
	let exception = new ebnf::Expression_Exception :
	let xtextWildcard = new xtext::Wildcard :
	let validTerminal = xtextNegatedToken.terminal.transform(ebnfGrammar) :
		exception.setLeft(internalTransform(xtextWildcard, ebnfGrammar)) ->
		exception.setRight(validTerminal) ->
		wrapIntoGroup(exception);

// The xtext UntilToken has no peer in EBNF but need to be expressed by more complex construct
private internalTransform(xtext::UntilToken xtextUntilToken, ebnf::EbnfGrammar ebnfGrammar) :
	let concatenation = new ebnf::Expression_Concatenation:
	let exception = new ebnf::Expression_Exception :
	let wildcard = internalTransform(new xtext::Wildcard, ebnfGrammar) :
	let validTerminal = xtextUntilToken.terminal.transform(ebnfGrammar) :
	let validTerminal2 = xtextUntilToken.terminal.transform(ebnfGrammar) :
		exception.setLeft(wildcard) ->
		exception.setRight(validTerminal) ->
		concatenation.elements.add(wrapIntoGroup(exception)) ->
		concatenation.elements.add(validTerminal2) ->
		wrapIntoGroup(concatenation);

// As no standardized wildcard symbol exists in ISO-EBNF use "?any character?" instead
private ebnf::Expression_Special_Sequence internalTransform(xtext::Wildcard xtextWildcard, ebnf::EbnfGrammar ebnfGrammar) :
	let specialSequence = new ebnf::Expression_Special_Sequence:
		specialSequence.setText("any character") ->
		specialSequence;

// A character range is rolled out to all contained alternatives
private internalTransform(xtext::CharacterRange xtextCharacterRange, ebnf::EbnfGrammar ebnfGrammar) :
	info("INTERNAL") ->
	if (xtextCharacterRange.left.value.length == 1 || xtextCharacterRange.right.value.length == 1) then (
		wrapIntoGroup(
			createAlternative(
				// Iterate from starting char to end char
				let fromUnicodeChar = unicodeValueForCharacter(xtextCharacterRange.left.value) :
				let toUnicodeChar = unicodeValueForCharacter(xtextCharacterRange.right.value) :
					fromUnicodeChar.upTo(toUnicodeChar).collect(char|
						let terminalSymbol = new ebnf::Expression_Terminal_Symbol :
							terminalSymbol.setText(characterFromUnicodeValue(char))
					)
			)
		)
	) else (
		error("Character range contains must contain exactly one character!") ->
		null
	);

private ebnf::Expression_Terminal_Symbol internalTransform(xtext::EnumLiteralDeclaration xtextEnumLiteralDeclaration, ebnf::EbnfGrammar ebnfGrammar) :
	let terminalSymbol = new ebnf::Expression_Terminal_Symbol:
		if (xtextEnumLiteralDeclaration.literal != null) then (
			terminalSymbol.setText(xtextEnumLiteralDeclaration.literal.value) ->
			terminalSymbol
		) else (
			terminalSymbol.setText(xtextEnumLiteralDeclaration.enumLiteral.name) ->
			terminalSymbol
		);

The helper methods unicodeValueForString and characterFromUnicodeValue refer to java methods.

Simplification of the Transformation’s Results

Maybe you noticed the call optimize() in transform(xtext::Grammar). This invokes another model-to-model transformation that removes unncescessary grammatical constructs from this transformation’s result model in order to improve its readability. The following code shows an example of such an optimization algorithm that performs the following steps:

  • Remove concatenations with one entry only.
  • Remove groupings that contain all elements of a production rule.
  • Replace ((...)) by (...), replace [[...]] by [...], and replace {{...}}) by {...}
  • Replace [(...)] and ([...]) by [...]
  • Replace {(...)} and ({...}) by {...}
  • Replace [{...}] and {[...]} by {...}

Of course, there are more optimizations possibilities. Complement the current implementation if nescessary.

optimize(ebnf::EbnfGrammar grammar) :
	info("Optimizing") ->

	// Remove concatenations with one entry only
	grammar.rules.eAllContents.typeSelect(ebnf::Expression_Concatenation).
		select(concatenation|concatenation.elements.size == 1).
		collect(concatenation|replace(concatenation, concatenation.elements.first())) ->

	// Remove top-level groups
	grammar.rules.expr.typeSelect(ebnf::Expression_Group).
		collect(group|replace(group, group.expr)) ->

	// Remove simple groups if they contain other groups, repetition groups or optional groups
	grammar.rules.eAllContents.typeSelect(ebnf::Expression_Group).
		select(group|
			group.expr.metaType == ebnf::Expression_Group ||
			group.expr.metaType == ebnf::Expression_Repetition_Group ||
			group.expr.metaType == ebnf::Expression_Optional_Group
		).
		collect(group|replace(group, group.expr)) ->

	grammar.rules.eAllContents.typeSelect(ebnf::Expression_Optional_Group).
		select(group|group.expr.metaType == ebnf::Expression_Group).
		collect(group|replace(group.expr, ((ebnf::Expression_Group)group.expr).expr)) ->
	grammar.rules.eAllContents.typeSelect(ebnf::Expression_Repetition_Group).
		select(group|group.expr.metaType == ebnf::Expression_Group).
		collect(group|replace(group.expr, ((ebnf::Expression_Group)group.expr).expr)) ->

	// Remove groups that contain groups of the same type
	grammar.rules.eAllContents.typeSelect(ebnf::Expression_Group).
		select(group|group.expr.metaType == ebnf::Expression_Group).
		collect(group|replace(group, group.expr)) ->
	grammar.rules.eAllContents.typeSelect(ebnf::Expression_Optional_Group).
		select(group|group.expr.metaType == ebnf::Expression_Optional_Group).
		collect(group|replace(group, group.expr)) ->
	grammar.rules.eAllContents.typeSelect(ebnf::Expression_Repetition_Group).
		select(group|group.expr.metaType == ebnf::Expression_Repetition_Group).
		collect(group|replace(group, group.expr)) ->

	// Optional groups that contain repetition groups can be replaced by a repetition group with 0 min repetitions
	grammar.rules.eAllContents.typeSelect(ebnf::Expression_Optional_Group).
		select(group|group.expr.metaType == ebnf::Expression_Repetition_Group).
		collect(group|((ebnf::Expression_Repetition_Group)group.expr).setAtLeastOne(false) -> group).
		collect(group|replace(group, group.expr)) ->
	grammar.rules.eAllContents.typeSelect(ebnf::Expression_Repetition_Group).
		select(group|group.atLeastOne == false).
		select(group|group.expr.metaType == ebnf::Expression_Optional_Group).
		collect(group|replace(group.expr, ((ebnf::Expression_Optional_Group)group.expr).expr)) ->

	null;

Example

Now we created everything we need to generate EBNF description for any Xtext-based language. The following listing shows the generated description for our EBNF-language:

NAME = {
		(
			"a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z"
		) | (
			"A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z"
		) | "_"
	}-;
Expression_Rule_Reference = NAME;
TERMINAL_SYMBOL = (
		'"', {
			?any character? - '"'
		}-, '"'
	) | (
		"'", {
			?any character? - "'"
		}-, "'"
	);
Expression_Terminal_Symbol = TERMINAL_SYMBOL;
Expression = Expression_Alternative;
Expression_Repetition_Group = "{", Expression, "}", [
		"-"
	];
Expression_Optional_Group = "[", Expression, "]";
Expression_Group = "(", Expression, ")";
SPECIAL_SEQUENCE = "?", {
		?any character? - "?"
	}-, "?";
Expression_Special_Sequence = SPECIAL_SEQUENCE;
Expression_Terminal = Expression_Rule_Reference | Expression_Terminal_Symbol | Expression_Repetition_Group | Expression_Optional_Group | Expression_Group | Expression_Special_Sequence;
NUMBER = (
		"1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
	), {
		"0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
	};
Expression_Repetition = Expression_Terminal | (
		NUMBER, "*", Expression_Terminal
	);
Expression_Exception = Expression_Repetition, {
		"-", Expression_Repetition
	};
Expression_Concatenation = Expression_Exception, {
		",", Expression_Exception
	};
Expression_Alternative = Expression_Concatenation, {
		"|", Expression_Concatenation
	};
ProductionRule = NAME, "=", Expression, ";";
EbnfGrammar = {
		ProductionRule
	};
COMMENT = "(*", (
		(
			?any character? - "*)"
		), "*)"
	);
WS = {
		" " | "	" | "
" | "
"
	}-;

That’s it folks. I hope that this post can help you complete your projects successfully!

Stay tuned for the next post.

The source code of this post is available here.

Advertisements
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: