Xtext based Lua-Editor

Lua is an imperative programming language. It was designed to be embedded as programming language within other programs. Lua uses automatic memory management with garbage collection. It is dynamically typed and provides coroutines. The Lua interpreter has a low memory footprint and is even suitable for embedded systems.

Lua uses a very simple grammar with specialized list operations. The complete symtax can be found here.

In this post we use Xtext to develop an IDE for this real-world programming language.

The Xtext grammar for Lua introduces all valid statements of the language including all its control structures. Furthermore it defines the expression syntax and the valid operators.

grammar org.xtext.eclipse.Lua

import 'http://www.eclipse.org/emf/2002/Ecore' as ecore

generate lua 'http://www.eclipse.org/xtext/Lua'

Chunk hidden (WS, COMMENT):
	Block;

Block hidden (WS, COMMENT):
	{Block}
	(statements+=Statement (';')? )*
	( returnValue=LastStatement (';')? )?;

// ****************************************************
// LAST STATEMENT
// ****************************************************
LastStatement: 
	LastStatement_Return | LastStatement_Break;

// The return statement is used to return values from a function or a chunk
LastStatement_Return: 
	'return' {LastStatement_ReturnWithValue} (returnValues+=Expression (',' returnValues+=Expression)*)?;

// The break statement is used to terminate the execution of a while, repeat, or for loop, skipping to the next statement after the loop
LastStatement_Break: 
	'break' {LastStatement_Break};

// ****************************************************
// STATEMENT
// ****************************************************
Statement hidden (WS, COMMENT):
	Statement_Block |
	Statement_While |
	Statement_Repeat |
	Statement_If_Then_Else |
	Statement_For_Numeric |
	Statement_For_Generic |
	Statement_GlobalFunction_Declaration |
	Statement_LocalFunction_Declaration |
	Statement_Local_Variable_Declaration |
	Statement_FunctioncallOrAssignment;

// A block can be explicitly delimited to produce a single statement. "do ... end"
Statement_Block: 
	'do' block=Block 'end';

// Control structure. "while ... do ... end"
Statement_While: 
	'while' expression=Expression 'do' block=Block 'end';

// Control structure. "repeat ... until ..."
Statement_Repeat: 
	'repeat' block=Block 'until' expression=Expression;

// Control structure. "if ... then ... elseif ... elseif ... else ... end"
Statement_If_Then_Else: 
	'if' ifExpression=Expression 'then' ifBlock=Block (elseIf+=Statement_If_Then_Else_ElseIfPart)* ('else' elseBlock=Block)? 'end';
Statement_If_Then_Else_ElseIfPart:
	'elseif' elseifExpression=Expression 'then' elseifBlock=Block;

// The numeric for loop repeats a block of code while a control variable runs through an arithmetic progression "for ...=..., ... [,...] do ... end"
Statement_For_Numeric: 
	'for' iteratorName=LUA_NAME '=' startExpr=Expression "," untilExpr=Expression ("," stepExpr=Expression)? 'do' block=Block 'end';

// The generic for statement works over functions, called iterators. On each iteration, the iterator function is called to produce a new value, stopping when this new value is nil "for ... in ... do ... end"
Statement_For_Generic: 
	'for' names+=LUA_NAME (',' names+=LUA_NAME)* 'in' expressions+=Expression (',' expressions+=Expression)* 'do' block=Block 'end';

Statement_GlobalFunction_Declaration:
	'function' prefix+=LUA_NAME ('.' prefix+=LUA_NAME)* (':' functionName=LUA_NAME)? function=Function 'end';

Statement_LocalFunction_Declaration:
	'local' 'function' functionName=LUA_NAME function=Function 'end';

// Local variables can be declared anywhere inside a block. The declaration can include an initial assignment "local ... [= ...]"
Statement_Local_Variable_Declaration: 
	'local' variableNames+=LUA_NAME (',' variableNames+=LUA_NAME)* ('=' initialValue+=Expression (',' initialValue+=Expression)*)?;

Statement_FunctioncallOrAssignment:
	Expression_AccessMemberOrArrayElement (
		// Assignment
		({Statement_Assignment.variable+=current} (',' variable+=Expression_AccessMemberOrArrayElement)* '=' values+=Expression (',' values+=Expression)*) |

		// Call of a member function
		(':' {Statement_CallMemberFunction.object=current} memberFunctionName=LUA_NAME arguments=Functioncall_Arguments ) |

		// Call of a function pointer
		({Statement_CallFunction.object=current} arguments=Functioncall_Arguments)
	);

// ****************************************************
// EXPRESSIONS
// ****************************************************
// Delegate to the priority chain of operators by calling the rule for the lowest priority operator
Expression hidden (WS, COMMENT): 
	Expression_Or;

// Or: left associative, priority 0
Expression_Or returns Expression: 
	Expression_And ('or' {Expression_Or.left=current} right=Expression_And)*;

// And: left associative, priority 1
Expression_And returns Expression: 
	Expression_Compare ('and' {Expression_And.left=current} right=Expression_Compare)*;

// Comparisons: left associative, priority 2
Expression_Compare returns Expression: 
	Expression_Concatenation (
		('>'  {Expression_Larger.left=current} right=Expression_Concatenation) |
		('>=' {Expression_Larger_Equal.left=current} right=Expression_Concatenation) |
		('<'  {Expression_Smaller.left=current} right=Expression_Concatenation) |
		('<=' {Expression_Smaller_Equal.left=current} right=Expression_Concatenation) |
		('==' {Expression_Equal.left=current} right=Expression_Concatenation) |
		('~=' {Expression_Not_Equal.left=current} right=Expression_Concatenation)
	)*;

// Concatenation: right associative, priority 3
Expression_Concatenation returns Expression: 
	Expression_PlusMinus ('..' {Expression_Concatenation.left=current} right=Expression_Concatenation)?;

// addition/subtraction: left associative, priority 4
Expression_PlusMinus returns Expression: 
	Expression_MultiplicationDivisionModulo (
		('+'  {Expression_Plus.left=current} right=Expression_MultiplicationDivisionModulo) |
		('-'  {Expression_Minus.left=current} right=Expression_MultiplicationDivisionModulo)
	)*;

// multiplication/division, left associative, priority 5
Expression_MultiplicationDivisionModulo returns Expression:
	Expression_Unary (
		('*'  {Expression_Multiplication.left=current} right=Expression_Unary) |
		('/'  {Expression_Division.left=current} right=Expression_Unary) |
		('%'  {Expression_Modulo.left=current} right=Expression_Unary)
	)*;

// Unary operators: right associative, priority 6
Expression_Unary returns Expression: 
	Expression_Exponentiation |
	('not' {Expression_Negate} exp=Expression_Unary) |
	('#'   {Expression_Length} exp=Expression_Unary) |
	('-'   {Expression_Invert} exp=Expression_Unary);

// exponentiation: right associative, priority 7
Expression_Exponentiation returns Expression: 
	Expression_Terminal
	( '^' {Expression_Exponentiation.left=current} right=Expression_Exponentiation )?;

Expression_Terminal returns Expression:
	Expression_Nil |
	Expression_True |
	Expression_False |
	Expression_Number |
	Expression_VarArgs |
	Expression_String |
	Expression_Function |
	Expression_TableConstructor |
	Expression_Functioncall;

Expression_Nil:
	'nil' {Expression_Nil};
Expression_True:
	'true' {Expression_True};
Expression_False:
	'false' {Expression_False};
Expression_Number:
	value=LUA_NUMBER;
Expression_VarArgs:
	'...' {Expression_VarArgs};
Expression_String:
	value=LUA_STRING;
Expression_Function:
	'function' function=Function 'end';
Expression_TableConstructor:
	'{' {Expression_TableConstructor} (fields+=Field ((','|';') fields+=Field)* (','|';')? )? '}';

// Function calls, left associative, single call only, priority 9
Expression_Functioncall returns Expression: 
	Expression_AccessMemberOrArrayElement (
		// Member-Funktionsaufruf
		(':' {Expression_CallMemberFunction.object=current} memberFunctionName=LUA_NAME arguments=Functioncall_Arguments) |

		// Aufruf eines Funktionspointer:
		({Expression_CallFunction.object=current} arguments=Functioncall_Arguments)
	)?;

// Access a member or array element, left associative, chaining possible, priority 10
Expression_AccessMemberOrArrayElement returns Expression: 
	Expression_VariableName (
		// An expression accessing an element in a variable array
		('[' {Expression_AccessArray.array=current} index=Expression ']') |

		// Access a member variable using multiple parts separated by "."
		('.' {Expression_AccessMember.object=current} memberName=LUA_NAME)
	)*;

// access a variable, terminal expression, priority 11
// Delegate to top of expression rule chain for bracketed expressions
Expression_VariableName returns Expression: 
	('(' Expression ')') |
	({Expression_VariableName} variable=LUA_NAME);


// ****************************************************
// FUNCTIONS
// ****************************************************
Function:
	'(' (parameters+=LUA_NAME (',' parameters+=LUA_NAME)* ','?)? (varArgs?='...')? ')' body=Block;

// Some syntactic sugar: strings and field can be passed as parameters without bracket
Functioncall_Arguments: 
	{Functioncall_Arguments}
	(
		('(' (arguments+=Expression (',' arguments+=Expression)*)? ')' ) |
		(arguments+=Expression_TableConstructor) |
		(arguments+=Expression_String)
	);

// ****************************************************
// TABLES/FIELDS
// ****************************************************
Field:
	Field_AddEntryToTable_Brackets |
	Field_AddEntryToTable |
	Field_AppendEntryToTable;

// Each field of the form "[exp1] = exp2" adds to the new table an entry with key exp1 and value exp2
Field_AddEntryToTable_Brackets: 
	'[' indexExpression=Expression ']' '=' value=Expression;

// A field of the form "name = exp" is equivalent to ["name"] = exp
Field_AddEntryToTable: 
	key=LUA_NAME '=' value=Expression;

// fields of the form "exp" are equivalent to [i] = exp
Field_AppendEntryToTable: 
	value=Expression;

// ****************************************************
// TERMINALS
// ****************************************************
terminal COMMENT:
	'--' (
		('[['->']]') |
		(!'[' (!'\n')* '\n'?)
	);

terminal WS:
	(' '|'\t'|'\r'|'\n')+; // Consume all white space, tabs and new line characters

// Identifiers can be any string of letters, digits, and underscores, but mustn't begin with a digit.
terminal LUA_NAME returns ecore::EString:
	( 'a'..'z' | 'A'..'Z' | '_' ) ( 'a'..'z' | 'A'..'Z' | '_' | '0'..'9' )*;

terminal LUA_STRING returns ecore::EString:
	( "'" ( '\\' ('a'|'b'|'f'|'n'|'r'|'t'|'v'|'"'|"'"|'\\'|('0'..'9'('0'..'9'('0'..'9')))) | !('\\'|"'") )* "'" ) |
	( '"' ( '\\' ('a'|'b'|'f'|'n'|'r'|'t'|'v'|'"'|"'"|'\\'|('0'..'9'('0'..'9'('0'..'9')))) | !('\\'|'"') )* '"' ) |
	( '[[' -> ']]' );

terminal LUA_NUMBER returns ecore::EDouble:
	( ('0'..'9')+ ( '.' ('0'..'9')+ ( ('E'|'e') ('-')? ('0'..'9')+ )? )? ) |
	( '0x' ('0'..'9'|'a'..'f')+ );

Take a lot at the delegation chain of rules that was used to parse expressions with respect to their operator precedence. Each individual rule in this chain is variant of a generic pattern to parse left or right associative expression parts. The ordering of the rules in the chain decides for the priority when multiple rules can match.

Value Converters


The following examples are all valid number in Lua:

  • 3
  • 3.0
  • 3.213
  • 314.16e-2
  • 0.31416E1
  • 0xff
  • 0x56

The terminal rule LUA_NUMBER matches all these examples. However, we need to create and register a specialized value converter that converts from a floating point value to such a textual representation and vice versa. This is an example of an implementation:

@ValueConverter(rule = "LUA_NUMBER")
class LuaNumberValueConverter extends  AbstractNullSafeConverter<Double> {

	@Override
	protected Double internalToValue(String string, AbstractNode node) {
		try {
			if (string.startsWith("0x")) {
				Integer temp = Integer.parseInt(string.substring(2), 16);
				return temp.doubleValue();
			} else {
				int indexOfExponent = string.indexOf("e");
				if (indexOfExponent == -1) {
					indexOfExponent = string.indexOf("E");
				}
				double multiplicator = 1.0;
				if (indexOfExponent!=-1) {
					String exponentPart = string.substring(indexOfExponent+1);
					int exponent = Integer.parseInt(exponentPart);
					multiplicator = Math.pow(10, exponent); 
					string = string.substring(0, indexOfExponent-1);
				}
				double base = Double.parseDouble(string);
				return new Double(base * multiplicator);		
			}
		} catch(IllegalArgumentException e) {
			throw new ValueConverterException(e.getMessage(), node, e);
		}
	}

	@Override
	protected String internalToString(Double value) {
		String result; 
		if (value.equals(Math.floor(value))) { 
			result = Long.toString(value.longValue());
		} else {
			result = Double.toString(value);
		}
		return result;
	}
};

Lua uses a specific String syntax which differ from the one in Java. In Lua strings can be enclosed within single or double quotes. Furthermore, it is possible to use the long string format. An opening long bracket of level n is an opening square bracket followed by n equal signs followed by another opening square bracket (f.ex. [==[). A closing long bracket is defined similarly (f.ex. ]==]). A long string can run over multiple lines and can contain arbitrary characters except for the closing long bracket of the same level.

Except for the long string format some character need to be escaped in strings. These are ‘\a’ (bell), ‘\b’ (backspace), ‘\f’ (form feed), ‘\n’ (newline), ‘\r’ (carriage return), ‘\t’ (horizontal tab), ‘\v’ (vertical tab), ‘\\’ (backslash), ‘\” (quotation mark [double quote]) and ‘\” (apostrophe [single quote]). A character in a string can also be specified by its numerical value using the escape sequence \ddd, where ddd is a sequence of maximum three decimal digits.

Note that in the Lua grammar presented here requires to provide exactly three digits to simplify parsing. Furthermore, it does limit long strings to one level.

In order to respect these particularities, a value converter for these strings is needed:

class LuaStringValueConverter extends AbstractNullSafeConverter<String>() {

	@Override
	protected String internalToValue(String string, AbstractNode node) {
		try {
			if (string.startsWith("'") && string.endsWith("'")) {
				String containedString = string.substring(1, string.length() - 1);
				containedString.replaceAll("\\\"", "\"");
				containedString = unquoteString(containedString);
				return containedString;
				
			} else if (string.startsWith("\"") && string.endsWith("\"")) {
				String containedString = string.substring(1, string.length() - 1);
				containedString.replaceAll("\\'", "'");
				containedString = unquoteString(containedString);
				return containedString;
				
			} else if (string.startsWith("[[") && string.endsWith("]]")) {
				String containedString = string.substring(2, string.length() - 2);
				return containedString;
				
			} else throw new IllegalArgumentException("Unsupported string format");
		} catch(IllegalArgumentException e) {
			throw new ValueConverterException(e.getMessage(), node, e);
		}
	}

	@Override
	protected String internalToString(String value) {
		String result = quoteString(value);
		result.replace("\"", "\\\"");
		return "'" + value + "'";
	}
	

	private String unquoteString(String string) {
		string = string.replace("\\\\", "\\");
		string = string.replace("\\a", "\10"); 
		string = string.replace("\\f", "\f");
		string = string.replace("\\n", "\n");
		string = string.replace("\\r", "\r");
		string = string.replace("\\t", "\t");
		string = string.replace("\\v", "|");
		return string;
	}
	
	private String quoteString(String string) {
		string = string.replace("\\", "\\\\");
		string = string.replace("\10", "\\a"); 
		string = string.replace("\f", "\\f");
		string = string.replace("\n", "\\n");
		string = string.replace("\r", "\\r");
		string = string.replace("\t", "\\t");
		string = string.replace("|", "\\v");
		return string;
	}
};

Name Provider


When parsing lua files within an MWE workflow one will typically use the component org.eclipse.xtext.mwe.Reader. The top level element of a lua file is the unamed element Block. The reader component, however, ignores top-level element without names. Without additional actions to be taken we will end up with an empty slot and no further processing within the workflow is possible.

Therefore, we need to provide a name for the top-level block. This can be done registering a name provider:

public class LuaQualifiedNameProvider extends DefaultDeclarativeQualifiedNameProvider {
	public String qualifiedName(Block block) {
		if (block.eContainer() == null) 
			return "root"; 
		else 
			return null;
	}
}

Enhancements to provide an IDE for Lua developers


In order to provide Lua developers with a full fledged IDE some more work has to be done. For example, the Lua compiler and interpreter should be integrated into the eclipse environment. However, the code shown provides eclipse based editors with keyword completion. Furthermore it provides parsers for Lua files which can serve as a base to develop a more complete toolings.

Advertisements
    • Jean-Sebastien Leduc
    • July 28th, 2011

    Been checking your code and xtext 2.0.
    Really interesting! I’m learning xtext and trying to go further

    • I’m glad that I could help you.
      Please note, that the examples and the source code are built for Xtext 1.0. I don’t know yet, whether there are any compatibility issues with 2.0.

  1. Cool, Thanks for sharing this. Is it allowed to use that code in a Project? Any license?

    • Gijs
    • March 25th, 2014

    Thanks for that grammar! I’d like to use it in another project to include Lua function code in a DSL. However, when compiling, I get an error:
    [fatal] rule ruleExpression_Functioncall has non-LL(*) decision due to recursive rule invocations reachable from alts 2,3. Resolve by left-factoring or using syntactic predicates or using backtrack=true option.
    The following token definitions can never be matched because prior tokens match the same input: RULE_INT
    Have you seen this error before and know a solution? I’m using Xtext 2.5.3, so it could be a compatibility issue.

    • I used Xtext 0.7 and 1.0 and had the backtrack option enabled in the MWE script that runs the generation of the parser which solved that issue for me. Backtracking, however, impacts the speed of the parsing process which can be an issue with larger files. If this bothers you, you can try to left factorize the clashing rules (there is a brief description of how to do this on the Xtext website)

        • Gijs
        • March 27th, 2014

        Thanks for the clear anwser! I thought I tried the backtracking option, but apparantly I did something wrong. I tried again and now it works. Backtracking will be fine for now, since it’s gonna be used as a research prototype only.

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: