Skip to content

A Lua bytecode compiler written in Lua itself for didactic purposes or for new language implementations

License

Notifications You must be signed in to change notification settings

q66/luajit-lang-toolkit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LuaJIT Language Toolkit

The LuaJIT Language Toolkit is a Lua implementation of the Lua programming language itself. It generates LuaJIT's bytecode complete with debug informations. The generated bytecode, in turn can be run by the LuaJIT's virtual machine.

On itself this tookit does not do anything useful since LuaJIT is able to generate and run the bytecode for any Lua program. The purpose of the language toolkit is to provide a starting point to implement a programming language that target the LuaJIT virtual machine.

With the LuaJIT Language Toolkit is easy to create a new language or extend the Lua language because the parser is cleanly separated from the bytecode generator and the virtual machine run time environment.

The toolkit implement actually a complete pipeline to parse a Lua program, generate an AST tree and generate the bytecode.

Lexer

Its role is to recognize lexical elements from the program text. It does take the text of the program as input and does produce a flow of "tokens".

Using the language toolkit you can run the lexer only to examinate the flow of tokens:

luajit run-lexer.lua tests/test-1.lua

The command above generate for the following code fragment:

local x = {}
for k = 1, 10 do
    x[k] = k*k + 1
end

to obtain a list of the tokens:

TK_local
TK_name	x
=
{
}
TK_for
TK_name	k
=
TK_number	1
,
TK_number	10
TK_do
TK_name	x
[
TK_name	k
]
=
TK_name	k
*
TK_name	k
+
TK_number	1
TK_end

Each line represent a token where the first element is the kind of token and the second element is its value, if any.

The Lexer's code is an almost literal translation of the LuaJIT's lexer.

Parser

The parser takes the flow of tokens as given by the lexer and form the statements and expressions according to the language's grammar. The parser takes a list of user supplied rules that are invoked each time a parsing rule is completed. The user's module can return a result that will be passed to the other rules's invocation.

For example, the grammar rule for the "return" statement is:

explist ::= {exp ','} exp

return_stmt ::= return [explist]

In this case the toolkit parser rule will parse the optional expression list by calling the function expr_list. Then, once the expressions are parsed the user's rule ast:return_stmt(exps, line) will be invoked by passing the expressions list obtained before.

local function parse_return(ast, ls, line)
    ls:next() -- Skip 'return'.
    ls.fs.has_return = true
    local exps
    if EndOfBlock[ls.token] or ls.token == ';' then -- Base return.
        exps = { }
    else -- Return with one or more values.
        exps = expr_list(ast, ls)
    end
    return ast:return_stmt(exps, line)
end

As you cas see the user's parsing rules are invoked using the ast object.

With the LuaJIT Language Toolkit a set of rules are defined in "lua-ast.lua" to build the AST of the program.

In addition the parser provides additional informations about:

  • the function prototype
  • the syntactic scope

The first is used to keep trace of some informations about the current function parsed.

The syntactic scope rules tell to the user's rule when a new syntactic block begins or end. Currently this is not really used by the AST builder but it can be useful for other implementations.

The Abstract Syntax Tree (AST)

The abstract syntax tree represent the whole Lua program with all the informations. If you implement a new programming language you can implement some transformations of the AST tree if you need. Currently the language toolkit does not perform any transformation and just pass the AST tree to the bytecode generator module.

Bytecode Generator

Once the AST tree is generated it can be feeded to the bytecode generator module that will generate the corresponding LuaJIT bytecode.

The bytecode generator is based on the original work of Richard Hundt for the Nyanga programming language. It was greatly modified by myself to produce optimized code similar to what LuaJIT generate itself.

Alternative Lua Code generator

Instead of passing the AST tree to the bytecode generator an alternative module can be used to generate Lua code. The module is called "luacode-generator" and can be used exactly like the bytecode generator.

The Lua code generator has the advantage of being more simple and more safe as the code is parsed directly by LuaJIT ensuring from the beginning complete compatibility of the bytecode.

Currently the Lua Code Generator backend does not preserve the line numbers of the original source code. This is meant to be fixed in the future.

Use this backend instead of the bytecode generator if you prefer to have a more safe backend to convert the Lua AST to code. The module can be used also to pretty-printing a Lua AST tree since the code itself is propably the most human readable representation of the AST tree.

Running the Application

The application can be run with the following command:

luajit run.lua <filename>

The "run.lua" script will just invoke the complete pipeline of the lexer, parser and bytecode generator and it will pass the bytecode to luajit with "loadstring".

The script "run.lua" can optionally show the generated bytecode using the "-bl" flag. For example:

luajit run.lua -bl tests/test-1.lua

will print on the screen:

-- BYTECODE -- "test-1.lua":0-7
00001    TNEW     0   0
0002    KSHORT   1   1
0003    KSHORT   2  10
0004    KSHORT   3   1
0005    FORI     1 => 0010
0006 => MULVV    5   4   4
0007    ADDVN    5   5   0  ; 1
0008    TSETV    5   0   4
0009    FORL     1 => 0006
0010 => KSHORT   1   1
0011    KSHORT   2  10
0012    KSHORT   3   1
0013    FORI     1 => 0018
0014 => GGET     5   0      ; "print"
0015    TGETV    6   0   4
0016    CALL     5   1   2
0017    FORL     1 => 0014
0018 => RET0     0   1

You can compare it with the bytecode generated natively by LuaJIT using the command:

luajit -bl tests/test-1.lua

In the example above the generated bytecode will be identical to those generated by LuaJIT. This is not an hazard since the Language Toolkit's bytecode generator is designed to produce the same bytecode that LuaJIT itself would generate. Yet in some cases the generated code will differ but this is not considered a problem as long as the generated code is still correct.

Current Status

Currently LuaJIT Language Toolkit should be considered as beta software. The implementation is now complete in term of features and well tested, even for the most complex cases and a complete test suite is used to verify the correctness of the generated bytecode.

About

A Lua bytecode compiler written in Lua itself for didactic purposes or for new language implementations

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Lua 87.2%
  • C 8.5%
  • Python 3.1%
  • Makefile 1.2%