High performance native Lua compiler
clx is an ahead-of-time (AOT) native compiler that compiles Lua 5.5 source code into optimized machine code. By eliminating the need for a runtime interpreter or bytecode virtual machine, clx provides predictable execution time, minimal latency, and a reduced resource footprint.
clx transpiles Lua source code into optimized C++ code, which is then compiled using the system's C++ compiler to produce native machine code. This architecture provides performance up to 60x faster than standard Lua interpreters, thanks to static analysis and high performance optimizations.
clx generates standalone binaries with zero external dependencies. The entire toolchain is fully open source and released under the permissive MIT License.
Due to the static compilation model inherent to an AOT compiler, load(), dofile(), loadfile(), string.dump(), and the , debug library are not available. Dynamic code loading requires a runtime interpreter, and debug introspection requires runtime metadata that AOT compilation does not preserve.
Features
Installation
clx is available as source code (build from any platform) and as pre-built binaries for Linux (x86_64), macOS (ARM64), and Windows (x86_64) from GitHub Releases, built automatically via CI. Linux binaries require glibc ≥ 2.39. This page explains how to build and install the clx binary and runtime library.
Verify your installation:
Benchmarks
Performance comparison against popular Lua runtimes. The speedup is relative to Lua 5.5 interpreter. Benchmarks may vary depending on the C++ toolchain and environment.
| Runtime | Time | Speedup |
|---|
Getting Started
Build clx
Clone the repository and build from source:
Alternatively, build manually with CMake:
Compile Your First Program
Create hello.lua:
Compile and run:
Your Second Program
Let's try something more interesting:
Compile it with --fast flag for better performances:
Language Features
clx supports most Lua 5.5 features including variables, control flow, functions, tables, metatables, coroutines, standard libraries, and bitwise operations. Due to the AOT compilation model, load(), dofile(), loadfile(), string.dump(), and the debug library are not available.
Variables and Types
Control Flow
Functions
Tables and Metatables
Coroutines
String Module
Bitwise Operations
Performance Tips
Use local variables, prefer numeric for loops, and avoid mixing types for optimal performance.
Common Issues
Debugging Compilation Errors
If you get a C++ compilation error, you can see the generated code:
Understanding Runtime Errors
Runtime errors show the Lua line where the error occurred:
The format is filename:line: message.
Next Steps
Read the Architecture, Optimizations, Runtime, and CLI documentation.
CLI Reference
Usage
Options starting with - that are not recognized by clx are automatically passed through to the C++ compiler.
Build Mode
Output Options
Compilation Options
Optimization Flags
Default flags are only applied when no compiler options are provided by the user.
DCE flags (-ffunction-sections -fdata-sections -Wl,--gc-sections) are added only for executables.
--size (default) links against libclx_size.a and reduces binary size by 36–40% vs --fast. Compute-heavy code can be 80–320% slower; table/IO-heavy code sees negligible difference. Use --fast when throughput matters more than size.
If you provide any compiler options, no default flags are added — you get full control.
Platform-Specific
Examples
Environment Variables
clx respects these environment variables:
Exit Codes
Build with CMake
If building from source:
Architecture
Overview
clx transpiles Lua source code into optimized C++, then compiles with the system C++ compiler to produce native machine code. This architecture provides performance up to 60x faster than standard Lua interpreters.
Compiler Pipeline
(gcc / clang / cl)
Components
CLI — Handles argument parsing, file I/O, and invokes the C++ compiler. Auto-detects gcc, clang, or MSVC. Enables DCE via -ffunction-sections -Wl,--gc-sections (gcc/clang) or /Gy /link /OPT:REF /OPT:ICF (MSVC).
Lexer — Converts source to token stream (keywords, identifiers, literals, operators, delimiters).
Parser — Recursive descent parser builds an AST with statement, expression, block, and function nodes.
AST Nodes — Core node types: Block, Identifier, BinaryOp, UnaryOp, FunctionDef, TableConstructor, ForStatement, WhileStatement, IfStatement, CallExpression.
Optimizer — Analyzes the AST and annotates nodes with optimization hints: Numeric fast-path (direct C++ arithmetic), variable scope resolution, table purity analysis, constant folding preparation, shape version tracking for inline cache invalidation, yields_number analysis for numeric for loops.
Code Generator — Produces C++ code with fast-path for numeric operations, slow-path for dynamic operations, loop transformation, [[likely]] branch prediction hints, per-call-site CacheSlot inline caching, StringBuilder-based concatenation, and wyhash-based string hashing.
Runtime Library — Implements Lua semantics: core VM (GC, tables, metamethods, arithmetic, bitwise ops), base library, math library, coroutines, module loading, and string library.
Project Layout
Data Flow
Compilation: Lua source → Lexing → Parsing → AST → Optimization → Annotated AST → Codegen → C++ source → Compilation → Native binary
Runtime: Initialization (LState, standard libraries) → Execution (compiled code with Numeric fast-path) → Fallback (Lua value representation for dynamic types) → Cleanup (GC)
Key Runtime Components
StringPool
Open-addressed hash map for string interning. Each slot owns a baked allocation: [uint32_t hash][uint32_t len][char data...\0]. LValue stores a pointer to the char data (8 bytes past alloc start). One probe on hit, no std::string, no side map, no double lookup. Supports intern_preallocated() for zero-allocation string concatenation.
wyhash
Fast, high-quality hash function used for table keys and string interning. Uses compile-time constant secrets with 128-bit multiply for excellent avalanche. For interned strings, the hash is baked into the allocation header, making lvalue_hash() a single 4-byte load.
For strings ≤8 bytes, swar_hash_8() replaces wyhash_str — loads all bytes into one register with a single memcpy and mixes via wyhash64. Used consistently for both TAG_ISTR inline strings and short interned strings so cross-type hash compatibility is maintained.
CacheSlot Inline Caching
Per-call-site cache slots for string-keyed table access. Each access site in the source gets one CacheSlot that caches the last table pointer and value. Uses shape_version to detect stale cached values after table writes. Only caches non-GC values to avoid dangling pointers after collection. States: valid/invalid based on table pointer and shape version match.
StringBuilder
O(n) string concatenation that avoids the O(n²) quadratic blow-up of repeated s = s .. part patterns. Uses inline storage for up to 8 parts, grows to heap allocation when needed. Produces a single interned string with baked hash on to_string().
Shape Version Tracking
Tables track a shape_version that increments on every write. CacheSlots check the version to detect stale cached values, preventing incorrect reads after table mutations.
Optimizations
Constant Folding
clx emits arithmetic expressions and delegates to the C++ compiler, which performs constant folding at compile time:
Numeric Fast-Path
clx distinguishes between Integer and Number types in its nan-boxed value representation. The runtime LValue arithmetic functions dispatch to native double or int64 operations internally.
Direct Arithmetic Fast-Path
When all operands are known to be numeric, clx generates direct C++ arithmetic instead of dynamic LValue dispatch:
Local Variable Optimization
Local variables that hold numbers are stored as unboxed C++ doubles:
Inline Caching
Each string-keyed table access site gets a dedicated CacheSlot. On repeated access to the same table key, the cache skips the hash probe entirely. Shape version guards detect stale cached values after table mutations. Only non-GC values are cached to avoid dangling pointers.
String Optimizations
StringPool — Open-addressed hash map for string interning with one-probe-on-hit, baked hashes, and no std::string overhead.
Baked Hashes — For interned strings, the wyhash is baked into the allocation header. Reading the hash costs a single 4-byte load.
StringBuilder — O(n) string concatenation that avoids O(n²) quadratic blow-up. Produces a single interned string with baked hash.
wyhash — Fast, high-quality hash with compile-time constant secrets and 128-bit multiply (__uint128_t or _umul128 on MSVC).
Pre-Allocated Interning — intern_preallocated() adopts a pre-formatted buffer directly into the StringPool, cutting string concat from 3 heap allocations to 1 (or 0 on pool hit).
Code Generation Optimizations
Loop Transformations — Numeric for loops transform to C++ for loops. Generic for loops emit direct LCFunction pointer calls.
Branch Prediction Hints — Fast paths annotated with [[likely]] attributes.
Inlining — Small functions inlined at compile time. All arithmetic operators are CLX_INLINE with always_inline.
SIMD Vectorization — C++ compiler can vectorize simple loops with -O3 -march=native. Add -mavx2 manually for AVX2-specific builds.
Dead Code Elimination — -ffunction-sections -fdata-sections -Wl,--gc-sections (gcc/clang) or /Gy /link /OPT:REF /OPT:ICF (MSVC).
Link-Time Optimizations
When using -flto (gcc/clang) or /GL (MSVC), the compiler can inline across translation units, eliminate dead code across the entire program, and perform whole-program analysis. Enabled by default in release mode.
Runtime Optimizations
Table Pre-sizing — Tables with known structure are pre-allocated to the correct size.
Table Layout — Cache-line-optimized layout: all gettable fields fit in one 64-byte cache line.
Upvalue Fast-Path — Closure variables that aren't captured are stored directly, avoiding heap allocation.
Metamethod Caching — Frequently used metamethod strings are pre-interned at initialization.
Length Operator — String length read from baked header, avoiding strlen.
Optimization Levels
Lua source-level debugging. The generated C++ contains #line directives mapping each statement to the original .lua file and line, so GDB, LLDB, or MSVC debugger can step through the script.
Compiler Remarks
GCC/Clang (Linux/macOS)
Default release flags:
MSVC (Windows)
Default release flags:
Key MSVC optimizations: /GL (Whole program optimization / LTO), /OPT:REF (Remove unused functions), /OPT:ICF (Identical COMDAT folding), /Gy (Function-level linking), /fp:fast (Fast floating-point semantics).
Cross-Platform Tips
Use -O3 -march=native -flto=auto on gcc/clang for maximum performance. On MSVC, /O2 is the primary optimization flag; /GL enables link-time optimization. Both compilers support SIMD vectorization when loops are simple enough. Dead code elimination requires function-level linking on both platforms.
Runtime
Value System
clx uses nan-boxing to represent all Lua values in 64 bits with distinct types for Number (double), Integer (native int64), TAG_ISTR inline strings (≤5 bytes, no heap allocation), and pointers for tables, functions, threads, and userdata.
0xFFF9 + len) + 48-bit immediate character sequence (≤ 5 Bytes)
nil, true, and false
Garbage Collection
Stop-the-world mark-and-sweep collector: mark phase traverses reachable objects from roots, sweep phase deallocates unreachable objects. Finalizers (__gc) are called before collection. Uses a reusable worklist vector to avoid repeated allocations.
Standard Libraries
Base Library — print, error, assert, type, tostring, tonumber, pairs, ipairs, next, pcall, xpcall, select, collectgarbage, setmetatable, getmetatable, rawequal, rawget, rawset, rawlen, warn, _VERSION
Math Library — Uses set_lazy_funcs for lazy registration via constexpr LazyReg[]. Functions created as LCFunction closures on first access, then cached on the table.
String Library — len, sub, reverse, lower, upper, rep, byte, char, format, find, match, gmatch, gsub, pack, unpack, packsize with full pattern matching support.
Coroutine Library — create, resume, yield, status, wrap using OS-level fibers/ucontext.
Table Library — insert, remove, concat, sort, unpack, pack, move.
Metamethods
CacheSlot Inline Caching
Per-call-site cache for string-keyed table access. Valid when table pointer matches, shape version hasn't changed, and cached value is not a GC object (avoiding dangling pointers). States: valid/invalid based on table pointer and shape version match.
Closures and Upvalues
clx supports lexical scoping with full closure capture:
- Local variables captured by inner functions become upvalues
- Shared upvalues (multiple closures sharing the same captured variable)
- Loop variable capture (closures created in a for loop each capture the correct iteration value)
- Triple nesting and arbitrary capture depth
- Tail call optimization (TCO) for recursive calls — no stack growth
Goto and Labels
Full goto / ::label:: support with proper lexical scoping:
- Forward and backward jumps
- Duplicate labels in different scopes resolve correctly
- Goto can create loops (backward jumps)
Memory Layout
Table Layout
| Region | Contents |
|---|---|
| LHeader (metadata) | |
| type, marked, next | GC metadata |
| Cache line 0 (64 bytes) — all gettable fields here | |
| array pointer | 8 bytes |
| array_size, array_cap | 16 bytes |
| bucket, hash_size | 24 bytes |
| metatable, hash_count | 40 bytes |
| padding | 56 bytes (to align next cache line) |
| Parallel arrays (cache-line optimized) | |
| keys | hash_size × 8 bytes |
| vals | hash_size × 8 bytes |
| nexts | hash_size × 2 bytes |
| shape_version | 4 bytes |
| free_head | 2 bytes |
String Layout
Interned strings are stored as:
LValue stores a pointer to the char data (skipping the 8-byte header). Length at ptr[-4..ptr[-1] and hash at ptr[-8..ptr[-5]].
Pre-interned Metamethods
To avoid repeated string interning on every metamethod dispatch, clx pre-interns common metamethod strings at LState initialization: str_index, str_newindex, str_gc, str_call, str_close, str_pairs, str_tostring. These are stored directly in LState and used for fast metamethod lookup.
GC Options
The collectgarbage() function accepts these options:
Lazy Function Registration
set_lazy_funcs(L, table, lazy_regs, count) attaches a __index metamethod that creates LCFunction closures lazily on first access and caches them on the table. Uses constexpr-friendly LazyReg arrays (raw function pointers, no std::function) so registration tables live in static read-only storage. Math library uses this pattern.
Performance Characteristics
C++ API
All functions are in namespace clx. Include <clx.h> to use the full API. Values are LValue objects, not stack indices.
Lifecycle
Value Constructors
Type Queries
Type Names
State Queries
Lenient Conversions
Return a default value on failure (no exception):
Strict Conversions
Throw LRuntimeException on type mismatch:
Checked Field Access
Validate that a field has the expected type:
These produce messages like field 'x' (integer expected, got string).
Optional Conversions
Return default on nil, throw on type mismatch:
String Conversion (__tostring aware)
If v is already a string, returns v directly. If v has __tostring, calls it and returns the result. Otherwise falls back to to_string + intern.
Argument Validation
Table Operations
Module Registration
LReg struct
LazyReg struct
LazyReg uses raw function pointers instead of std::function, making the array constexpr. set_lazy_funcs stores the LazyReg* as light userdata on the metatable; the array must persist in static storage.
Globals & Metatables
Table Iteration
Function Calls
Error Helpers
Coroutines
Core Types
LValue Raw Constructors
MultiValue
Thread Status Constants
Module Registration
Each standard module has a luastd_* function that creates and sets the global table:
Call individual luastd_* functions or openlibs(L) after open() before using Lua features.
Coroutine Example
Complete Module Example
Compile into an object/library, then link with clx main.lua --modules mylib. clx's generated main() calls register_module("mylib", luaopen_mylib). The module becomes available via require("mylib") at runtime.
Modules
clx supports two ways to organize and load modules: Lua source modules compiled alongside your entry point (static preload), and statically linked C++ modules with --modules. Both are consumed via Lua's require().
Lua Source Modules
Pass multiple .lua files to clx — the first is the entry point, the rest become modules loadable via require:
Inside main.lua, require them by name (filename without .lua):
How it works
clx compiles each .lua file into a function luaopen_<module> with extern linkage. The generated main() calls register_module() for non-entry modules — this stores the function in package.preload[name] without calling it:
When Lua calls require("mymodule"):
- Checks package.loaded["mymodule"] — if present, returns it immediately
- Checks package.preload["mymodule"] — calls the registered function
- Stores the result in package.loaded["mymodule"] and returns it
- Subsequent calls return the cached value without re-executing
Linking
All builds link statically against libclx.a. No shared library is needed at runtime.
Module convention
A Lua source module should return a table (or any value) that becomes the result of require:
C++ Native Modules (Statically Linked)
You can link precompiled C++ code using --modules:
The function must use CLX_API (which provides extern linkage and proper symbol visibility):
The generated main() calls register_module, which stores the wrapper in package.preload — the function runs only on first require().
Writing a C++ native module
Compile to an object file:
Then link with your Lua script. clx looks for my_native_mod.a (or .lib on Windows) in the current directory, then in <clx-install-dir>/lib/clx/, and on POSIX also in /usr/local/lib/clx/:
If your module depends on external libraries, pass link flags directly:
Compiling Lua to Libraries
Static Library
Object File
All export luaopen_mylib. A host C++ program links against the static library:
Combining All Approaches
All modules are registered in package.preload and loaded via require.
Options Reference
API Reference
Lua 5.5 Compatibility
clx targets Lua 5.5 compatibility. The following table summarizes the current implementation status.
| Core Language | Status |
|---|---|
| Variables | ✅ |
| Arithmetic operators | ✅ |
| Logical operators | ✅ |
| Comparisons | ✅ |
| Functions | ✅ |
| Closures | ✅ |
| _ENV | ✅ |
| Varargs | ✅ |
| Multiple returns | ✅ |
| Local variables | ✅ |
| Global variables | ✅ |
| Control Flow | Status |
|---|---|
| if / elseif / else | ✅ |
| while | ✅ |
| repeat / until | ✅ |
| numeric for | ✅ |
| generic for | ✅ |
| break | ✅ |
| goto | ✅ |
| labels | ✅ |
| Tables | Status |
|---|---|
| Table constructors | ✅ |
| Array part | ✅ |
| Hash part | ✅ |
| Mixed tables | ✅ |
| Table iteration | ✅ |
| Metatables | Status |
|---|---|
| __index / __newindex | ✅ |
| __add / __sub / __mul / __div / __mod / __pow / __unm | ✅ |
| __len / __concat / __eq / __lt / __le | ✅ |
| __call / __tostring | ✅ |
| __ipairs / __pairs | ✅ |
| Coroutines | Status |
|---|---|
| coroutine.create / resume / yield | ✅ |
| coroutine.status / wrap | ✅ |
| Standard Libraries | Status |
|---|---|
| base | ✅ |
| math | ✅ |
| string | ✅ |
| table | ✅ |
| coroutine | ✅ |
| io | ✅ |
| os | ✅ |
| utf8 | ✅ |
| package | ✅ |
| debug | ❌ |
Unsupported — AOT Limitations
Due to the static compilation model, the following features are intentionally unsupported:
| Feature | Reason |
|---|---|
| load() | Runtime Lua code loading |
| loadfile() | Runtime Lua code loading from file |
| dofile() | Runtime Lua code execution from file |
| string.dump() | Lua code compilation |
| debug library | Runtime introspection |
Migration Guide: Lua C API → clx C++ API
This guide is for developers who have written Lua binary modules using the Lua C API and want to port them to clx. clx does not expose a VM stack — instead, values are passed as clx::LValue objects directly.
Key Differences
| Lua C API | clx C++ API |
|---|---|
| Stack-based (lua_push*, lua_to*) | Value-based (clx::LValue, no stack) |
| lua_State* L | clx::LState* L |
| lua_pushnumber(L, 3.14) | clx::number(3.14) |
| lua_tointeger(L, 1) | args[0].as_integer() |
| luaL_checknumber(L, 1) | clx::check_number(L, args[0]) |
| lua_settop(L, 0) | Not needed — no stack |
| lua_getfield(L, idx, "key") | clx::get_field(L, table, "key") |
| lua_setfield(L, idx, "key") | clx::set_field(L, table, "key", val) |
| lua_newtable(L) | clx::table(L) |
| return count n | return MultiValue(values...) |
| lua_error(L) | throw clx::LRuntimeException(...) |
| lua_len(L, idx) | clx::len(L, v) |
| lua_concat(L, n) | clx::concat(L, a, b) |
| lua_pcall(L, nargs, nresults) | clx::pcall(L, func, args, count) |
| lua_next(L, idx) | clx::next(L, table, key) |
Function Signature
Lua C API:
clx C++ API:
Module Registration
Lua C API:
clx C++ API:
Complete Migration Example
Before (Lua C API):
After (clx C++ API):
Not supported in clx: lua_load, luaL_loadfile, luaL_loadstring, lua_dump, bytecode format, lua_pushcclosure, upvalue API, lua_newstate, lua_gc, luaL_Buffer.