18 KiB
Compiler front-end in a single-header-file C library
.. for a very simple statically typed language. I mostly designed it for personal use, it's small but complete - you can actually program in it and so on. It compiles to C with all the debug info so debuggers work. Currently I find that it's pretty good for metaprogramming and code generation, particularly when you have a workflow "metaprogram -> program". The library provides functions for traversing, printing the AST and other niceties:
- User has full control over compilation!
- No dependencies, permissive license, single file that compile both in C and C++!
- Simpler then C: core of the language is 4000 loc and entire package 11k loc.
- Statically typed, procedural and modern: it's a mix of Go/Odin/Jai with "return to C" as ideal.
- Complete: it supports conditional compilation, modularity via packages etc.
- State of art error handling techniques like AST poisoning, proper parsing recovery, catches tons of errors without misreporting!
- Great C integration: using C libraries feels native, the language compiles easily to C with great debug info.
Example or you can try the language online
import "raylib";
main :: proc(): int {
InitWindow(800, 600, "Thing");
SetTargetFPS(60);
for !WindowShouldClose() {
BeginDrawing();
ClearBackground(RAYWHITE);
DrawText("Congrats! You created your first window!", 190, 200, 20, LIGHTGRAY);
EndDrawing();
}
CloseWindow();
return 0;
}
Program examples:
Examples of hooking into the compilation:
- Generating type information
- Adding printf checks
- Adding instrumentation to every procedure
- Adding a generic dynamic array using AST modification
Other:
How to integrate
To create the implementation do this in one of your C or C++ files:
#define LIB_COMPILER_IMPLEMENTATION
#include "lib_compiler.h"
How to build the compiler executable
Simply compile one file with whatever compiler you want:
clang tools/lcompile.c -o lc.exe
Once you have the executable you can compile examples and stuff:
lc.exe examples/create_raylib_window
When in doubt: ./lc.exe --help.
How to run the test suite
clang++ src/build_tool/main.cpp -o bld.exe
./bld.exe
You only need to compile the build tool once. Afterwards just call ./bld.exe. There are multiple testing options so checkout: ./bld.exe --help.
Language overview
The language is strictly procedural, I have taken most of the inspiration from C, Golang, Ion, Jai and Odin. There are no classes, methods etc. only procedures and data types.
Packages
Package is a real directory on your disk that contains source files. It's also an aggregation of declarations - functions, constants, variables, types which you can import into your program.
import "libc";
import RL "raylib";
main :: proc(): int {
RL.SetClipboardText("**magic**");
printf("**magic**");
return 0;
}
Constants
By which I mean the literal values but also the named declarations. There are 3 constant types in the language: untyped ints, untyped floats and untyped strings. The "untyped" here means that the type is inferred from context. There are no suffixes like 'ull', no random overflows and the values are also bounds checked for safety. If you have ever used Go, I have a good news for you, you are already familiar with this concept.
binary :: 0b10000110;
hex1 :: 0xabcd3245;
hex2 :: 0xABCD3245;
decimal :: 1235204950;
floating_point :: 2342.44230;
boolean :: false;
nil_val :: nil;
string_val :: "Something";
raw_string :: `
{
"cool_json": "yes",
"cool_array": ["a", "b", 32],
}
`;
Types
The type system is strict, most conversions require explicit casts.
a: int = 10;
b: long = :long(a); // all of these assignments require a cast
c: float = :float(a);
d: short = :short(a);
e: llong = 1234; // untyped constants automatically fit into the context
// no matter the type
There is one exception which is the *void type. It has very permissive semantics.
i1: int;
v : *void = &i1;
i2: *int = v;
When using regular pointers, again, casts are necessary.
i1: *int = nil;
i2: *float = :*float(i1);
Compound types are read from left to right. This creates a nice mirroring between spoken and the symbolic language:
a: *int; // a: pointer '*' to an int
b: [32]*int; // an: array of 32 pointers to int
c: *proc(): int; // a: pointer to a proc type that returns int
Structs and unions
Node :: struct {
v1: Value;
v2: Value;
}
Value :: union {
i64: llong;
u64: ullong;
}
Basic types
The language uses exactly the same types as C for greater compatibility. I don't want to go against the current. I want the language to be as simple as possible.
c: char; uc: uchar; s: short; us: ushort;
i: int; ui: uint; l: long; ul: ulong;
ll: llong; ull: ullong; f: float; d: double;
b: bool;
Typedefs
Typedef allows you to create an alias to an existsing type but with additional layer of type safety. my_int is not an int, it's a different type and so it requires a cast.
my_int :: typedef int;
i: int = 10;
mi: my_int = :my_int(i); // requires a cast
A simple alias can be defined by adding a '@weak' note:
my_int: typedef int; @weak
i: int = 10;
mi: my_int = i;
Compound expressions
These are expressions meant for initializing structs and unions.
- Particular members can be targeted by name.
- Members that were not mentioned are initialized to zero.
Vector2 :: struct { x: float; y: float; }
Camera2D :: struct {
offset: Vector2;
target: Vector2;
rotation: float;
zoom: float;
}
Camera: Camera2D = {
offset = {100, 100},
zoom = 1.0,
};
SecondCamera := :Camera2D{{1, 1}, zoom = 1.0};
Arrays and compound expressions
array: [100]int; // array of 100 integers
inited_array: [100]int = {
0,
[4] = 4,
[50] = 50,
[99] = 99,
// remaining values are '0'ed
};
Named and default procedure arguments
RED :: 0xff0000ff;
DrawRectangle :: proc(x: int, y: int, w: int, h: int, color: uint = RED);
main :: proc(): int {
DrawRectangle(x = 100, y = 100, w = 50, h = 50);
return 0;
}
Simulating enums
The language doesn't have enums but similar feature can be easily simulated. Operator '^' allows you to generate a list of enumerated constants: 1, 2, 3 ... using typedef you can create a simple integer type with additional type safety.
TextKind :: typedef int;
TEXT_INVALID :: 0;
TEXT_BOLD :: ^; // 1
TEXT_CURSIVE :: ^; // 2
TEXT_NORMAL :: ^; // 3
Enumerating bit flags
Operator '<<' allows you to generate a list of enumerated flags: 0b1, 0b10, 0b100 ...
EntityProps :: typedef uint;
ENTITY_CAN_BURN :: 0b1;
ENTITY_CAN_MOVE :: <<; // 0b10
ENTITY_CAN_BE_SHOCKED :: <<; // 0b100
ENTITY_WALKING :: <<; // 0b1000
*char and String types
The language by default is very *char friendly. Strings default to the cstring type which functions in exactly the same way as in C. It functions as an array of bytes which is null terminated by a null 0.
But the language also has it's own String type:
String :: struct {
str: *char;
len: int;
}
Untyped strings work well with both:
a: String = "sized string";
b: *char = "c string";
c: = "this defaults to *char";
String is a structure so you need to index it using .str or .len.
test :: proc() {
a: *char = "C string";
c := a[1];
b: String = "Len string";
c = b.str[1];
len := b.len; @unused
}
A lengthof operator can be used to learn how big a constant string is, this also works for arrays:
str :: "String";
len :: lengthof(str);
Pointer arithmetic
import "libc";
test :: proc() {
a: *int = malloc(sizeof(:int) * 10);
c: *int = addptr(a, 4);
d: *int = &a[4];
// there is no: (a + 4)
}
Casting between types
a: float = :float(10);
b := &a;
c := :*int(b);
If statement
if_statement_showcase :: proc(cond: int) {
if cond == 1 {
//
} else if cond == 2 {
//
} else {
//
}
}
Switch statement
import "libc";
switch_statement_showcase :: proc(value: int)
{
switch value
{
case 0,4,8:
{
printf("0, 4, 8\n");
}
case 1, 2, 3:
{
printf("1, 2, 3\n");
} @fallthrough
case 5:
{
printf("this activates when 1, 2, 3 and 5\n");
}
default: printf("default case");
}
}
For loop
for_loop_showcase :: proc() {
// infinite loop
for {
/* .. */
}
cond := true;
for cond {
break;
}
for i := 0; i < 4; i += 1 {
//
}
i := 0;
for i < 4 {
i += 1;
}
i = 0;
for i < 4; i += 1 {
//
}
}
Labeled loops, breaks and continues
You can label a loop and use that label to control the outer loop.
test :: proc() {
cond := true;
outer_loop: for cond {
for cond {
break outer_loop; // continue works too
}
}
}
Arithmetic operators
arithmetic_operators :: proc() {
add := 1 + 2;
sub := 1 - 2;
mul := 1 * 2;
div := 1 / 2;
mod := 1 % 2;
and := 1 && 2;
or := 1 || 2;
equals := 1 == 2;
not_equals := 1 != 2;
greater_then := 1 > 2;
greater_then_or_equal := 1 >= 2;
lesser_then := 1 < 2;
lesser_then_or_equal := 1 <= 2;
left_shift := 1 >> 2;
right_shift := 1 << 2;
bit_or := 1 | 2;
bit_and := 1 & 2;
bit_xor := 1 ^ 2;
not := !1;
negate := ~1;
}
Arithmetic assignment operators
arithmetic_assign_operators :: proc() {
a := 1;
a /= 1;
a *= 1;
a %= 1;
a -= 1;
a += 1;
a &= 1;
a |= 1;
a ^= 1;
a <<= 1;
a >>= 1;
}
Defer statement
A defer statement allows to postpone a certain action to the end of scope. For example - you might allocate memory and use defer just bellow your allocation to free the memory. This free will be deferred, it will happen a bit later.
import "libc";
test :: proc() {
// it's useful to keep certain pairs of actions together:
file := fopen("data.txt", "rb");
if (file) {
defer fclose(file);
// .. do things
}
}
Notes
You can annotate declarations, statements etc. This puts metadata on the AST which you can access from a metaprogram.
// a note that starts with '#', doesn't bind to anything, it basically acts as a directive.
#do_something(a, b, c);
Data :: struct {
a: int;
b: *int;
} @serialize // binds to AST of Data
The language uses notes to implement a bunch of features, for example:
#static_assert(sizeof(:int) == 4);
test :: proc() {
// ...
} @build_if(LC_OS == OS_WINDOWS)
Conditional compilation
It's important for a low level language to have some form of platform awareness. We don't have includes and preprocessor here so I decided to implement a construct that works during parsing stage. It discards declarations and doesn't let them into typechecking based on the evaluation of the inner expression.
'#build_if' can appear at the top of the file, it decides whether to compile that particular file or not:
#build_if(LC_OS == OS_WINDOWS);
'@build_if' can be bound to any top level declaration.
LONG_MAX :: 0xFFFFFFFF; @build_if(LC_OS == OS_WINDOWS)
LONG_MAX :: 0xFFFFFFFFFFFFFFFF; @build_if(LC_OS == OS_MAC || LC_OS == OS_LINUX)
Builtins (like sizeof)
a := sizeof(:int);
b := alignof(:int);
A :: struct { a: int; b: int; }
c := offsetof(:A, b);
d := lengthof("asd");
e := typeof(d);
#static_assert(offsetof(:A, b) == 4);
Raw C code when targeting the C generator
#`#include <stdlib.h>`;
global_variable := 0;
main :: proc(): int {
a: *int = #`malloc(32)`;
#`a[0] = lc_package_global_variable`;
return a[0];
}
Any and typeof
Any :: struct {
type: int;
data: *void;
}
test :: proc() {
a: Any = 32;
b: Any = "thing";
c: Any = b;
i: int = 10;
d: Any = &i;
}
print :: proc(v: Any) {
switch(v.type) {
case typeof(:int): {
val := :*int(v.data);
// ..
}
case typeof(:float): {
val := :*float(v.data);
// ..
}
}
}
main :: proc() {
print(32);
print("asd");
}
Variadic arguments with Any promotion
import "libc";
any_vargs :: proc(fmt: *char, ...Any) {
va: va_list;
va_start(va, fmt);
for i := 0; fmt[i]; i += 1 {
if fmt[i] == '%' {
arg := va_arg_any(va);
if arg.type == typeof(:int) {
val := :*int(arg.data);
// ..
} else {
// ..
}
}
}
va_end(va);
}
main :: proc() {
any_vargs("testing % %", 32, "thing");
}
Comments
// I'm a line comment
/* I'm a block comment
/*
I can even nest!
*/
*/
Doc comments
/** package
This is a package doc comment,
there can only be 1 per package,
it appears at top of the file.
It binds to the package AST.
*/
/** file
This is a file doc comment,
there can only be 1 per file,
it appears at top of the file under package doc comment.
It binds to the file AST.
*/
/**
This is a top level declaration doc comment,
1 per declaration,
it appears above the declaration.
It binds to the declaration AST.
*/
A :: proc(): int {
return 0;
}
Packed structs
A :: struct {
a: short;
b: char;
c: int;
d: char;
e: llong;
f: uchar;
} @packed
Useful resources for compiler development
- https://c3.handmade.network/blog/p/8632-handling_parsing_and_semantic_errors_in_a_compiler - great article by Christoffer Lerno (C3 author) about handling errors in compilers.
- https://www.youtube.com/watch?v=bNJhtKPugSQ - Walter Bright (D author) talks about AST poisoning here.
- https://bitwise.handmade.network/ - series by Per Vognsen where he actually creates a C like language, very helpful, very hands on!
- https://hero.handmade.network/episode/code/day206/ - this episode of handmade hero started me on the entire compiler journey, a long, long time ago.
- https://www.youtube.com/watch?v=TH9VCN6UkyQ&list=PLmV5I2fxaiCKfxMBrNsU1kgKJXD3PkyxO - I have re-watched this playlist many this, searching for keywords and ideas. Jonathan Blow's compiler was a big inspiration of mine when learning programming languages.
- A Retargetable C Compiler: Design and Implementation by Christopher W. Fraser and David R. Hanson - sometimes looked at this as a reference to figure stuff out. Very helpful resource on compiler construction, the way it's written reads more like documentation but you use what you have.
- Compiler Construction by Niklaus Wirth - https://people.inf.ethz.ch/wirth/CompilerConstruction/index.html - have to learn to read Pascal / Oberon but he goes over implementing all the stages of the compiler. Wirth's project of implementing an entire hardware / software stack is really inspiring.
- https://github.com/rui314/chibicc - great resource for learning how to write a very dumb x64 backend. I like the format, you go over git commits, doesn't work well on github though, with Sublime Merge it's a pleasure to follow.
- https://go.dev/blog/constants - article on golang type system, untyped types, constants that kind of stuff.
- https://github.com/JoshuaManton/sif - looked at this as a reference from time to time, author seems like a Jonathan Blow fan so it was a good resource informed by similar resources as I used.
- https://github.com/gingerbill/odin - I sometimes peeked at the compiler to figure stuff out when I was confused. https://c3.handmade.network/blog - Christoffer Lerno (C3 author) blog.
- https://github.com/c3lang/c3c - I sometimes looked at C3 compiler as a reference, the author also let me use his bigint library, thanks a lot! :)