Files
lib_compiler/README.md
Krzosa Karol 62b15a1077 Update README
2024-06-14 09:07:55 +02:00

18 KiB

Compiler front-end in a single-header-file C library

For a very simple statically typed language. I mostly designed it for personal use, it's small but complete - you can actually program in it and so on. It compiles to C with all the debug info so debuggers work. Currently I find that it's pretty good for metaprogramming and code generation, particularly when you have a workflow "metaprogram -> program". The library provides functions for traversing, printing the AST and other niceties:

  • User has full control over compilation!
  • No dependencies, permissive license, single file that compile both in C and C++!
  • Simpler then C: core of the language is 4000 loc and entire package 11k loc.
  • Statically typed, procedural and modern: it's a mix of Go/Odin/Jai with "return to C" as ideal.
  • Complete: it supports conditional compilation, modularity via packages etc.
  • State of art error handling techniques like AST poisoning, proper parsing recovery, catches tons of errors without misreporting!
  • Great C integration: using C libraries feels native, the language compiles easily to C with great debug info.

Example or you can try the language online

import "raylib";

main :: proc(): int {
    InitWindow(800, 600, "Thing");
    SetTargetFPS(60);

    for !WindowShouldClose() {
        BeginDrawing();
        ClearBackground(RAYWHITE);
        DrawText("Congrats! You created your first window!", 190, 200, 20, LIGHTGRAY);
        EndDrawing();
    }
    CloseWindow();
    return 0;
}

Program examples:

Examples of hooking into the compilation:

Other:

How to integrate

To create the implementation do this in one of your C or C++ files:

#define LIB_COMPILER_IMPLEMENTATION
#include "lib_compiler.h"

How to build the compiler executable

Simply compile one file with whatever compiler you want:

clang tools/lcompile.c -o lc.exe

Once you have the executable you can compile examples and stuff:

lc.exe examples/create_raylib_window

When in doubt: ./lc.exe --help.

How to run the test suite

clang++ src/build_tool/main.cpp -o bld.exe
./bld.exe

You only need to compile the build tool once. Afterwards just call ./bld.exe. There are multiple testing options so checkout: ./bld.exe --help.

Language overview

The language is strictly procedural, I have taken most of the inspiration from C, Golang, Ion, Jai and Odin. There are no classes, methods etc. only procedures and data types.

Packages

Package is a real directory on your disk that contains source files. It's also an aggregation of declarations - functions, constants, variables, types which you can import into your program.

import "libc";
import RL "raylib";

main :: proc(): int {
    RL.SetClipboardText("**magic**");
    printf("**magic**");
    return 0;
}

Constants

By which I mean the literal values but also the named declarations. There are 3 constant types in the language: untyped ints, untyped floats and untyped strings. The "untyped" here means that the type is inferred from context. There are no suffixes like 'ull', no random overflows and the values are also bounds checked for safety. If you have ever used Go, I have a good news for you, you are already familiar with this concept.

binary         :: 0b10000110;
hex1           :: 0xabcd3245;
hex2           :: 0xABCD3245;
decimal        :: 1235204950;
floating_point :: 2342.44230;

boolean        :: false;
nil_val        :: nil;

string_val     :: "Something";
raw_string     :: `
{
    "cool_json": "yes",
    "cool_array": ["a", "b", 32],
}
`;

Types

The type system is strict, most conversions require explicit casts.

a: int   = 10;
b: long  = :long(a);  // all of these assignments require a cast
c: float = :float(a);
d: short = :short(a);
e: llong = 1234;      // untyped constants automatically fit into the context
                      // no matter the type

There is one exception which is the *void type. It has very permissive semantics.

i1: int;
v : *void = &i1;
i2: *int  = v;

When using regular pointers, again, casts are necessary.

i1: *int   = nil;
i2: *float = :*float(i1);

Compound types are read from left to right. This creates a nice mirroring between spoken and the symbolic language:

a:  *int;         // a:  pointer '*' to an int
b:  [32]*int;     // an: array of 32 pointers to int
c:  *proc(): int; // a:  pointer to a proc type that returns int

Structs and unions

Node :: struct {
    v1: Value;
    v2: Value;
}

Value :: union {
    i64: llong;
    u64: ullong;
}

Basic types

The language uses exactly the same types as C for greater compatibility. I don't want to go against the current. I want the language to be as simple as possible.

c: char;   uc: uchar;   s: short; us: ushort;
i: int;    ui: uint;    l: long;  ul: ulong;
ll: llong; ull: ullong; f: float; d: double;
b: bool;

Typedefs

Typedef allows you to create an alias to an existsing type but with additional layer of type safety. my_int is not an int, it's a different type and so it requires a cast.

my_int :: typedef int;

i: int = 10;
mi: my_int = :my_int(i); // requires a cast

A simple alias can be defined by adding a '@weak' note:

my_int: typedef int; @weak

i: int = 10;
mi: my_int = i;

Compound expressions

These are expressions meant for initializing structs and unions.

  • Particular members can be targeted by name.
  • Members that were not mentioned are initialized to zero.
Vector2  :: struct { x: float; y: float; }
Camera2D :: struct {
    offset:   Vector2;
    target:   Vector2;
    rotation: float;
    zoom:     float;
}

Camera: Camera2D = {
    offset = {100, 100},
    zoom   = 1.0,
};

SecondCamera := :Camera2D{{1, 1}, zoom = 1.0};

Arrays and compound expressions

array: [100]int; // array of 100 integers

inited_array: [100]int = {
    0,
    [4]  = 4,
    [50] = 50,
    [99] = 99,
    // remaining values are '0'ed
};

Named and default procedure arguments

RED           :: 0xff0000ff;
DrawRectangle :: proc(x: int, y: int, w: int, h: int, color: uint = RED);

main :: proc(): int {
    DrawRectangle(x = 100, y = 100, w = 50, h = 50);
    return 0;
}

Simulating enums

The language doesn't have enums but similar feature can be easily simulated. Operator '^' allows you to generate a list of enumerated constants: 1, 2, 3 ... using typedef you can create a simple integer type with additional type safety.

TextKind :: typedef int;
TEXT_INVALID :: 0;
TEXT_BOLD    :: ^; // 1
TEXT_CURSIVE :: ^; // 2
TEXT_NORMAL  :: ^; // 3

Enumerating bit flags

Operator '<<' allows you to generate a list of enumerated flags: 0b1, 0b10, 0b100 ...

EntityProps :: typedef uint;
ENTITY_CAN_BURN       :: 0b1;
ENTITY_CAN_MOVE       :: <<; // 0b10
ENTITY_CAN_BE_SHOCKED :: <<; // 0b100
ENTITY_WALKING        :: <<; // 0b1000

*char and String types

The language by default is very *char friendly. Strings default to the cstring type which functions in exactly the same way as in C. It functions as an array of bytes which is null terminated by a null 0.

But the language also has it's own String type:

String :: struct {
    str: *char;
    len:  int;
}

Untyped strings work well with both:

a: String = "sized string";
b: *char  = "c string";
c:        = "this defaults to *char";

String is a structure so you need to index it using .str or .len.

test :: proc() {
    a: *char = "C string";
    c := a[1];

    b: String = "Len string";
    c = b.str[1];
    len := b.len; @unused
}

A lengthof operator can be used to learn how big a constant string is, this also works for arrays:

str :: "String";
len :: lengthof(str);

Pointer arithmetic

import "libc";

test :: proc() {
    a: *int = malloc(sizeof(:int) * 10);
    c: *int = addptr(a, 4);
    d: *int = &a[4];
    // there is no: (a + 4)
}

Casting between types

a: float = :float(10);

b := &a;
c := :*int(b);

If statement

if_statement_showcase :: proc(cond: int) {
    if cond == 1 {
        //
    } else if cond == 2 {
        //
    } else {
        //
    }
}

Switch statement

import "libc";
switch_statement_showcase :: proc(value: int)
{
    switch value
    {
        case 0,4,8:
        {
            printf("0, 4, 8\n");
        }
        case 1, 2, 3:
        {
            printf("1, 2, 3\n");
        } @fallthrough
        case 5:
        {
            printf("this activates when 1, 2, 3 and 5\n");
        }
        default: printf("default case");
    }
}

For loop

for_loop_showcase :: proc() {

    // infinite loop
    for {
        /* .. */
    }

    cond := true;
    for cond {
        break;
    }

    for i := 0; i < 4; i += 1 {
        //
    }

    i := 0;
    for i < 4 {
        i += 1;
    }

    i = 0;
    for i < 4; i += 1 {
        //
    }
}

Labeled loops, breaks and continues

You can label a loop and use that label to control the outer loop.

test :: proc() {
    cond := true;
    outer_loop: for cond {
        for cond {
            break outer_loop; // continue works too
        }
    }
}

Arithmetic operators

arithmetic_operators :: proc() {
    add                   := 1 +  2;
    sub                   := 1 -  2;
    mul                   := 1 *  2;
    div                   := 1 /  2;
    mod                   := 1 %  2;

    and                   := 1 && 2;
    or                    := 1 || 2;
    equals                := 1 == 2;
    not_equals            := 1 != 2;
    greater_then          := 1 >  2;
    greater_then_or_equal := 1 >= 2;
    lesser_then           := 1 <  2;
    lesser_then_or_equal  := 1 <= 2;

    left_shift            := 1 >> 2;
    right_shift           := 1 << 2;
    bit_or                := 1 |  2;
    bit_and               := 1 &  2;
    bit_xor               := 1 ^  2;

    not                   := !1;
    negate                := ~1;
}

Arithmetic assignment operators

arithmetic_assign_operators :: proc() {
    a  := 1;
    a  /= 1;
    a  *= 1;
    a  %= 1;
    a  -= 1;
    a  += 1;
    a  &= 1;
    a  |= 1;
    a  ^= 1;
    a <<= 1;
    a >>= 1;
}

Defer statement

A defer statement allows to postpone a certain action to the end of scope. For example - you might allocate memory and use defer just bellow your allocation to free the memory. This free will be deferred, it will happen a bit later.

import "libc";
test :: proc() {
    // it's useful to keep certain pairs of actions together:
    file := fopen("data.txt", "rb");
    if (file) {
        defer fclose(file);
        // .. do things
    }
}

Notes

You can annotate declarations, statements etc. This puts metadata on the AST which you can access from a metaprogram.

// a note that starts with '#', doesn't bind to anything, it basically acts as a directive.
#do_something(a, b, c);


Data :: struct {
    a:  int;
    b: *int;
} @serialize // binds to AST of Data

The language uses notes to implement a bunch of features, for example:

#static_assert(sizeof(:int) == 4);


test :: proc() {
    // ...
} @build_if(LC_OS == OS_WINDOWS)

Conditional compilation

It's important for a low level language to have some form of platform awareness. We don't have includes and preprocessor here so I decided to implement a construct that works during parsing stage. It discards declarations and doesn't let them into typechecking based on the evaluation of the inner expression.

'#build_if' can appear at the top of the file, it decides whether to compile that particular file or not:

#build_if(LC_OS == OS_WINDOWS);

'@build_if' can be bound to any top level declaration.

LONG_MAX :: 0xFFFFFFFF;         @build_if(LC_OS == OS_WINDOWS)
LONG_MAX :: 0xFFFFFFFFFFFFFFFF; @build_if(LC_OS == OS_MAC || LC_OS == OS_LINUX)

Builtins (like sizeof)

a := sizeof(:int);
b := alignof(:int);

A :: struct { a: int; b: int; }
c := offsetof(:A, b);

d := lengthof("asd");
e := typeof(d);

#static_assert(offsetof(:A, b) == 4);

Raw C code when targeting the C generator

#`#include <stdlib.h>`;

global_variable := 0;

main :: proc(): int {
    a: *int = #`malloc(32)`;
    #`a[0] = lc_package_global_variable`;
    return a[0];
}

Any and typeof

Any :: struct {
    type: int;
    data: *void;
}
test :: proc() {
    a: Any = 32;
    b: Any = "thing";
    c: Any = b;

    i: int = 10;
    d: Any = &i;
}
print :: proc(v: Any) {
    switch(v.type) {
        case typeof(:int): {
            val := :*int(v.data);
            // ..
        }
        case typeof(:float): {
            val := :*float(v.data);
            // ..
        }
    }
}

main :: proc() {
    print(32);
    print("asd");
}

Variadic arguments with Any promotion

import "libc";

any_vargs :: proc(fmt: *char, ...Any) {
    va: va_list;
    va_start(va, fmt);

    for i := 0; fmt[i]; i += 1 {
        if fmt[i] == '%' {
            arg := va_arg_any(va);

            if arg.type == typeof(:int) {
                val := :*int(arg.data);
                // ..
            } else {
                // ..
            }
        }
    }

    va_end(va);
}

main :: proc() {
    any_vargs("testing % %", 32, "thing");
}

Comments

// I'm a line comment

/* I'm a block comment
    /*
        I can even nest!
    */
*/

Doc comments

/** package

This is a package doc comment,
    there can only be 1 per package,
    it appears at top of the file.
It binds to the package AST.

*/

/** file

This is a file doc comment,
    there can only be 1 per file,
    it appears at top of the file under package doc comment.
It binds to the file AST.

*/

/**

This is a top level declaration doc comment,
    1 per declaration,
    it appears above the declaration.
It binds to the declaration AST.

*/
A :: proc(): int {
    return 0;
}

Packed structs

A :: struct {
    a: short;
    b: char;
    c: int;
    d: char;
    e: llong;
    f: uchar;
} @packed

Useful resources for compiler development