Skip to content

U Internals

Compilation Process

Compilation Process

The compilation is performed ahead-of-time on a host machine for any combination of targets. To understand U's unique design, we need to look at the compilation process.

Until now, most language compilation process has been based on the same steps:

%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#ffffff', 'mainBkg': '#cde9f4'}}}%% flowchart LR S((Source)) --> A(Lexer) --> B(Syntax Parser) -->C(Semantic Analyzer) --> D(Binary Generator) subgraph Dynamic Language: Python/Ruby/Javascript Z(Interpreter) style S fill:#FFFFFF,stroke:#333 end subgraph Static Language: C/C++/Rust D --> E(Linker) --> F{{Executable}} style F fill:#FFFFFF,stroke:#333 style S fill:#FFFFFF,stroke:#333 end C --> Z(Interpreter)

In U, you drives each phase of the compilation process through Apertures:

%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#ffffff', 'mainBkg': '#cde9f4'}}}%% flowchart LR S((Source)) --> A(Lexer) --> B(Syntax Parser) -->C(Semantic Analyzer) --> D(Binary Generator) --> E(Linker) --> F{{Executable}} %% Ulet --> A A -.-> aps(Apertures<br/>':', '%', '\') --> Ulet(Ulet) Ulet --> B Ulet --> C Ulet --> D Ulet --> E style F fill:#FFFFFF,stroke:#333 style S fill:#FFFFFF,stroke:#333 style Ulet fill:#00e676, stroke:#333,stroke-width:1px style aps fill:#E0DF1B,stroke:#333,stroke-width:1px, fontSize: 18

Compilation Element Naming

Each of the above compilation steps produces elements. Given the continuously growing literature on programming languages, there is a lot of confusion about their name. For example, Token is often used to describe lexer output elements. Still, your favorite search engine would give you so many different meanings that it makes it difficult for newcomers to have a clear view of how it works. The same goes for Node and other terms.

For clarity reason, we define new terms that will help us remember everything and simplify the documentation. To naming those elements, we just append the postifx -el to the step name producing the element. We simply reuse the same combination as pixel, which comes from picture + element.

A Compel is any compilation element processed by U compiler.

Adding this postfix gives us the following:

%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#ffffff', 'mainBkg': '#cde9f4'}}}%% flowchart LR S((Source)) -->|inpels| A(Lexer) -->|Lexels| B(Syntax<br/>Parser) -->|Syntels| C(Semantic<br/>Analyzer) -->|Semels| D(Binary<br/>Generator) -->|Binels| E(Linker) --> F{{Executable}} style F fill:#FFFFFF,stroke:#333 style S fill:#FFFFFF,stroke:#333

For developers with previous experience, here are some equivalences found in other languages implementations:

  • Inpel: input element, a Unicode source code character.

  • Lexel: lexical element, a group of inpels with a category: alphanumeric, text, numbers. For example, open as Identifier, "a string" as String, 123 as Number, are lexels. There are usually named 'token', 'lexeme', 'lexical unit' in other languages.

  • Syntel: syntax element, lexel's internal storage. For example, "a string" will be stored in memory as a String object by the U compiler, and + as a link to the force add.

  • Semel: semantic element, a group of syntels that represents an action or plane in U semantics. For example, semel "A" + 1 is not valid: U cannot convert a string to an int.

  • Binels: binary element, an output file stored follwing U Binary Specification. In other compiled languages, it's like an object file.

To sum up, U compiler:

  • read inpels from an input file,
  • groups them into lexel,
  • stores them in memory as syntels
  • groups syntels together following U's grammar to validate your source code
  • compiles every file to binel
  • links all binels to produce a standalone executable file.