This document contains preliminary information about U development. It's a live document subject to change.

U syntax is concise and defined by very few but necessary rules to have coherent grammar. Built-in literals can be tailored with ugo.

Source File¶

A U project consists of files in a directory structure managed by ugo, U build system.

Here is U file properties:

Source code is UTF-8 encoded
Comments, characters and string literals can contain Unicode characters
Default character value is as wide as a Unicode character: up to 4 bytes
Default String value are stored in UTF-8 but may be iterated by char, byte, or Unicode elements.

Main¶

U programs do not need a special function named main that is invoked to start the program. U uses the very last statement of root file.The root file is either the one passed in the command line or specified in ugo's build file.

\< "Welcome" 
# This statement is executed without <main>.

But if you are building a large project, you may need to define the main function to better control your programs behavior. In U, the main is denoted '\/', the V of U. The symbol consists of the escape apertures '\' followed by a slash '/'.

No keywords¶

U syntax is easy to learn and consistent. Unlike other languages, there is, and there will be no direct keywords. All U keywords are guarded with apertures :: like ::if. Therefore if, while are valid identifiers in U. Readability is key when choosing your identifiers.

As U adoption increase, new needs will arise either for new business models or new hardware platform: new Iphones, AI models, better web architecture, autonomous cars, ...

For example, suppose in five years a new paradigm arise. In that case, it will be easy to add %new-feature to your code with simply a user defined syntax to implement your logic without breaking legacy code and most critically, without changing U syntax.

# This will use a Valcon named new-feature
%new-feature ....

Invalid characters¶

U grammar is very permissive. Few character combinations are rejected, mainly when the code is ambiguous. Here are simple examples of invalid source code:

:a : (b,,c) # ',,' missing an item
:a : (b,c]  # Delimiter mismatch: Opening '(' and closing ']'

Statements¶

U is line-based, with optional semicolons, like most modern programming languages (Python, Ruby, Swift) because it's visually pleasing. U doesn’t require you to write semicolons. They are only necessary if you want to write multiple statements on a single line.

:a : 1
:b : 2
# In one line
:a : 1; :b : 2

A semicolon is also required when using delimiters with parameters in a single line:

[: meter-per-second!; ...]

See Delimiters.

Comments¶

Comments are ignored by the U compiler. They begin with two back-slashes #. Back-slash character has been chosen because it's not widely used in code bases outside strings. It's a perfect character to improve expressiveness by allowing you to use any character in your source code.

Single line comment¶

Single line comments start with two back-slashes and extend to the end of the line.

# This is a comment
:a : 1; :b : 2

Multi-line comment¶

Multi-line comments start with two back-slashes followed by an open parenthesis #( and end with two back-slashes followed by a close parenthesis #):

#( 
  This is a comment
  written over 
  multiple lines. 
#)

Nested comment¶

Comments can be nested.

#(
  #( 
    # Inner line
  #)
#)

Whitespace¶

Usually, whitespaces – spaces, tabs, weird unicode characters, carriage returns, line breaks, ... – are ignored by parsers as they have no syntactic value. But U is whitespace sensitive for three main reasons:

Visually coherent: if you have not previously learned that whitespace does not have syntactic meaning in a programming language, then the Principle of Least Astonishment seems natural with U.
Expressiveness: if car:run is different than car : run, then more expressions are possible.
Productivity: a linter is not yet necessary in U. You don't need to format your code before sharing it.

If whitespace has meaning to humans, then it should have meaning to machines as well.

— The U team

It simply means that a combination of space character: ' ', tab: '\t', or comments is treated as a separator. However, the number of spaces is not significant.

Whitespace-sensitivity allows you better expressiveness and more operators (an operator can be prefix, infix, postfix).

:a : [1, 2, 3]   
\< a.*  2  # Prints [2, 4, 6] ; multiply EACH element of array 'a' by 2
\< a .* 2  # Prints [[1, 2], [2, 4], [3,9]] ; create pair of 'a' values associated with their double

In the above example, Whitespace sensitivity allows an operator without space .* and another one with spaces around .*.

Names, Symbols, and Identifiers¶

Many languages are annoyingly strict about what characters are valid in names. U has very few limits on names.

A name is an interned string i.e. a string associated to a number. The compiler uses names to store and retrieve Values. A name is not accessible from the source code (User dim).

Names rules are:

Names can contain almost all non-space Unicode characters.
Names must start with a letter followed by letters, numbers
Unlike other languages, hyphens - and operators might be part of a name when enabled in Ugo: for example server-index?.

Using hyphens as separators in names increases readiness and productivity: * you don't have to choose between camelCase or snake_case code style. * names with hyphen are easier to convert to values: login-screen can easily be converted to a class LoginScreen, an id login_screen, or a file name login/screen.png. * hyphens are allowed in some languages like css

In U, there are only two named Values:

Symbol: Symbols start with aperture ':' followed by a name.
Identifier: Identifiers are simply names in the source code. Identifiers are a reference to a variable.

Here are some examples of names:

:var : 2   # ':var' is a Symbol named 'var'
f var - 3  # 'f', 'var' are Identifiers

# Unicode
:timon : "Timon"
:丁满 : "丁满 in chinese"
:Тимон : "Тимон in russian"
:🐨 : "koala that look like Timon"
:π : 3.14159

# Operator in names
:is_enabled?
:function->>: 1
function->> a, b

:'+++' :> ...   # Define an operator '+++'
:result : 1 +++ 4

Delimiters¶

In U, delimiters are Values:

(...): define a lexical level grouping like a sequence (a, b). Any sequence element is separated by a comma ','.
{...}: define a function, closure, or any construct that needs to execute statements. See Functions.
[...]: define a data model that needs storage at compile time or at run time like sets, array, hash map. See Collections.

{}, and [] denotes function definition in U. It means that they accept parameters and specialization with operators:

# A function with 2 parameters
{: a, b;  a + b }

# An Array that ensure every item pushed is in meter/second
[<< meter-per-second-!; ... ]

See Functions.

See Collections.

Values¶

See Values.

Documentation¶

U has documentation in a predefined format. For example, in markdown:

#[md
    #Function
    function **move**
#]

See Documentation.

Tests¶

U has built-in test and validation constructs:

#?{
   # Expecting result == 4
   add 3, :>: input, result ; result == 4
#} 
:add :>: v ; v + 1

See Tests.