Unlike others languages, all U collections use brackets: []
. It helps you easily understand that you are managing a group of values.
U has many built-in data types to hold a collection of items:
- sequences:
(a, b)
- arrays:
[a, b]
- lists:
[- a, b -]
- hashes:
[< :a: b >]
- sets:
[$ a, b $]
- matrices:
[| a, b |]
- tables:
[+ a, b +]
Collections have different properties:
- Element type: same or different type,
- Element access: random or continuous,
- Collection's capacity: dynamic or static
For example:
- Lists are dynamic and random: an element can be freely inserted or removed
- Arrays are static and continuous: an insertion means copying the array in another location to insert an element.
Collections exist in many flavors: String, Bool-Vector(bitset),...
U performs conversion between collections if needed. For example, if all items are of type int
, a list will be converted to an array if possible. U will warn you about the change.
You can also restrict element types allowed in collections, see Specifiers.
Collections are 0-based indexed, but you can use 1-based index by changing your ugo. They cannot be mixed up within the same pod.
Sequences¶
Sequence are collections at the syntax level. They are typically needed when defining multiple variables or calling a function with parameters:
As parentheses can also be used to specify the order of evaluation in an expression, there is an ambiguity in the case of expression like (2 + 1)
, which could be:
- a sequence with 1 item:
[3]
, - a parenthesized value:
3
U easily resolves this ambiguity at compile-time by looking at the way you use the expression.
in U, sequences and arrays are similar. The implementation will depend on how you use them.
Arrays¶
:a :! [1, 2]
:b : [1, "Alice"]
a << 3 # 'a' <=> [1,2,3]
a.<< 1 # 'a' <=> [2,4,6]; shift left all elements
:ints : [@int] # Array of int
\< [:(:fill: 8, :count: 3)]
# Prints [8,8,8]
Lists¶
Tip
'[-'
and '-]'
denote Lists, the hyphen '-'
represents the link between nodes like in a linked list.
List gives you more functions to navigate between nodes. For example, only lists has prev
, and next
.
Hashes¶
Tip
'[<'
and '->]'
denote Hashes, the arrows '<'
and '>'
represent value associations.
A hashmap, or just hash, is a hash-table represented as a collection of key-value pairs.
:role : :developer
:status : "Status"
:h :! [<
:name: 'Megan',
:role # Same as :role: role
status: :online # Associate variable <status> and :online
>]
\< h[:name]
# Prints 'Megan'
h[:name] := 'Francis'
\< h[:name]
# Prints 'Francis'
Keys can be of any value, ends with a colon :
. In Above example:
:name:
consists of symbol:name
, and key separator:
status:
consists of variablestatus
without colon at the beginning, and key separator:
:role
is a shorthand to avoid to repeat key and value.
Sets¶
Tip
'[$'
and '$]'
denote Sets, '$'
represents the 'S'
of Set.
Sets are collections of unordered, unique items. They can be viewed as a special map where only keys are used.
For example:
s_err
raises an error as 'a' is duplicated, and $!
means only unique items allowed.
Matrices¶
Tip
'[|'
and '|]'
denotes Matrices, '|'
represents matrices' columns.
In U, matrices are multidimensional arrays at the lexical level. :
Matrices are rectangular arrays of values arranged in rows and columns. Rectangular means all columns have same size. They are very powerful to process huge data or extract patterns like in AI. They allow to visually organize complex data and efficiently generate code for high concurrent hardware like GPU.
The same code could be written with arrays too:
To be as close as possible to a mathematical notation, U defines an easier syntax with a user defined syntax:
With this syntax, you are free from commas, but there are other constraints:
- new lines are mandatory to denote rows
- expression with spaces must be within parentheses
()
For example
Matrices Access¶
Collections are accessed and modified through their methods.
To retrieve an element:
To change a mutable collection:
To append or remove an element:
Matrices Methods¶
Matrices can use all methods defined for arrays. For example:
X = [|1, 2 ; 3, 4|]
\< X.inverse # Call method <inverse> for matrices, not array
#(
print:
Mat(2x2) : [|
-2 1
3/2 -1/2
|]
#)
Matrices can be converted to raw array:
Tables¶
Tip
'[+'
and '+]'
denotes Tables, '+'
represents tables' grid separator ⊞.
Table are collections of columns and their relationships. Columns can have different sizes. Table are a generalization of maps. But unlike maps where the relation keys and values are tightly coupled, you can freely reorganize columns and elements with specific methods with tables. They are more like a column-oriented store.
For example:
Or simply:
\< t.cols_names= :c1, :c2
#(
Prints
C1 | C2
---|---
a 4
b 3
a 2
d 1
#)
#In-place modifications
t.first.sort_asc!
t.last.sort_asc!
\< t
#(
Prints
C1 | C2
---|---
a 1
a 2
b 3
d 4
#)
\< t.@json
#(
Prints
{"a": 2, "b": 3, "d": 4}
# "a" with value 1 overwritten
#)
Element Access¶
All collections use the same syntax to access elements. It's either [...]
, or .[...]
To return a default value:
Iterators¶
Most collections processing could be done with iterators. Iterators allow you to prevent loop errors by letting U check collections bounds and loop breaks. Keywords ::for
, ::while
will allow you to express loops in an imperative style.
In U, iterators are created by function juxtaposition or using dot .{
:
Specifiers¶
When you write [1,2,3]
, U interprets it as:
[
: create a collection- push elements
1
,2
,3
]
: collection's end
The key point is that you are able to specify collections' behavior with collection's specifiers at a different time of collection definition:
- before or after creation (point 1 or 3)
- before or after item insertion (point 2)
The most common use case is adding element types:
Specifiers allow you to add constraints with invariants and raise errors if there are not satisfied. They are a powerful feature as you don't need to define special classes to use them. You can define them in place:
:a :! [:( # Specifiers start
@int, # All elements must be ints
:capacity: 2, # Only 2 elements
:before_create:> ...., # Before creating the collection
:before-<<:> e :: !? e.odd # Ensure that all items are even; 'before-<<' means 'before_pushing'
): # Specifiers end
# ... elements
]
a <<= 0 # OK
a <<= 1 # OK
a <<= "hello" # Error: only ints
a <<= 2 # Error: max 2
U has many specifier types:
-
constructor call:
:a :! [: @:int, # Array of ints :capacity: 10, # Max 10 elements :<<: (:meter_per_second!, :even!); # Only even elements in meter-per-second x, y ] # Or with a function :a : [: @int, :capacity: 10, :<<: {: item; item.meter-per-second!; item.even!} # Use a function ; x, y ] # All The above could be done in an OOP Style # Call Array constructor with keyword '::an' :a : @Array @int, :capacity: 10 a << x.meter-per-second!.even! # Ensure x match criteria a << y.meter-per-second!.even! # Ensure y match criteria
-
constructor function:
Specifiers are availables for all collections with a colon :
at the beginning of the definition, except sequences as there are only meant to group items:
- [arrays]:
[: ... ; a, b]
- [lists]:
[-: ... ; a, b -]
- [sets]:
[$: ... ; a, b $]
- [hashes]:
[<: ... ; a, b >]
- [matrices]:
[|: ... ; a, b |]
- [tables]:
[+: ... ; a, b +]
Collection Comprehension¶
Collection Comprehensions are special kinds of specifiers.
Fixed Size Collections¶
Unlike ordinary collections, their capacity is constant. You cannot append elements to them nor shrink them. You can only modify their elements in place.
For example, to create a fixed 10-elements array: [<= 10; 1,2]
Fixed-size arrays have many benefits:
- efficient elements access,
- smaller memory storage,
- faster elements access (data on the stack).
Slices¶
Slices are a structure that keeps references to the beginning and the end of an array. It allows U to avoid inefficient array processing. U makes heavy use of slices for better performance.
Slices can be created by accessing collections with ranges:
:a : [0,1,2, 3, 4, 5, 6]
:slice1 : a[1:.:4] # Slice from 1 to 4
:slice2 : a[0:+2:4] # From 0 to 4 by +2; Slice with [0,2,4]
Mutable Collections¶
By default, collections are immutable, but you can use mutable collections too. It's a mandatory feature when working on low-level systems. However, in that case, reliability in concurrent environments cannot be guaranteed.
Collections Implementations¶
U always converts collections to only two memory layouts and picks up the most efficient one:
- linked lists or lists: linked memory blocks, can contain different types, variable size, slower processing.
- vectors: continous memory blocks, contain only same types, fixed size, fast processing. Vectors perform vector-wide operations at the hardware level. Vectors serve as an abstraction of the SIMD parallel processing instructions.
For example, if a collection has elements of the same type, they will be implemented as vectors. You can force collection to be of a certain type. See Collection Types.
By default, and for performance reasons, U compiler will try to:
- replace expensive memory processing by raw code at compile-time (loop unrolling),
- create collections at compile-time as constant data within the executable,
- create collections at run-time.
Creating collections at run time is more expensive as they require storage and memory processing.