U support strings with three quotes:

single quote '...'
double quote "..."
backtick quote `...`

:a : "Hello you!"
:b : 'Hello you!'
:c : `Hello you!`

All above statements create the same string.

Characters¶

Character literal have the following syntax \'c' for character c.

See Characters.

Extended String Syntax¶

Strings are so ubiquitous that U has a special aperture for string processing: .%.

'a'.%ascii   # Convert to ascii
'a'.%ascii:code # Means get ascii code 
'a'.%U:scalars # Means get utf8 scalars => [...]

In U, a built-in string is a read-only array of bytes encoded in UTF-8 (See Unicode). String values are immutable. You cannot modify its elements:

:text : 'peace'
text[0] : \'A' # Error
text[2:<.<:-1]) # Return slice 'ace'

::mut :name : 'bob'
name[0] : \'B' # Ok, 'name' is mutable

To use mutable string, use keyword ::mut or string constructor %!.

The aperture % remind you that U need to parse content to build a Value: a string, a regexp, or a Valcon.

String Constructor¶

Strings constructor can be defined with apertures:

%: for immutable string
%!: for mutable string

Unlike others languages, in U, string constructors are created at the lexer level. All operations are done directly on input characters. It allows an efficient parsing process by avoiding:

expensive memory access and,
useless processing.

For example, to delete spaces at the beginning and the end of a string (to strip it). In most languages, you write:

# a string with 9 chars
"  hello  ".strip

Python compiler, like most compilers, performs the following steps:

allocate a string S in memory containing 10 chars: 9 chars + '\0' => 10 chars in memory,
create or use a working buffer B
execute string method 'strip' from S into B

Four spaces around 'hello' have been added to S in step 1, and discarded in B in step 3. U avoids the allocation of B and the processing of 4 bytes.

In U, operations are done before string creations:

:%{strip}'  hello  '

U will:

meet aperture :%,
look for a string constructor C with 'strip' function
allocate a string S
constructor C with S will meet the following characters:
- 2 space: discard
- 5 chars: keep
- 2 space: discard

Only useful characters are kept. This kind of optimization could make a huge difference when processing large strings like web pages.

Immutable String Constructor¶

:str : :%"peace"  
# => Same as :str : 'peace' 

:str : :%(max: 5)'peace'
# Immutable string with a maximum capacity of 5 characters

:str : :%{: lhs; ... }'peace'
# Immutable string with in line constructor. 'lhs' (left hand side) is 'peace'

Note that whether you pass an immutable string as an argument on function call, assign it to a variable or constant, every time a copy of the original string is created unless specified otherwise.

Mutable String Constructor¶

:str : :%!'peace'
::mut :str : 'peace'
# Mutable strings: all statements above are the same 

:str : :%!(:max: 5)'peace'
# Mutable string with a maximum capacity of 5 characters

:str : :%!{: lhs; ... }'peace'
# string with in line constructor. 'lhs' (left hand side) is 'peace'

A mutating method call changes the string in place.

User defined string constructor¶

::A :CustomChar
::A :MutString, {
    ::init {: tok_stream
      # Return an iterator that convert input char to CustomChar
      tok_stream.{: char; ::a @CustomChar, char}
    }
}

:str : :%!@MutString 'peace'
# str contains only 'CustomChars', not built-i chars

Raw String¶

Strings are escaped by default:

\< "name\npassword"

print in 2 lines:

name
password

String defined with \ operator are raw strings and are not escaped:

:%\"name\npassword"
# Immutable string. Prints "name\npassword"

:%!\"name\npassword"
# Mutable string. Prints "name\npassword"

See escape sequences

Multiline String¶

:%{
  this is a 
  multline string
%}

With delimiters¶

Multiline strings can be defined with multiple delimiters:

:%( .. %)
:%[ .. %]
:%{ .. %}
:%< .. %>

Nested strings¶

Nested strings are allowed:

:a : :%{
    ...
%}
:b : :%[
    ...
%]
:c : :%(
    ...
%)

\< (a << b << c)::inspect
#(
Prints 
:%{
    %#1{
        %#2{

        %#2}
    %#1}
%}
#)

Numbers show block level for better readability. Choosing other delimiters is a better option.

String Interpolation¶

String Interpolation allows you to build string by concatenating strings and other values.

:name : 'Alice'
\< 'hello \:name'
# Prints 'Hello Alice'

String interpolation syntax is pretty simple and is guarded by the escape char \:

Without spaces: use \:, like \:var where var is an identifier
With spaces: use the following sequence with quotes to enclose an expression:
- \(' ... '\) like 'Welcome \(' user.name '\), it's\:time', where user.name and time are expresssions,
- \(" ... "\)
- \(` ... `\)

Note that the quotes are required to fastly parse them without ambiguity. Without them it's not possible to know if \) is a string interpolation end, or an escape aperture.

The enclosed expression will have to be explictly converted to a string:

:last_name : 'Simpson'

:name : 'Lisa'
\< 'Welcome \:name' 

:mother : 'Marge'
\< 'Welcome \(' mother '\) \:last_name!'

Formatting¶

To format values, use formating methods starting with aperture .%=

\< 'number: \(' 0xfacade.%=08X

# Prints: "number: 00FACADE"

See String Formatting Methods

Unicode¶

U default character encoding is UTF-8. U can support other encodings through user defined syntax.

For most use cases, you don’t need to worry about characters. But if you need a better control over characters, import the 'characters' Valcon. It contains useful methods to handle encodings and let you process them with confidence.

For example:

:pirate : '🏴‍☠️'  # String with 1 character: The pirate flag
pirate[0]       # Return 1 character
pirate.length   # 1

pirate.%U-code-point       # Return 4 code points: U+1F3F4, U+200D, U+2620, U+FE0F
pirate.%+       # Same as above
                # '+' is to remeber 'U+'
pirate.%+[0]    # Return first code point U+1F3F4
pirate.%+.length# 4

pirate.%bytes   # Return 13 bytes: F0 9F 8F B4 E2 80 8D E2 98 A0 EF B8 8F
pirate.%*       # Same as above
pirate.%*.length# 13

%U8(F0 9F 8F B4 E2 80 8D E2 98 A0 EF B8 8F) # Create '🏴‍☠️'
%U+(1F3F4, 200D, 2620, FE0F) # Create '🏴‍☠️'

See Unicode for more details about the difference between encoding, Unicode, code points, code units, scalars, graphemes...

Encodings Conversion¶

Strings can be converted between Supported Encodings.However, some conversion are not possible.

"An 🍎".%encode-to "ASCII"
# Undefined Conversion Error: UTF-8 -> ASCII

String manipulation¶

U supports common string manipulation. Range indexes can be negative to denote an index from the end of the string like s[0:.:-5]

Most string processing can be done with immutable strings and iterators. However, many mutating methods are already defined. You can just pick the paradigms depending on your task.

Iterate¶

By default, Strings allows you to iterate over characters:

:apple : "An 🍎"

apple.{: char;
  \< char 
  # Prints: 'A', 'n', ' ', and '🍎'
}

You can also iterate over bytes too:

apple.%bytes.{|byte|
  \< byte 
  # '🍎' has 4 bytes: 240, 159, 141, 142
  # Prints: 65('A'), 110('n'), 32(' '), 240, 159, 141, 142
}

Bytes can be formatted for better printing:

\< apple.%hex-fmt
# Prints: 0x416E20F09F8D8E

Inserting¶

The simplest way to insert a character or a sequence of characters at a specific index is the operators <<=, or method insert:

:a : :%!"Sunny day"
\< a.append('!', 6)
# Same as: a.%+<('!', 6)

# Prints "Sunny day!!!"

The arrow shows from which end the operation occurs.

Removing¶

The mutating method remove removes characters at an index or range:

:a : :%!"sunny day"
\< a.%-<(0)
# Prints: "unny day"

a.%-<(0:<.<:5)
# Prints: "day"

# Above statements are the same as 
# (%!"sunny day").remove(0:<.<:6)

The arrow shows from which end the operation occurs.

Replacing¶

The mutating method replace replace characters at an index or range:

:a : :%!"red color"
\< a %<->(4) 'car'
# Prints: "red car"

# Same as (%!"red color").replace 4, 'car'

Slicing¶

In U, you can easily extract any substrings based on a range of indexes.

:a : "green tree"[ :<.>:-5 ]

Pattern Matching¶

The simplest way to check if a substring exists in a string is to call contains method:

:a : "green button"
\< a.%? 'green'
# Prints: true

\< a.%? :%/^green/
# Prints: true

Appending¶

The simple way to append to string is by using operators +=, ++ or by using the append method.

You can append an entire string to the original one:

:name : 'Bob'
:a : :%! "Welcome" ++ " "
a += name
a.append \'!'
\< a  # Prints: 'Welcome Bob!'