U support strings with three quotes:
- single quote
'...'
- double quote
"..."
- backtick quote
`...`
All above statements create the same string.
Characters¶
Character literal have the following syntax \'c'
for character c
.
See Characters.
Extended String Syntax¶
Strings are so ubiquitous that U has a special aperture for string processing: .%
.
'a'.%ascii # Convert to ascii
'a'.%ascii:code # Means get ascii code
'a'.%U:scalars # Means get utf8 scalars => [...]
In U, a built-in string is a read-only array of bytes encoded in UTF-8 (See Unicode). String values are immutable. You cannot modify its elements:
:text : 'peace'
text[0] : \'A' # Error
text[2:<.<:-1]) # Return slice 'ace'
::mut :name : 'bob'
name[0] : \'B' # Ok, 'name' is mutable
To use mutable string, use keyword ::mut
or string constructor %!
.
The aperture %
remind you that U need to parse content to build a Value: a string, a regexp, or a Valcon.
String Constructor¶
Strings constructor can be defined with apertures:
%
: for immutable string%!
: for mutable string
Unlike others languages, in U, string constructors are created at the lexer level. All operations are done directly on input characters. It allows an efficient parsing process by avoiding:
- expensive memory access and,
- useless processing.
For example, to delete spaces at the beginning and the end of a string (to strip it). In most languages, you write:
Python compiler, like most compilers, performs the following steps:- allocate a string
S
in memory containing 10 chars: 9 chars + '\0' => 10 chars in memory, - create or use a working buffer
B
- execute string method 'strip' from
S
intoB
Four spaces around 'hello' have been added to S
in step 1, and discarded in B
in step 3. U avoids the allocation of B
and the processing of 4 bytes.
In U, operations are done before string creations:
U will:
- meet aperture
:%
, - look for a string constructor
C
with 'strip' function - allocate a string
S
- constructor
C
withS
will meet the following characters:- 2 space: discard
- 5 chars: keep
- 2 space: discard
Only useful characters are kept. This kind of optimization could make a huge difference when processing large strings like web pages.
Immutable String Constructor¶
:str : :%"peace"
# => Same as :str : 'peace'
:str : :%(max: 5)'peace'
# Immutable string with a maximum capacity of 5 characters
:str : :%{: lhs; ... }'peace'
# Immutable string with in line constructor. 'lhs' (left hand side) is 'peace'
Note that whether you pass an immutable string as an argument on function call, assign it to a variable or constant, every time a copy of the original string is created unless specified otherwise.
Mutable String Constructor¶
:str : :%!'peace'
::mut :str : 'peace'
# Mutable strings: all statements above are the same
:str : :%!(:max: 5)'peace'
# Mutable string with a maximum capacity of 5 characters
:str : :%!{: lhs; ... }'peace'
# string with in line constructor. 'lhs' (left hand side) is 'peace'
A mutating method call changes the string in place.
User defined string constructor¶
::A :CustomChar
::A :MutString, {
::init {: tok_stream
# Return an iterator that convert input char to CustomChar
tok_stream.{: char; ::a @CustomChar, char}
}
}
:str : :%!@MutString 'peace'
# str contains only 'CustomChars', not built-i chars
Raw String¶
Strings are escaped by default:
print in 2 lines:
String defined with \
operator are raw strings and are not escaped:
:%\"name\npassword"
# Immutable string. Prints "name\npassword"
:%!\"name\npassword"
# Mutable string. Prints "name\npassword"
See escape sequences
Multiline String¶
With delimiters¶
Multiline strings can be defined with multiple delimiters:
:%(
..%)
:%[
..%]
:%{
..%}
:%<
..%>
Nested strings¶
Nested strings are allowed:
:a : :%{
...
%}
:b : :%[
...
%]
:c : :%(
...
%)
\< (a << b << c)::inspect
#(
Prints
:%{
%#1{
%#2{
%#2}
%#1}
%}
#)
Numbers show block level for better readability. Choosing other delimiters is a better option.
String Interpolation¶
String Interpolation allows you to build string by concatenating strings and other values.
String interpolation syntax is pretty simple and is guarded by the escape char \
:
- Without spaces: use
\:
, like\:var
wherevar
is an identifier - With spaces: use the following sequence with quotes to enclose an expression:
\(' ... '\)
like'Welcome \(' user.name '\), it's\:time'
, whereuser.name
andtime
are expresssions,\(" ... "\)
\(` ... `\)
Note that the quotes are required to fastly parse them without ambiguity. Without them it's not possible to know if \)
is a string interpolation end, or an escape aperture.
The enclosed expression will have to be explictly converted to a string:
:last_name : 'Simpson'
:name : 'Lisa'
\< 'Welcome \:name'
:mother : 'Marge'
\< 'Welcome \(' mother '\) \:last_name!'
Formatting¶
To format values, use formating methods starting with aperture .%=
Unicode¶
U default character encoding is UTF-8. U can support other encodings through user defined syntax.
For most use cases, you donβt need to worry about characters. But if you need a better control over characters, import the 'characters' Valcon. It contains useful methods to handle encodings and let you process them with confidence.
For example:
:pirate : 'π΄ββ οΈ' # String with 1 character: The pirate flag
pirate[0] # Return 1 character
pirate.length # 1
pirate.%U-code-point # Return 4 code points: U+1F3F4, U+200D, U+2620, U+FE0F
pirate.%+ # Same as above
# '+' is to remeber 'U+'
pirate.%+[0] # Return first code point U+1F3F4
pirate.%+.length# 4
pirate.%bytes # Return 13 bytes: F0 9F 8F B4 E2 80 8D E2 98 A0 EF B8 8F
pirate.%* # Same as above
pirate.%*.length# 13
%U8(F0 9F 8F B4 E2 80 8D E2 98 A0 EF B8 8F) # Create 'π΄ββ οΈ'
%U+(1F3F4, 200D, 2620, FE0F) # Create 'π΄ββ οΈ'
See Unicode for more details about the difference between encoding, Unicode, code points, code units, scalars, graphemes...
Encodings Conversion¶
Strings can be converted between Supported Encodings.However, some conversion are not possible.
String manipulation¶
U supports common string manipulation. Range indexes can be negative to denote an index from the end of the string like s[0:.:-5]
Most string processing can be done with immutable strings and iterators. However, many mutating methods are already defined. You can just pick the paradigms depending on your task.
Iterate¶
By default, Strings allows you to iterate over characters:
You can also iterate over bytes too:
apple.%bytes.{|byte|
\< byte
# 'π' has 4 bytes: 240, 159, 141, 142
# Prints: 65('A'), 110('n'), 32(' '), 240, 159, 141, 142
}
Bytes can be formatted for better printing:
Inserting¶
The simplest way to insert a character or a sequence of characters at a specific index is the operators <<=
, or method insert
:
The arrow shows from which end the operation occurs.
Removing¶
The mutating method remove
removes characters at an index or range:
:a : :%!"sunny day"
\< a.%-<(0)
# Prints: "unny day"
a.%-<(0:<.<:5)
# Prints: "day"
# Above statements are the same as
# (%!"sunny day").remove(0:<.<:6)
The arrow shows from which end the operation occurs.
Replacing¶
The mutating method replace
replace characters at an index or range:
:a : :%!"red color"
\< a %<->(4) 'car'
# Prints: "red car"
# Same as (%!"red color").replace 4, 'car'
Slicing¶
In U, you can easily extract any substrings based on a range of indexes.
Pattern Matching¶
The simplest way to check if a substring exists in a string is to call contains
method:
Appending¶
The simple way to append to string is by using operators +=
, ++
or by using the append
method.
You can append an entire string to the original one: