https://rainsoft.io/mastering-swift-essential-details-about-strings/
Mastering Swift: essential details about strings
String type is an important component of any programming language. The most useful information that user reads from the window of an iOS application is pure text.
To reach a higher number of users,the iOS application must be internationalised and support a lot of modern languages. The Unicode standard solves this problem,but creates additional complexity when working with strings.
On one hand,the language should provide a good balance between the Unicode complexity and the performance when processing strings. On the other hand,it should provide developer with comfortable structures to handle strings.
In my opinion,Swift does a great job on both hands.
Fortunately Swift's string is not a simple sequence of UTF-16 code units,like in JavaScript or Java.
In case of a sequence of UTF-16 code units it's a pain to do Unicode-aware string manipulations: you might break a surrogate pair or combining character sequence.
Swift implements a better approach. The string itself is not a collection,instead it provides views over the string content that may be applied according to situation. And one particular view,String.CharacterView
,is fully Unicode-aware.
Forlet myStr = "Hello,world"
you can access the following string views:
myStr.characters
isString.CharacterView
. Valuable to access graphemes,that visually are rendered as a single symbol. The most used view.myStr.unicodeScalars
isString.UnicodeScalarView
. Valuable to access the Unicode code point numbers as 21-bit integersmyStr.utf16
isString.UTF16View
. Useful to access the code unit values encoded in UTF16myStr.utf8
isString.UTF8View
. Valuable to access the code unit values encoded in UTF8
Most of the time developer deals with simple string characters,without diving into details like encoding or code units.
CharacterView
works nice for most of the string related tasks: iteration over the characters,counting the number of characters,verify substring existence,access by index,different manipulations and so on.
Let's see in more details how these tasks are accomplished in Swift.
1.Character
andCharacterView
structures
String.CharacterView
structure is a view over string content that is a collection ofCharacter
.
To access the view from a string,usecharacters
string property:
let message = "Hello,world" let characters = message.characters print(type(of: characters)) // => "CharacterView"
message.characters
returns theCharacterView
structure.
The character view is a collection ofCharacter
structures. For example,let's access the first character in a string view:
message.characters.first
returns an optional that is the first character"H"
.
The character instance represents a single symbolH
.
In Unicode termsH
isLatin Capital letter H,0)">U+0048code point.
Let's go beyond ASCII and see how Swift handles composite symbols. Such characters are rendered as a single visual symbol,but are composed from a sequence of two or moreUnicode scalars. Strictly such characters are namedgrapheme clusters.
Important:CharacterView
is a collection of grapheme clusters of the string.
Let's take a closer look atç
grapheme. It may be represented in two ways:
- Using
U+00E7
LATIN SMALL LETTER C WITH CEDILLA: rendered asç
- Or using a combining character sequence:
U+0063
LATIN SMALL LETTER Cplus the combining markU+0327
COMBINING CEDILLA. The grapheme is composite:c
+◌̧
=ç
Let's pick the second option and see how Swift handles it:
ç
that is rendered using two Unicode scalarsU+0063
andU+0327
.
Character
structure accepts multiple Unicode scalars as long as they create a single grapheme. If you try to add more graphemes into a singleCharacter
,Swift triggers an error:
Even ifsingleGrapheme
is composed of 3 Unicode scalars,it creates a single graphemeḉ
.
multipleGraphemes
tries to create aCharacter
from 2 Unicode scalars. This creates 2 separated graphemesa
andb
in a singleCharacter
structure,which is not allowed.
2. Iterating over characters in a string
CharacterView
collection conforms toSequence
protocol. This allows to iterate over the view characters in afor-in
loop:
Each character fromweather.characters
is accessed usingfor-in
loop. On every iterationchar
variable is assigned with a character fromweather
string:"r"
,0)">"a",0)">"i"and"n"
.
As an alternative,you can iterate over the characters usingforEach(_:)
method,indicating a closure as the first argument:
The iteration usingforEach(_:)
method is almost the same asfor-in
,only that you cannot usecontinue
orbreak
statements.
To access the index of the current character in the loop,0)">CharacterViewprovides theenumerated()
method. The method returns a sequence of tuples(index,character)
:
enumerated()
method on each iteration returns tuplesindex
variable contains the character index at the current loop step. Correspondinglychar
variable contains the character.
3. Counting characters
Simply usecount
property of theCharacterView
to get the number of characters:
weather.characters.count
contains the number of characters in the string.
Each character in the view holds a grapheme. When an adjacent character (for example acombining mark) is appended to string,you may find thatcount
property is not increased.
It happens because an adjacent character does not create a new grapheme in the string,instead it modifies an existingbase Unicode character. Let's see an example:
Initiallydrink
has 4 characters.
When the combining markU+0301
COMBINING ACUTE ACCENTis appended to string,it modifies the previous base charactere
and creates a new graphemeé
. The propertycount
is not increased,because the number of graphemes is still the same.
4. Accessing character by index
Swift doesn't know about the characters count in the string view until it actually evaluates the graphemes in it. As result a subscript that allows to access the character by an integer index directly does not exist.
You can access the characters by a special typeString.Index
.
If you need to access the first or last characters in the string,the character view structure hasfirst
andlast
properties:
Notice thatlast
properties are optional typeCharacter?
.
In the empty stringempty
these properties arenil
.
To get a character at specific position,you have to useString.Index
type (actually an alias ofString.CharacterView.Index
). String offers a subscript that acceptsString.Index
to access the character,as well as pre-defined indexesmyString.startIndex
andmyString.endIndex
.
Using string index type,let's access the first and last characters:
color.startIndex
is the first character index,socolor[startIndex]
evaluates tog
.
color.endIndex
indicates thepast the endposition,or simply the position one greater than the last valid subscript argument. To access the last character,you must calculate the index right before string's end index:color.index(before: color.endIndex)
.
To access characters at position by an offset,use theoffsetBy
argument ofindex(theIndex,offsetBy: theOffset)
method:
Indicating theoffsetBy
argument,you can access the character at specific offset.
Of courseoffsetBy
argument is jumping over string graphemes,i.e. the offset applies overCharacter
instances of string'sCharacterView
.
If the index is out of range,Swift generates an error:
To prevent such situations,indicate an additional argumentlimitedBy
to limit the offset:limitedBy: theLimit)
. The function returns an optional,which isnil
for out of bounds index:
oops
is an optionalString.Index?
. The optional unwrap verifies whether the index didn't jump out of the string.
5. Checking substring existence
The simplest way to verify the substring existence is to callcontains(_ other: String)
string method:
animal.contains("rabbit")
returnstrue
becauseanimal
contains"rabbit"
substring.
Correspondinglyanimal.contains("cat")
evaluates tofalse
for a non-existing substring.
To verify whether the string has specific prefix or suffix,the methodshasPrefix(_:)
andhasSuffix(_:)
are available. Let's use them in an example:
"rabbit"
is a suffix of"white rabbit"
. So the corresponding method callsanimal.hasPrefix("white")
andanimal.hasSuffix("rabbit")
returntrue
.
When you need to search for a particular character,it makes sense to query directly the character view. For example:
contains(_:)
verifies whether the character view has a particular character.
The second function form accepts a closure:contains(where predicate: (Character) -> Bool)
and performs the same verification.
6. String manipulation
The string in Swift is avalue type. Whether you pass a string as an argument on function call,assign it to a variable or constant - every time acopyof the original string is created.
A mutating method call changes the string in place.
This chapter covers the common manipulations over strings.
Append to string a character or another string
The simplest way to append to string is+=
operator. You can append an entire string to original one:
String structure provides a mutating methodappend()
. The method accepts a string,a character or even a sequence of characters,and appends it to the original string. For instance:
Extract a substring from string
The methodsubstring()
allows to extract substrings:
- from a specific index up to the end of string
- from the the start up to a specific index
- or based on a range of indexes.
Let's see how it works:
The string subscript accepts a range or closed range of string indexes. This helps extracting substrings based on ranges of indexes:
Insert into string
The string type provides the mutating methodinsert()
. The method allows to insert a character or a sequence of characters at specific index.
The new character or sequence is inserted before the element currently at the specified index.
See the following sample:
Remove from string
The mutating methodremove(at:)
removes the character at an index:
You can remove characters in the string that are in a range of indexes usingremoveSubrange(_:)
:
Replace in string
The methodreplaceSubrange(_:with:)
accepts a range of indexes that should be replaced with a particular string. The method is mutating the string.
Let's see a sample:
The character view mutation alternative
Many of string manipulations described above may be applied directly on string's character view.
It is a good alternative if you find more comfortable to work directly with a collection of characters.
For example you can remove characters at specific index,or directly the first or last characters:
To reverse a word usereversed()
method of the character view:
You can easily filter the string:
Map the string content by applying a transformer closure:
Or reduce the string content to an accumulator value:
7. Final words
At first sight,the idea of different types of views over string's content may seem overcomplicated.
In my opinion it is a great implementation. Strings can be viewed in different angles: as a collection of graphemes,UTF-8 or UTF-16 code units or simple Unicode scalars.
Just pick the view depending on your task. In most of the cases it isCharacterView
.
The character view deals with graphemes that may be compound from one or more Unicode scalars. As result the string cannot be integer indexed (like arrays). Instead a special type of index is applicable:String.Index
.
Special index type adds a bit of complexity when accessing individual characters or manipulating strings. I agree to pay this price,because having truly Unicode-aware operations on strings is awesome!
Do you find string views comfortable to use? Write a comment bellow and let's discuss!
P.S.You might be interested to read mydetailed overview of array and dictionary literals in Swift.