Swift 5.0.x String Performance

Turns out, in Swift 5.0.x, String(utf8String:) and String(bytes:encoding:) are not “optimised” compared to String(validatingUTF8:).

Which String initializer are you using, specifically?

String(utf8String:) is a Foundation extension and looking at the commit history, it wasn’t optimized to avoid bridging until 5.1 (commit a088e1322495).

String(validatingUTF8:) should be a ton faster in 5.0.x.
— David Smith (@Catfish_Man) May 16, 2019

In other words. In Swift 5.0.x, any String.append(:) calls can potentially take a performance hit.

It’s a bridged NSString, so in the current model either reserveCapacity or the actual append will have to convert to a native Swift String, which will (in 5.0.x) incur O(n) CFStringGetCharacterAtIndex calls for some specific types of NSString.
— David Smith (@Catfish_Man) May 16, 2019

Similarly, any NSTextView.string.append(:) operations will also suffer.

This is what that performance hit looks like in my case, where a DispatchSourceRead is feeding a NSTextView every time there are bytes available to read.

        let fileDescriptor = fileHandleForReading.fileDescriptor
        let readSource = DispatchSource.makeReadSource(fileDescriptor: fileDescriptor, queue: self.queue)
        
        readSource.setEventHandler { [weak readSource = readSource] in
            guard let data = readSource?.data else {
                return
            }
            
            let estimatedBytesAvailableToRead = Int(data)
            
            //before: var buffer = [UInt8](repeating: 0, count: estimatedBytesAvailableToRead)
            var buffer = [CChar](repeating: 0, count: estimatedBytesAvailableToRead)
            let bytesRead = Darwin.read(fileDescriptor, &buffer, estimatedBytesAvailableToRead)
            /*
            Making sure that the buffer ends with a 0 since the bytes read are not guaranteed to.
            `String(validatingUTF8:)` has a requirement that the "cString is A pointer to a null-terminated UTF-8 code sequence."
            */
            buffer.append(0) 
            
            //https://twitter.com/Catfish_Man/status/1128934439096971264
            guard bytesRead > 0, let availableString = String(validatingUTF8: buffer) else {
                return
            }
        
            (completion ?? DispatchQueue.main).async {
                self.output(part: availableString, count: availableString.utf8.count)
            }
        }

Use NSTextView.textStorage.append(:) and convert your String to an NSAttributedString using NSAttributedString(string:).

//before: textView.string.append(output)
textView.textStorage.append(NSAttributedString(string:output))

This is what the performance looks like after the code change.

Couldn’t be more grateful for David Smith responding to my request as well as Marcin Krzyzanowski and Paul Goracke for nudging me to take a look at NSTextStorage.

This had the potential to be a huge time sink.

Swift 5.1 will be released sometime after March 18, 2019.

Update

Started noticing some crashes on String(validatingUTF8:) with a Fatal error: UnsafeMutablePointer.initialize overlapping range when using it with an UnsafePointer<CChar>. Instead using the implementation as defined in the Discussion under the String documentation. Will report back.

Update 2

Re the Fatal error: UnsafeMutablePointer.initialize overlapping range. I hate having bad code around, even if it’s bad sample code. It’s been a learning experience for sure.

String(validatingUTF8:) has a requirement that the “cString is A pointer to a null-terminated UTF-8 code sequence.”. This code on the other hand makes no guarantees that this will be the case.

var buffer = [CChar](repeating: 0, count: estimatedBytesAvailableToRead)
let bytesRead = Darwin.read(fileDescriptor, &buffer, estimatedBytesAvailableToRead)

Sure, it initialises the character array with 0 (i.e. null¹) but read may fill that array and not end with a 0.

Back to the String(validatingUTF8:) initialiser. If you look at the source, it uses UTF8._nullCodeUnitOffset(in:) which “Is an equivalent of strlen for C-strings” which gets the length of the string (based on the presence of a null terminating character of course, we are going deep in C now). I was guessing that strlen takes a trip down memory lane© looking for that null terminating character and ends up way beyond an “acceptable” length. What is acceptable you say?

For that we have to take a look at the source code of UnsafePointer

public func initialize(from source: UnsafePointer<Pointee>, count: Int) {

    _debugPrecondition(count >= 0,
        "UnsafeMutablePointer.initialize with negative count")
    _debugPrecondition(UnsafePointer(self) + count <= source ||  source + count <= UnsafePointer(self),
        "UnsafeMutablePointer.initialize overlapping range")

Builtin.copyArray(
  Pointee.self, self._rawValue, source._rawValue, count._builtinWordValue)
// This builtin is equivalent to:
// for i in 0..<count {
//   (self + i).initialize(to: source[i])
// }
}

So I decided to take the red pill, go down the rabbit hole and see for myself.

for i in 1...100_000 {
    var buffer = [CChar](repeating: 1, count: 4096)
    let string = OverlappingRange(validUTF8:buffer).string
    print(string)
}

Welcome to the Matrix².-

“Remember…all I’m offering is the truth. Nothing more.”

Brought back the memories of learning C++ back at the University on a Borland C++ editor. ↩︎
One run took 2 seconds for the UnsafeMutablePointer.initialize overlapping range but that wasn’t very exciting. ↩︎

Software Engineer

Swift 5.0.x String Performance

Update

Update 2