Thread 1: Fatal error: Index out of range when index is less then array count - ios

I am getting error Thread 1: Fatal error: Index out of range on
func ReaderConverterCallback(_ converter: AudioConverterRef,
_ packetCount: UnsafeMutablePointer<UInt32>,
_ ioData: UnsafeMutablePointer<AudioBufferList>,
_ outPacketDescriptions: UnsafeMutablePointer<UnsafeMutablePointer<AudioStreamPacketDescription>?>?,
_ context: UnsafeMutableRawPointer?) -> OSStatus {
let reader = Unmanaged<Reader>.fromOpaque(context!).takeUnretainedValue()
//
// Make sure we have a valid source format so we know the data format of the parser's audio packets
//
guard let sourceFormat = reader.parser.dataFormat else {
return ReaderMissingSourceFormatError
}
//
// Check if we've reached the end of the packets. We have two scenarios:
// 1. We've reached the end of the packet data and the file has been completely parsed
// 2. We've reached the end of the data we currently have downloaded, but not the file
//
let packetIndex = Int(reader.currentPacket)
let packets = reader.parser.packets
let isEndOfData = packetIndex >= packets.count - 1
if isEndOfData {
if reader.parser.isParsingComplete {
packetCount.pointee = 0
return ReaderReachedEndOfDataError
} else {
return ReaderNotEnoughDataError
}
}
//
// Copy data over (note we've only processing a single packet of data at a time)
//
let packet = packets[packetIndex] <--------- Thread 1: Fatal error: Index out of range on
var data = packet.0
let dataCount = data.count
ioData.pointee.mNumberBuffers = 1
ioData.pointee.mBuffers.mData = UnsafeMutableRawPointer.allocate(byteCount: dataCount, alignment: 0)
_ = data.withUnsafeMutableBytes { (bytes: UnsafeMutablePointer<UInt8>) in
memcpy((ioData.pointee.mBuffers.mData?.assumingMemoryBound(to: UInt8.self))!, bytes, dataCount)
}
ioData.pointee.mBuffers.mDataByteSize = UInt32(dataCount)
//
// Handle packet descriptions for compressed formats (MP3, AAC, etc)
//
let sourceFormatDescription = sourceFormat.streamDescription.pointee
if sourceFormatDescription.mFormatID != kAudioFormatLinearPCM {
if outPacketDescriptions?.pointee == nil {
outPacketDescriptions?.pointee = UnsafeMutablePointer<AudioStreamPacketDescription>.allocate(capacity: 1)
}
outPacketDescriptions?.pointee?.pointee.mDataByteSize = UInt32(dataCount)
outPacketDescriptions?.pointee?.pointee.mStartOffset = 0
outPacketDescriptions?.pointee?.pointee.mVariableFramesInPacket = 0
}
packetCount.pointee = 1
reader.currentPacket = reader.currentPacket + 1
return noErr;
}
even if there is packetIndex is less then packets.count.
Note: Please compare both question before marking it duplicate. Reference possible duplicate doesn't show that index of array is less than array count.
I am using this https://github.com/syedhali/AudioStreamer/ library for playing audio from url.

It looks like a Multi-Thread problem. According to the printed logs, the index seems ok, but another thread may change the data 'packets', that causes the crash. Please consider locking data-related operations between threads.
Additional analyzation: according to the following lines, packets may be shared between threads.
let reader = Unmanaged<Reader>.fromOpaque(context!).takeUnretainedValue()
//......
let packets = reader.parser.packets
Suggestion: check if somewhere the Unmanaged<Reader> change the parser.packets, and make a lock strategy.

Related

Using AudioKit's AVAudioPCMBuffer normalize function to normalize multiple audio files

I've got an array of audio files that I want to normalize so they all have similar perceived loudness. For testing purposes, I decided to adapt the AVAudioPCMBuffer.normalize method from AudioKit to suit my purposes. See here for implementation: https://github.com/AudioKit/AudioKit/blob/main/Sources/AudioKit/Audio%20Files/AVAudioPCMBuffer%2BProcessing.swift
I am converting each file into an AVAudioPCMBuffer, and then performing a reduce on that array of buffers to get the highest peak across all of the buffers. Then I created a new version of normalize called normalize(with peakAmplitude: Float) -> AVAudioPCMBuffer takes that peak amplitude, calculates a gainFactor and then iterates through the floatData for each channel and multiplies the floatData by the gainFactor. I then call my new flavor of normalize with the peak.amplitude that I get from the reduce operation on all the audio buffers.
This produces useful results, sometimes.
Here's the actual code in question:
extension AVAudioPCMBuffer {
public func normalize(with peakAmplitude: Float) -> AVAudioPCMBuffer {
guard let floatData = floatChannelData else { return self }
let gainFactor: Float = 1 / peakAmplitude
let length: AVAudioFrameCount = frameLength
let channelCount = Int(format.channelCount)
// i is the index in the buffer
for i in 0 ..< Int(length) {
// n is the channel
for n in 0 ..< channelCount {
let sample = floatData[n][i] * gainFactor
self.floatChannelData?[n][i] = sample
}
}
self.frameLength = length
return self
}
}
extension Array where Element == AVAudioPCMBuffer {
public func normalized() -> [AVAudioPCMBuffer] {
var minPeak = AVAudioPCMBuffer.Peak()
minPeak.amplitude = AVAudioPCMBuffer.Peak.min
let maxPeakForAllBuffers: AVAudioPCMBuffer.Peak = reduce(minPeak) { result, buffer in
guard
let currentBufferPeak = buffer.peak(),
currentBufferPeak.amplitude > result.amplitude
else {
return result
}
return currentBufferPeak
}
return map { $0.normalize(with: maxPeakForAllBuffers.amplitude) }
}
}
Three questions:
Is my approach reasonable for multiple files?
This appears to be using "peak normalization" vs RMS or EBU R128 normalization. Is that why when I give it a batch of 3 audio files and 2 of them are correctly made louder that 1 of them is made louder even though ffmpeg-normalize on the same batch of files makes that 1 file significantly quieter?
Any other suggestions on ways to alter the floatData across multiple AVAudioAudioPCMBuffers in order to make them have similar perceived loudness?

Occasional crashes while read from NSDataAsset

Some of my users reported a crash. I'm not sure if I do the right think, so I show the code and the part of the crash report with the error description and maybe someone has a helpful hint for me.
UseCase: A file in the Asset Catalog contains the coordinates of several border lines of countries. There are also some strings in it. This code fragment reads the file and stores the converted data into variables.
SOMETIMES the app crashes (I receive 1 or 2 reports a week) at the last line of the code fragment (let numberOfCountries = Int(dataReadBufferUInt64)). I have a group of beta testers and they have no problem at all. So it is a frustrating issue.
I think it has something to do HOW I read the data out of the file. Maybe I have to use another way?
Any help/comment is welcomed!
DataStructure:
The Asset is a programmatically produced file. The code fragment is dealing with the header of this file.
First element is a copyright string with some informations of the version of the file, creation date etc.
The string has two elements:
1) the length of the string as a UInt16 value (range 0 ... UInt16.max)
2) the string elements (if length > 0), each "character" as one UInt16 value
Second element is the number of the countries in that file. It is a UInt64 value.
The rest of the data structure is quite complex, but is not relevant here, as the file is read sequential and IF it crashes, it crashes when reading / converting the UInt64.
There is only one version of the file "out in the wild". Every app reads this file every time the UserInterface gets build up. This is why I do not understand the OCCASIONAL crashes ...
The code fragment:
// calculate the size of a record, we use "size" as this is the number of bytes used to store on record
// other possibilities:
// .stride = number of bytes used to store one record and the added nul bytes to align to next memory bounds
// .alignment = number of bytes of alignment bounds
let sizeUInt16 : UInt64 = UInt64(MemoryLayout<UInt16>.size)
let sizeUInt64 : UInt64 = UInt64(MemoryLayout<UInt64>.size)
// get the data out of the asset catalog
if let countryBorderLineData = NSDataAsset(name: "CountryBorderLine data", bundle: Bundle.main)?.data {
// the read buffers, one for each expected data type
var dataReadBufferUInt16 : UInt16 = 0
var dataReadBufferUInt64 : UInt64 = 0
// read the string with the entry comment
// read the length of the string
(countryBorderLineData as NSData).getBytes(&dataReadBufferUInt16, range: NSRange(location: nextLocation,
length: Int(sizeUInt16) ))
// advance the pointer
nextLocation += Int(sizeUInt16)
// take the number of items we should read
var numberOfItemsToRead : Int = Int(dataReadBufferUInt16)
// check if this is not an empty string
if numberOfItemsToRead > 0 {
// target buffer of the string
var UTF16Array : [UInt16] = []
// loop to read all content
for _ in 0 ..< numberOfItemsToRead {
// read next string element
(countryBorderLineData as NSData).getBytes(&dataReadBufferUInt16, range: NSRange(location: nextLocation,
length: Int(sizeUInt16) ))
// advance the pointer
nextLocation += Int(sizeUInt16)
// append read string element to the array
UTF16Array.append(dataReadBufferUInt16)
}
// convert the read array into a string
let resultString = String(utf16CodeUnits: UTF16Array, count: UTF16Array.count)
}
// read the number of countries
(countryBorderLineData as NSData).getBytes(&dataReadBufferUInt64, range: NSRange(location: nextLocation,
length: Int(sizeUInt64) ))
// advance the pointer
nextLocation += Int(sizeUInt64)
// This line SOMETIMES crashes (see crash subset of crash report)
let numberOfCountries = Int(dataReadBufferUInt64)
...
}
This part of the crash report shows the error
Date/Time: 2019-08-28 22:00:06.5042 +0200
Launch Time: 2019-08-28 22:00:02.2638 +0200
OS Version: iPhone OS 12.4 (16G77)
Baseband Version: 1.06.02
Report Version: 104
Exception Type: EXC_BAD_ACCESS (SIGSEGV)
Exception Subtype: KERN_INVALID_ADDRESS at 0x8000000000000010
VM Region Info: 0x8000000000000010 is not in any region. Bytes after previous region: 9223372025580486673
REGION TYPE START - END [ VSIZE] PRT/MAX SHRMOD REGION DETAIL
MALLOC_NANO 0000000280000000-00000002a0000000 [512.0M] rw-/rwx SM=PRV
--->
UNUSED SPACE AT END
Termination Signal: Segmentation fault: 11
Termination Reason: Namespace SIGNAL, Code 0xb
Terminating Process: exc handler [22974]
Triggered by Thread: 3
All methods of this class are called within the following GCD queue:
let myQueueForBorderLines : DispatchQueue = DispatchQueue(
label: "appName.myQueueForBorderLines", qos: .userInitiated)
The used data structures etc. are only read and managed by this class methods, so I think this is not a multithreading issue.
UPDATE
With knowing that numberOfBytesToRead in the old code cannot be the cause of the issue, I can just show you the recommended usage of Data as for now.
let sizeUInt16 = MemoryLayout<UInt16>.size
let sizeUInt64 = MemoryLayout<UInt64>.size
// get the data out of the asset catalog
if let countryBorderLineData = NSDataAsset(name: "CountryBorderLine data", bundle: Bundle.main)?.data {
var nextLocation = 0
// the read buffers, one for each expected data type
var dataReadBufferUInt16: UInt16 = 0
var dataReadBufferUInt64: UInt64 = 0
// read the string with the entry comment
// read the length of the string
_ = withUnsafeMutableBytes(of: &dataReadBufferUInt16) {bufPtr in
countryBorderLineData.copyBytes(to: bufPtr, from: nextLocation...)
}
// advance the pointer
nextLocation += Int(sizeUInt16)
// take the number of items we should read
let numberOfItemsToRead = Int(dataReadBufferUInt16)
// check if this is not an empty string
if numberOfItemsToRead > 0 {
// target buffer of the string
var utf16Array: [UInt16] = Array(repeating: 0, count: numberOfItemsToRead)
utf16Array.withUnsafeMutableBufferPointer {bufPtr in
countryBorderLineData.copyBytes(to: bufPtr, from: nextLocation...)
}
// advance the pointer
nextLocation += numberOfItemsToRead * sizeUInt16
// convert the read array into a string
let resultString = String(utf16CodeUnits: utf16Array, count: utf16Array.count)
print(resultString)
}
// read the number of countries
_ = withUnsafeMutableBytes(of: &dataReadBufferUInt64) {bufPtr in
countryBorderLineData.copyBytes(to: bufPtr, from: nextLocation...)
}
// advance the pointer
nextLocation += sizeUInt64
// This line SOMETIMES crashes (see crash subset of crash report)
let numberOfCountries = Int(dataReadBufferUInt64)
//...
}
The implementation of Swift.Data has changed recently, so it might be causing some issues, but the possibility is not high (little more than very unlikely).
If your code runs in the multithreaded context, that may lead SOMETIMES crashes.
Anyway, when you show more context, I will check it and update my answer again.
OLD ANSWER
It depends on how your is organized (you should better show the spec of the NSDataAsset), but your code consumes numberOfBytesToRead * sizeUInt16 bytes, with this loop:
// loop to read all content
for _ in 0 ..< numberOfBytesToRead {
//...
// advance the pointer
nextLocation += Int(sizeUInt16)
//...
}
After this loop, nextLocation may be pointing some unknown position,
which may:
- be exceeding the valid range of your countryBorderLineData
- cause Int(dataReadBufferUInt64) to overflow
- ...
And looping through each UTF-16 code point is not an efficient way to read UTF-16 String.
I would re-write your code as:
let sizeUInt16 = MemoryLayout<UInt16>.size
let sizeUInt64 = MemoryLayout<UInt64>.size
if let countryBorderLineData = data {
var nextLocation = 0
// the read buffers, one for each expected data type
var dataReadBufferUInt16: UInt16 = 0
var dataReadBufferUInt64: UInt64 = 0
// read the string with the entry comment
// read the length of the string
_ = withUnsafeMutableBytes(of: &dataReadBufferUInt16) {bufPtr in
countryBorderLineData.copyBytes(to: bufPtr, from: nextLocation...)
}
// advance the pointer
nextLocation += sizeUInt16
// take the number of bytes we should read
let numberOfBytesToRead = Int(dataReadBufferUInt16)
// check if this is not an empty string
if numberOfBytesToRead > 0 {
assert(numberOfBytesToRead.isMultiple(of: sizeUInt16))
// target buffer of the string
var utf16Array: [UInt16] = Array(repeating: 0, count: numberOfBytesToRead/sizeUInt16)
utf16Array.withUnsafeMutableBufferPointer {bufPtr in
countryBorderLineData.copyBytes(to: bufPtr, from: nextLocation...)
}
// advance the pointer
nextLocation += numberOfBytesToRead
// convert the read array into a string
let resultString = String(utf16CodeUnits: utf16Array, count: utf16Array.count)
print(resultString)
}
// read the number of countries
_ = withUnsafeMutableBytes(of: &dataReadBufferUInt64) {bufPtr in
countryBorderLineData.copyBytes(to: bufPtr, from: nextLocation...)
}
// advance the pointer
nextLocation += sizeUInt64
// This line SOMETIMES crashes (see crash subset of crash report)
let numberOfCountries = Int(dataReadBufferUInt64)
//...
}
My guess might be wrong and the code above would not solve your issue, in such cases, please show more info about your data and I can correct my answer.

Overlapping accesses to "result", but modification requires exclusive access; consider copying to a local variable in xcode 10

open static func PBKDF2(_ password: String, salt: Data,
prf: PRFAlg, rounds: UInt32) throws -> Data {
var result = Data(count:prf.cc.digestLength)
let passwData = password.data(using: String.Encoding.utf8)!
let status = result.withUnsafeMutableBytes { (passwDataBytes: UnsafeMutablePointer<UInt8>) -> CCCryptorStatus in
return CCKeyDerivationPBKDF!(
PBKDFAlgorithm.pbkdf2.rawValue,
(passwData as NSData).bytes,
passwData.count,
(salt as NSData).bytes,
salt.count,
prf.rawValue,
rounds,
passwDataBytes,
result.count)
}
guard status == noErr else { throw CCError(status) }
return result
}
result.withUnsafeMutableBytes is giving an error in Xcode 10, In Xcode 9 it is a warning.
That is a consequence of SE-0176 Enforce Exclusive Access to Memory, which was implemented in Swift 4
and is more strict in Swift 4.2.
The solution is to assign the count to a separate variable:
let count = result.count
let status = result.withUnsafeMutableBytes {
(passwDataBytes: UnsafeMutablePointer<UInt8>) -> CCCryptorStatus in
return CCKeyDerivationPBKDF(..., count)
}
or to capture the count in a capture list:
let status = result.withUnsafeMutableBytes { [ count = result.count ]
(passwDataBytes: UnsafeMutablePointer<UInt8>) -> CCCryptorStatus in
return CCKeyDerivationPBKDF(..., count)
}
See also Overlapping access warning in withUnsafeMutableBytes in the Swift forum
hamishknight:
withUnsafeMutableBytes(_:) is a mutating method on Data, therefore it requires write access to data for the duration of the call. By accessing data.count in the closure, you’re starting a new read on data which conflicts with the current write access.
Joe_Groff:
You could also capture a copy of data or its count in the closure’s capture list: ...
The capture list gets evaluated and the results captured by value when the closure is formed, before the exclusive access begins, avoiding the overlapping access.
and this comment in
the corresponding pull request:
Here an appropriate fix is to copy the count of buffer outside of the closure and pass that to inet_ntop() instead.
Update for Swift 5:
let status = result.withUnsafeMutableBytes { [ count = result.count ]
(bufferPointer) -> CCCryptorStatus in
let passwDataBytes = bufferPointer.bindMemory(to: UInt8.self).baseAddress
return CCKeyDerivationPBKDF(..., passwDataBytes, count)
}

H.264 Decoder not working properly?

I've gone over the code for this decoder for elementary h.264 bitstreams a hundred times, tweaking things along the way, with no luck. When I send the output CMSampleBuffers to an AVSampleBufferDisplayLayer, they don't appear, presumably because there's something wrong with how I'm decoding them.
I get no error messages anywhere; the AVSampleBufferDisplayLayer has no error and "status" is "1" (aka .rendering), CMSampleBufferIsValid() returns "true" on the outputted CMSampleBuffers, and I encounter no errors in my decoder either.
I'm stumped and my hope is that a more experienced developer can catch something that I'm missing.
I input raw bytes here (typealias FrameData = [UInt8])
func interpretRawFrameData(_ frameData: inout FrameData) -> CMSampleBuffer? {
let size = UInt32(frameData.count)
var naluType = frameData[4] & 0x1F
var frame: CMSampleBuffer?
// The start indices for nested packets. Default to 0.
var ppsStartIndex = 0
var frameStartIndex = 0
switch naluType {
// SPS
case 7:
print("===== NALU type SPS")
for i in 4..<40 {
if frameData[i] == 0 && frameData[i+1] == 0 && frameData[i+2] == 0 && frameData[i+3] == 1 {
ppsStartIndex = i
spsSize = i - 4 // Does not include the size of the header
sps = Array(frameData[4..<i])
// Set naluType to the nested PPS packet's NALU type
naluType = frameData[i + 4] & 0x1F
break
}
}
// If nested frame was found, fallthrough
if ppsStartIndex != 0 { fallthrough }
// PPS
case 8:
print("===== NALU type PPS")
for i in ppsStartIndex+4..<ppsStartIndex+34 {
if frameData[i] == 0 && frameData[i+1] == 0 && frameData[i+2] == 0 && frameData[i+3] == 1 {
frameStartIndex = i
ppsSize = i - spsSize - 8 // Does not include the size of the header. Subtract 8 to account for both the SPS and PPS headers
pps = Array(frameData[ppsStartIndex+4..<i])
// Set naluType to the nested packet's NALU type
naluType = frameData[i+4] & 0x1F
break
}
}
// If nested frame was found, fallthrough
if frameStartIndex != 0 { fallthrough }
// IDR frame
case 5:
print("===== NALU type IDR frame");
// Replace start code with size
let adjustedSize = size - UInt32(ppsSize) - UInt32(spsSize) - 8
var blockSize = CFSwapInt32HostToBig(adjustedSize)
memcpy(&frameData[frameStartIndex], &blockSize, 4)
if createFormatDescription() {
frame = decodeFrameData(Array(frameData[frameStartIndex...]))
}
// B/P frame
default:
print("===== NALU type B/P frame");
// Replace start code with size
var blockSize = CFSwapInt32HostToBig(size)
memcpy(&frameData[frameStartIndex], &blockSize, 4)
frame = decodeFrameData(Array(frameData[frameStartIndex...]))
break;
}
return frame != nil ? frame : nil
}
And this is where the actual decoding happens:
private func decodeFrameData(_ frameData: FrameData) -> CMSampleBuffer? {
let bufferPointer = UnsafeMutablePointer<UInt8>(mutating: frameData)
let size = frameData.count
var blockBuffer: CMBlockBuffer?
var status = CMBlockBufferCreateWithMemoryBlock(kCFAllocatorDefault,
bufferPointer,
size,
kCFAllocatorNull,
nil, 0, frameData.count,
0, &blockBuffer)
if status != kCMBlockBufferNoErr { return nil }
var sampleBuffer: CMSampleBuffer?
let sampleSizeArray = [size]
status = CMSampleBufferCreateReady(kCFAllocatorDefault,
blockBuffer,
formatDesc,
1, 0, &sampleTimingInfo,
1, sampleSizeArray,
&sampleBuffer)
if let buffer = sampleBuffer, status == kCMBlockBufferNoErr {
let attachments: CFArray? = CMSampleBufferGetSampleAttachmentsArray(buffer, true)
if let attachmentArray = attachments {
let dic = unsafeBitCast(CFArrayGetValueAtIndex(attachmentArray, 0), to: CFMutableDictionary.self)
let key = Unmanaged.passUnretained(kCMSampleAttachmentKey_DisplayImmediately).toOpaque()
let val = Unmanaged.passUnretained(kCFBooleanTrue).toOpaque()
CFDictionarySetValue(dic,
key,
val)
}
print("===== Successfully created sample buffer")
return buffer
}
return nil
}
Other things to note:
The formatDescription contains the correct information (mediaType = "vide", mediaSubType = "avc1", dimensions = "640x480")
The bitstream I'm decoding always groups the SPS, PPS, and IDR frames together and sends them as one big packet every 20 or so frames. Every other time, an individual B/P frame is sent.
Thanks!
That code was pretty ugly so I decided to touch it up a little bit. Turned out that did the trick. Something must have been wrong in there.
Anyways, here's a working version. It sends the decoded & decompressed frame to its delegate. Ideally, whoever calls interpretRawFrameData would be returned a displayable frame, and I'll work on that, but in the meantime this works too.
https://github.com/philipshen/H264Decoder

What is a safe way to turn streamed (utf8) data into a string?

Suppose I'm a server written in objc/swift. The client is sending me a large amount of data, which is really a large utf8 encoded string. As the server, i have my NSInputStream firing events to say it has data to read. I grab the data and build up a string with it.
However what if the next chunk of data I get falls on an unfortunate position in the utf8 data? Like on a composed character. It seems like it would mess the string up if you try to append a chunk of non compliant utf8 to it.
What is a suitable way to deal with this? I was thinking I could just keep the data as an NSData, but then I don't have anyway to know when the data has finished being received (think HTTP where the length of data is in the header).
Thanks for any ideas.
The tool you probably want to use here is UTF8. It will handle all the state issues for you. See How to cast decrypted UInt8 to String? for a simple example that you can likely adapt.
The major concern in building up a string from UTF-8 data isn't composed characters, but rather multi-byte characters. "LATIN SMALL LETTER A" + "COMBINING GRAVE ACCENT" works fine even if decode each of those characters separately. What doesn't work is gathering the first byte of 你, decoding it, and then appending the decoded second byte. The UTF8 type will handle this for you, though. All you need to do is bridge your NSInputStream to a GeneratorType.
Here's a basic (not fully production-ready) example of what I'm talking about. First, we need a way to convert an NSInputStream into a generator. That's probably the hardest part:
final class StreamGenerator {
static let bufferSize = 1024
let stream: NSInputStream
var buffer = [UInt8](count: StreamGenerator.bufferSize, repeatedValue: 0)
var buffGen = IndexingGenerator<ArraySlice<UInt8>>([])
init(stream: NSInputStream) {
self.stream = stream
stream.open()
}
}
extension StreamGenerator: GeneratorType {
func next() -> UInt8? {
// Check the stream status
switch stream.streamStatus {
case .NotOpen:
assertionFailure("Cannot read unopened stream")
return nil
case .Writing:
preconditionFailure("Impossible status")
case .AtEnd, .Closed, .Error:
return nil // FIXME: May want a closure to post errors
case .Opening, .Open, .Reading:
break
}
// First see if we can feed from our buffer
if let result = buffGen.next() {
return result
}
// Our buffer is empty. Block until there is at least one byte available
let count = stream.read(&buffer, maxLength: buffer.capacity)
if count <= 0 { // FIXME: Probably want a closure or something to handle error cases
stream.close()
return nil
}
buffGen = buffer.prefix(count).generate()
return buffGen.next()
}
}
Calls to next() can block here, so it should not be called on the main queue, but other than that, it's a standard Generator that spits out bytes. (This is also the piece that probably has lots of little corner cases that I'm not handling, so you want to think this through pretty carefully. Still, it's not that complicated.)
With that, creating a UTF-8 decoding generator is almost trivial:
final class UnicodeScalarGenerator<ByteGenerator: GeneratorType where ByteGenerator.Element == UInt8> {
var byteGenerator: ByteGenerator
var utf8 = UTF8()
init(byteGenerator: ByteGenerator) {
self.byteGenerator = byteGenerator
}
}
extension UnicodeScalarGenerator: GeneratorType {
func next() -> UnicodeScalar? {
switch utf8.decode(&byteGenerator) {
case .Result(let scalar): return scalar
case .EmptyInput: return nil
case .Error: return nil // FIXME: Probably want a closure or something to handle error cases
}
}
}
You could of course trivially turn this into a CharacterGenerator instead (using Character(_:UnicodeScalar)).
The last problem is if you want to combine all combining marks, such that "LATIN SMALL LETTER A" followed by "COMBINING GRAVE ACCENT" would always be returned together (rather than as the two characters they are). That's actually a bit trickier than it sounds. First, you'd need to generate Strings, not Characters. And then you'd need a good way to know what all the combining characters are. That's certainly knowable, but I'm having a little trouble deriving a simple algorithm. There's no "combiningMarkCharacterSet" in Cocoa. I'm still thinking about it. Getting something that "mostly works" is easy, but I'm not sure yet how to build it so that it's correct for all of Unicode.
Here's a little sample program to try it out:
let textPath = NSBundle.mainBundle().pathForResource("text.txt", ofType: nil)!
let inputStream = NSInputStream(fileAtPath: textPath)!
inputStream.open()
dispatch_async(dispatch_get_global_queue(0, 0)) {
let streamGen = StreamGenerator(stream: inputStream)
let unicodeGen = UnicodeScalarGenerator(byteGenerator: streamGen)
var string = ""
for c in GeneratorSequence(unicodeGen) {
print(c)
string += String(c)
}
print(string)
}
And a little text to read:
Here is some normalish álfa你好 text
And some Zalgo i̝̲̲̗̹̼n͕͓̘v͇̠͈͕̻̹̫͡o̷͚͍̙͖ke̛̘̜̘͓̖̱̬ composed stuff
And one more line with no newline
(That second line is some Zalgo encoded text, which is nice for testing.)
I haven't done any testing with this in a real blocking situation, like reading from the network, but it should work based on how NSInputStream works (i.e. it should block until there's at least one byte to read, but then should just fill the buffer with whatever's available).
I've made all of this match GeneratorType so that it plugs into other things easily, but error handling might work better if you didn't use GeneratorType and instead created your own protocol with next() throws -> Self.Element instead. Throwing would make it easier to propagate errors up the stack, but would make it harder to plug into for...in loops.
I'm revisiting this question cause I've had the very same problem to solve.
My solution adopts the UTF8.ForwardParser, hence it works with chunks of UInt8 values, keeping around the bytes of a scalar which might be among two consecutive chunks of bytes.
// This class generates chunks of bytes from the given InputStream
final class ChunksGenerator {
static let bSize = 1024
let stream: InputStream
var buffer = Array<UInt8>(repeating: 0, count: bSize)
init(_ stream: InputStream) {
self.stream = stream
self.stream.open()
}
// Pull a chunk of bytes from the stream
func pull() throws -> ArraySlice<UInt8> {
switch stream.streamStatus {
// We've got to read the stream
case .opening: fallthrough
case .open: fallthrough
case .reading: break
// We're either done reading or having an error
case .error: fallthrough
case .atEnd:
stream.close()
if let error = stream.streamError {
throw error
} else {
fallthrough
}
case .closed: fallthrough
case .notOpen: return []
// Let's also address other status
case .writing: fallthrough
#unknown default: preconditionFailure("status: \(stream.streamStatus) not manageable for InputStream")
}
// read from stream in buffer
let length = stream.read(&buffer, maxLength: Self.bSize)
guard
length > 0
else {
// Either stream.read(_&:maxLength:) returned 0 or -1
defer {
stream.close()
}
if length == 0 {
return []
}
throw stream.streamError!
}
return buffer.prefix(length)
}
}
// This Iterator returns Character from an InputStream
struct CharacterParser: IteratorProtocol {
typealias Element = Character
let chunksGenerator: ChunksGenerator
var chunk: Array<UInt8> = []
var chunkIterator: IndexingIterator<Array<UInt8>> = [].makeIterator()
var error: Swift.Error? = nil
var utf8Parser = UTF8.ForwardParser()
init(inputStream: InputStream) {
self.chunksGenerator = ChunksGenerator(inputStream)
self.chunk = _pulledChunk ?? []
self.chunkIterator = chunk.makeIterator()
}
mutating func next() -> Character? {
switch utf8Parser.parseScalar(from: &chunkIterator) {
case .valid(let encoded):
// We've parsed a scalar encoded in UTF8,
// let's decode it and return the Character:
let scalar = UTF8.decode(encoded)
return Character(scalar)
case .emptyInput:
// We've consumed this chunk of bytes,
// let's pull another one from the stream
// and update the iterator underlaying data:
chunk = _pulledChunk ?? []
self.chunkIterator = chunk.makeIterator()
guard
// In case we've pulled an empty one then
// we're done and we return nil
!chunk.isEmpty
else { return nil }
return next()
case .error(length: let length):
// We've gotten a parsing error, therefore
// the suffix of the actual chunk up to the
// length from the parse error contains bytes
// of a potential encoded scalar spanning
// across two chunks:
let remaninigChunk = chunk.suffix(length)
let pulledChunk = _pulledChunk ?? []
chunk = remaninigChunk + pulledChunk
chunkIterator = chunk.makeIterator()
guard
chunk.count > remaninigChunk.count
else {
// No more data could be pulled from the stream:
// this is it, the stream ends with bytes that aren't an UTF8 scalar, thus we set the error and return nil.
self.error = DecodingError.dataCorrupted(DecodingError.Context(codingPath: [], debugDescription: "Parse error. Bytes: \(remaninigChunk) cannot be parsed into a valid UTF8 scalar", underlyingError: self.error))
return nil
}
return next()
}
}
// Attempt to pull a chunk from the stream,
// in case there was an error we set it in this
// iterator and return nil.
private var _pulledChunk: Array<UInt8>? {
mutating get {
do {
let pulled = try chunksGenerator.pull()
return Array(pulled)
} catch let e {
self.error = e
return nil
}
}
}
}

Resources