How to efficiently write large files to disk on background thread (Swift) - ios

Update
I have resolved and removed the distracting error. Please read the entire post and feel free to leave comments if any questions remain.
Background
I am attempting to write relatively large files (video) to disk on iOS using Swift 2.0, GCD, and a completion handler. I would like to know if there is a more efficient way to perform this task. The task needs to be done without blocking the Main UI, while using completion logic, and also ensuring that the operation happens as quickly as possible. I have custom objects with an NSData property so I am currently experimenting using an extension on NSData. As an example an alternate solution might include using NSFilehandle or NSStreams coupled with some form of thread safe behavior that results in much faster throughput than the NSData writeToURL function on which I base the current solution.
What's wrong with NSData Anyway?
Please note the following discussion taken from the NSData Class Reference, (Saving Data). I do perform writes to my temp directory however the main reason that I am having an issue is that I can see a noticeable lag in the UI when dealing with large files. This lag is precisely because NSData is not asynchronous (and Apple Docs note that atomic writes can cause performance issues on "large" files ~ > 1mb). So when dealing with large files one is at the mercy of whatever internal mechanism is at work within the NSData methods.
I did some more digging and found this info from Apple..."This method is ideal for converting data:// URLs to NSData objects, and can also be used for reading short files synchronously. If you need to read potentially large files, use inputStreamWithURL: to open a stream, then read the file a piece at a time." (NSData Class Reference, Objective-C, +dataWithContentsOfURL). This info seems to imply that I could try using streams to write the file out on a background thread if moving the writeToURL to the background thread (as suggested by #jtbandes) is not sufficient.
The NSData class and its subclasses provide methods to quickly and
easily save their contents to disk. To minimize the risk of data loss,
these methods provide the option of saving the data atomically. Atomic
writes guarantee that the data is either saved in its entirety, or it
fails completely. The atomic write begins by writing the data to a
temporary file. If this write succeeds, then the method moves the
temporary file to its final location.
While atomic write operations minimize the risk of data loss due to
corrupt or partially-written files, they may not be appropriate when
writing to a temporary directory, the user’s home directory or other
publicly accessible directories. Any time you work with a publicly
accessible file, you should treat that file as an untrusted and
potentially dangerous resource. An attacker may compromise or corrupt
these files. The attacker can also replace the files with hard or
symbolic links, causing your write operations to overwrite or corrupt
other system resources.
Avoid using the writeToURL:atomically: method (and the related
methods) when working inside a publicly accessible directory. Instead
initialize an NSFileHandle object with an existing file descriptor and
use the NSFileHandle methods to securely write the file.
Other Alternatives
One article on Concurrent Programming at objc.io provides interesting options on "Advanced: File I/O in the Background". Some of the options involve use of an InputStream as well. Apple also has some older references to reading and writing files asynchronously. I am posting this question in anticipation of Swift alternatives.
Example of an appropriate answer
Here is an example of an appropriate answer that might satisfy this type of question. (Taken for the Stream Programming Guide, Writing To Output Streams)
Using an NSOutputStream instance to write to an output stream requires several steps:
Create and initialize an instance of NSOutputStream with a
repository for the written data. Also set a delegate.
Schedule the
stream object on a run loop and open the stream.
Handle the events
that the stream object reports to its delegate.
If the stream object
has written data to memory, obtain the data by requesting the
NSStreamDataWrittenToMemoryStreamKey property.
When there is no more
data to write, dispose of the stream object.
I am looking for the most proficient algorithm that applies to writing
extremely large files to iOS using Swift, APIs, or possibly even
C/ObjC would suffice. I can transpose the algorithm into appropriate
Swift compatible constructs.
Nota Bene
I understand the informational error below. It is included for completeness. This
question is asking whether or not there is a better algorithm to use
for writing large files to disk with a guaranteed dependency sequence (e.g. NSOperation dependencies). If there is
please provide enough information (description/sample for me to
reconstruct pertinent Swift 2.0 compatible code). Please advise if I am
missing any information that would help answer the question.
Note on the extension
I've added a completion handler to the base writeToURL to ensure that
no unintended resource sharing occurs. My dependent tasks that use the file
should never face a race condition.
extension NSData {
func writeToURL(named:String, completion: (result: Bool, url:NSURL?) -> Void) {
let filePath = NSTemporaryDirectory() + named
//var success:Bool = false
let tmpURL = NSURL( fileURLWithPath: filePath )
weak var weakSelf = self
dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0), {
//write to URL atomically
if weakSelf!.writeToURL(tmpURL, atomically: true) {
if NSFileManager.defaultManager().fileExistsAtPath( filePath ) {
completion(result: true, url:tmpURL)
} else {
completion (result: false, url:tmpURL)
}
}
})
}
}
This method is used to process the custom objects data from a controller using:
var items = [AnyObject]()
if let video = myCustomClass.data {
//video is of type NSData
video.writeToURL("shared.mp4", completion: { (result, url) -> Void in
if result {
items.append(url!)
if items.count > 0 {
let sharedActivityView = UIActivityViewController(activityItems: items, applicationActivities: nil)
self.presentViewController(sharedActivityView, animated: true) { () -> Void in
//finished
}
}
}
})
}
Conclusion
The Apple Docs on Core Data Performance provide some good advice on dealing with memory pressure and managing BLOBs. This is really one heck of an article with a lot of clues to behavior and how to moderate the issue of large files within your app. Now although it is specific to Core Data and not files, the warning on atomic writing does tell me that I ought to implement methods that write atomically with great care.
With large files, the only safe way to manage writing seems to be adding in a completion handler (to the write method) and showing an activity view on the main thread. Whether one does that with a stream or by modifying an existing API to add completion logic is up to the reader. I've done both in the past and am in the midst of testing for best performance.
Until then, I'm changing the solution to remove all binary data properties from Core Data and replacing them with strings to hold asset URLs on disk. I am also leveraging the built in functionality from Assets Library and PHAsset to grab and store all related asset URLs. When or if I need to copy any assets I will use standard API methods (export methods on PHAsset/Asset Library) with completion handlers to notify user of finished state on the main thread.
(Really useful snippets from the Core Data Performance article)
Reducing Memory Overhead
It is sometimes the case that you want to use managed objects on a
temporary basis, for example to calculate an average value for a
particular attribute. This causes your object graph, and memory
consumption, to grow. You can reduce the memory overhead by
re-faulting individual managed objects that you no longer need, or you
can reset a managed object context to clear an entire object graph.
You can also use patterns that apply to Cocoa programming in general.
You can re-fault an individual managed object using
NSManagedObjectContext’s refreshObject:mergeChanges: method. This has
the effect of clearing its in-memory property values thereby reducing
its memory overhead. (Note that this is not the same as setting the
property values to nil—the values will be retrieved on demand if the
fault is fired—see Faulting and Uniquing.)
When you create a fetch request you can set includesPropertyValues to NO > to reduce memory overhead by avoiding creation of objects to represent the property values. You should typically only do so, however, if you are sure that either you will not need the actual property data or you already have the information in the row cache, otherwise you will incur multiple
trips to the persistent store.
You can use the reset method of NSManagedObjectContext to remove all managed objects associated with a context and "start over" as if you'd just created it. Note that any managed object associated with that context will be invalidated, and so you will need to discard any references to and re-fetch any objects associated with that context in which you are still interested. If you iterate over a lot of objects, you may need to use local autorelease pool blocks to ensure temporary objects are deallocated as soon as possible.
If you do not intend to use Core Data’s undo functionality,
you can reduce your application's resource requirements by setting the
context’s undo manager to nil. This may be especially beneficial for
background worker threads, as well as for large import or batch
operations.
Finally, Core Data does not by default keep strong
references to managed objects (unless they have unsaved changes). If
you have lots of objects in memory, you should determine the owning
references. Managed objects maintain strong references to each other
through relationships, which can easily create strong reference
cycles. You can break cycles by re-faulting objects (again by using
the refreshObject:mergeChanges: method of NSManagedObjectContext).
Large Data Objects (BLOBs)
If your application uses large BLOBs ("Binary Large OBjects" such as
image and sound data), you need to take care to minimize overheads.
The exact definition of “small”, “modest”, and “large” is fluid and
depends on an application’s usage. A loose rule of thumb is that
objects in the order of kilobytes in size are of a “modest” sized and
those in the order of megabytes in size are “large” sized. Some
developers have achieved good performance with 10MB BLOBs in a
database. On the other hand, if an application has millions of rows in
a table, even 128 bytes might be a "modest" sized CLOB (Character
Large OBject) that needs to be normalized into a separate table.
In general, if you need to store BLOBs in a persistent store, you
should use an SQLite store. The XML and binary stores require that the
whole object graph reside in memory, and store writes are atomic (see
Persistent Store Features) which means that they do not efficiently
deal with large data objects. SQLite can scale to handle extremely
large databases. Properly used, SQLite provides good performance for
databases up to 100GB, and a single row can hold up to 1GB (although
of course reading 1GB of data into memory is an expensive operation no
matter how efficient the repository).
A BLOB often represents an attribute of an entity—for example, a
photograph might be an attribute of an Employee entity. For small to
modest sized BLOBs (and CLOBs), you should create a separate entity
for the data and create a to-one relationship in place of the
attribute. For example, you might create Employee and Photograph
entities with a one-to-one relationship between them, where the
relationship from Employee to Photograph replaces the Employee's
photograph attribute. This pattern maximizes the benefits of object
faulting (see Faulting and Uniquing). Any given photograph is only
retrieved if it is actually needed (if the relationship is traversed).
It is better, however, if you are able to store BLOBs as resources on
the filesystem, and to maintain links (such as URLs or paths) to those
resources. You can then load a BLOB as and when necessary.
Note:
I've moved the logic below into the completion handler (see the code
above) and I no longer see any error. As mentioned before this
question is about whether or not there is a more performant way to
process large files in iOS using Swift.
When attempting to process the resulting items array to pass to a UIActvityViewController, using the following logic:
if items.count > 0 {
let sharedActivityView = UIActivityViewController(activityItems: items, applicationActivities: nil)
self.presentViewController(sharedActivityView, animated: true) { () -> Void in
//finished}
}
I am seeing the following error: Communications error: { count = 1,
contents = "XPCErrorDescription" => { length =
22, contents = "Connection interrupted" } }> (please note, I am looking for a better design, not an answer to this error message)

Performance depends wether or not the data fits in RAM. If it does, then you should use NSData writeToURL with the atomically feature turned on, which is what you're doing.
Apple's notes about this being dangerous when "writing to a public directory" are completely irrelevant on iOS because there are no public directories. That section only applies to OS X. And frankly it's not really important there either.
So, the code you've written is as efficient as possible as long as the video fits in RAM (about 100MB would be a safe limit).
For files that don't fit in RAM, you need to use a stream or your app will crash while holding the video in memory. To download a large video from a server and write it to disk, you should use NSURLSessionDownloadTask.
In general, streaming (including NSURLSessionDownloadTask) will be orders of magnitude slower than NSData.writeToURL(). So don't use a stream unless you need to. All operations on NSData are extremely fast, it is perfectly capable of dealing with files that are multiple terabytes in size with excellent performance on OS X (iOS obviously can't have files that large, but it's the same class with the same performance).
There are a few issues in your code.
This is wrong:
let filePath = NSTemporaryDirectory() + named
Instead always do:
let filePath = NSTemporaryDirectory().stringByAppendingPathComponent(named)
But that's not ideal either, you should avoid using paths (they are buggy and slow). Instead use a URL like this:
let tmpDir = NSURL(fileURLWithPath: NSTemporaryDirectory())!
let fileURL = tmpDir.URLByAppendingPathComponent(named)
Also, you're using a path to check if the file exists... don't do this:
if NSFileManager.defaultManager().fileExistsAtPath( filePath ) {
Instead use NSURL to check if it exists:
if fileURL.checkResourceIsReachableAndReturnError(nil) {

Latest Solution (2018)
Another useful possibility might include the use of a closure whenever the buffer is filled (or if you've used a timed length of recording) to append the data and also to announce the end of the stream of data. In combination with some of the Photo APIs this could lead to good outcomes. So some declarative code like below could be fired during processing:
var dataSpoolingFinished: ((URL?, Error?) -> Void)?
var dataSpooling: ((Data?, Error?) -> Void)?
Handling these closures in your management object may allow you to succinctly handle data of any size while keeping the memory under control.
Couple that idea with the use of a recursive method that aggregates pieces of work into a single dispatch_group and there could be some exciting possibilities.
Apple docs state:
DispatchGroup allows for aggregate synchronization of work. You can
use them to submit multiple different work items and track when they
all complete, even though they might run on different queues. This
behavior can be helpful when progress can’t be made until all of the
specified tasks are complete.
Other Noteworthy Solutions (~2016)
I have no doubt that I will refine this some more but the topic is complex enough to warrant a separate self-answer. I decided to take some advice from the other answers and leverage the NSStream subclasses. This solution is based on an Obj-C sample (NSInputStream inputStreamWithURL example ios, 2013, May 12) posted over on the SampleCodeBank blog.
Apple documentation notes that with an NSStream subclass you do NOT have to load all data into memory at once. That is the key to being able to manage multimedia files of any size (not exceeding available disk or RAM space).
NSStream is an abstract class for objects representing streams. Its
interface is common to all Cocoa stream classes, including its
concrete subclasses NSInputStream and NSOutputStream.
NSStream objects provide an easy way to read and write data to and
from a variety of media in a device-independent way. You can create
stream objects for data located in memory, in a file, or on a network
(using sockets), and you can use stream objects without loading all of
the data into memory at once.
File System Programming Guide
Apple's Processing an Entire File Linearly Using Streams article in the FSPG also provided the notion that NSInputStream and NSOutputStream should be inherently thread safe.
Further Refinements
This object doesn't use stream delegation methods. Plenty of room for other refinements as well but this is the basic approach I will take. The main focus on the iPhone is enabling the large file management while constraining the memory via a buffer (TBD - Leverage the outputStream in-memory buffer). To be clear, Apple does mention that their convenience functions that writeToURL are only for smaller file sizes (but makes me wonder why they don't take care of the larger files - These are not edge cases, note - will file question as a bug).
Conclusion
I will have to test further for integrating on a background thread as I don't want to interfere with any NSStream internal queuing. I have some other objects that use similar ideas to manage extremely large data files over the wire. The best method is to keep file sizes as small as possible in iOS to conserve memory and prevent app crashes. The APIs are built with these constraints in mind (which is why attempting unlimited video is not a good idea), so I will have to adapt expectations overall.
(Gist Source, Check gist for latest changes)
import Foundation
import Darwin.Mach.mach_time
class MNGStreamReaderWriter:NSObject {
var copyOutput:NSOutputStream?
var fileInput:NSInputStream?
var outputStream:NSOutputStream? = NSOutputStream(toMemory: ())
var urlInput:NSURL?
convenience init(srcURL:NSURL, targetURL:NSURL) {
self.init()
self.fileInput = NSInputStream(URL: srcURL)
self.copyOutput = NSOutputStream(URL: targetURL, append: false)
self.urlInput = srcURL
}
func copyFileURLToURL(destURL:NSURL, withProgressBlock block: (fileSize:Double,percent:Double,estimatedTimeRemaining:Double) -> ()){
guard let copyOutput = self.copyOutput, let fileInput = self.fileInput, let urlInput = self.urlInput else { return }
let fileSize = sizeOfInputFile(urlInput)
let bufferSize = 4096
let buffer = UnsafeMutablePointer<UInt8>.alloc(bufferSize)
var bytesToWrite = 0
var bytesWritten = 0
var counter = 0
var copySize = 0
fileInput.open()
copyOutput.open()
//start time
let time0 = mach_absolute_time()
while fileInput.hasBytesAvailable {
repeat {
bytesToWrite = fileInput.read(buffer, maxLength: bufferSize)
bytesWritten = copyOutput.write(buffer, maxLength: bufferSize)
//check for errors
if bytesToWrite < 0 {
print(fileInput.streamStatus.rawValue)
}
if bytesWritten == -1 {
print(copyOutput.streamStatus.rawValue)
}
//move read pointer to next section
bytesToWrite -= bytesWritten
copySize += bytesWritten
if bytesToWrite > 0 {
//move block of memory
memmove(buffer, buffer + bytesWritten, bytesToWrite)
}
} while bytesToWrite > 0
if fileSize != nil && (++counter % 10 == 0) {
//passback a progress tuple
let percent = Double(copySize/fileSize!)
let time1 = mach_absolute_time()
let elapsed = Double (time1 - time0)/Double(NSEC_PER_SEC)
let estTimeLeft = ((1 - percent) / percent) * elapsed
block(fileSize: Double(copySize), percent: percent, estimatedTimeRemaining: estTimeLeft)
}
}
//send final progress tuple
block(fileSize: Double(copySize), percent: 1, estimatedTimeRemaining: 0)
//close streams
if fileInput.streamStatus == .AtEnd {
fileInput.close()
}
if copyOutput.streamStatus != .Writing && copyOutput.streamStatus != .Error {
copyOutput.close()
}
}
func sizeOfInputFile(src:NSURL) -> Int? {
do {
let fileSize = try NSFileManager.defaultManager().attributesOfItemAtPath(src.path!)
return fileSize["fileSize"] as? Int
} catch let inputFileError as NSError {
print(inputFileError.localizedDescription,inputFileError.localizedRecoverySuggestion)
}
return nil
}
}
Delegation
Here's a similar object that I rewrote from an article on Advanced File I/O in the background, Eidhof,C., ObjC.io). With just a few tweaks this could be made to emulate the behavior above. Simply redirect the data to an NSOutputStream in the processDataChunk method.
(Gist Source - Check gist for latest changes)
import Foundation
class MNGStreamReader: NSObject, NSStreamDelegate {
var callback: ((lineNumber: UInt , stringValue: String) -> ())?
var completion: ((Int) -> Void)?
var fileURL:NSURL?
var inputData:NSData?
var inputStream: NSInputStream?
var lineNumber:UInt = 0
var queue:NSOperationQueue?
var remainder:NSMutableData?
var delimiter:NSData?
//var reader:NSInputStreamReader?
func enumerateLinesWithBlock(block: (UInt, String)->() , completionHandler completion:(numberOfLines:Int) -> Void ) {
if self.queue == nil {
self.queue = NSOperationQueue()
self.queue!.maxConcurrentOperationCount = 1
}
assert(self.queue!.maxConcurrentOperationCount == 1, "Queue can't be concurrent.")
assert(self.inputStream == nil, "Cannot process multiple input streams in parallel")
self.callback = block
self.completion = completion
if self.fileURL != nil {
self.inputStream = NSInputStream(URL: self.fileURL!)
} else if self.inputData != nil {
self.inputStream = NSInputStream(data: self.inputData!)
}
self.inputStream!.delegate = self
self.inputStream!.scheduleInRunLoop(NSRunLoop.currentRunLoop(), forMode: NSDefaultRunLoopMode)
self.inputStream!.open()
}
convenience init? (withData inbound:NSData) {
self.init()
self.inputData = inbound
self.delimiter = "\n".dataUsingEncoding(NSUTF8StringEncoding)
}
convenience init? (withFileAtURL fileURL: NSURL) {
guard !fileURL.fileURL else { return nil }
self.init()
self.fileURL = fileURL
self.delimiter = "\n".dataUsingEncoding(NSUTF8StringEncoding)
}
#objc func stream(aStream: NSStream, handleEvent eventCode: NSStreamEvent){
switch eventCode {
case NSStreamEvent.OpenCompleted:
fallthrough
case NSStreamEvent.EndEncountered:
self.emitLineWithData(self.remainder!)
self.remainder = nil
self.inputStream!.close()
self.inputStream = nil
self.queue!.addOperationWithBlock({ () -> Void in
self.completion!(Int(self.lineNumber) + 1)
})
break
case NSStreamEvent.ErrorOccurred:
NSLog("error")
break
case NSStreamEvent.HasSpaceAvailable:
NSLog("HasSpaceAvailable")
break
case NSStreamEvent.HasBytesAvailable:
NSLog("HasBytesAvaible")
if let buffer = NSMutableData(capacity: 4096) {
let length = self.inputStream!.read(UnsafeMutablePointer<UInt8>(buffer.mutableBytes), maxLength: buffer.length)
if 0 < length {
buffer.length = length
self.queue!.addOperationWithBlock({ [weak self] () -> Void in
self!.processDataChunk(buffer)
})
}
}
break
default:
break
}
}
func processDataChunk(buffer: NSMutableData) {
if self.remainder != nil {
self.remainder!.appendData(buffer)
} else {
self.remainder = buffer
}
self.remainder!.mng_enumerateComponentsSeparatedBy(self.delimiter!, block: {( component: NSData, last: Bool) in
if !last {
self.emitLineWithData(component)
}
else {
if 0 < component.length {
self.remainder = (component.mutableCopy() as! NSMutableData)
}
else {
self.remainder = nil
}
}
})
}
func emitLineWithData(data: NSData) {
let lineNumber = self.lineNumber
self.lineNumber = lineNumber + 1
if 0 < data.length {
if let line = NSString(data: data, encoding: NSUTF8StringEncoding) {
callback!(lineNumber: lineNumber, stringValue: line as String)
}
}
}
}

You should consider using NSStream (NSOutputStream/NSInputStream). If you are going to choose this approach, keep in mind that background thread run loop will need to be started (run) explicitly.
NSOutputStream has a method called outputStreamToFileAtPath:append: which is what you might be looking for.
Similar question :
Writing a String to an NSOutputStream in Swift

Related

Does UserDefaults avoid redundant writes to disk?

I am developing a Mac application that is currently configured with UserDefaults. I would like to be able to swap it out with JSON configuration in the future.
Here's the problem - UserDefaults reads/writes on a per-key basis, whereas with JSON you read/write the configuration in bulk (as a single file).
The simple solution would be create a protocol with the lowest common denominator - write/read disk in bulk:
protocol PreferencesRepository {
read() -> Preferences
readLive() -> Observable<Preferences>
write(Preferences)
}
struct Preferences {
let configA: String
let configB: String
...
}
The implementation of PreferencesRepository when using UserDefaults would look like this:
read() -> Preferences {
let a = userDefaults.read("keyA")
let b = userDefaults.read("keyB")
...
return Preferences.init(a, b, ...)
}
write(preferences: Preferences) {
userDefaults.write("keyA", preferences.configA);
userDefaults.write("keyB", preferences.configB);
...
}
You can see how if just one property of Preferences changes, all properties will be written to UserDefaults.
I understand that UserDefaults has optimizations to avoid reading from disk as much as possible by keeping an in-memory cache in sync with the disk. Does UserDefaults use that same cache to avoid redundant writes?

How do I release cached images when using UIImage(data:)?

I'm noticing heavy memory usage from my image cache in my collection view and need to understand how to release it. I understand the difference between UIImage(named:) and UIImage(contentsOfFile:). However, I'm using UIImage(data:) and I can't seem to find any documentation on releasing image caches in this instance. Any help appreciated. Here's my code snippet:
if let setImage = cell?.viewWithTag(101) as? UIImageView {
if let url = URL(string: imageURLs[indexPath.item]) {
let task = URLSession.shared.dataTask(with: url, completionHandler: { data, _, error in
guard let data = data, error == nil else {
print("No data detected: \(Error.self)")
return
}
DispatchQueue.main.async {
let newImageData = UIImage(data: data)
self.imageData[indexPath.item] = newImageData!
setImage.image = self.imageData[indexPath.item] as? UIImage
}
})
task.resume()
URLSession.shared.finishTasksAndInvalidate()
}
}
UIImage(data:) doesn’t store in the system image cache. So, if you remove all of the references to the associated images from your imageData, make sure the image views are getting released, etc., you should be good.
If imageData a simple collection, consider making it a NSCache, where you can constrain the total count or cost. Also, we often employ two tier caching mechanisms (with smaller in-memory limits, larger persistent storage caches).
You might consider using one of the many third party libraries (AlamofireImage, KingFisher, SDWebImage, etc.) as they tend to employ decent caching strategies, getting you out of the weeds of this. They all offer nice “asynchronous image” extensions to UIImageView, too. (For example, the implementation you’ve shared with us is going to suffer from backlogging issues if you scroll quickly through a big collection view, something that is easily tackled with these UIImageView extensions.) Your UICollectionViewDataSource really should not be burdened with this sort of code.
I was under the impression that Firestore auto-caching applied to cloud Storage, but it only applies to cloud Database. Once I implemented local caching with NSCache, my problem was solved.

Swift Dictionary Absurd Memory Usage

I ran into an interesting problem in one of my applications. When accessing a Dictionary many times, the memory usage of my application skyrockets to over a gigabyte in seconds. Here is some sample code to show the problem.
override func viewDidLoad() {
let dictionary = ["key1":"value1"]
let nsKey: NSString = "key1"
let swiftKey = nsKey as String
for _ in 0 ... 10000000 {
dictionary[swiftKey]
}
}
Repeatedly accessing the dictionary causes memory to climb until the loop finishes. I looked at instruments and saw tons of string allocations. Turns out using an NSString is the issue.
Changing the nsKey to a swift String like so fixes the issue:
let nsKey = "key1"
Also changing the dictionary to an NSDictionary fixes the issue:
let dictionary: NSDictionary = ["key1":"value1"]
Does anyone know why accessing the dictionary using a casted NSString causes so much heap allocation, and are there any other fixes besides the ones described above?
Here are some pictures. It looks like behind-the-scenes strings are being allocated and set to autorelease (or am I reading the data below wrong?) Could this be why memory usage continuously allocates and then drains at a later point? If this is true, should this be considered a "bug"? This issue occurs on OS X as well as iOS.
The best solution is to not bridge to NSString here. Just use Swift types. Or, as you discovered, you can just use Foundation types (NSString and NSDictionary). Bridging can require making temporary copies.
In any case, though, in loops like this it's very common to create temporary copies for one reason or another (even if you avoided this particular problem). To address that, you need to drain your autorelease pool in the loop. For instance:
let dictionary = ["key1":"value1"]
let nsKey: NSString = "key1"
let swiftKey = nsKey as String
for _ in 0 ... 10000000 {
autoreleasepool { // <=== the scope of the current pool
dictionary[swiftKey]
}
}
Adding that will keep your memory steady. This is a very common thing to do in large loops in Cocoa. Otherwise the pool won't be drained until you return from your top-level method.

Cloudkit fetch data (strings and image asset) take a long time to appear after call

I was hoping that someone can help a coding newbie with what might be considered a stupid question. I'm making a blog type app for a community organization and it's pretty basic. It'll have tabs where each tab may be weekly updates, a table view with past updates and a tab with general information.
I setup cloudkit to store strings and pictures, and then created a fetchData method to query cloud kit. In terms of the code (sample below) it works and gets the data/picture. My problem is that it takes almost 5-10 seconds before the text and image update when I run the app. I'm wondering if that's normal, and I should just add an activity overlay for 10 seconds, or is there a way to decrease the time it takes to update.
override func viewDidLoad() {
fetchUpcoming()
}
func fetchUpcoming() {
let container = CKContainer.defaultContainer()
let publicData = container.publicCloudDatabase
let query = CKQuery(recordType: "Upcoming", predicate: NSPredicate(format: "TRUEPREDICATE", argumentArray: nil))
publicData.performQuery(query, inZoneWithID: nil) { results, error in
if error == nil { // There is no error
println(results)
for entry in results {
self.articleTitle.text = entry["Title"] as? String
self.articleBody.text = entry["Description"] as? String
let imageAsset: CKAsset = entry["CoverPhoto"] as! CKAsset
self.articlePicture.image = UIImage(contentsOfFile: imageAsset.fileURL.path!)
self.articleBody.sizeToFit()
self.articleBody.textAlignment = NSTextAlignment.Justified
self.articleTitle.adjustsFontSizeToFitWidth = true
}
}
else {
println(error)
}
}
}
Another question I had is about string content being stored on cloud kit. If I want to add multiple paragraphs to a blood entry (for example), is there a way to put it in one record, or do I have to separate the blog entry content into separate paragraphs? I may be mistaken but it seems like CloudKit records don't recognize line breaks. If you can help answer my questions, I'd be really appreciative.
It looks like you might be issuing a query after creating the data, which isn't necessary. When you save data, as soon as your completion block succeeds (with no errors) then you can be sure the data is stored on the server and you can go ahead and render it to the user.
For example, let's say you're using a CKModifyRecordsOperation to save the data and you assign a block of code to the modifyRecordsCompletionBlock property. As soon as that block runs and no errors are passed in, then you can render your data and images to your user. You have the data (strings, images, etc.) locally because you just sent them to the server, so there's no need to go request them again.
This provides a quicker experience for the user and reduces the amount of network requests and battery you're using on their device.
If you are just issuing normal queries when your app boots up, then that amount of time does seem long but there can be a lot of factors: your local network, the size of the image you're downloading, etc. so it's hard to say without more information.
Regarding the storage of paragraphs of text, you should consider using a CKAsset. Here is a quote from the CKRecord's documentation about string data:
Use strings to store relatively small amounts of text. Although
strings themselves can be any length, you should use an asset to store
large amounts of text.
You'll need to make sure you're properly storing and rendering line break characters between the user input and what you send to CloudKit.

Exporting large amounts of data from coredata to json

I'm trying to export some data from core data to JSON. While the record count isn't particularly large (around 5000-15000 records), my data model is complex and there is a large amount of data in each record, so when I export this I exceed the allowable memory and iOS kills my app.
The steps i currently take are:
1. I have a method that extracts all the data from cordata and stores it an `NSDictionary`
2. I then write it to a file using an `NSOutputStream` and `NSJSONSerialization`
3. I then zip up the file and send it via email
I'm pretty sure that steps 2 and 3 are fine from a max memory perspective as I stream the data. But the problem is that it gets killed in step 1 because I'm effectively pulling all the data out of CD and putting it in memory so I can pass it through NSOutputStream to NSJSONSerialization.
Anyone know how to not have to pull everything into memory, but still write to a single tree JSON file?
Update - More Detail
My data structure (simplified for clarify) looks like this.
Given its not just a flat set of records but a hierarchal structure of objects with relationships i cant figure out how to pull the data out of core data in batches and fed tot he json streamer rather than all in memory to construct the json. my step one above is actually a collection of recursive methods that pull the data out of the core data entities and construct the 'NSDictionary'.
Folder {
Folder {
Word {
details type 1
details type 2
}
Word {
details type 1
details type 2
}
}
Folder {
Word {
details type 1
details type 2
}
Word {
details type 1
details type 2
}
}
Word {
details type 1
details type 2
}
}
[UPDATED TO IMPLEMENT LOW MEMORY SERIAL OUTPUT OF NESTED FOLDER HIERARCHY AS NESTED JSON OBJECT FILE]
Now you have provided more detail it's clear the original problem statement lacked sufficient detail for anyone to be able to provide an answer for you. Your issue is actually an age-old problem of how to traverse hierarchies in a memory efficient way combined with the fact the iOS JSON Library is quite light and doesn't easily support streamed writing of deep hierarchies).
The best approach is to use a technique known as the visitor pattern. For each of your NSManagedObject types shown above, implement a protocol called visitor, e.g. just the interface line for each object should look something like this:
#interface Folder : NSManagedObject <Visitable>
#interface Word : NSManagedObject <Visitable>
The visitor protocol should define a method call for all objects that comply with the protocol.
#protocol Visitable <NSObject>
- (void)acceptVisitor:(id<Visitor>)visitor;
#end
You are going to define a visitor object, which itself implements a visitor protocol.
#protocol Visitor <NSObject>
- (void)visitFolder:(Folder*)folder;
- (void)visitWord:(Word*)word;
#end
#interface JSONVisitor : NSObject <Visitor>
#property (nonatomic, strong) NSURL *streamURL;
- (void)startVisiting:(id<Visitable>)visitableObject;
#end
#implementation JSONVisitor
#property (nonatomic, strong) NSOutputStream *outputStream;
- (void)startVisiting:(id<Visitable>)visitableObject
{
if ([visitableObject respondsToSelector:#selector(acceptVisitor:)]
{
if (_outputStream == nil)
{
// more code required set up your output stream
// specifically as a JSON output stream.
// add code to either set the stream URL here,
// or set it when the visitor object is instantiated.
_outputStream = [NSOutputStream outputStreamWithURL:_streamURL append:YES];
}
[_outputStream open];
// Note 1a Bypass Apple JSON API which doesn't support
// writing of partial objects (doing so is very easy anyway).
// Write opening root object fragment text string to stream
// such as:
// {
// "$schema" : "http://myschema.com/draft-01/schema#Folder1",
// "name" : "Folder export",
// "created" : "2013-07-16T19:20:30.45+01:00",
// "Folders" : [
[visitableObject acceptVisitor:self];
// Note 1b write closing JSON root object
// e.g.
// ]
// }
[_outputStream close];
}
}
- (void)visitFolder:(Folder*)folder
{
// Note 2a Bypass Apple JSON API which doesn't appear to support
// writing of partial objects (Writing JSON is very easy anyway).
// This next step would be best done with a proper templating system,
// but for simplicity of illustration I'm suggesting writing out raw
// JSON object text fragments.
// Write opening JSON Folder object fragment text string to stream
// e.g.
// "Folder" : {
if ([folder.folders count] > 1) {
// Write opening folder array fragment to stream e.g.
// "Folders" : [
// loop through folder member NSManagedObjects here
// (note defensive checks for nulls not included).
NSUInteger count = 0;
for (Folder *nestedFolder in folder.folders)
{
if (count > 0) // print comma to output stream
[nestedFolder acceptVisitor:self];
count++;
}
// write closing folders array to stream
// ]
}
if ([folder.words count] > 1) {
// Write opening words array fragment to stream e.g.
// "Words" : [
// loop through Word member NSManagedObjects here
// (note defensive checks for nulls not included).
NSUInteger count = 0;
for (Word *nestedWord in folder.words)
{
if (count > 0) // print comma to output stream
[nestedFolder acceptVisitor:self];
count++;
}
// write closing Words array to stream
// ]
}
// Print closing Folder object brace to stream (should only be followed
// a comma if there are more members in the folder this object is contained by)
// e.g.
// },
// Note 2b Next object determination code here.
}
- (void)visitWord:(Word*)word
{
// Write to JSON stream
[NSJSONSerialization writeJSONObject:word toStream:_outputStream options: NSJSONWritingPrettyPrinted error:nil];
}
#end
This object is able to "visit" each object in your hierarchy and do some work work with it (in your case write it to a JSON stream). Note you don't need to extract to a dictionary first. You just work directly with the Core Data objects, making them visitable. Core Data contains it's own memory management, with faulting, so you don't have to worry about excessive memory usage.
This is the process. You instantiate the visitor object and then call it's start visiting method passing in the root Folder object of your hierarchy above. In that method, the visitor object "knocks on the door" of the first object to be visited by calling - (void)acceptVisitor:(id<Visitor>)visitor on the object to be visited. The root Folder then "welcomes the visitor in" by calling a method back on the visitor object matching it's own object type, e.g.:
- (void)acceptVisitor:(id<Visitor>)visitor
{
if ([visitor respondsToSelector:#selector(visitFolder:)]) {
[visitor visitFolder:self];
}
}
This in turn calls the visitFolder: method on the visitor object which opens the stream writes the object as JSON and closes the stream. This is the important thing. This pattern may appear complex at first, but I guarantee, if you are working with hierarchies, once you have implemented it you will find it powerful and easy to manage.
To support low memory serial output of a deep hierarchy, I'm suggesting you write your own JSON Folder object to the output stream. Since JSON is so simple, this is much easier than it might at first appear. The alternative is to look for a JSON Library which supports low memory serialised writing of nested objects (I haven't used JSON much so don't know if such exists and is easy to use on iOS). The visitor pattern ensures you need have no more than one NSManagedObject instantiated to work on for each level of the hierarchy (though of course more will inevitably need to be instantiated as you implement hierarchy traversal logic) so this is light on memory usage.
I have given examples of the text string that needs to be written to the output stream. Best practice would dictate using a templating system for this rather than directly writing statically allocated strings. But personally I wouldn't worry about adopting the quick and dirty approach if your deadline is tight.
I've assumed your folder objects contain a folders property providing a set of additional folders. I have also assumed your Folders NSManagedObject class contains a words property containing a set of Words NSManagedObjects. Remember if you stay working in Core Data it will look after ensuring you keep a low memory footprint.
At the end of the visitFolder: method, you can use the following logic.
Check if the Folder's contains any folders and visit each in turn if it does.
If it contains no more folders, check if it contains any Words, and visit each in turn if it does.
Note the above code is the simplest construct for minimising the memory footprint. You may want to optimise it for performance by e.g. only doing an auto-release when a certain batch size is exceeded. However given the problem you have described, it will be best to implement the most memory efficient method first.
If you have polymorphic hierarchies - your on your own :) - get a book out and do some study -managing them is a grad degree in itself.
Clearly this code is untested!
Check the NSFetchRequest documentation. You will see two properties:
- (NSUInteger)fetchOffset;
– fetchBatchSize;
With use of these two properties you can restrict the number of returned NSManagedObjects to a given batch size.
Open a stream you can write too. Set up a loop to execute a fetch request. But set a batch size (x) and then update the fetch offset of the fetch request at the end of the loop code for the next iteration of the loop.
myFetchRequestObject.fetchOffset += x;
Process the batch of data objects writing the JSON data to your open stream before starting the next iteration of the loop.
When either no more objects are returned or the number of objects returned by the fetch are less than the batch size, exit your loop.
Close your stream.
problem was that i had Enable Zombie Objects in the project schema turned on.
For some reason this also carried through to the release build too.
turning it off fixed all my problems.
I ended up also using TheBasicMinds design pattern because its a cool design pattern...

Resources