Massive text file, NSString and Core Data

Massive text file, NSString and Core Data - ios

I've scratched my head about this issue for a long, long time now, but I still haven't figured out a way to do this efficiently and without using too much memory at once (on iOS, where memory is very limited).
I essentially have very large plain text files (15MB on average), which I then need to parse and import to Core Data.
My current implementation is to have an Article entity on Core Data, that has a "many" relationship with a Page entity.
I am also using a slightly modified version of this line reader library: https://github.com/johnjohndoe/LineReader
Naturally, the more Page entities I create, the more memory overhead I create (on top of the actual NSString lines).
No matter how much I tweak the amount of lines per page, or the amount of characters per line, the memory usage goes absolutely crazy (~300MB+), while just importing the whole text file as a single string quickly peaks at ~180MB and finished in a matter of seconds, with the former taking a couple of minutes.
The line reader itself might be at fault here, since I am refreshing the pages in the managed context after they're done, which to my knowledge should release them from memory.
At any rate, does anyone have any notes, techniques or ideas on how I should go about implementing this? Ideally I'd like to have support for pages, since I need to be able to navigate the text anyway, and later loading the entire text to memory doesn't sound like much fun.
Note: The Article & Page entity method works fine after the import, but the importing itself is going way overboard with memory usage.
EDIT: For some reason, Core Data is also consuming ~300MB of memory when removing an Article entity from the context. Any ideas on why that might be happening, or how that could be remedied?

Related

Core Data refuses to clear external data references from memory

I am loading large amounts of data into Core Data on a background thread with a background NSManagedObjectContext. I frequently reset this background context after it's saved in order to clear the object graph from memory. The context is also disposed of once the operation is complete.
The problem is that no matter what I do, Core Data refuses to release large chunks of data that are stored as external references. I've verified this in the Allocations instrument. Once the app restarts the memory footprint stays extremely low as these external references are only unfaulted when accessed by the user. I need to be able to remove these BLOBS from memory after the initial download and import since they take up too much space collectively. On average they are just html so most are less than 1MB.
I have tried refreshObject:mergeChanges: with the flag set to NO on pretty much everything. I've even tried reseting my main NSManagedObjectContext too. I have plenty of autorelease pools, there are no memory leaks, and zombies isn't enabled. How can I reduce my Core Data memory footprint when external references are initially created?
I've reviewed all of Apple's documentation and can't find anything about the life cycle of external BLOBS. I've also searched the many similar questions on this site with no solution: Core Data Import - Not releasing memory
Everything works fine after the app first reboots, but I need this first run to be stable too. Anyone else been able to successfully fault NSData BLOBS with Core Data?

I'm assuming the "clear from memory" means "cause the objects to be deallocated" and not "return the address space to the system". The former is under your control. The latter is not.
If you can see the allocations in the Allocations instrument, have you turned on tracking of reference count events and balanced the retains and releases? There should be an indicative extra retain (or more).
If you can provide a simple example project, it would be easier to figure out what is going on.

iOS SQLite or 1000 loose files?

Suppose I have 1000 records of variable size, ranging from around 256 bytes to a few K. I wonder is there any advantage of putting them into a sqlite database versus just reading/writing 1000 loose files on iOS? I don't need to do any operations other than access by a single key, which I can use as the filename. Seems like the file system would be the winner unless the number of records grows very large.

If your system were read-only, I would say that the file system is the clear winner: a simple binary file and perhaps a small index to know where each record starts would be all that you need. You could read the entire index into memory, and then grab your records from the file system as needed, for a performance that would be extremely tough to match for any RDBMS.
However, since you are planning on writing data back, I would suggest going with SQLite because of potential data integrity issues.
Performance concerns should not be underestimated, too: since your records are of variable size, writing the data back may prove to be difficult in cases when records need to expand. Moreover, since you are on a mobile platform, you would need to build something in to avoid data corruption when the program is killed unexpectedly in the middle of a write. SQLite takes care of this; your code would have to build something comparable to it, or risk data corruption problems.

Air for iOS - how to handle crashes due to limited memory?

We're currently developing an iPad application using Air for iOS and from time to time experience crashes (on iPad1 with ios 5 only) which seem to be because the application is using up too much memory.
How to catch/handle such errors in the application? how to be notified when memory is low? trying to catch flash.errors.MemoryError doesn't seem to work. Any tips?

I've done some work in this area and here are some tips that I can give you.
Get Flash Builder 4.6 Premium.
Get it if only for the profiler alone. It has one of the best profilers available for diagnosing things like this. With that said, there are other Flash profilers around, that have varying degrees of usefulness.
This alone will help you find and diagnose where most of your memory is going in terms of raw memory usage, but also help you find how many objects you are creating and destroying and how long they are hanging around before the garbage collector is finally getting around to letting them go.
Pool smaller trivial objects
Rather than constantaly creating and destorying smaller objects, create object pools. This will save you the cost of spinning up new objects constantly, and keep you from having to wait until the garbage collector to run before releasing the memory.
There are a lot of examples and patterns to look at for creating object pools in actionscript. It would be easier if AS supported generics, but even without them its still pretty straight forward.
Eagerly dispose of huge objects
This goes directly against the advice in the previous point, but for huge objects, you don't want them hanging around in memory forever. I'm referring to things like BitmapData, when you are done with them (for the foreseeable future), tear them down and null them out, and let the garbage collector clean it up.
When you need them again, rebuild them. Yes, you will take a slight performance hit, but memory on mobile devices is precious and don't waste it by keeping around a 2mb bitmapdata object that only ever appears on the loading screen. Throw it away.
Null out references you don't need anymore
Take some time and try to really understand what the garbage collector needs to do its work, and how its decides which objects can and cannot be thrown away. Try to avoid self referential objects/circular references, while the CG can normally figure it out, sometimes it might need a litle hand holding.
Evaluate every time you use new [Related to 2]
Again using a memory profiler will help for this step, but make sure that every time you instantiate a new object, you need to instantiate a new object. It can be very easy to get lazy when developing for a PC, just throwing new objects into the pool and letting the CG sort it out. See if there are good caching strategies (object pooling, or just reference caching) if its small. And if its a HUGE object that you are building up and tearing down often, it might be time to try to come up with a better architectural solution.
As far as I know, if you get to the point where iOS thinks the memory is low, its already too late. Last time I checked, the framework will try to run the CG when it thinks its running out of memory, and if it can't free up enough memory to continue, it fails out. Do your best to try to avoid getting to the point where the operating system thinks the only safe option is to terminate your thread.

Should I use the Core Text API or draw text API in Quartz?

I am new to iOS, and I am going to write a book reader(like Stanza) for iOS. But now I am confused by the text related APIs offered by Apple, there's quite a lot of them. I took a look at one of them - Core Text, which seems quite convenient to use. But the problem is that Core Text just does not reuse memory, for drawing a page of text, I have to create an NSAttributedString, a CTFramesetterRef and a CTFrameRef. And after the text is drawn on the screen, those objects are supposed to be released, the memory they occupied just couldn't be reused(or there's a way to reuse those memory?).
So, it looks like memory use of those APIs is not efficient, memory allocation and deallocation happen too frequently.
What I am wishing to do is that I am able cache some context settings, setting them on the context in drawRect: when I want to draw some text, and the text is cached in a plain NSMutableString*(not an NSAttributedString*), later I can append text to the NSMutableString* cache.
P.S.: The reason why I use an NSMutableString* for cache is that the book, a txt file can be too big to be kept in memory(as far as effective use of memory is concerned), I will always keep a block of text, say several pages in memory, and when the user turns pages, I will read more pages from the txt file and append the text to the cache, of course I will cut those text in the start of the cache when some requirements are met.
The question is: Which Text API should I use and why? or If I have to use Core Text, is it possible to reuse those memory?
Thank you in advance!

You can reuse the memory. CoreText is good a framework to work with text. You will have to use CFRelease() to release the memory acquired. Have a look at the CoreFoudation memory management guide.

How many 'screens' of data could a game store before having to delete some?

Assuming I was making a Temporal-esque time travel game, and wanted a to save the current state of the screen (location of player and enemies, whether or not destructible objects are destroyed, et cetera) every second to an array, how much data would I be able to store on this array before the game would start to lag considerably and I would have to either delete the array or save it to a file out of the game (ie: a .bin).
On a similar note, is it faster to constantly save every screen to a .bin, or to only do this when it is necessary (start saving when the array is halfway 'full', for example).
And I know that the computer it is being run on matters, but assume it is being run on a reasonably modern computer (not old, but not a nasa supercompeter either), particularily because I have no way of telling exactly what the people who play the game will be using.

Depending on how you use the data afterwards, you could consider storing the changes between states instead of the actual states.
You should use a buffer to reduce the number of I/O-operations. Put data in main memory and write a larger amount of data to disk when needed.

It would depend on the amount of objects you needs to save and how much memory is taken up by each object.
Hypothetically, let's take a vastly oversimplified and naive example, and say that your game contains an average of 40 objects, each of which has 20 properties that take up two bytes of storage. That's 1600 bytes per second, if you're saving each second.

Well it is impossible to give you an answer that will definitely work for your scenario. You'll need to try a few things.
Assuming you are loading large images, sounds, or graphics from disk it may not be good to write to disk with high frequency due to contention. I say may because it really depends on th computer and everything that is going on. So how do you deal with this issue? One way is to run a background thread that watches a queue for items that need to be written to disk. The thread can monitor the queue for a certain number of items before writing to disk. The alternative is to wait for certain other events to happen in the game where I/O is happening and save it then. You may need to analyse the size of events that you are saving and try different options.

You would want to get an estimate as to how much data is saved per screen, then decide how much of someone's memory you want to use, and then just divide, as you will have huge variances. I am using a 64 bit OS so how much you can store on my machine is different than on a 32-bit machine.
You may want to decide what is actually needed. For example, if you just save the properties of each object into a json array you may save yourself some space, but you want to limit how much you write to a disk, as that will need to be done on a separate thread that only writes to this file, so that you don't have two threads trying to access the same resource, queue up the writes.
Saving the music, for example, may not be useful, as that will change anyway, I expect.
Be very judicious about what you will save, try it and see if you are saving enough.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart