Realm file size in iOS app

Realm file size in iOS app - ios

I have an app that uses Realm as a staging database. It receives information from a bluetooth device, processes it, and sends the processed result to a server.
The incoming data from bluetooth gets stored in a Realm table (table1). A separate thread reads data from the Realm database, processes it, and stores it into a second table (table2) for uploading to a server. When it pulls this data and successfully processed it, it deletes it from table1.
The third thread pulls data from table2, and when it successfully sends, removes it from table2.
I'm using a database here in case, for whatever reason, the app is killed - data won't be lost... it will just resume where it left off when the app is restarted. But as you can see, the database is not something that hangs around (it's not like an address book or something... it is just temporary staging)
What I notice is that no matter what the heck I do, the realm database file just increases in size over time. I'll end up with a database that if I open it, will have one record in it, but the database file on disk could be 10s of MB in size if the app is running long enough.
Data is being processed on different background queues so as to not block any UX (one of the reasons I'm using Realm instead of CoreData). But I'm using things like autoreleasepools and the invalidate command to avoid objects that are read from having copies made (as suggested by many realm questions/answers)
What gives? I know I don't have a code sample here, but this just seems like a basic garbage collection problem in Realm. I've seen other questions related to this where people are like "why is my database so huge", and the answers suggest doing things like "writeCopyToPath", but that feels like an incredible hack, and regardless, it would be very difficult - this app is meant to be constantly connected and monitoring a bluetooth device, so to do this, it would mean stopping, making sure all threads that might alter the database are quiesced, doing the copy to compact the db, and then starting everything back up again. That just seems nonsensical to me. I might interrupt user operations for example. I don't want a user to not be able to do something because I decided it was time to do database maintenance.
I feel like I'm either missing some incredibly fundamental point in how to make Realm not keep junk around, or Realm is just the completely wrong solution for my problem. I've never seen this problem with databases - adding and deleting lots of records... quickly... seems like something a database should just be able to do without exploding in size.

Are you making sure that the background thread is not holding on to old versions of the Realm, preventing the space from being reused?
Quote from the docs (https://realm.io/docs/swift/latest/#seeing-changes-from-other-threads):
If a thread has no runloop (which is generally the case in a background thread), then Realm.refresh() must be called manually in order to advance the transaction to the most recent state.
Failing to refresh Realms on a regular basis could lead to some transaction versions becoming “pinned”, preventing Realm from reusing the disk space used by that version, leading to larger file sizes.

Related

Mobile application data management

My question surrounds around one single point - data management in mobile application. I have created a mobile application where data comes from server. The data includes both text and images. Following are the steps I am doing for this :
First launch :
1. Get server data.
2. Save server data in Sqlite database.
3. Show Sqlite data.
Next launches :
1. Show Sqlite data.
2. Get server data in background.
3. Delete previous Sqlite data.
4. Save new server data in Sqlite database.
5. Show Sqlite data.
I have couple of questions on these steps :
1. Is this the right approach ? Other way could be showing data every time from server but that would not display the data on screen immediately (depending on internet speed).
2. I also thought of comparing the Sqlite data with the new server data. But faced a big challenge. The new server data might have new records or deleted records. Also, I could not find an appropriate approach to compare each database field with JSON data.
So what is the best approach to compare local Sqlite data with new server data ?
3. Each time I delete the Sqlite data and insert new data and then refresh the screen (which has a UITableView), it blinks for a second which is obvious. How to avoid this issue if steps 3, 4, 5 are followed ?
4. How should I proceed with data update in case I come back on the screen each time or when the application becomes active ? I am very aware of NSOperationQueues or using GCD for that matter. But what if I am crazy and go back and forth to screen again and again. There will be a number of NSOperations in the queue.

It's a challenge to synchronise server data, I've done that before, and if you can spend time on it I'd say it's the best solution.
You may need creation and modification dates on both server and local objects, to compare them - this will let you decide which objects to add, update and delete.
If the server sends you only the recently updated objects you can save a lot of traffic and improve performance (but deleted objects will be harder to detect).
If the data is only changed in the server it's easier, when the app can change the data too it becomes more complicated (but it seems that it's not your case). It also depends on how complex the database is, of course.
If you don't want to invest some time in doing this, just fetching all data everytime works too, even if it is not ideal! Instead of showing the old data and blinking it, you can just make the user wait 2-3 seconds when entering, while you get the new data. Or instead you can fetch the data only when starting the app, and so when you get to that view controller it will be ready already.
It's a complex problem that everyone faces at some point, so I'm curious to see what other people will suggest :)

This is a good question.
I personally think downloading data, store locally and later try to sync is a dangerous scenario. Easy to introduce bugs, master <-> slave issues (what data should be master, if multiple devices would be used etc.)
I think something like this could be a working approach:
1. I would try to look at possibilities to lazy load the data from the server on-demand. That is when a user have a View that should display data, load that specific data with the creation of that specific View. This ensures the data is allways in sync.
2. Tackling the need to reload data from server from every view, could be done by simply storing the downloaded data as objects in memory (not using SqlLite). The view will try to load the needed data trough your cache manager, and it would serve it from memory, if available. If not in memory simply get the data from your server and add it to your memory cache.
The memory cache could be a home made data manager wrapping a Dictionary stored on you AppDelegate, or some global "Singelton" to wrap the cache management/storing and data loading.
3. With lazy loaded data and memory cache you would need to make sure any updates (changes, new records, deleted records) updates your memory data model, as well as pushing these changes to the server as soon as possible. Depending on data size etc. you could force the user to wait, or do it directly as background process.
4. To ensure the data is in sync, you should make sure that you periodically invalidate (delete) the local memory records in the cache and thereby force data updates from the server. Best approach would probably be to have a last updated timestamp for each record in the memory cache. So the periodical invalidator would only delete "old records" from the memory cache (once again not from the server).
To save server from unnecessary data load, the data should still load on demand when the user needs it in a view, and not as part of "cache invalidation".
5. Depending on the data size you might need to look at "cache invalidation". Could be as simple as when xx records are stored, start deleting old objects from memory cache (not server, only locally on device).
6. If data sync is absolutely critical you might want to look at refreshing your memory cache for a record, just before you allow the user to change data. E.g. when user taps "Edit" or similar, you grab the latest data from server for that record. This is just to make sure the user is not going to update a record using outdated data and thereby accidentally overriding any changes made remote, or on another device etc.
--
My take on it. I do not believe there is a "perfect right way" to do this. But this would be what I would try to do.
Hope this will help with some ideas and inspiration.

How about this:
If data exists in SqlLite, load into "in-memory" copy and show it
In background load new server data
delete old sqlite data if it exists (note that the in-memory copy remains)
save new server data to sqlite
load new sqlite data into "in-memory" copy and show it.
If no data was found in step 1, display a "loading" screen to the user during step 2.
I'm making the assumption that the data from SqlLite is small enough to keep a copy in memory to show in your UITable view (The UITable view would always show data from in-memory).
It may be possible to combine steps 4 and 5 if the data is small enough to hold two copies in memory at the same time (you would create a new in-memory copy and swap with the visible copy when complete).
Note:
I don't talk about error handling here, but I would suggest that you don't delete the sqlite data until you have new data to replace it with.
This approach also eliminates the need to determine if this is the first launch or not. The logic always remains the same which should make it a little easier to implement.
Hope this is useful.

You can do same things more efficiently by MultiVersion Concurrency Control (MVCC), which uses a counter (sort of a very simple "time stamp") for every data record, which is updated whenever the record is changed means you need to get those data which is Updated after last sync call that reduces lots of redundant data and bandwidth.
Source: MultiVersion Concurrency Control

Core data migration without reading all of the data into memory?

We have a core data DB (sqlite store) which, for some users, is about 100-150 MB. I wouldn't think that would be too big for a storage system to deal with (even on a mobile device), but we've found that with that size core data DB, ANY lightweight migration takes ~10+ seconds to complete. Even something as simple as adding a completely new independent entity (not related to any other entity). With raw sqlite this would be a create table statement. So, my question is whether anyone else has seen this and, if so, have they found a workaround to make such simple migrations faster? Specifically, I'm looking for a way to handle adding a new independent entity to an existing core data DB that's ~100-150 MB and have it be quick (i.e., under 5 seconds).
I believe that core data migrations ALWAYS have to read all of the data from the source and write it all to a destination for a migration (which is terrible BTW), but I'm hoping someone can prove me wrong. :) I couldn't find any way to do it with a custom migration either.
I've considered munging the DB with sql directly to basically make the model look like what CoreData would expect (I've done stuff like this to manually "downgrade" a core data DB for debugging purposes), but we really want to avoid doing something like that in production.
UPDATE:
For reference, this is the current approach we are taking. This is not a generic solution, but will work for our use case. Unless I get a better answer I'll add this as an answer at some point in the future and accept it.
We're going to deal with this by essentially making the DB smaller. There are 2 out of 15 entities that take up the bulk of the space in the DB (~95%). We're going to create completely separate data models each with one of those entities. This is done without changing the main model at all (hence, no core data migration). We'll then make a task that runs with background priority in GCD that, if any of those entities are found in the main DB, they're moved to the appropriate separate DB and removed from the main DB. This is done in batches with some sleeps between batches so it's less resource intensive and doesn't affect normal app operation. We'll modify the code that accesses those entities to try to get them from the new DB and fall back to the main DB if they're not in there.
In a future update after we find that all, or at least most, of our users have updated their data in the new DBs we'll drop those entities from the main DB.
This leaves us with a small main DB that can have migrations applied quickly and two large DBs that have migrations done more slowly. Those large DBs, in our case, should change less often (maybe never?) and even if they do change there are limited places in the app that require them so we can work around it in the UI (e.g., report some feature as unavailable until we move data).

A 10-20 second delay for an update to a huge dataset seems perfectly reasonable to me. Just don't do it on the main thread.
This means you'll have to modify the boilerplate Core Data stack setup that you get in the usual Xcode templates. Instead of always setting up the stack on the main thread at launch time, check to see if migration is needed. If so, put up appropriate UI, do the migration in a background thread, and be ready to invoke beginBackgroundTaskWithExpirationHandler: if needed.

Database encrypted by SQLCipher in an iOS app is becoming permanently inaccessible

I recently modified my iOS app to enable serialized mode for both a database encrypted using SQLCipher and a non-encrypted database (also SQLite). I also maintain a static sqlite3 connection for each database, and each is only opened once (by simply checking for null values) and shared throughout the lifetime of the app.
The app is required to have a sync-like behavior which will download a ton of records from a remote database at regular intervals using a soap request and update the contents of the local encrypted database. Of course, the person using the app may or may not be updating or reading from the database, depending on what they're doing, so I made the changes mentioned in the above paragraph.
When doing short term testing, there doesn't appear to be any issue with how things work, and I've yet experience any problem.
However, some users are reporting that they've lost access to the encrypted database, and I'm trying to figure out why.
My thoughts are as follows: Methods written by another developer declared all sqlite3_stmt's to be static (I believe this code was in the problematic release). In the past I've noticed crashes when two threads using a particular method run simultaneously. One thread finalizes, modifies or replaces a sqlite3_stmt while another thread is using it. A crash doesn't always occur because he has wrapped most of his SQLite code in try/catch blocks. If it's true that SQLite uses prepare and finalize to implement locking, could the orphaning of sqlite3_stmt's which occurs due to their static nature in this context be putting the database into an inoperable state? For example, when a statement acquires an exclusive lock after being stepped is replaced by an assignment in the same method running in another thread?
I realize that this doesn't necessarily mean that the database will become permanently unusable, but, consider this scenario:
At some point during the app's lifetime it will re-key the encrypted database and that key is stored in another database. Suppose that it successfully re-keys the encrypted database, but then the new key is not stored in the other database because of what I mentioned above.
Provided that the database hasn't become corrupted at some point (I'm not really counting on this being the case), this is the only explanation I can come up with for why the user may not be able to use the encrypted database after restarting the iOS app, seeing as the app would be the only one to access the database file.
Being that I can't recreate this issue, I can only speculate about what the reasoning might be. What thoughts do you have? Does this seem like a plausible scenario for something that happens rarely? Do you have another idea of something to look into?

If the database is rekeyed, and the key for the database is not successfully stored in the other database, then it could certainly cause the problem.

Sharing an large array with all users on a rails app

I have inherited an app that generates a large array for every user that visit the app. I recently discovered that it is identical for nearly all the users!!
Now I want to somehow make one copy of it so it is not built over and over again. I have thought of a few options and wanted input to see which one is the best:
1) Create a model and shove the data into the database
2) Create a YAML file and have the app load it when it initializes.
I personally like the model idea but a few engineers at work feel as though it does not deserve to be a full model. 97% of the times users will see the same exact thing but 3% of the time users will get a slightly different array (a few elements will have changed).
Any other approaches that I should consider.??..thanks in advance.

Remember that if you store the data in the DB, each request which requires the data will have to execute a DB query to pull it out. If you are running multiple server threads, each thread could have its own copy in memory (if they are all handling requests which require the use of the array). In that case, you wouldn't be saving any memory (though you might save time from not having to regenerate the array).
If you are running multiple server processes (not threads), and if the array contents change as the application is running, and the changes have to be visible to all the processes, caching in memory won't work. You will have to use the DB in that case.
From the information in your comment, I suggest you try something like this:
Store the array in your DB, and make sure that the record(s) used have created/updated timestamps. Cache the contents in memory using a constant/global variable/class variable. Also store the last time the cache was updated.
Every time you need to use the array, retrieve the relevant "updated" timestamp from the DB. (You may need to use hand-coded SQL and ModelName.connection.execute to avoid pulling back all the data in the record, which ActiveRecord will probably do.) If the timestamp is later than the last time your cache was updated, pull the array from the DB and update your cache.
Use a Mutex ('require thread') when retrieving/updating the cached data, in case your server setup may use multiple threads. (I don't think that Passenger does, but I have had problems similar to threading problems when using Passenger+RMagick, so I would still use a Mutex to be safe.)
Wrap all the code which deals with the cached array in a library class (or a class method on the model used to store the data), so the details of cache management don't spill over into the rest of the application.
Do a little bit of performance testing on the cache setup using Benchmark.measure {}. If a bug in the setup actually made performance worse rather than better, that would be sad...

I'd go with option 2. You can add two constants (for the 97% and 3%) that load from a YAML file when the app initializes. That ought to shrink your memory footprint considerably.
Having said that, yikes, this is just a band-aid on a hack, but you knew that already. I'd consider putting some time into a redesign, if you have that luxury.

Performance of NSManagedObjectContext save degrades dramatically

I am having issues with a CoreData-based iOS app when it tries to build the initial DB from data sent from the server. Basically, the server sends down 1MB chunks of objects (about 3,000 per chunk), and the iOS client deserializes them and writes them into disk.
What I'm seeing is that everything is going pretty well for about the first 8 chunks (out of 44), then performance drops off dramatically and each chunk starts taking longer and longer, as in the image below. Pretty much all the time is consumed in [NSManagedObjectContext save] as you can see in the Instruments profiling data, but also it appears that the app is no longer running at 100% of CPU for some reason, like it's waiting on disk I/O or something.
A few important facts about how I'm doing this:
Each chunk is processed in its own NSManagedObjectContext with its own NSAutoreleasePool, so there is no object build-up in a non-flushed context between processing of chunks.
There is no NSUndoManager set on any of the contexts.
There is no mergeChangesFromContextDidSaveNotification: going on (i.e. the chunk contexts aren't pushing their changes into a "master" context)
I'm using a SQLite-based datastore on iOS 4.3.
The records being written do have indexes on them.
The entire sync job is processed on a single GCD background thread (i.e. dispatch_queue_create() and dispatch_async()).
I have no idea why the performance suddenly drops off like that or what can be done to address it. I have poked around and read the following, but nothing has jumped out at me yet:
http://cocoawithlove.com/2008/03/testing-core-data-with-very-big.html
Does the performance of saving a ManagedObjectContext depend on the number of contained (unchanged) objects?
Any ideas or pointers for making this app scale up to 100,000 records in the database would be much appreciated.
Edit - extra stats
This Instruments graph shows the same simulation as above (on iPad2), but includes the disk activity stats and you can see pretty plainly that all of the "not running at 100% CPU" time seems to be taken up with writing to disk.
I also ran same sync attempt running on the iOS simulator. Overall memory usage is more or less constant for each chunk except for a dictionary that contains object IDs that grows slightly over time (but these are not CoreData objects or anything that would affect saves, they are just NSNumbers). This dict is a small amount of memory compared to the total heap and so the problem is not running out of memory.
What is interesting about this test is that the CoreData Save instrument reports that the successive saves take roughly the same amount of time, which obviously conflicts with the CPU profiling information from the first set of results. It seems like CoreData thinks it is taking the same amount of time to push changes to the DB, but the DB itself (i.e. SQLite) suddenly takes a lot longer to actually stream those changes to disk.

I know this is an old issue, so this is probably no longer relevant for you, but it may be to someone else.
I've seen performance issues seeding a Core Data database over iCloud and discovered that if you have inverse relationships on the data model you can be hurt incredibly badly performance wise. The way iCloud transaction logging has been implemented, it actually seems to be an inevitable problem. Each transaction sent to iCloud (have a look at them on developer.icloud.com - they're just zipped up plists) records every relationship that is affected by a change. Unlike when you modify one end of an relationship in Core Data, and it takes care of the inverse end, the core data transaction log ends recording the changes at BOTH ends, rather than it working it out.
So if you have a 1 to many relationship, and you create another record which will end up hanging off the 'many' end - well the record at the '1' end will also be updated to reflect the fact a new additional record is now hanging off it. If you have an architecture that means you have a 'type' object that lots of 'data' objects hang off, then every time you add a new data object, the type one is going to have a transaction written for it as well - but here's the kicker, because the iCloud Core Data transactions record the ENTIRE state of edited entities, not just the changes, EVERY relationship already recorded against it is also added to the log, not just the one indicating the new subordinate record. This can quickly spiral out of control as the amount of data written grows as the number of relationships between entities grows - it ends up taking longer and longer to save batches.
I've answered a question a bit like this before here on the Apple dev forums which might be useful as I never seem to be able to describe this succinctly.
The easiest option to improve seeding performance if this scenario is what is impacting you is to switch inverse relationships off, but this isn't always an option.

More information about your implementation would help. For example, do you run this on the main thread or are you implementing background threads? However, I have seen this behavior before. When performing extensive batch operations using Core Data, it can slow down if not memory managed properly. Have you checked memory usage? Have you checked for leaks? Another thing to try is to make sure you are using NSAutoreleasePool correctly if needed. By draining the pool periodically, that may help performance.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart