Autosave to Firestore lagging - ios

I am autosaving to Firestore every time the user modifies anything in the text field. My code is:
self.collection.document(noteID!).updateData(["note.title": noteTitle.text, "note.lastUpdatedTimestamp": time])
It has started lagging very heavily all of a sudden. Is there a way to batch writes within a document to avoid this heavy lagging?

There is nothing terribly inefficient about that line of code that could be improved. Splitting this up into an update per field would not help. It's more likely the case that your network connection is slower than normal, or that somehow one of the fields is much longer than usual, which increases the time it takes to update.

Related

Realm file size in iOS app

I have an app that uses Realm as a staging database. It receives information from a bluetooth device, processes it, and sends the processed result to a server.
The incoming data from bluetooth gets stored in a Realm table (table1). A separate thread reads data from the Realm database, processes it, and stores it into a second table (table2) for uploading to a server. When it pulls this data and successfully processed it, it deletes it from table1.
The third thread pulls data from table2, and when it successfully sends, removes it from table2.
I'm using a database here in case, for whatever reason, the app is killed - data won't be lost... it will just resume where it left off when the app is restarted. But as you can see, the database is not something that hangs around (it's not like an address book or something... it is just temporary staging)
What I notice is that no matter what the heck I do, the realm database file just increases in size over time. I'll end up with a database that if I open it, will have one record in it, but the database file on disk could be 10s of MB in size if the app is running long enough.
Data is being processed on different background queues so as to not block any UX (one of the reasons I'm using Realm instead of CoreData). But I'm using things like autoreleasepools and the invalidate command to avoid objects that are read from having copies made (as suggested by many realm questions/answers)
What gives? I know I don't have a code sample here, but this just seems like a basic garbage collection problem in Realm. I've seen other questions related to this where people are like "why is my database so huge", and the answers suggest doing things like "writeCopyToPath", but that feels like an incredible hack, and regardless, it would be very difficult - this app is meant to be constantly connected and monitoring a bluetooth device, so to do this, it would mean stopping, making sure all threads that might alter the database are quiesced, doing the copy to compact the db, and then starting everything back up again. That just seems nonsensical to me. I might interrupt user operations for example. I don't want a user to not be able to do something because I decided it was time to do database maintenance.
I feel like I'm either missing some incredibly fundamental point in how to make Realm not keep junk around, or Realm is just the completely wrong solution for my problem. I've never seen this problem with databases - adding and deleting lots of records... quickly... seems like something a database should just be able to do without exploding in size.
Are you making sure that the background thread is not holding on to old versions of the Realm, preventing the space from being reused?
Quote from the docs (https://realm.io/docs/swift/latest/#seeing-changes-from-other-threads):
If a thread has no runloop (which is generally the case in a background thread), then Realm.refresh() must be called manually in order to advance the transaction to the most recent state.
Failing to refresh Realms on a regular basis could lead to some transaction versions becoming “pinned”, preventing Realm from reusing the disk space used by that version, leading to larger file sizes.

Entity Framework / Glimpse duration disparity

I'm using the latest ASP.NET MVC and Entity Framework (MVC 5.2.2, EF 6.1.2), and latest of Glimpse. I'm working on improving query times to eagerly load an entity with several nested child objects, and have reduced the number of queries by using .Include("Object.Child") to bring in navigation properties. At first, I thought I was getting a good result, seeing the "Total query execution time" in the SQL tab of Glimpse reduce significantly. Yet the "Total connection open time" stays high, and is very long for the resulting combined mega-query. See screenshot below.
I'm wondering if anyone can help me understand what is going on with the differences in the two durations? Glimpse says my command takes <100 ms, but that the SQL connection takes >5 seconds. The query in this case is really messy with lots of joins etc, however it's not clear where the time goes if indeed the query itself finishes in 100 ms.
Note: I've seen the answer about why two durations here, but it doesn't explain the nature of each.
Thanks for asking the question. The timer for the connection duration starts when the connection is opened and finishes when it is closed. To work this out further, how are you using your context/connection, are you sharing it, keeping it around, etc?
After further testing, I think I've figured out what was happening. I saw in another question which suggested that the .Include() approach to eager loading hierarchical entities in Entity Framework can result in complex queries with many joins and duplication of data in the result set. I had a long XML string as one of my properties, so if this was duplicated many times at the database, it would take a long time to return/process.
As a test, I cleared the data-heavy field and reran the query, getting a far shorter "connection" duration (the one list on the right in Glimpse). It went from 9 seconds to under 200 ms total. Based on this I assume the data size was the culprit, and learned my lesson about using large data properties this way.
I'd still be interested to know whether Glimpse could show you the raw data being returned from a query, or even show the size in bytes, along with the record count. This would have likely made this problem evident.
A little late to this question, but I encountered the same problem and was also trying to understand the disparity between my query execution time and connection open time.
FWIW, I discovered that I was passing an enumerable in my view model to the view, rather than a concrete list. Thus, the view was triggering evaluation of the query and prolonging the amount of time that the connection remained open. By passing lists of the items (call .ToList() on the enumerable), I drastically reduced the amount of time for which the connection remained open.

SQLite iPad performance problems during mass insert and select

I've been working on an iPad app and all is working fine besides sqlite performance. Now, this app needs to handle a lot of data.
At the moment I'm having 2 issues, one is when I'm populating the database. The current test is 710 records, each with 20 columns and the app can't handle that. This is the main issue, I'm not sure it would ever process anymore than this amount, or even anywhere near this amount but it's what I'm aiming for. My thoughts are; is sqlite even enough to handle this much data, on an iPad.
The second is when pulling data from the database to populate a table view - each row calls for 4 records and the time it takes to call all of these is causing the table to lag slightly whilst it's scrolling. Could I get away with processing the queries in a separate thread? I have tried something similar to this, but I had no luck.
Any help would me amazing, thanks a lot.
In my past project experience, I have seen that index on the tables have slowed down the insert. I dropped the index just before insert bulk, insert the records and recreate the index - I had seen significant difference. Hope this helps

Performance of NSManagedObjectContext save degrades dramatically

I am having issues with a CoreData-based iOS app when it tries to build the initial DB from data sent from the server. Basically, the server sends down 1MB chunks of objects (about 3,000 per chunk), and the iOS client deserializes them and writes them into disk.
What I'm seeing is that everything is going pretty well for about the first 8 chunks (out of 44), then performance drops off dramatically and each chunk starts taking longer and longer, as in the image below. Pretty much all the time is consumed in [NSManagedObjectContext save] as you can see in the Instruments profiling data, but also it appears that the app is no longer running at 100% of CPU for some reason, like it's waiting on disk I/O or something.
A few important facts about how I'm doing this:
Each chunk is processed in its own NSManagedObjectContext with its own NSAutoreleasePool, so there is no object build-up in a non-flushed context between processing of chunks.
There is no NSUndoManager set on any of the contexts.
There is no mergeChangesFromContextDidSaveNotification: going on (i.e. the chunk contexts aren't pushing their changes into a "master" context)
I'm using a SQLite-based datastore on iOS 4.3.
The records being written do have indexes on them.
The entire sync job is processed on a single GCD background thread (i.e. dispatch_queue_create() and dispatch_async()).
I have no idea why the performance suddenly drops off like that or what can be done to address it. I have poked around and read the following, but nothing has jumped out at me yet:
http://cocoawithlove.com/2008/03/testing-core-data-with-very-big.html
Does the performance of saving a ManagedObjectContext depend on the number of contained (unchanged) objects?
Any ideas or pointers for making this app scale up to 100,000 records in the database would be much appreciated.
Edit - extra stats
This Instruments graph shows the same simulation as above (on iPad2), but includes the disk activity stats and you can see pretty plainly that all of the "not running at 100% CPU" time seems to be taken up with writing to disk.
I also ran same sync attempt running on the iOS simulator. Overall memory usage is more or less constant for each chunk except for a dictionary that contains object IDs that grows slightly over time (but these are not CoreData objects or anything that would affect saves, they are just NSNumbers). This dict is a small amount of memory compared to the total heap and so the problem is not running out of memory.
What is interesting about this test is that the CoreData Save instrument reports that the successive saves take roughly the same amount of time, which obviously conflicts with the CPU profiling information from the first set of results. It seems like CoreData thinks it is taking the same amount of time to push changes to the DB, but the DB itself (i.e. SQLite) suddenly takes a lot longer to actually stream those changes to disk.
I know this is an old issue, so this is probably no longer relevant for you, but it may be to someone else.
I've seen performance issues seeding a Core Data database over iCloud and discovered that if you have inverse relationships on the data model you can be hurt incredibly badly performance wise. The way iCloud transaction logging has been implemented, it actually seems to be an inevitable problem. Each transaction sent to iCloud (have a look at them on developer.icloud.com - they're just zipped up plists) records every relationship that is affected by a change. Unlike when you modify one end of an relationship in Core Data, and it takes care of the inverse end, the core data transaction log ends recording the changes at BOTH ends, rather than it working it out.
So if you have a 1 to many relationship, and you create another record which will end up hanging off the 'many' end - well the record at the '1' end will also be updated to reflect the fact a new additional record is now hanging off it. If you have an architecture that means you have a 'type' object that lots of 'data' objects hang off, then every time you add a new data object, the type one is going to have a transaction written for it as well - but here's the kicker, because the iCloud Core Data transactions record the ENTIRE state of edited entities, not just the changes, EVERY relationship already recorded against it is also added to the log, not just the one indicating the new subordinate record. This can quickly spiral out of control as the amount of data written grows as the number of relationships between entities grows - it ends up taking longer and longer to save batches.
I've answered a question a bit like this before here on the Apple dev forums which might be useful as I never seem to be able to describe this succinctly.
The easiest option to improve seeding performance if this scenario is what is impacting you is to switch inverse relationships off, but this isn't always an option.
More information about your implementation would help. For example, do you run this on the main thread or are you implementing background threads? However, I have seen this behavior before. When performing extensive batch operations using Core Data, it can slow down if not memory managed properly. Have you checked memory usage? Have you checked for leaks? Another thing to try is to make sure you are using NSAutoreleasePool correctly if needed. By draining the pool periodically, that may help performance.

Storing Data In Memory: Session vs Cache vs Static

A bit of backstory: I am working on an web application that requires quite a bit of time to prep / crunch data before giving it to the user to edit / manipulate. The data request task ~ 15 / 20 secs to complete and a couple secs to process. Once there, the user can manipulate vaules on the fly. Any manipulation of values will require the data to be reprocessed completely.
Update: To avoid confusion, I am only making the data call 1 time (the 15 sec hit) and then wanting to keep the results in memory so that I will not have to call it again until the user is 100% done working with it. So, the first pull will take a while, but, using Ajax, I am going to hit the in-memory data to constantly update and keep the response time to around 2 secs or so (I hope).
In order to make this efficient, I am moving the intial data into memory and using Ajax calls back to the server so that I can reduce processing time to handle the recalculation that occurs w/ this user's updates.
Here is my question, with performance in mind, what would be the best way to storing this data, assuming that only 1 user will be working w/ this data at any given moment.
Also, the user could potentially be working in this process for a few hours. When the user is working w/ the data, I will need some kind of failsafe to save the user's current data (either in a db or in a serialized binary file) should their session be interrupted in some way. In other words, I will need a solution that has an appropriate hook to allow me to dump out the memory object's data in the case that the user gets disconnected / distracted for too long.
So far, here are my musings:
Session State - Pros: Locked to one user. Has the Session End event which will meet my failsafe requirements. Cons: Slowest perf of the my current options. The Session End event is sometimes tricky to ensure it fires properly.
Caching - Pros: Good Perf. Has access to dependencies which could be a bonus later down the line but not really useful in current scope. Cons: No easy failsafe step other than a write based on time intervals. Global in scope - will have to ensure that users do not collide w/ each other's work.
Static - Pros: Best Perf. Easies to maintain as I can directly leverage my current class structures. Cons: No easy failsafe step other than a write based on time intervals. Global in scope - will have to ensure that users do not collide w/ each other's work.
Does anyone have any suggestions / comments on what I option I should choose?
Thanks!
Update: Forgot to mention, I am using VB.Net, Asp.Net, and Sql Server 2005 to perform this task.
I'll vote for secret option #4: use the database for this. If you're talking about a 20+ second turnaround time on the data, you are not going to gain anything by trying to do this in-memory, given the limitations of the options you presented. You might as well set this up in the database (give it a table of its own, or even a separate database if the requirements are that large).
I'd go with the caching method of for storing the data across any page loads. You can name the cache you want to store the data in to avoid conflicts.
For tracking user-made changes, I'd go with a more old-school approach: append to a text file each time the user makes a change and then sweep that file at intervals to save changes back to DB. If you name the files based on the user/account or some other session-unique indicator then there's no issue with conflict and the app (or some other support app, which might be a better idea in general) can sweep through all such files and update the DB even if the session is over.
The first part of this can be adjusted to stagger the write out more: save changes to Session, then write that to file at intervals, then sweep the file at larger intervals. you can tune it to performance and choose what level of possible user-change loss will be possible.
Use the Session, but don't rely on it.
Simply, let the user "name" the dataset, and make a point of actively persisting it for the user, either automatically, or through something as simple as a "save" button.
You can not rely on the session simply because it is (typically) tied to the users browser instance. If they accidentally close the browser (click the X button, their PC crashes, etc.), then they lose all of their work. Which would be nasty.
Once the user has that kind of control over the "persistent" state of the data, you can rely on the Session to keep it in memory and leverage that as a cache.
I think you've pretty much just answered your question with the pros/cons. But if you are looking for some peer validation, my vote is for the Session. Although the performance is slower (do you know by how much slower?), your processing is going to take a long time regardless. Do you think the user will know the difference between 15 seconds and 17 seconds? Both are "forever" in web terms, so go with the one that seems easiest to implement.
perhaps a bit off topic. I'd recommend putting those long processing calls in asynchronous (not to be confused with AJAX's asynchronous) pages.
Take a look at this article and ping me back if it doesn't make sense.
http://msdn.microsoft.com/en-us/magazine/cc163725.aspx
I suggest to create a copy of the data in a new database table (let's call it EDIT) as you send the initial results to the user. If performance is an issue, do this in a background thread.
As the user edits the data, update the table (also in a background thread if performance becomes an issue). If you have to use threads, you must make sure that the first thread is finished before you start updating the rows.
This allows a user to walk away, come back, even restart the browser and commit whenever she feels satisfied with the result.
One possible alternative to what the others mentioned, is to store the data on the client.
Assuming the dataset is not too large, and the code that manipulates it can be handled client side. You could store the data as an XML data island or JSON object. This data could then be manipulated/processed and handled all client side with no round trips to the server. If you need to persist this data back to the server the end resulting data could be posted via an AJAX or standard postback.
If this does not work with your requirements I'd go with just storing it on the SQL server as the other comment suggested.

Resources