I've done a fair amount of research over the past few days, but I'm not sure what the current best practice is for concurrent Core Data. The most relevant post seems to be this blog post, but in light of this analysis about the performance of different concurrency methods, it seems that the modern way with parent contexts might not be the best. Also, this example from Apple doesn't implement the best practice mentioned in Apple's own concurrency guide that recommends NOT using the default NSConfinementConcurrencyType.
In light of all of this, what is the simplest and best way to implement concurrency with Core Data? All I need is a background thread the do some long writes to Core Data without hanging up the UI. Code examples are appreciated.
As always, it really depend on what you are trying to accomplish.
"long writes" will hang your UI no matter the architecture you implement.
A write operation lock the DB file in the OS level and the sqlite engine level (if you use this kind of store), all pending read operations will have to wait for the write to end before they complete.
One of the most used optimisation methods is segmenting the database "load" process with multiple save operations (you shouldn't mind as this happen in the background).
So, to answer the question:
The simplest way for you, would probably be to use the architecture described in the blog post you mentioned (parent-child hierarchy). if you notice that this cause to much "stutter" for your UI, try to optimise your data load process or try a different architecture.
use instruments to find "bottle necks" in your application execution.
CodeData has "quirks"/bugs in every architecture that I know, and you will find them gradually, depending on your use of it.
My recommendation is to go with the parent/child context pattern. In the light of the sparse details you give (e.g. number of records, total volume of data, latency of delivery, etc). this seems to be the most flexible and proven solution that can also accommodate very large datasets.
Contrary to other claims, you can have a smoothly operating UI regardless of how long the "writes" are to your database. Obviously, that is what background threads are for. The mechanism to keep the UI fluid is through so-called notifications about data change. You can react to these gracefully without disturbing the user experience.
Your remark about NSConfinementConcurrencyType is correct. As stated in your source, it is there for backward compatibility, so you can just forget about it. Obviously, for concurrency you want to use NSPrivateQueueConcurrencyType when creating your context.
Related
I am building a healthKit based app and am wondering how to best save healthKit data persistently.
My current approach is to get the data and save it as attributes of a custom class object and then save it in core data as NSData.
In terms of performance is Realm faster than CoreData?
According to http://qiita.com/moriyaman/items/1a2916f4c2b79e934370 CoreData is apparently slower than FMDB which is slower than Realm. Can someone confirm if this is true even after taking into account faults and indexes?
Disclaimer: I work at Realm
The question of which persistence product will perform best among the solutions you mentioned is highly dependent on the type/amount/frequency of your data. Since Core Data and FMDB are layers over SQLite, they can't be faster than SQLite by design, but they provide enough convenience to be worthwhile to many users. On the other hand, Realm isn't based on SQLite, but rather its own custom database engine that was designed specifically for modern smartphones. It was designed to strike a better balance between powerful features, simple API without adding a large performance hit.
You can see public benchmarks comparing Realm/SQLite/Core Data/FMDB in Realm's launch blog post here: http://realm.io/news/introducing-realm#fast
Finally, your approach of serializing HealthKit information into NSData using something like NSCoding is going to be terribly inefficient. No matter the persistence solution you choose, you'll be better served by using the serialization built in to those products rather than storing an already serialized data blob.
As I commented to #jpsim, it is difficult to simply compare the performance of Core Data to lower-level frameworks like FMDB, or differently-abstracted frameworks like Realm. Which approach you choose will dramatically impact how you build your program, which will tend to shuffle the performance problems around to different places.
Core Data and SQLite solve very different problems. SQLite is a relational database. Core Data is an object persistence engine. I'm not an expert on Realm, but it seems to be trying to strike a balance between those two approaches with more low-level control than Core Data affords, but a closer tie-in to the object model than SQLite. The fact that Realm (at least in my impressions of it) gives you more low-level control opens up opportunities for you to optimize things, or to mess things up IMO. That's neither good nor bad, it just makes it hard to apples-to-apples compare them, and particularly makes generic "performance benchmarks" problematic. The question isn't whether it's possible for someone to write faster code using engine A vs. engine B. The question is whether you will likely write acceptably performant code in each engine, while avoiding bugs, and minimizing development time.
In general, I believe HealthKit data is supposed to be stored in HealthKit in order to protect privacy. You should be careful about storing this data in your own storage anyway. Be particularly aware of the guidelines about iCloud:
Apps using the HealthKit framework that store users’ health information in iCloud will be rejected
I don't know how this will impact documents that you store and are then backed-up to iCloud. Just leaving the data in HealthKit is the best way to not have to worry about such problems.
In any case, though, performance is just one axis to consider. You didn't indicate anything to suggest that you have very special performance problems (for example, that you're handling tens of thousands of records, or real-time data, or something like that). So I would focus first on what tool meet your general needs best, and then do some basic experiments to make sure the performance is reasonable, and then optimize as you find issues.
Okay, so first off before anyone attempts to make a determination that this is a "duplicate" question; I have reviewed most of the posts on SO regarding similar questions but even in combination of all that has been said I still am somewhat at a dilemma as to the definitive or maybe I should say unanimous agreement on this.
I can however say that I have (based on the posts) conclusively determined that the answer is based on the scope of the requirement. But even with consideration of this, the opinions seem too diverse for me to make a decision as to how I should handle this.
My immediate requirement is that I need to persist variable data from 1 controller across many views. More specifically, I have a controller and corresponding view that handles shopping cart item counts and I would like to persist that data across multiple views. I am thinking that the _layout view is the most logical choice for this.
Now I have successfully accomplished this task by assigning the value to a Session variable which is retrieved from my _layout view; so even when the user were to navigate any where within the site the number of items in the Shopping Cart will persist until either they leave the site or complete the checkout; in which case the variable will be cleared in code.
The posts I've read seemed biased to either staying away from Session variables in favor of Cookies and storing data in a database; or stating that for the intent purpose for which I propose to use them, Session variables are perfectly fine to use.
The other thing I've read suggests that Session variables can potentially impede overall performance if there is high traffic on the site since the information is stored on the server.
I personally cannot justify storing this type of information in a database and subsequently hitting the database as I'd imagine that this could also affect site performance and seems a bit overkill for storage of temporary data. TempData, ViewData and ViewBag do not work in persisting the data so they are not logical choices for the requirement IMO.
If there is another well suited alternative to the Session variable (which is working for me) I would like to know what it is.
2 posts that seem contradictory in effort of providing best recommendations leave me a bit confused.
Cons: Is it a good practice to avoid using Session State in ASP.NET MVC? If yes, why and how?
Pros: Still ok to use Session variables in ASP.NET mvc, or is there a better alternative for some things (like a cart)
Seems that this question (although presented in many different variations) has no definitive answer that I can conclude.
If there is a more preferrable way to accomplish this without overkill then that is the answer I'm in search of.
I read somewhere the use of MVC filters in tandem with the Global.ascx application start section as well, but this does not seem appropriate for variables set at the controller level as much as perhaps, static variables.
Can someone maybe squash (for lack of a better word) the many diverse opinions on the topic and maybe provide a more definitive answer to the question? I'm sure the diverse opinions have their place and I'm not attempting to discredit them. But having a definitive and possibly unanimous answer would be better; then I could sort through the other posts to determine what is best for my application.
Of course, if this question has no definitive answer; just tell me that and I'll attempt to derive my own answer from the other posts.
Thanks
===========================================================
UPDATED RESPONSE TO ANSWERS PROVIDED
Caching and Cookies seem to be a general preference from the responses however I've also noted the statement that caching its not an ideal candidate to use across multiple web server because synchronization can be a potential issue.
Giving credit to Tim, it's stated that Database storage is optimized and users have the option to return at a later time and continue where they left off.
That is an excellent point, but keeping foresight on probabilities; its likely a reasonable given that some users may not return leaving unneccessary data in the database.
So keeping the DB optimized and clean (which "to me" is of equal relevance) would require implementing a maintenance task to automatically expire those records based on a set threshold of time to account for those circumstances. Although a maintenance task is not an unquestionable option, I still think this adds just a bit more work to the task simply for the intent purpose of serving as temporary storage.
Nonetheless, I do respect Tim's recommendation and believe it deserves merit on countering my initial opinion to a degree; that a database would not seem to be a viable option for storing Temporary data; so I think the compromise would be to store the data in a database (given the scenario of a Shopping Cart or similar) perhaps after a checkout. This way as you previously stated, the data may be persistently tracked upon subsequent visits so you have a record of transactions. But more importantly, it would be data of those transactions having real relevance to persist to the database.
It was also stated that although Session is faster than Database; but notwithstanding to have its caveats that can to some degree be mitigated by other mechanisms such as leveraging the SessionStateBehavior attribute, just serving as one example.
BUT... I think Erik kind of drove the point home with the Dunning-Kruger Effect. Although, from the content and explanations for proposed answers given here; I seriously doubt the expertise of any of the individuals who have responded is any way questionable. Nonetheless, I tend to agree on the fact of getting a unanimous opinion may be somewhat of a higher than reasonable expectation on my part.
What I was more specifically looking for was a general consensus for a technique that would comfortably accomodate a diverse number of scenarios. In other words, something that would accomodate not only my particular scenario but also provide the element of scalability to larger environments with potentially heavier traffic. This way a change in the programming would be either alleviated altogether or minimal at best.
==================================================
Summary based on the feedback:
Session variables seem to accomodate smaller case scenarios and when applicable, but they have some potential for persistence concerns among other notable discrepancies as stated very thorougly by Erik. So this option obviously will not fit a scalable model.
Caching is preferable over Session variables but again not neccessarily the "best" scalable option due to among other things to the potential synchronization complexities in web server farm environments as previously pointed out. But an option nonetheless.
Database storage is scalable but for the intent purpose of temporary volatile storage is probably not the most elegant option from a database perspective as it would require periodical cleanup. Personally, having a strong foundation in database concepts earlier in my career this probably is not going to be something that many developers will likely agree with; but using the database for this purpose may suffice for Web Development from a programmers perspective; however from perspective of the DAL and DB development this (to me) has the potential for mandating an additional DB task to enforce an efficient backend.
Cookies seem to be a nice option having the combined "desirable" elements of Session variables and caching.
==================================================
CONCLUSION
Based on the answers; I think COOKIES and CACHING seem to be generally well rounded proposals for best practice across the board in combination with database storage when continued persistence is required after the fact; as potentially good candidates for scalability of the ones presented.
The ultimate choice between the 2 would seem to be based on the amount and type of data requiring storage (e.g. sensitive vs non-sensitive and whether or not there is any concern that the client may alter the data on their end); in addition to special considerations for COOKIES in the fact that they may be disabled by the clients.
Obviously, there is no one size fits all solution as clearly pointed out and concluded from the answers provided but in terms of scalability; I may be wrong but these seem to be the BEST choices available.
Because all the responses are good; I'm fairly going to credit all the posts as useful and going to accept Erik's answer as a well rounded overall scalable solution. I wish I could select more than one accepted answer as I believe Tim's response was also very well layed out and concise.
Gupta's response was good also, but I wanted more elaboration of the proposed answer and not a repeat of previous posts.
Thanks Guys!
You will never get unanimous opinion on anything in any large group of people. That's just human nature. Part of that stems from the Dunning-Kruger Effect which states that the less someone knows about a subject, the more likely they are to over value their expertise in that subject. In other words, lots of people think they know something, but only because they don't know they don't know it. Part of it is simply that people have different experiences, and some have found no problems with session, while others have in various situations, or vice versa...
So, to backup your research, which suggest that the answer depends heavily on the requirements, we need to understand what your requirements are. If this is to be a high traffic site, with load balanced servers in a web farm, then stay as far away from session as you can. Sure, it's possible to share session in various ways in a server farm environment (session server, distribute cache server, etc..), but avoiding session will almost always be faster if you can help it.
If your site is a single server, and unlikely to ever grow beyond that. And your traffic patterns are relatively low, then session may be a useful option. However, you should always be aware that session is unreliable storage, and can disappear on you at any time. If the app pool is recycled, session is gone. If an uncaught exception bubbles up to the worker process, the session may be gone. If IIS thinks there's not enough memory, your session may be gone, regardless of any timeout values configured. You also can't always get reliable notification that a session has ended, since terminated sessions do not fire the Session_End event.
Another issue is that Session is serialized. In other words, IIS prevents more than one thread from writing to the session at a time, and it often does this by locking the session while a thread is running if it has not opted out of writable session locking. This can cause severe problems in some cases, and merely poor performance in others. You can mitigate this by marking various methods with a read-only session attribute if you aren't going to be modifying it in that method.
Ultimately, if you do choose to use session, then try to only use it for small, short lived things if at all possible, and if not possible then build in a way to "regenerate" the data if the session is lost. For instance, using your number of items in cart example, you could write a method that first checks to see if the value is there, and if not it goes out and loads it from the database. Always use this method to access the variable, rather than accessing it directly from session... this way, if the session is lost it will just reload it.
However, having said this... For the number of items in a cart, I would generally prefer to use a cookie for this information, since cookies get passed to the page on every load anyways, and this is a small discrete unit of data. Generally prefer Session for sensitive data that you want to prevent the user from being able to change.. number of items in the cart simply doesn't fit that rule.
When
Databases are highly optimized. A simple value like a shopping cart count is a good candidate for caching by the database and (hopefully) cheap to compute outright. It may be a non-issue.
However, if you have ruled out other mechanisms, small, user-by-user values are viable candidates for session.
Cache is fine for site-wide values, or user-specific values with unique keys. However, synchronizing caches across multiple web servers can be difficult. Out of process session state will stay synchronized because it is stored in a single location (database or a state server).
Of course, there are many 3rd party caching alternatives with various options to keep them synchronized.
Regardless of where the count is temporarily stored, I'm of the opinion that shopping carts themselves should be stored in the database so that users have the option to return later and continue where they left off.
Performance
If you use out of process session state (e.g. in a load balanced environment and/or to make session more durable), it will hit a database or call an out of process service, but the call is relatively cheap unless you are serializing large object graphs.
Session is loaded once per request. Subsequent read access is very fast.
Writing to session can be detrimental to performance, even when there is no load. Why? most modern applications use asynchronous calls, and when multiple async calls hit an HTTP handler (page, controller, etc) that reads/writes session, ASP.Net will lock the session to serialize access. To avoid this, you can decorate your controllers with [SessionState( SessionStateBehavior.ReadOnly )]
Design
Now I have successfully accomplished this task by assigning the value
to a Session variable which is retrieved from my _layout view;
This seems like mixing concerns, i.e. having the view aware of the underlying storage mechanism. From a purist standpoint, I would set this value on a view model or at least put it in the ViewBag. From a practical standpoint, one or two values retrieved in this manner probably won't hurt anything, but beware of letting it grow much further.
I read somewhere the use of MVC filters in tandem with the Global.ascx
application start section as well, but this does not seem appropriate
for variables set at the controller level as much as perhaps, static
variables.
Static variables have perfectly legitimate uses, but you must understand them thoroughly or risk serious problems.
See my answers pertaining to static variables in ASP.Net:
does aspx provide special treatment for c# static variables
Static fields vs Session variables
Session alternative in different prospective :-
When you keep something in session it breaks the primary rule in ASP.NET MVC. You can use these options as an alternative of session.
If your asp.net (MVC) session do boxing unboxing on the object then it makes a little load on the server. Try this idea
Caching :- Storing a List or something like large data in session is better can fit in Caching. You have control on whenever you want it to expire rather than user session.
If your app depends on JSON/Ajax data then you can use some kind of functionality provided in html5 (like WebSQL, IndexDB). it will not use the cookie so you can save some workload on the server.
I'm trying to write a iOS note taking app that is blazingly fast for a large number of notes and that syncs without ever blocking the UI. (Don't worry, it's just a learning project, I know there are a billion note apps for iOS). I have decided to use Core Data (mostly because of the excellent posts by Brent Simmons about Vesper). I know UIManagedDocument can do async reads and writes and has a lot of functionality built in, so I'm wondering if there is any information on which would be faster for a fairly simple notes app. I can't really find a lot of information about people using UIManagedDocuments for anything other than a centralized, basically singleton, persistent store. Is it suitable for 1000s of documents? Would it be faster or slower than just a database of NSManagedObjects? It seems like most information I can find about Core Data is oriented towards people using NSManagedObject, so any information about UIManagedDocuments being used in production apps would be really helpful. At this point, the only thing I can think of is to just write the whole app both ways, load 10,000 notes into it, and see what happens.
Update
To clarify, I'm not learning iOS development and Objective-C, the "learning project" mostly means that I've never used Core Data and would like to learn how to write a really performant Core Data application.
UIManagedDocument is designed/intended for document based applications. One UIManagedDocument instance per document. If you are not building a document based application then you should not be using UIManagedDocument.
Everything that people like about UIManagedDocument can be accomplished with very little effort using the Core Data stack directly. UIManagedDocument abstracts you away from what your persistence layer is doing. Something you really do not want.
If you want a high performance Core Data application you do not want to be using UIManagedDocument. You will run into issues with it. It will do things at random times and cause performance issues.
You are far, far, better off learning the framework properly.
In the case of Vesper, those are not documents; they are too small. Think of documents as Word files, or Excel files. Large complicated data structures that are 100% isolated from each other.
Also, whether you use a UIManagedDocument or not, you will be using NSManagedObject instances. NSManagedObject, NSManagedObjectContext, NSPersistentStoreCoordinator are all foundational objects in Core Data. UIManagedDocument is just an abstraction layer on top.
Finally, Core Data is not a database. Thinking of it that way will get you into a jam. Core Data is an object model that can persist to disk and one of the persistence formats happens to be SQLite.
Update (Running Into Problems)
UIManagedDocument is an abstraction on top of Core Data. To use UIManagedDocument you actually need to learn more about Core Data than if you just used the primary Core Data stack.
UIManagedDocument uses a parent/child context internally. Don't understand that yet? See the point above. What it also means is that your requests for it to save are "taken under advisement" as opposed to being saved right then and there. This can lead to unexpected results if you don't understand the point of it or don't want it to save when it feels like it.
UIManagedDocument uses asynchronous saves and at most you can request that it save. Doesn't mean it is going to save now, nor does it mean you can easily stop and wait for the save to complete. You need to trust that it will complete it. In addition, it may decide to save at an inopportune moment.
When you start looking for performance gains with Core Data you tend to want to build the stack in a very specific way to maximize the benefit to your application. That is application dependent and with the abstracts in UIManagedDocument you get limited very quickly.
Even in a situation where I was building a document based application I would still not use UIManagedDocument. Just to much behind the curtain.
Performance is likely to come down to how the rest of your code is implemented, and not necessarily the difference between a UIManagedDocument or a NSManagedObject.
Think of UIManagedDocument as a specific niche implementation of CoreData, that already has the parts of CoreData you want baked into it's structure to save you (the developer) a little bit of time writing code. It's been purpose built for handling UIDocuments, and multi-threading.
Under the hood, it's likely that UIManagedDocument is using CoreData as well as you would (assuming you know what you're doing), but it's theoretically possible you could cut some corners by knowing the exact and inane details of your implementation.
If you're new to CoreData, or GCD and NSOperationQueue, then you'll likely save a ton of developer time by leveraging UIManagedDocument instead of rolling your own.
A very apt analogy would be to using a NSFetchedResultsController to run your UITableView, rather then rolling your own CoreData and UITableView implementation.
If you're new to objective-C, I'd recommend you bang out something functional with UIManagedDocument at first. Later you get lost in the weeds of dispatch_asynch() and NSFetchRequest, and pick up a few milliseconds of performance here and there.
Cheers!
Just store a plain text file on the disk for each note.
Keep a separate database (using sqlite or perhaps just -[NSDictionary writeToFile:atomically]) to store metadata, such as the last user modification date and the last server sync date for each note.
Periodically talk to the server, asking it for a list of stuff that has changed since the last time you did a sync, and sending any data on your end that has changed in the same time period.
This should be perfectly fast as long as you have less than a million note files. Do all network and filesystem operations on an NSOperationQueue.
I've been working on implementing an automated trading system in Grails based on Interactive Brokers' API (brief overview here: Grails - asynchronous communication with 3rd party API) for longer than I care to admit. This is a high-frequency trading strategy, so it's not as simple as placing an order for 100 shares and getting filled. There's a lot of R&D involved, so my architecture and design have been and continue to morph and evolve over time.
What has become clear over the past month or so is that the asynchronous nature of the API is killing me. I have a model of my intended position within Grails, but this does not automatically reflect the actual state at the brokerage. There is a process of creating orders, some of which get filled now, some later, and some never. There could be partial fills, canceled or rejected orders, or any number of other errors. And the asynchronous updates have turned into a nightmare of pessimistic locks, ugly relationships and dependencies between Positions, Intents, Orders, Transactions, etc. And still, with all that unelegant, smelly code, there are times when my internal model gets out of sync with the actual state of the brokerage account. And that is a very dangerous situation.
So, I'm realizing that I need some kind of async framework that will allow Grails and the IB API to maintain precisely the same state without fail. I am somewhat familiar with Gpars, Akka, Promises, and Actors but only on the surface; I have no hands-on experience with any of them. Just recently, I saw Parse's Bolt Framework Tasks and wondered if that might be a good fit. My need is not really for parallelism or multi-threading of computations or collections. All I am trying to do is make sure that the async callbacks from IB are properly reflected in the Grails domain classes at all times. And I'm hoping that the right framework will allow me to delete tons of ugly spaghetti code that I've written trying to solve this problem.
What I need is a recommendation on the right framework, model, or architecture that addresses this problem. I welcome any recommendations, whether or not I mentioned them above.
Background:
I'm a software engineering student and I was checking out several algorithms for recommendation systems. One of these algorithms, a collaborative filtering has a lot of loops int it, it has to go through all of the users and for each user all of the ratings he has made on movies, or other rateable items.
I was thinking of implementing it on ruby for a rails app.
The point is there is a lot of data to be processed so:
Should this be done in the database? using regular queries? using PL/SQL or something similar (Testing dbs is extremely time consuming and hard, specially for these kind of algorithms )
Should I do a background job that caches the results of the algorithm? (If so the data is processed on memory and if there are millions of users, how well does this scale)
Should I run the algorithm every time there is a request or every x requests? (Again, the data is processed in memory)
The Question:
I know there are things that do this like Apache Mahout but they rely on Hadoop for scaling. Is there another way out? is there a Mahout or Machine Learning equivalent for ruby and if so how where does the computation take place?
Here is my thoughts on each of the methods:
No it should not. Some calculations would be much faster to run in your database and some would not. However it would be hard and time consuming to test exactly which calculations that should be runned in your db, and you would properly experience that some part of the algorithm is slow in postgreSQL or whatever you use.
More importantly: this is not the right place to run logic, as you say yourself, it would be hard to test and it's overall a bad practice. It would also affect the performance of your requests overall each time the db have to calculate the algorithm. Also the db would still use a lot of memory processing this so that isn't a advantage.
By far the best solution. See below for more explanation.
This is a much better solution than number one. However this would mean that your apps performance would be very unstable. Some times all resources would be free for normal requests, and some times you would use all your resources on you calculations.
Option 2 is the best solution, as this doesn't interfere with the performance of the rest off your app and is much easier to scale as it works in isolation. If for example you experience that your worker can't keep up, you can just add some more running processes.
More importantly you would be able to run the background processes on a separate server and thereby easily monitor the memory and resource usage, and scale your server as necessary.
Even for real time updates a background job will be the best solution (if of course the calculation is not small enough to be done in the request). You could create a "high priority" queue that has enough resources to almost always be empty. If you need to show the result to the user with a reload, you would have to add some kind of push notification after a background job is complete. This notification could then trigger an update on the page through javascript (you can also check out the new live stream function of rails 4).
I would recommend something like Sidekiq with Redis. You could then cache the results in memcache or you could recalculate the result each time, that really depends on how often you would need to calculate this. With this solution, however, it would be much easier to setup a stable cache if you want it.
Where I work, we have an application that runs some heavy queries with a lot of calculations like this. Each night these jobs are queued and then run on a isolated server over the next few hours. This scales really well, and is also easy to monitor with new relic.
Hope this helps, and makes sense (I know my English isn't perfect), but please feel free to ask if I misunderstood something or you have more questions.