I hope this hasn't been asked before, I searched for a similar question but couldn't find something that's close enough to my case.
Basically, my question is if CoreData can handle a big amount of relationships (by big I mean maybe up to 30 on some entities) while not being slowed down too much.
I have these entities (Just as a demonstration):
Resources (contains all Resources for an object)
ProcessA
ProcessB
ProcessC
ProcessD
ProcessE
... (many more)
These processes are needed for various tasks but nearly all of them need to have a to-one (or to-many) relationship to Resources.
Sometimes, they even need multiple relationships to Resources (currentResources and oldResources or whatever). That's why I need to add an inverse relationship (it actually causes problems if I don't have inverses. I don't know why).
Due to the inverses, the Relationships entity starts to look like this:
Resources
someAttribute
...
---
-> inverseOfProcessA
-> inverseOfProcessB
-> inverseOfProcessC
-> inverseOfProcessD
-> inverseOfProcessE
-> ...
It's getting filled with inverse relationships that I will never need to access from this side (they just have to be there to prevent some strange things from happening).
I read into the concept of faulting. Is my understanding correct that when I fetch a Resources object, it doesn't actually load all the relationships? If so, how efficient is faulting? Because it does have to fill all of the relationships with fault-objects. That's the part I'm not really sure about. Does this impact performance when having many relationships on an entity?
I am sorry if this is a trivial question. The answer might be obvious, but I've just started worrying about performance and this interests me (and even though I read about faults in the documentation, this is not 100% clear to me)
Actually, if you're asking yourself if Core Data is going to struggle with your model, you have to ask yourself if your choices are good.
I don't know if Core Data can handle this, the answer is probably yes, but it depends on various other factors : are we talking about few entries or thousands of entries ? how many fetch requests ?
About your model : why not create a single entity Process with a way to distinguish a Process A from a Process B (with an identifier being an integer or a string) ?
About faulting : by default, when you create an NSFetchRequest, returnsObjectsAsFaults is set to true allowing you to work with large datasets without materializing all objects in memory. If you know that you're going to access all relationships, you can set this property to false in order to work with fully materialized objects, saving you bunch of round trips to the persistent layer.
Remember that Core Data is just a wrapper around a relational database so keep this aspect in mind to optimize your model and your requests.
I'm using a parent-child MOC architecture as described by Marcus Zarra in his blog post and talk.
It's generally working great, but I have an ordered one-to-many relationship where the "many" accumulates a lot of records over time. The issue is, in the process of saving the private context to disk, CoreData runs a select query for what appears to be every single object in the association, one at a time, even if it hasn't been touched. As you can imagine, this is incredibly slow.
Any ideas on how to eliminate this or at least make it batch it into one query?
Ordered relationships are problematic for a number of reasons, but this is out of scope for this question.
One obvious solution attempt is to replicate the ordered property yourself by introducing your own attribute to keep track of the order. This is the path I have taken in all my projects where I had this use case. Your own ordering logic gives you much more granular control over the expensive processes, such as inserting an element into the series (rather than just appending it at the end).
Please note that in many applications the ordered property can be modeled differently, e.g. with a time stamp, which in some cases can spare you the necessity of modifying the whole chain.
As for the problem of using "one query": You could fetch all objects needing to be reordered, modifying their order (e.g. by adding them one by one to the parent object), and save.
I am building an app with Objective-C and I would like to persist data. I am hesitating between NSKeyedArchiver and core Data. I am aware there are plenty of ressources about this on the web (including Objective-C best choice for saving data) but I am still doubtful about the one I should use. Here are the two things that make me wonder :
(1) I am assuming I will have around 1000-10000 objects to handle for a data volume of 1-10 Mb. I will do standard database queries on these objects. I would like to be able to load all these objects on launching and to save them from time to time -- a 1 second processing time for loading or saving would be fine by me.
(2) For the moment my model is rather intricate : for instance classA contains among other properties an array of classB which is itself formed by (among other) a property of type classC and a property of type classD. And class D itself contains properties of type classE.
Am I right to assume that (1) means that NSKeyedArchiver will still work fine and that (2) means that using core Data may not be very simple ? I have tried to look for cases where core Data was used with complex object graph structure like my case (2) on the web but haven't found that many ressources. This is for the moment what refrains me the most from using it.
The two things you identify both make me lean towards using CoreData rather than NSKeyedArchiver:
CoreData is well able to cope with 10,000 objects (if not considerably more), and it can support relatively straight-forward "database-like" queries of the data (sorting with NSSortDescriptors, filtering with NSPredicate). There are limitations on what can be achieved, but worst case you can load all the data into memory - which is what you would have to do with the NSKeyedArchiver solution.
Loading in sub-second times should be achievable (I've just tested with 10,000 objects, totalling 14Mb, in 0.17 secs in the simulator), particularly if you optimise to load only essential data initially, and let CoreData's faulting process bring in the additional data when necessary. Again, this will be better than NSKeyedArchiver.
Although most demos/tutorials opt for relatively straight forward data models (enough to demonstrate attributes and relationships), CoreData can cope with much more sophisticated data models. Below is a mock-up of the relationships that you describe, which took a few minutes to put together:
If you generate subclasses for all those entities, then traversing those relationships is simple (both forwards and backwards - inverse relationships are managed automatically for you). Again, there are limitations (CoreData does the SQL work for you, but in so doing it is less flexible than using a relational database directly).
Hope that helps.
I'm giving up traditional DDD, which is often a massive timewaster, and forces me to do endless mapping: data layer <--> domain layer <--> presentation layer.
For even a small change I must change data models, domain models, presentation models / viewmodels, then the repositories, manager/service classes, and of course the AutoMapper maps, and then test the whole thing! Each call requires calling a layer which calls a layer which calls the underlying code. And I don't get anything in return other than "you might need it in the future". Meh.
My current approach is more pragmatic:
I don't worry about the difference between the "data layer" and "domain layer" any longer, as there's no point - the terms are interchangeable. I let EF do it's thing, and add interfaces and repositories on top when needed.
I've merged my "data" and "domain" projects (into "core", boring name, I know), and I could almost swear that Visual Studio is actually running faster.
I allow EF entities to go up and down the stack, but, I still map them to presentation models / viewmodels as usual.
For simple operations I call repositories directly from controllers, for complex operations I use domain managers/services as usual; the repositories never expose IQueryable.
I define entities/POCOs as partial classes, so I can add domain behavior separately in corresponding partial classes.
The problem: I now use the entities all over the place, so client code can see their navigation properties. And the models are always materialized after they leave a repository, so those navigation properties are often null.
Possible solutions:
1. Live with it. It's ugly but preferable to the problems explained above.
2. For each entity, define an interface which hides the navigation properties; and make client code use the interfaces. But ironically, this means another layer (albeit thin and manageable).
3. What else?
I'm not used to this sort of fast-and-loose programming style, so maybe I'm missing some obvious tricks. Is there anything else I should take into account? I'm sure there are other problems I will encounter soon.
EDIT:
This question is not about DDD. And note that many struggle with a traditional DDD approach -- Seemann appears to arrive at the same conclusion, Rahien speaks about the "Useless Abstraction For The Sake Of Abstraction Anti Pattern", and Evans himself said DDD is only truly useful in 5% of cases. Also see this thread. Some of the comments/answers are predictably about how I'm doing DDD wrong, or how I can tweak my system to do it right. However, I'm not asking about DDD or bashing it for the cases where it is suitable, rather I'd like to know what others are doing in line with the thinking I've described above. It's not as if DDD is a panacea to all design ills, every decade a new process comes out (RUP anyone? XP, Agile, Booch, blah...). DDD is just the shiniest new one, and the most well known and used. But pragmatism should come first as I'm trying to build salable products that ship on time and are easy to maintain. The most useful programming axiom I've learned, by far, is YAGNI. What I want is to change my system to a sort of "DDD-lite", where I get it's strong design/OOP/pattern philosophy, but without the fat.
A typical persistence approach with DDD is to map the domain model directly to corresponding tables. Technically, the mappings are still there (and are usually declared in code), but there is no explicit data model, as pointed out by lazyberezovsky.
The problem with navigation properties can be resolved in a few different ways, regardless of whether you are employing DDD or not. I dislike approach 1 because it makes it more difficult to reason about your code - you never know which properties will be set and which won't. Approach 2 is much better in theory, because it makes it very explicit what that a given query requires and making things explicit is a good practice in general. A similar, but simpler and less brittle approach is to use read-models, which are just objects designed to fulfill requirements of a given query of set of queries. Within the context of DDD, they allow you to decouple behavior rich entities from queries, which are quite often at odds. Now proponents of DRY may scream heresy and come at you with torches and pitchforks, but in practice it is often much easier to maintain a read-model and an entity then to try to coerce entities to fulfill query requirements by way of interfaces or complex mapping strategies. Additionally, the responsibilities of a read-model and a behavior model are quite different, therefore DRY isn't applicable.
This is not to say that DDD is applicable in your scenario. It is often a wise decision to avoid full fledged DDD, especially in scenarios that are mostly CRUD. You are correct to be cautious, a good example of KISS and YAGNI. DDD reaps benefits when your domain consists of complex behavior, not just data. At any rate, the read-model pattern applies.
UPDATE
For implementations that don't employ a read-model, take a look at Fetching Strategy Design where the notion of a fetching strategy allows the specification of exactly what is needed from the database which mitigates issues with navigational properties. The material referenced in the linked post is also of interest. Overall, this attempts to avoid the a layer of indirection present in other approaches. However, in my opinion, using the proposed fetching strategy is more complex than using a read-model while the net result is the same.
Some thoughts about this point:
... the repositories never expose IQueryable ... the models are always
materialized after they leave a repository ...
Your question is tagged with "asp.net-mvc", so you have a web application in mind. 90% or more of all requests will be GET requests that are supposed to fetch some data from the database and show those data in a web view. How often are those needed data really entities rather than only bags of properties (a selection of properties of an entity type or perhaps composed of properties from multiple entities)?
Say, your application has 100 views. Only a minority of these will show complete entities:
50 of them are list views that show selected data (a customer with ID and address, but without the customer's contact person, phone number and sales volume)
20 of them contain autocomplete text boxes to select a reference (the customer for an order, but only the customer's name and city is shown in the autocomplete list, not the rest of the address nor contact person, phone number and sales volume and only the first 5 hits are displayed)
1 is an edit view for a customer that shows everything, but not the sales volume
1 is a details view for a customer with his last five orders
1 is a details view for an order including order items including product of each item but without the product's supplier name
1 is the same view but specialized for the purchasing department that wants to see the supplier for each item and item's product with average supplier's lead time for the last three months.
1 is a view for the service department that shows the order with only the order items of product category "repair service"
1 view for the Human Resources department shows employees including a photo stored as a big blob
1 view for personnel planning department shows a short version of the employee without photo
etc., etc.
As a UI programmer I would have all kinds of data requirements to render a view with the examples above:
I need only a selection of properties
I need even different selections of the same entity's properties for different views
I need an order including all items but without a reference to a product
I need an order including all items (but not all properties of the items) and including a reference to a product and to a supplier (but not all supplier's properties)
I need an order including only a filtered list of order items
I need a customer including the last five orders, not all 3000 orders he ever had
I need an employee but please without the big blob image
etc., etc.
How to fulfill these requirements as a data access/repository/service developer?
I only provide a handful of methods and materialize entities: load order header, load order header with items, load order header with items and product, load order header with items and product and supplier, load customer header (throw 15 of the 20 properties away, dear UI developer, if you only need five properties), load customer header with all 3000 orders (throw 2995 away, dear UI developer, if you only need five), etc., etc. I return interfaces from the repositories that hide not loaded navigation properties.
I care about every detail that the UI needs: I create repository/service methods like GetFiveCustomerPropertiesForAutoComplete, GetCustomerWithLastFiveOrders, etc. etc. I return interfaces from the repositories that hide the properties (also scalar) I haven't loaded. Or I return "DTOs" that contain the requested properties. I change the repository/services and create new DTOs every day when a UI developer calls with a data requirement for the next view.
I return IQueryable<TEntity> from the repositories and tell the UI developer "create the LINQ query yourself to fetch the data you need for your views". (Next morning the DBA is complaining about hundreds of terrible performing database queries.)
I return "prepared" IQueryable<TEntity>s from the repositories/services that cover - for example - security concerns like applying Where clauses for the user's access rights or append a Where clause for a search term or apply a NoTracking option to the query. I tell the UI developer: "You are allowed to extend the query with a) projections (Select), b) paging (Take and Skip) and perhaps c) sorting (OrderBy) because I consider those three query parts as UI concerns. All other query requirements (filtering, joining, grouping, etc.) have to be implemented in the repository/service layer and are forbidden in the UI layer." The most important piece here are projections that materialize ViewModels directly through the LINQ/SQL query without intermediate mapping layer and without the overhead to load more than the needed columns/properties.
These are only some thoughts. Every approach has its benefits and downsides. Working in small teams where at least one or a few developers have an overview what is happening in both the repository/service and the UI/"projection" layer the last option works fine for me in my experience although it doesn't always work with the strict rules decribed (for example, the filter by product category for included order items of an order requires to apply a Where clause inside of the projection, i.e. in the UI layer). For POST requests and data modifications I would use DTOs that send to data collected from a view back to a service to be processed there.
For stricter separation of "query layer" and UI layer I would probably prefer something close to the second option, maybe not with an interface/DTO for every UI requirement, but somehow reduced to a set of DTOs for the most common requirements (with the price of a little overhead of sometimes unnecessarily loaded properties). However, I expect that to be more work than the last option due to the larger amount of necessary repository/service methods, the additional maintenance of (perhaps many) DTOs and the intermediate mapping between DTOs and ViewModels.
Personally I am concerned about materializing full entities, especially complex object graphs, when I don't need them 90% of the time. But my concern is not verified by extensive performance measurements proving that this approach is really a problem for a "normal" application that doesn't have special high performance needs.
How can anyone give you sound advice when we have no clue as to what it is you are building? In the grand scheme of things, you might be building the wrong solution (not saying you are). So do realize all we can relate to is technical design issues and similar past experiences.
Many people face your problem, indeed. The mapping is loose coupling tax in the land of static typing. Maybe a more dynamic language could solve some of your pain. Or maybe you might find virtue in automating more (DSL, MDA). You could also switch to client server instead.
Interfaces are not layers, rather abstractions. Use them wisely.
Personally, I'd never take these shortcuts. Been bitten too many times trying to skip steps. Logic starts popping up in odd places. If I have a data driven app to develop simple datasets come to mind, EF as well. But I don't call the objects aggregate or entity in the DDD sense, just entity in the ERD sense. Transactionscript might be a better fit than doing the partial method sprinkeling. As for read model objects, these are not layers of indirection.
Overall, I get the feeling, and it is just that, you're making a mess of things because you fight the mapping friction by taking on a dependency on objects that don't reveal the required shape (navigation properties that are null) thereby causing problems in a different area.
I'll just try to be short - we went for the method 2 - ie, add layer of interfaces that you use on the client. You can have EF generate them for you, just a little tweak of the .tt templates.
Yes, it creates (yet) another layer, but it's logic-free and adds no complexity. Of course, if your client needs to deserialize entities, you have to add (yet) another layer that will handle deserialization and reference both the entities definitions and the interfaces that he'll return to the client. But it's also thin, so we learned to live with it, because it turned out to work just fine, and the client really stays clean...
The problem: I now use the entities all over the place, so client code
can see their navigation properties.
I don't quite get why this is a problem and how it's related to EF entities in particular. By client code do you mean presentation layer code or any code consuming your entities ?
For UI code a simple solution is to define ViewModels that just don't expose these navigation properties (or only expose a few of them depending on the object graph depth your GUIs need).
For other code it's only normal to be able to see the navigation properties of entities. They are public for a reason. You can end up breaking the Law of Demeter if you abuse them, but it's a matter of developer discipline not to fall into that trap.
An entity contains its own contract - all code that has access to the entity is supposed to be able to use any part of this contract. If you feel like your entities are exposing too much and that you need to put interfaces on top of them to restrain access to certain parts, maybe it's just a different entity.
I don't worry about the difference between the "data layer" and "domain layer" any longer, as there's no point - the terms are
interchangeable. I let EF do it's thing, and add interfaces and
repositories on top when needed.
I've merged my "data" and "domain" projects (into "core", boring name, I know), and I could almost swear that Visual Studio is
actually running faster.
I allow EF entities to go up and down the stack, but, I still map them to presentation models / viewmodels as usual.
For simple operations I call repositories directly from controllers, for complex operations I use domain managers/services as
usual; the repositories never expose IQueryable.
I define entities/POCOs as partial classes, so I can add domain behavior separately in corresponding partial classes.
None of these things seems to be fundamentally anti-DDD to me, except data/domain separation.
Especially if you do database-first EF -DDD is clearly a domain-centric approach and you shouldn't define your tables before defining your entities. It's also not clear whether some of your domain entities talk to the database or EF directly (not DDD - and more generally, layered-architecture - compliant) or you systematically have data access objects in between (DDD compliant).
I have a coredata db running on an ipad with iOS 5.1.1. My database has around 50,000 companies in it. I have created an index on the company name attribute. I am searching on this attribute and on occasion get several thousand records returned with a fetchRequest.
When several thousand records are returned then it can take a couple of seconds to return from the fetch. This makes type-ahead searching pretty clunky.
I anticipate having much larger databases in the future. What are my options for implementing a really fast search function?
Thanks.
I recommend watching the core-data performance videos from the last few WWDCs. They often talk about strategies for improving this kind of bottleneck. Some suggestions from the videos :
De-normalise the name field into a separate "case and diacritic insensitive" searchString field and search on that field instead, using <, <= or BEGINSWITH. Avoid MATCHES and wildcards.
Limit the number of results returned by the NSFetchRequest using fetchLimit and fetchBatchSize
If your company object is large you can extract some of the key data items off onto a separate smaller "header" object that is used just for the search interface. Then add a relationship back to the main object when the user makes a selection.
Some pointers to a couple of videos (there are more from other years also):
WWDC 2012: Session 214 - Core Data Best Practices : 45:00
WWDC 2010: Session 137 - Optimizing Core Data Performance on iPhone OS: 34:00
While Core Data is the correct tool in many cases, it's not a silver bullet by any means.
Check out this post which discusses a few optimizing strategies and also cases when an SQL database is your better choice instead of Core Data:
http://inessential.com/2010/02/26/on_switching_away_from_core_data
You may honestly be better off using an SQL database instead of Core Data in this case because when you try to access attribute values on Core Data entities, it typically results in a fault and pulls the object into active memory... this can definitely have performance and speed costs. Using a true database - Core Data is not a true database, see http://cocoawithlove.com/2010/02/differences-between-core-data-and.html - you can query the database without creating objects from it.