If I have a grain (or a client) that is going to send messages to another grain often (several times per minute for hours on end), what is the best practice for accessing that grain? Do I get it from the factory, use it, and discard it - getting a new grain reference every time? Or in this case should I "hold" a reference to the grain, only getting it from the factory once?
You can use the Factory. Getting the grain reference from a factory is a completely local operation and the references are also cached internally, so there is no need to hold and reuse them and you can simply get one from the factory every time you need one.
Related
I have a grain in Orleans for the players of a game. A player has several properties that I want to access directly in the client. Is it possible, is it efficient and does it make sense to have these as public properties on the grain? Or, should I have a GetAllState method that returns a DTO with the current value of these properties in the grain?
public interface IPlayerGrain : IGrainWithIntegerKey
{
// Individual public properties to access grain state?
string Name { get; }
int Score { get; }
int Health { get; }
// Or, get all the current grain state as DTO?
Task<PlayerState> GetAllState();
}
From my current understanding I think I will need to use GetAllState as I think any communication into the grain needs to be via a method and this may pass between silos. So, you probably want to minimise the number of messages passed and wouldnt want to pass three messages to get Name, Score and Health. Or, is message passing pretty cheap and not something I should worry about doing too much? In my example I've only included 3 properties, but in my real game there will be many more.
However, I don't really like the idea of having an anemic DTO model that is just a copy of the grain's internal properties.
So I was wondering if there was a better way, or a preferred pattern for this sort of thing in Orleans?
I think this depends a lot on the life cycle and access patterns of the properties. Do the properties tend to change independently or together? (At first glance, they seem to change independently and at quite different rates; I assume that Score and Health can change very frequently, but Name would almost never change.) And given that Name changes very infrequently, would it be a good fit for your access patterns to retrieve it every time you wanted an updated Score or Health value? Maybe Score and Health would be frequently accessed together, and Name along with other more static properties also belong together.
If you let these kinds of questions drive your API before you think about the cost of message passing, you will probably find some good sweet spots (probably not the whole state of the grain). You might also consider having a Player grain and a PlayerStats grain that have different life cycles and that correspond more closely to the change rate of various pieces of data.
Introduction of a complex return type to minimize roundtrips is a valid solution.
However, I wouldn't return the whole internal state, as I assume that not all the clients need all the data all the time. It may also be a sign that you have a business logic implemented outside of the grains and you should move it into the grain.
You might also consider that Health and Score, which are likely to change frequently, be exposed in a stream.
Having figured out most of my data-model for a new iOS app, I'm now stuck with a problem that I've been thinking about for a while.
An 'Experiment' has a name, description and owner. It also has one 'Action' and one 'Event'.
An 'Event' could be different things: Time, Location or Speed.
Depending on what the 'Event' is, it can have a different 'Type'. For example, Time could be one-off, interval, date-range, repeating or random. Location could be area or exact location.
Each 'Type' then has a value that has a data type unique to itself. The Time One-Off could be a date value of 12:15pm and the Location Exact could be a GeoPoint value of (30.0, -20.0).
The Problem
How do I design the data model so that the database is not riddled
with NULL values?
How do I design the data model to be extensible if I add more 'Events'
and 'Types'.
Thoughts
As an Experiment only has one Action and one Event, it would be wrong to separate these two into different tables, however not doing so would cause the Experiment table to be full of NULL values, as I'd have to have columns for Event, Event Type and Event Type Value to compensate for all of the possible data types one could enter for an Event Type Value. (date, int, string, geopoint, etc)
Separating the Event and Event Type into a separate table would probably fix the NULL value issue however I'd be left with repeating data, especially in the case of time as the Event with Type One-Off as 12:00pm, as this would exist in other experiments, not just one. (Unless I create EVERY possibility and populate a separate table with these - how could I easily do this though?)
Maybe I'm over complicating things, maybe I'm missing something so simple that I'm going to kick myself when I see it.
You need to think about your data model in terms of objects not tables. Core data works with object graphs so everything in core data is an object. In Objective-c you work with objects. This is why you don't need a ORM tool. If you think in terms of objects then I think the model below (obviously needs work but you should get the point) makes sense. The advantage of separating your concepts out into objects like this is that you can look at your problem from multiple angles. In other words you can look at it from the Experiment angle or from the Event angle. I suspect you will want to do something with the data such as use your Time object in your code to show on a calendar or set a reminder. Fetch all the events for all experiments of a specific type, etc. By encapsulating these data items in objects in core data, everything is ready for you to leverage, manipulate and modify in your code. It also removes the null value issue you identified. Because you won't be creating objects for null values, only for values that are relevant to your experiment. That being said, you might want to break down the model even further depending upon the specifics of your program. Also, you would not have the repeating data issue you mention if you design this properly. Again, you're not dealing with rows in a table you are dealing with objects. If you create an Event Type object with "one-off 12:00pm", you can assign that Event Type objec,t through its relationship, to as many Event(s) as you wish. You don't create the object again, you simply reference it. When you think of the relationships think "X can be associated with Y". For example, "An Experiment can be associated with only 1 Event", "An Event Type can be associated with many Events", "An Event can be associated with only 1 Event Type". Taking this approach sets you up for extensibility down the road. Imagine you want to add a new Event Type. You simply create a new event entity and associate it to your Event Type entity.
My suggestion is to think about your object model relative to how you anticipate using the objects in your code (and how you anticipate accessing the objects via queries). That should help drive how you construct it (e.g. if you need a time object then make sure you have that in your object model. If you need an alert object then make sure you have that in your object model). Let the model do the work for you and try not to write a lot of code to assemble the equivalent of an object model within objective-c or start creating objects in code and populating them with data from your data store.
(EDIT: Replace the "event" relationship in the diagram under time, location & speed with "event types")
I was wondering how people who follow DDD get around potential performance issues with using EF and the repository pattern with returning an aggregate root with children.
e.g. Parent
----- Child A
Or even e.g. Parent
----- Child A
------- Child A2
If I bring back the aggregate root's data from the repository and use a navigational property EF then fires off another query because it is utlising lazy loading. This is a problem because we are experiencing 100+ queries when we are in a loop.
If I bring back the aggregate root's data from the repository with the children's data as well by using the 'Include' statements, this will bring back the childrens data from the repository with its parent. Then when I use the navigational properties no queries fire off because that data is already in memory.
The problem with the second approach is that some of our data for the child object can be quite big e.g. 100,000+ records.
Obviously I don't want to store 100,000+ records in memory for the child. We decided to use paging to select 10 at a time to get around this, but another issue is when we are trying to use calculations on the children like sum, total count etc but we can only do that in memory on the 10 records we have pulled back.
I know the DDD way is to pull back the object graph with all of its data in memory and then you traverse through the objects for the data you need to display.
There is a split in our team with some believing we should pull back the aggregate root and it's children together and some feel we should have a method on the aggregate root's repository that queries the childrens data directly and pulls back the child object.
I Just wondered how other people have solved the performance issues with large amounts of data being stored in memory with the parent/child.
If you have to deal with performance you must use the second approach with special method exposed on repository - that is the point of repository to provide you such methods otherwise you can use EF context / set directly.
Theory is nice if you work with theoretical data - once you have real data you must tweak theory to work in real world scenarios.
You can also check this article (there are three following articles on the blog). It does the second way but it pretends to be the first way. It works for Count but maybe you can use the idea for some other scenarios as well.
The DDD way isn't always to pull back all the data that is required. One technique we use a pattern called double dispatching. This is where you make your call to your aggregate roots' method (or domain service) with all the parameters it requires but also with it you pass in a 'query only' repository type interface parameter too. This enables the root or its children decide what extra data is required and when it's it should be returned by simply calling methods on this injected interface.
This approach adhere's to the DDD principals that states that aggregate roots should not be aware of repository implementation whilst providing an testable and highly performant domain code.
I have a table in my database that stores event totals, something like:
event1_count
event2_count
event3_count
I would like to transition from these simple aggregates to more of a time series, so the user can see on which days these events actually happened (like how Stack Overflow shows daily reputation gains).
Elsewhere in my system I already did this by creating a separate table with one record for each daily value - then, in order to collect a time series you end up with a huge database table and the need to query 10s or 100s of records. It works but I'm not convinced that it's the best way.
What is the best way of storing these individual events along with their dates so I can do a daily plot for any of my users?
When building tables like this, the real key is having effective indexes. Test your queries with the EXAMINE statement or the equivalent in your database of choice.
If you want to build summary tables you can query, build a view that represents the query, or roll the daily results into a new table on a regular schedule. Often summary tables are the best way to go as they are quick to query.
The best way to implement this is to use Redis. If you haven't worked before with Redis I suggest you to start. You will be surprised how fast this can get :). The way I would do such a thing is to use the Hash data structure Redis provides. Just assign every user to his Hash (making a unique key for every user like "user:23:counters"). Inside this Hash you can store a daily timestamp as "05/06/2011" as the field and increment its counter every time an event happens or whatever you want to do with that!
A good start would be this thread. It has a simple, beginner level solution. Time Series Starter. If you are ok with rails models: This is a way it could work. For a sol called "irregular" time series. So this is a event here and there, but not in a regular interval. Like a sensor that sends data when your door is opened.
The other thing, and that is what I was looking for in this thread is regular time series db: Values come at a interval. Say 60/minute aka 1 per second for example a temperature sensor. This all boils down to datasets with "buckets" as you are suspecting right: A time series table gets long, indexes suck at a point etc. Here is one "bucket" approach using postgres arrays that would a be feasible idea.
Its not done as "plug and play" as far as I researched the web.
I'm calling a update SPROC from my DAL, passing in all(!) fields of the table as parameters. For the biggest table this is a total of 78.
I pass all these parameters, even if maybe just one value changed.
This seems rather inefficent to me and I wondered, how to do it better.
I could define all parameters as optional, and only pass the ones changed, but my DAL does not know which values changed, cause I'm just passing it the model - object.
I could make a select on the table before updateing and compare the values to find out which ones changed but this is probably way to much overhead, also(?)
I'm kinda stuck here ... I'm very interested what you think of this.
edit: forgot to mention: I'm using C# (Express Edition) with SQL 2008 (also Express). The DAL I wrote "myself" (using this article).
Its maybe not the latest state of the art way (since its from 2006, "pre-Linq" so to say but Linq works only for local SQL instances in Express anyways) of doing it, but my main goal was learning C#, so I guess this isn't too bad.
If you can change the DAL (without changes being discarded once the layer is "regenerated" from the new schema when changes are made), i would recomend passing a structure containing the column being changed with values, and a structure kontaing key columns and values for the update.
This can be done using hashtables, and if the schema is known, should be fairly easy to manipulate this in the "new" update function.
If this is an automated DAL, these are some of the drawbacks using DALs
You could implement journalized change tracking in your model objects. This way you could keep track of any changes in your objects by saving the previous value of a property every time a new value is set.This information could be stored in one of two ways:
As part of each object's own private state
Centrally in a "manager" class.
In the first solution, you could easily implement this functionality in a base class and have it run in all model objects through inheritance.
In the second solution, you need to create some kind of container class that will keep a reference and a unique identifier to any model object that is created and record all changes in its state in a central store.This is similar to the way many ORM (Object-Relational Mapping) frameworks achieve this kind of functionality.
There are off the shelf ORMs that support these kinds of scenarios relatively well. Writing your own ORM will leave you without many features like this.
I find the "object.Save()" pattern leads to this kind of behavior, but there is no reason you need to follow that pattern (while I'm not personally a fan of object.Save(), I feel like I'm in the minority).
There are multiple ways your data layer can know what changed and most of them are supported by off the shelf ORMs. You could also potentially make the UI and/or business layer's smart enough to pass that knowledge into the data layer.
Two options that I prefer:
Generating/hand coding update
methods that only take the set of
parameters that tend to change.
Generating the update statements
completely on the fly.