One actor per simulated object, or a manager actor? - erlang

I'm developing a simulation that will feature many entities constantly updating, perhaps 30 times a second. Let's imagine we have 1000 entities, each of which has a velocity, and consequently a position that must be updated every tick.
So, how would you implement this using the actor model? I'm not necessarily using Erlang for this project, but for the sake of argument, let's just say I am. Would you have an actor for each of these entities? Or would you have a "manager" actor that maintains and updates a list of these entities?
Learn You Some Erlang says:
It is true that Erlang processes are very light: you can have hundreds
of thousands of them existing at the same time, but this doesn't mean
you have to use it that way just because you can. For example,
creating a shooter game where everything including bullets is its own
actor is madness. The only thing you'll shoot with a game like this is
your own foot. There is still a small cost in sending a message from
actor to actor, and if you divide tasks too much, you will make things
slower!
So that seems to suggest that managers would be better. Or is there a third option that I'm not seeing?

You say it! there is not one single good solution.
Now to be more helpful, and with the few background I have, I think you should look at these aspects of your project:
You say simulation. If you need to refresh a collection of entities every 30ms, first work to simplify the operations and the data model, and only second think how you can traverse the collection of data efficiently.
On the other end, if you have a huge and/or evolving collection of objects, with trivial algorithm/data model, then look at smarter data structure than lists, take care of the data copy...
If you use a multi-core (or a cluster) then think to have your entities grouped in several super entities in order to take advantage of parallelism, managing them in separate processes.
Next think if these groups can help you to reduce the number of evaluation (have an adaptive time slice? evaluation on demand? ...) .
Last, I think than generally speaking, Erlang is compact and easy to refactor, so take advantage of this an define some functional steps, and for each of them,
make them work, make them right and make then fast (Kent Beck ?)
For the last step you can get some help from the profiling tools such as fproof
Courage :o)

I think Learn You Some Erlang is making a bit of a premature optimization blunder here. You should use which ever abstraction makes the most sense to you, measure any problems, and refactor if necessary. Personally, I believe modeling each particle as its own actor would be the easiest to deal with, and is also the most idiomatic approach for the Actor model. Practically, however, you should do whatever floats your boat.

Related

EDW Kimball vs Inmon

I've been tasked with coming up with a recommendation of how to proceed with a EDW and am looking for clarification on what I'm seeing. Everything that I am learning about states that Kimball's approach will bring value quicker to business vs Inmon's. I get that Kimball's approach is a dimensional model from the getgo and different data marts (star schema) are integrated through conformed dimensions... thus the theory is I can simply come up with my immediate DM to solve business need and go on from there.
What I'm learning states that Inmon's model suggests that I have a EDW designed in 3NF. The EDW is not defined by source system but instead the structure of the business, Corporate Factory (Orders, HR, etc.). So data from disparate systems map into this structure. Once the data is in this form, ETLs are then created to produce DMs.
Personally I feel Inmon's approach is a better way. I believe this way is going to ensure that data is going to be consistent and it feels like you can do more with this data. What holds me back with this approach though is everything I'm reading says it's going to take much more time to deliver something but I'm not seeing how that is true. From my narrow view, it feels like no matter what the end result is we need a DM. Regardless of using Kimball's or Inmon's approach the end result is the same.
So then the question becomes how do we get there? In Kimballs approach we will create ETLs to some staging location and generally from there create a DM. In Inmon's approach I feel we just add in another layer... that is from the staging area we load this data into another database in 3NF organized by function. What I'm missing is how this step adds so much time.
I feel I can look at the end DM that needs to be made. Map those back to a DW in 3NF and then as more DMs are requested keep building up the DW in 3NF with more and more data. However if I create a DM in Kimballs model that DM is going to be built around the level of grain decided for that DM and what if the next DM requested wants reporting at even a deeper grain (to me it feels like Kimballs methodology would take more work) and with Inmon's it doesn't matter. I have everything at the transnational level so DMs of varying grains are requested, well I have the data, just ETL it to a DM and all DMs will report the same since they are sourced from the same data.
I dunno... just looking for others views. Everything I read says Kimball's is quicker... I say sure maybe a little bit but there is certainly a cost attributed by going to quicker route. And for sake of argument... let's say it takes a week to get a DM up and running through Kimballs methodology... to me it feels like it should only take 10% maybe 20% longer utilizing Inmon's.
If anyone has any real world experience with the different models and if one really takes so much longer then the other... please share. Or if I have this so backwards tell me that too!
For context; I look after a 3 billion record data warehouse, for a large multi-national. Our data makes its way from the various source systems through staging and into a 3NF db. From here our ELT processes move the data into a dimensionally modelled, star schema db.
If I could start again I would definitely drop the 3NF step. When I first built that layer I thought it would add real value. I felt sure that normalisation would protect the integrity of my data. I was equally confident the 3NF db would be the best place to run large/complex queries.
But in practice, it has slowed our development. Most changes require an update to the stage, 3NF and star schema db.
The extra layer also increases the amount of time it takes to publish our data. The additional transformations, checks and reconciliations all add up.
The promised improvement in integrity never materialised. I realise now that because I control the ETL, and the validation processes within, I can ensure my data is both denormalised and accurate. In reporting data we control every cell in every table. The more I think about that, the more I see it as a real opportunity.
Large and complex queries was another myth that has been busted by experience. I now see the need to write complex reporting queries as a failing of my star db. When this occurs I always ask myself: why isn't this question easy to answer? The answer is most often bad table design. The heavy lifting is best carried out when transforming the data.
Running a 3NF and star also creates an opportunity for the two systems to disagree. When this happens it is often a very subtle difference. Neither is wrong, per se. Instead, it is possible the 3NF and star query are asking slightly different questions, and therefore returning different results. Although technically correct, this can be hard to explain. Even minor and explainable differences can erode confidence, over time.
In defence of our 3NF db, it does make loading into the star easier. But I would happily trade more complex SSIS packages for one less layer.
Having said all of this; it is very hard to recommend an approach to anyone without a deep understanding of their systems, requirements, culture, skills, etc. Having read your question I am sure you have wrestled with all these issues, and many more no doubt! In the end, only you can decide what the best approach for your situation is. Once you've made your mind up, stick with it. Consistency, clarity and a well-defined methodology are more important that anything else.
Dimensions and measures are a well proven method for presenting and simplifying data to end users.
If you present a schema based on the source system (3nf) to an end user, vs a dimensionally modelled star schema (Kimball) to an end user, they will be able to make much more sense of the dimensionally modelled one
I've never really looked into an Inmon decision support system but to me it seems to be just the ODS portion of a full datawarehouse.
You are right in saying "The EDW is not defined by source system but instead the structure of the business". A star schema reflects this but an ODS (a copy of the source system) doesn't
A star schema takes longer to build than just an ODS but gives many benefits including
Slowly changing dimensions can track changes over time
Denormalisation simplifies joins and improves performance
Surrogate keys allow you to disconnect from source systems
Conformed dimensions let you report across business units (i.e. Profit per headcount)
If your Inmon 3NF database is not just an ODS (replica of source systems), but some kind of actual business model then you have two layers to model: the 3NF layer and the star schema layer.
It's difficult nowadays to sell the benefit of even one layer of data modelling when everyone thinks they can just do it all in a 'self service' tool! (which I believe is a fallacy). Your system should be no more complicated than it needs to be because all that complexity adds up to maintenance and that's the real issue - introducing changes 12 months into the build when you have to change many layers
To paraphrase #destination-data: your source system to star schema transformation (and seperation) is already achieved through ETL so the 3nf seems redundant to me. You design your star schema to be independent from source systems by correctly implementing surrogate keys and business keys, and modelling it on the business, not on the source system
With ETL and back-end data wrangling taking up about 70% of the project time for this kind of endeavour, an extra layer makes a big difference. Its an extra layer of transforming from source to target, to agree with the business and to test. It all adds up.
Whilst I'm not saying that dimensional models (the Kimball kind) are always easy to change, you've got a whole lot more inflexibility should you have to always change lots of layers when you want to change your BI.
In fact, where I've been consulting in places that have data warehouses that are considered to be inflexible and expensive to develop for, and not keeping pace with changes to the business, they have without exception included the 3NF layer prior to the DMs. As Nick mentioned, it is hard nowadays to sell the idea of a 'proper' data warehouse as opposed to a Data Discovery Bi tool- and the appeal of these is often driven by DWs being seen to be slow and expensive to develop.
Kimball isn't against having a 3NF layer prior to his DW if it makes sense for a situation, he just doesn't agree with Inmon that there's a point.
One common misunderstanding is that Kimball proposes distinct data marts, so that you'd have to change it each time there is a different reporting request. Instead, Kimball's DMs are based on real life business processes and modelled accordingly. Although its true you will then try and make them suitable for reporting, you try and make them so they can answer forseaable queries. You don't aggregate and store just the aggregates: you work with the transactional data in a Kimball dimensional model.
So no need to be reluctant from that perspective.
If an ODS works for you, then go for it- but a Kimball DW will meet the majority of requirements.

Rails 4: Complex Many to Many Relationships

I have the following models..
Provider
Patient
Clinic
Like an eco-system, all models should have many-to-many relationships with one another. It's very important that I am able to query data from all directions.
After intensive research on Active Record associations, I find many blogs warning against has_and_belongs_to_many and using has_many :through. Only issue is that requires a table to act as the "middle-man" for lack of better words but I'm unsure how that would work with a 3 models.
The other option is a polymorphic association but I'm unsure if I should invest the time in understanding that method if it's inapplicable for this particular situation.
Any advice on how to create these relationships for maximum flexibility and efficiency?
I have the following models
Tables, if they were modelled for a Relational database, files otherwise.
It's very important that I am able to query data from all directions.
Understood. That is simple and easy in a Relational Database.
Like an eco-system, all models should have many-to-many relationships with one another.
That is not correct.
If each of the three can be related to the other two, then your data is not modelled. There are basic dependencies that you have not identified. Eg: I can imagine that:
Providers provide services at Clinics
Clinics provide services to Patients
Patients visit Clinics to obtain services
Therefore, any relationship that a Patient may have with a Provider is via a Clinic, and not privately, without a Clinic.
Straighten those rules and dependencies out first, that will result in less than three Associative tables second. Something like this:
Clinic Provider Patient Table Relation Diagram
Response to Comment
Any advice on how to create these relationships for maximum flexibility and efficiency?
Well, Dr E F Codd's Relational Model is strongly established as the model, the method for organising data, such that it has (a) complete integrity (objects can't) (b) maximum flexibility (c) maximum speed. In the 45 years since its advent, there have been no other contenders. That is the model I am using. The principles that underpin that model are the principles that I rely upon, when I make my proposals.
All that has been, of course, confirmed and reinforced during my 34 years of database implementations. As well as thousands of other high-end implementations.
Data Independence
I've taken your post with great consideration and the difficulty is that there are many unique nuances with my app that won't allow me to simplify it to mirror the real world.
It is the other way around. The fact is, you are writing an app+db that deals with the real world. Therefore the db has to reflect that real world (limited to the scope of the enterprise) that you wish to engage with. Thousands of modellers have done that successfully (millions have done it incorrectly).
To the extent that you have "nuances" and complexities in the app, you have not modelled the data, as data, and the result will be a complex app that engages with an incorrectly modelled database, or worse, a non-database. All those "nuances" and complexities are in fact data; facts about data; rules about the data; and relationships between data. But you have an established view that the "nuances" and complexities are in the app, in your "models", not in the data.
Therefore your notion is false.
Every rule, constraint, control, nuance, complexity re the data, must be implemented in the data. That is simply data definition as per the RM. Otherwise you have no data independence, no database (just a bunch of files for data storage), no integrity, no relational power, no speed. And worse, you will be forever fixing up the "nuances" and complexities in the app layers, the object stack.
Data Definition
Let me start from first principles.
a databse is a collections of Facts, including relationships between those Facts
Assuming you understand that, there is an important next point.
There is no Fact that cannot be declared.
This is First order Logic, which the RM is based on. Therefore there is no such thing, as a "model" that is too complex (has "nuances" or too many "nuances") to be declared in terms of FOL. The scientific exercise that is called for, is to reduce that complexity to FOL Facts.
To the extent that those Facts are Facts about the real world, and they reflect the truth, your database will be isolated from the effect of change (you can extend the database and the app easily). Eg. Provider, Service, Specilisation are separate Facts.
To the extent that those "facts" are "facts" that you have chosen to store (as being relevant, from the perspective of your app, object design, etc), eg. Provider, Service and Specialisation as a single complex "fact", not discrete Facts, and not a reflection of the real world, your database and objects (a) will be hard to change, and (b) will keep changing, forever, until they are elevated, such that they do reflect the real world. You will have to "re-factor" the "database" every quarter.
Confidentiality
The data is very confidential so I'm reluctant to get to far into the matter.
We have been working with confidential data and/or systems for over 45 years without breaching confidentiality.
There is nothing new under the Sun.
We are dealing with structure, not content, it is not reasonable to suggest that your structure is so new and unique that it cannot be (i) discussed or (ii) modelled.
But most important, if you cannot describe it (in FOL terms), you cannot model it. If you cannot model it, you cannot write an app to engage with it (you can try, but as evidenced here, you will be stuck in that unresolved position).
Noting that the OO/ORM literature teaches people to obsess about the data content, in order to avoid dealing with the structure, meaning, relevance, etc, please note that I do not want to know. and the exercise does not need to know, the content. Describe only the data in terms of meaning, relevance, relationships, and we will model the required structure.
My question was more about how to create a cycle of many-to-many relationships with 3 models than whether my models "should" have them or not.
I think I understand that. That would be adding complexity to an article of which the complexity has been determined to be the fundamental problem. If you ask me to build an airplane without wings, and I tell you that your approach is incorrect, that you need wings, there is no point in telling me that you are seeking someone who can tell you HOW to build an airplane without wings, you have missed the point.
Reasoning
I would love to hear your reasoning if you believe there should never be a situation like this in any database.
Again, you have that the wrong way around. It is not that there should never be a situation like this in any database, it is that if there is a situation like this, it raises a red flag (to qualified and experienced modellers) that the data is not yet Normalised, not yet organised into discrete Facts. That means you need to take a step back and deal with the complexity in your "models", first. Then the relationships that are your current focus will be simplified.
Then, yes, there will not be a situation like this in the database.
My reasoning is Codd's RM, and the principles behind it. It has been the subject of many papers. (As well as many "papers" that are non-relational or anti-relational, such as those that support the OO/ORM "model".)
Specifically, here, that if you have a n-ary relation (technical term for the three-way relationship that you are seeking) that that can be, and should be, resolved into [multiple] binary relations (two-way relationships). Eg. the TRD I have suggested.
OO/ORM Mythology
In the context of "love to hear your reasoning", there are two sides, I have given the what you should be doing side, above. This is the what you should not be doing or why your method is broken side. Where do I start.
The OO/ORM model is that the "database" is merely a storage location to make the objects "persistent", a slave of the objects, and that constructing a monolith, layers of object classifiers, complexity, is the way to solve any problem.
The OO/ORM model is a total, abject failure. It has no scientific basis whatsoever.
(Noting that due to the destruction of the education system, these days "mathematicians" and "theoreticians" write "papers" that "prove" complete and utter nonsense. It is nonsense because it contradicts established science. They are not the class of mathematicians and theoreticians of the old school, who reject contradiction; non-science; nonsensical proofs. The only way that they can write such absurd "proofs", in to maintain a state of ignorance, a state of pathological denial, of other sciences.)
Specifically, they are in denial of the Relational Model (whilst referring to it, to give their papers some credibility, which is a fraud); its prescriptions (such as Data Independence, FOL); its prohibitions, they are in denial of Relational data models (UML cannot model data like IDEF1X can), thus they produce non-relational files, which have no Relational integrity, power, or speed.
They employ the Hammer phylosophy (ie. if one only knows a hammer, then every problem looks like a nai), in staggering denial that Maslow destroyed it, scientifically, over a century ago. Which leads them to pile more layers, more complexity, into the monolith. The converse is of course, to use the right tool for the job, which means define data according to the standard for data, the RM, and separately, objects according to whatever OO philosophy you choose.
They attempt to do everything in objects, to model everything in UML (which is not a standard by any means; nor adequate [one symbol plus a million notations], it has no decomposition, it is in fact a free-for-all in which everyone does their own thing).
The model exists is denial of the fact that since 1980's, in the software industry, we architect, write, and deploy components. Database components in the database, program components in the objects. Not monolithic Towers of Babel, that is pre-1970's technology.
Since 1984, we have had Client/Server and Open Architecture Standards. We have had OLTP Transaction Standards since 1960, restated in the C/S context in 1984. The OO/ORM crowd are in flagrant rebellion of each and every one of those Standards, they build monolithic object stacks sans architecture, sans components, sans Transactions, sans everything. (Apologies to the Bard.)
You might consider what everyone (even cartoonists) knows about the OO/ORM stack, the monolith, the non-architecture and compare it with the deployment of components in the Open Architecture diagram, given above.
Further, they are in denial that every implementation of the "model" is a massive failure, they deny the evidenced facts, and keep adding complexity to the already complex and unmanageable object layers. A "model" that has failed due to its non-architecture, part of which is precisely that failed complexity.
In case their evidenced pathological denial of the reality does not stand, in and of itself, as evidence of insanity, there is more, much more. The Twelve Steppers have an interesting definition of insanity: doing the same thing over and over again, being aware that it produces the same result, every time, but expecting a different result the next time. That doesn't stop them from adding more complexity to the complex model, or from marketing their pre-1970's technology, as "modern" "science".
But that doesn't stop them from writing yet more books and marketing their failed "model".
The OO/ORM crowd exists in isolation from, in pathological denial of, reality
Put another way, it is insane, in 2015, to implement software, in a un-architected monolith, that "does everything", rather than to architect; design; build; and deploy software components in the correct architectural position.
OO/ORM "Model" as Data
The fact that you keep calling your tables "model" is another red flag. That confirms that you have way too much complexity in them. A database consists of simple tables, each reflecting a discrete Fact, not models. To the extent that you consider them "models", you have (usually due to fixed notions re complex objects and classifiers) objects+data+complexity combined, not discrete data and discrete objects. That is the precise problem that will cause the app+db to fail.
So the next step is to (a) shelve the current focus re HOW to relate un-normalised complex "models", and to (b) normalise those "models" such that they are defined in terms of a Relational Database, such that they are discrete Facts. Following which, (c) the relating of the then Normalised tables will be straight-forward.
In one instance, each model acts as a type of 'User' but in another, they don't.
That is exactly the type of mashed-up concept that has to be rationalised, Normalised, such that it is (a) absolutely clear [when it is an user, and when it isn't] (b) defined in data, in FOL terms, as discrete Facts (c) such that you can confidently write code against it, build objects from it. Conversely, the absence of that clarity, the retention of complexity in the object layers, will result in complex objects that fail, and more important, data stored in files that have no integrity, power, or speed.
Self-Contradiction
Consider this. Since you are seeking Relational integrity, power, and speed, you cannot at the same time be
seeking to retain unresolved complexity that is well-known to destroy integrity, power, and speed, or
refusing to implement the requirements the host integrity, power, and speed, that you seek.
It is a massive, and double, contradiction, on your side. It is a philosophical and reasoning issue, that you have to consider and resolve for yourself. The OO/ORM seduces people into believing in magicke, into such crazy self-contradictions.
Regardless, I was very impressed with your answer and really appreciate the time you took to make the diagram.
Thank you. You are most welcome.
That took me all of five minutes. Because I have clarity. Because I follow scientific principles and standards. Because we have had a science and a methodology since 1970. Because we have had a modelling methodology and full notation for modelling Relational Data since 1987 (as a standard, IDEF1X, since 1993). The point is, it is nothing special, the sad fact is, it is not common, and it should be. The second sad fact is, it is unknown in the OO/ORM world.
Further Reading
You may be interested in this Question and Answer. The Answer covers many aspects of Relational Databases, that you will most certainly have to deal with, if not now, then at some point in the future. The minimum reading I draw your attention to right now is:
- Response to Update 3, pages 1, 2 and 6,
- specifically including the embedded link to Predicates
That might give you an idea of the reasoning; the depth of data definition that the RM affords; that all Facts can be declared in terms of FOL Predicates, that the OO/ORM crowd is totally ignorant of.
You may choose to add an Update to your question and ask how to declare one or the other "model", in terms of the RM, as discrete Facts, or open a new Question (and ping me).
Conversely, if you choose to stick to your position, the original question, then I think I have answered it (but the answer raises issues that you must address and resolve).
Please study carefully and comment or ask questions.

Massive data operations in the stored proc to DDD

Lets take an example of a product classification. All the products needs to be classified as vegetable or not. The business logic is, the product can be classified as vegetable if that product is from company A, B & C. If the product is not from those companies they are not vegetables. There are millions of products. This can be done in a stored proc with few lines of code. The operation may take only few seconds if it is done synchronizely.
As I understand, the DDD goes against the idea of putting the logic in the stored procedure. The logic can be put as a behavior on product which can self classify based on who is the source. To do this, all the million products need to be read into memory, process and then save it back to the database.
The problem here is the large amount of memory this operation needs. If the operation is done in chucks like 50,000 the repository has to first figure out how may products needs to be classified and should tell the domain the long running operation has to go in chunk. Surely, this approach is going to take more time and a bad user experience for the user who has to wait more time than a process than a stored procedure takes.
What is the reasonable approach to DDD when it comes to long running processes? Is the delay expected, so the app has to inform the user that the classification is going to take time and will let the user know when that is complete? And should not use stored procedure, but have the logic part of the domain.
UPDATE
Just to add some clarity, this classification process is done quite often. The application has to support the classification process, not an ETL or can't wait longer. That's why I'm trying to find the trade offs between using a stored procedure versus DDD.
Also note that it is not a Query, but a Command. The command can be called ClassifyAllProductsCommand(). When this command is run, there was no classification before. After the classification, other users of the system should see the new classification. For example, the product A is classified as Unavailable, and after the classification it can be Vegetable or Meat.
Classification is an interesting thing. It is a separate thing. Classification should never be implemented as structure... but that is another story :)
Your classification may even be regarded as a bounded context in the same way that reporting may be a bounded context. As such you may wish to handle classification separately. Your classification is not an aggregate root. It plays an auxiliary role. If it has no impact on the consistency in your domain modelling it may not even necessarily be part of your Product aggregate. It may be added and it may even be changed independently (not as bulk) but if it is used to determine the validity of your aggregate then your classification sub-system is going to have to take that into account.
Please bear in mind that it isn't a matter of DDD vs a stored procedure. You are executing queries against your data store. Whether that is done via a stored procedure or dynamically should not affect your decision. There is nothing preventing, say, a ProductRepository from calling a stored procedure.
You can have your classification sub-system still execute your SP or use DML directly. However, this isn't necessarily going to be part of your domain. You most certainly do not want to classify each product individually if it is something that happens quite often and as a bulk operation. If your current design dictates that these are bulk operations then keep them as such and don't force them into a DDD structure that is going to be prohibitive.
It is a design choice and sometimes making changes to individual items does not make sense. It should certainly be your aim to work on a single aggregate at a time but things like reporting or classification are another animal that don't always fit cleanly into the Domain-Driven Design thinking.
I think you're confusing DDD. If you were looking for Vegetable type Products, you would call a service that would retrieve Products for a particular Company. There would be no need to load all the products into memory.
Application or domain-centric design, just means designing your application around the business domain and not from a collection of database tables upwards (like a data-centric approach).
In contrast you end up with more data associations (joins) being done in your application and less in monolithic stored procedures. Which moves all your business logic into the application and not in the persistence device (the database), which kinda makes a lot of sense.
Also, if you deny yourself huge table joins then you also think carefully about things that traditionally cause massive overhead on your database and end up moving towards better design, like creating a separate reporting database, message buses, asynchronous tasks, etc.
EDIT
It seems like a common phrase in DDD but "it depends on your specific domain".
Without knowing the detail, I would want to know how often these classifications occur. Can they be done as the Products are created? Are they done often or rarely, planned or unpredictably?
If the classifications are common and must be done across all one million products, it might be best to create a smaller model for the Product, maybe something with just SmallProduct.Id and SmallProduct.CompanyId (probably naming it something better). Then data cache this smaller collection in memory and perform operations against it.
If the check to see if the product is a Vegetable is common and only one of a few possible classifications, it might be best to have Classifications in their own table and a linking table to link them to Products. Then the problem becomes more of a one time data setup issue.
On the rare chance that you're using a Document Database, you could just store these classifications in a collection on the Product object itself.
It seams you are interpreting "classification" as you aggregate root, containing products (as entities).
Honestly, it does not feel like a good design decision (I might be wrong, depends on the requirements specifics).
What if you think of the product as aggregate root (containing suppliers, discounts, etc.)?. In that case, you´ll need to load only one product at a time.
If the classification/supplier has a complex domain, you should consider having a separate bounded context for that.
Also, in your comment:
Just to add some clarity, this classification process is done quite often. The application has to support the classification process, not an ETL or can't wait longer. That's why I'm trying to find the trade offs between using a stored procedure versus DDD.
REALLY? You can´t fire an event and have the product service update the classification when the there´s an update on the supplier? The user will have an inconsistent state (say.. "undefined" category"), for a few seconds/minutes. It is not that bad, is it ?
But, if you are talking about a batch job, then, by all means, go with the stored procedure.

Persisting Game Actor Objects

This question pertains to a game I have been developing, but I believe it is a pretty generic concept for which I have not been able to find a clear answer.
I have been trying to figure out how to serialize actors (objects in a game world) to a file, dynamically and at arbitrary times.
Context
To understand my question you need to know how the world is generally constructed. The game is a cell-based world with 3 dimensions divided into smaller, more manageable sections that I'll refer to as chunks. The terrain info is all fixed known length, and I can serialize that information just fine, simply writing/reading to/from a world file with the appropriate offset whenever that chunk needs to be loaded into memory (say a player gets near it). That's all well and good until I have to deal with actors and writing them to a single file.
The Problem
I know that ISerializable is an incredibly useful resource for actually obtaining the data from the actors, but the problem I'm having is committing that to disk dynamically. By that I mean inserting/removing actors from the middle of a big file containing all actors. It would be a lot easier if I could serialize the entire game state and actor tree, but I need to be able to do this on small sections of the world at a time. Some sections will have no actors, some will have many (say up to a couple hundred). These sections are being loaded and saved as the players move around the world. Furthermore, the number of actors and size of their data will change over the course of the game, so I cannot handle it like I do the terrain. I need a way of committing the actor quickly, where I can find it quickly later and am not wasting a lot of file space. One thing that may be of use is that all actors in a chunk are serialized/de-serialized at once, never individually.
Note: These worlds can get very large (16k x 16k x 6) and therefore easily have millions of actors in all.
The Question
Is a database really the best way to do this? I am not opposed to implementing one, but that is an involved process and I want to be sure it is a recommended course of actions before I continue. It seems like there might be serious performance implications.
A tradition database (RDBMS) is not always the right way to go. But alas, you ARE trying to persist data.
Most IT professionals will likely guide you towards a traditional database, simply because for us it ISN'T involved. It is out bread and butter. Further more, there are hundreds of libraries that make our lives easier, the latest generation of which are the full blown ORMs.
However, as you have noted, a full blown RDBMS is a little heavy weight for your application (depending on your particular scaling needs). So I'll suggests a few alternatives.
MongoDB
RavenDB
CouchDB
Cassandra
Redis
Now, it IS true that in many ways, these are much lighter weight than RDBMSs. However these so called NoSQL (I picked Document stores, since they seem to be the closest match to your requirements) are somewhat immature. That is not to say they are buggy, and unreliable (they have higher reliability than RDBMSs), but people don't really know how to work with them.
Again, I need to qualify that statement. RDBMS have several decades of research and best practices behind them. There are vast swathes of plug-ins to the tool chains of each implementation. Every single contributor in SO knows how to use a DB well. But, none of those things is true with NoSQL.
TLDR
So it really boils down to this. YES RDBMS (traditional DBs) are complex, like a modern road car. But like a road car (which you buy), these exists the infrastructure to support them.
The alternative is a NoSQL database, which is like building a small electric go scooter. Yes its simpler. But you take it to a car shop, and they'll still have no clue.
Finally
My advice. Use an off the shelf ORM with a RDBMS. The current generation of ORM can pretty much hide your database from you. The setup won't be very performant (you won't be doing microsecond algo trading with it), but it should be enough for your needs.

How crazy should I get with turning things into objects?

I'm still new to OOP, and the way I initially perceived it was to throw alot of procedural looking code inside of objects, and think I'd done my job. But as I've spent the last few weeks doing alot of thinking, reading, and coding (and looking at good code, which is a hugely under-rated resource), I believe I'm starting to grasp the different outlook. It's really just a matter of clarity, simplicity, and organization once you get down to it.
But now I'm starting to look at things as objects that are not as black and white a slamdunk case for being an object. For example, I have a parser, and usually the parser returns some strings that I have to deal with. But it has one specialized case where it has to return an array, and what goes in that array and how it's formatted has specialized rules. This only amounts to two lines plus one method of code, but this code sticks out to me as not being cleanly fitting in the Parser class, and I want to turn it into its own "ActionArray" object.
But is it going to far? Has OOP become a hammer that is making me look at everything like a nail? Is it possible to go too far with turning things into objects?
It's your call, but you should think of objects as real life objects.
Take for example a car. You could describe a car with different objects:
Engine
Wheels
Chassis
Or you could describe a car with just one object:
Engine
You can keep it simple and stupid or you can spread the dependency to different objects.
As a general guideline, I think Sesame Street says it best: you need an new object when "one of these things is not like the others".
Listen to your code. If it is telling you that your objects are becoming polluted with non-essential state and behavior (and thus violating the "Single Responsibility Principle"), or that one part of your object has a rate of change that is different from the rest, and so on, it is telling you that you are missing an object.
Do the simplest thing that could possibly work. When that no longer works, do the next simplest thing. And so on. In general, this means that a system tends to move from fewer, larger objects to more, smaller objects; but not always.
There are a number of great resources for OO design. In addition to the ones already mentioned, I highly recommend Smalltalk Best Practice Patterns and Implementation Patterns by Kent Beck. They use Smalltalk and Java examples, respectively, but I find the principles translate quite well to other OO languages.
Design patterns are your friend. A class rarely exists in a vacuum. It interacts with other classes, and the mechanisms by which your classes are coupled together is going to directly affect your ability to modify your code in the future. With poor class design, a change that you make in one class may ripple down and force changes in other classes, which cause you to have to change other classes, etc.
Design patterns force you to think about how classes relate to each other. For example, your Parser class might choose to implement the Strategy design pattern to abstract out the mechanism for parsing. You might decide to create your Parser as a Template design pattern, and then have each actual instance of the Parser complete the template.
The original book on Design Patters (Design Patterns: Elements of Reusable Object-Oriented Software is excellent, but can be dense and intimidating reading if you are new to OOP. A more accessible book (and specific to Ruby) might be Design Patterns in Ruby, which has a nice introduction to design patterns, and talks about the Ruby way of implementing those patterns.
Object oriented programming is a pretty tricky tool. Many people today are getting into the same conflict, by forgetting the fundamental OOP purpose, which is improving code maintainability.
You can always brainstorm about your future OO code reusability and maintainability, and decide yourself if it's the best way to go. Take look at this interesting study:
Potok, Thomas; Mladen Vouk, Andy Rindos (1999). "Productivity Analysis of Object-Oriented Software Developed in a Commercial Environment"

Resources