I have an implementation of a server that stores its information in a single data actor. Problem is that that this is not very scalable. What I want to do is distribute the load over multiple data actors, hereby sacrificing data consistency to my clients.
I want to be able to dynamically add nodes to my server in order to scale my application up, so I suppose I should have one data actor per node.
The problem is how to decide to invoke which node when. Talking through a "main server node" would imply a single point of failure and that's not what I want. I am new to Erlang and have read "Learn You Some Erlang for Great Good" but that doesn't really help to give an idea on how to design the kind of application I want.
I am looking for pointers to educational example implementations/designs of the kind of system I want or ideas on how to get started.
P.S.: I know about mnesia, riak, ... but I can't use these systems.
Related
Using rails and postgresql.
I wrote my app without having in mind to use a master slave configuration.
Now, I've gotten master slave set up in the app and now I'm running into some technical debt. The same process in my app writes to the db and then immediately reads from the db. The read is not taking place on the read db but the data isn't there. Before, this wasn't efficient but it didn't cause any problems because both dbs were the same. Now, this is blowing up in my face.
The problem for me is that its difficult to find all the places in the code where this problem exists. Can someone can please suggest to me a technique to get my tests to run in such a way where the reads and the writes use different dbs that aren't updated so that I can figure out where my issues are?
Other solutions will also be welcomed!
I strongly recommend you rethink your master/slave configuration or whether master/slave is even right for your application.
It's not "tech debt" to build a system that assumes data written to persistent store can be read back immediately. It's normal and correct. While you might reasonably be able to avoid the pattern
write A, ..., look up A.key
with various simple cache schemes, trying to code around e.g.
write A, ..., complex query that *might* fetch A
requires you to retain a copy of A and determine whether it would satisfy the WHERE clause of the query in separate code, simply because you can't rely on the query results. Unless your system is very small and simple, trying to do this system-wide will produce a super-complex, fragile, expensive, and ugly code base. I strongly recommend you don't try it.
The usual purpose of a master/slave persistent store organization is to off-line read traffic that's not time-dependent on writes. For example, if your system mines data to produce summaries accessible to users, you'd offline the metric computation and have it mine the slave. This prevents mining queries from drawing resources away from user request handling. The small delay between write on master and copy to slave is no problem.
If your app is struggling because there's too much load on persistent store, you probably want partitioned data (sometimes called sharding), not master/slave. Partitioning can expose you to a different kind of problem: no cross-partition transactions. But this is usually easier to work through than what you're attempting.
After studying this area, I agree with Gene that master slave should only be used for reads that have been written a significant time before the read.
My ORIGINAL concept was that its better to utilize a functional programming style whereby the process retains all the information in the parameters and then doesn't make recourse to the database. The downside of this approach is that the human mind has a hard time with functional programming and in a massive computer program it makes sense to not insist on this added complication.
If you want to write a functional method or process then that is great and very efficient but there shouldn't be anything in the code that insists on this.
I'm doing some research on the workflow concepts and specifically BPMN standard. And I'm mostly interested in the available software on the subject.
I've already studied software like Activiti and jBPM, both of which are implemented in Java. As great as they are, I'm looking for something else. Even though such software call themselves BPM Engine I would rather name them BPM Engine Servers. They are stand alone servers (with web based GUI) which makes it really hard to embed them in other servers.
Now my question is: Is there a concept as BPM Engine in the manner it only executes the given BPM with the given data, only one step? Without any GUI or direct user interaction (something like a library)? What should I search for? What is it named? Are my expectations valid?
[UPDATE]
I've spent the last hours studying Activiti's user guide. I'm still not sure if I can use it the way I want it to! And I'll be grateful if someone can confirm it.
I'm interested in a console-like application which I can run whenever I like, give it the previously running process (most likely serialized as a string). The engine should construct the process based on the given history.
Once the process is reconstructed, I would like to move it forward one step by telling it what has happened. Then it should inform me of the next tasks to be performed and shutdown.
Finally I'll be storing the updated process after getting it as a string (the engine should serialize it in a way so it can unserialize it later).
I don't want the engine to have its own database or memory storage. I want it to shutdown completely once it's done. This is what I mean by Engine, no user interaction, no storage access.
Can any of the BPM engines perform in such a way?
perhaps I am missing your point, but Activiti is really nothing more than a jar file that can be embedded in any other java application. Certainly in order to run Activiti in any meaningful way you need a backing datastore (database) and one or more process definition, but as you can see from the unit tests that are part of Activiti, the database can be in memory and the process definition can be included in the war. There are many examples of Activiti (and likely jBPM) used as simply an embedded state machine with no exposed UI or user interaction.
My company has implemented a number of such solutions for different organizations.
If I have missed your point, feel free to give me an example of your requirement, I'm sure we have addressed it at one time or another.
You might be interested in Bonita BPM.
This open source BPM solution offers an execution engine that can be used as a standalone.
Just like its competitors, it also offers an optional GUI in the form of a web based application: Bonita Portal.
I think the challenge for what you want to do is that most of the BPM Engines separate the definition of the process from the execution. So for most of them you need someplace that will allow you to store the definition long term (typically a database) and then they track the state of a given instance of that definition for you.
If you wanted a truly stateless BPMN "interpretation" engine, then your serialized data would have to include not only the current state of the process, but he process definition as well. I'm sure this can be done, but I don't think any of the engines have taken this approach as doing so would add complexity to the solution, and solves a problem that not many people seem to be asking about.
Additionally it begs the question "given that we now have a process that knows what task it is on, how does that task actually get executed?" In most of the solutions I've seen the execution of the task takes place in the same server as the engine. In some where the execution is in a different technology, the "executor" doesn't understand the Process much at all except to make a call to signal "okay this thing is done" and the engine handles what happens next. You want to have this data in a serialize data structure of some sort, so the question would arise "If we have this stateless BPMN Engine, would the executor of the task have to update the serialized data to indicate the state change for the task".
There are other requirements of the BPMN specification that I think would make your approach very difficult, such as how to handle items like Intermediate Message Events that are either waiting for a specific time, or a message, before moving the process forward. While all of these are potentially solvable, it certainly would take significant re-engineering of current approaches.
I've learned a lot about Erlang the last couple of days and am familiar with component entity systems.
With the process centric approach of Erlang I would suggest that each entity would be an Erlang process instance. As for the CES (Component Entity System) approach I would have a process for like a "MovementSystem" for entities that own a MovementComponent (for example). I would then with tail-recursion "iterate" over all registered entities and send them messages to let them update their own process state rather than doing that as batch-processing by the MovementSystem itself... (what I then wouldn't call an entity system anymore, because in my understanding a CES has all information of all entity and components it is processing, which would mean "shared memory", which is by concept not part of Erlang)
Are those two approaches/paradigms - Erlang and "Component Entity System" - excluding each other, or am I missing something?
(I wouldn't even call this prototype on GitHub (https://gist.github.com/sntran/2986790) a real Component Entity System. This approach there looks more that the Entity-System, it rather looks to me as an gen_event based MQ-approach, for which I would probably use RabbitMQ instead... but that's not relevant here...)
Right now I don't see how these two concepts are even possible to get combined...
Okay, I did further research...
-> https://stackoverflow.com/a/1637134/3850640
This answer to another question to erlang explained it pretty well to me
One thing Erlang isn't really good at: processing big blocks of data.
Where a CES by nature handles a lot of data at once...
So, my answer would be "Yes, it is possible, but not a pretty good choice"...
I do not know about CES, but I do think that you are missing some things.
each entity would be an Erlang process instance
...
let them update their own process state rather than doing that as batch-processing by the MovementSystem itself
...
which would mean "shared memory", which is by concept not part of Erlang
It sounds as if you want to hold all your state in one place. The simplest way to do this is to use one process and have that process keep its own state. However, there are other ways: you could have a "global state" process that everyone can talk to. You can think of ETS as an example of this. Putting the shared state in a separate process makes synchronization much easier.
If you want to do parallel processing, there are many ways to arrange your code: you could have MovementSystem gen_server:cast to all MovementComponents and have them handle things. This probably works best if the different components of an entity interact and you need to know if something is trying to move and talk at the same time. If components are more independent, you might want to just spawn one-off jobs to handle the movement. Finally, it might be the case that running all movement code in serial is cheap and you just want one process per system.
I'm facing difficulties with finding the best CEP product for our problem. We need a distributed CEP solution with shared memory. The main reason for distribution isn't speeding up the process, but having a fallback in case of hardware or software problems on nodes. Because of that, all nodes should keep their own copy of the event-history.
Some less important requirements to the CEP product are:
- Open source is a big pre.
- It should run on a Linux system.
- Running in a Java environment would be nice.
Which CEP products are recommended?
A number of commercial non-open source products employ a distributed data grid to store the stateful event processing data in a fault-tolerant manner. My personal experience is with TIBCO BusinessEvents, which internally uses TIBCO ActiveSpaces. Other products claim do similar things, e.g., Oracle Event Processing uses Oracle Coherence.
Open source solutions, I wouldn't be aware that any of them offers functionality like this out of the box. With the right skills you might be able to use them in conjunction with a data grid (I've seen people try to use Drools Fusion together with infinispan), but there are quite a number of complexities that you need think about that a pre-integrated product would take care of for you (transaction boundaries, data access, keeping track of changes, data modeling).
An alternative you might consider if performance doesn't dictate a distributed/load-balanced setup could be to just run a hot standby, i.e., two engines performing the same CEP logic, but only one engine (the active one) actually triggering outgoing actions. The hot-standby engine would be just evaluating the CEP logic to have the data in its memory ready to take over in case of failure but not trigger outgoing actions as long as the other engine is running.
I know there is a lot of talk about BPM these days and I am conscious that some may see it to be a craze rather than a fundamentally important piece of software.
As someone from what most would call 'The Business', I have been doing my best to learn about BPM to ensure we continue to make decisions that not only make sense to the business, but IT as well.
I have noticed while reading that mention is made to application workflow when sometimes discussing BPM. I hadn't given this much thought until recently.
Therefore, what is the difference? When would you use one and not the other?
BPM is about the process and improving it, which takes into account users and potentially more than one application,e.g. an ERP system may have more than one application to it, though there may be other uses of the term. Note that the process could be viewed without what applications or technologies are used.
Application workflow is how an application is used to go from a to b. Here it is a specific set of code that is used and what happens over the course of an application getting from a to b. In this case, the application is front and center rather than the process.
Does that provide an answer? Another way to think of it is that multiple application workflows can make up a system which is used in a process that can have BPM applied to it.
Late to the game, but workflow is to database as BPMS is to DBMS. (Convenient how the letters line up, huh?)
IOW, BPM(S) is traditionally meant to refer to a particular framework/application that allows you to manage business processes: defining them, storing them, versioning them, measuring them, etc. This is similar to how a DBMS manages databases.
Now, a workflow is a definition, much like a database is a definition. In the former case, it is a definition of operations/work (Fufill Order), steps thereof (Send Invoice) and rules/constraints on the work (If no stock, send notice). In the latter, similar case, it is a definition of data structure (CREATE TABLE) and constraints (InvoiceTotal must be > $0.00).
I think this is a potentially confusing subject, particular as some development environments use a type of process flow model to generate user facing applications (I'm thinking about Outsystems here, for example).
But, for me, the distinction is crystal clear. Application workflow, as people talk about it, refers to a user's path through an application, i.e. the pages they complete/visit, the data they enter, etc. on their way to completing a transaction of some sort. Application orkflow is a poor term for this though, I think application flow would be more meaningful.
BPM on other hand, is about modelling and executing a workflow process. By workflow, in this context, I mean a series of discrete steps (or tasks) that have to be completed (either programmatically or via human interaction) in a certain order to complete a process. These tasks can be implemented as individual application modules (each with their own "application workflow", see above). The job of the workflow engine is to make sure that these separate steps are assigned to the right people (of groups of people) in the right sequence, and that overall the process completes in an orderly way.
I don't think there's a clear answer to this at all. These are words, as opposed to theoretical concepts. If you add the word "checklist" into the mix - that just turns out to be a linear version of a process (but you can have conditionals in checklists - making them a workflow).
I am not sure how to help in reframing this question, but it's almost as if no answer can ever be possible. My own thoughts are at https://tallyfy.com/improving-efficiency-workflow-vs-business-process-management/