Orleans parallel reads - orleans

I am new to orleans so forgive me if this is a dumb question.
I recently watched a tutorial where the presenter created an aggregate actor that was used to query a collection of state of other individual actors.
Since actors cannot execute actions in parallel is this really an anti pattern to do reads in this way? Or is orleans so fast because it can query in memory that it does not really matter that reads are not in parallel?

Related

master slave exposes technical debt

Using rails and postgresql.
I wrote my app without having in mind to use a master slave configuration.
Now, I've gotten master slave set up in the app and now I'm running into some technical debt. The same process in my app writes to the db and then immediately reads from the db. The read is not taking place on the read db but the data isn't there. Before, this wasn't efficient but it didn't cause any problems because both dbs were the same. Now, this is blowing up in my face.
The problem for me is that its difficult to find all the places in the code where this problem exists. Can someone can please suggest to me a technique to get my tests to run in such a way where the reads and the writes use different dbs that aren't updated so that I can figure out where my issues are?
Other solutions will also be welcomed!
I strongly recommend you rethink your master/slave configuration or whether master/slave is even right for your application.
It's not "tech debt" to build a system that assumes data written to persistent store can be read back immediately. It's normal and correct. While you might reasonably be able to avoid the pattern
write A, ..., look up A.key
with various simple cache schemes, trying to code around e.g.
write A, ..., complex query that *might* fetch A
requires you to retain a copy of A and determine whether it would satisfy the WHERE clause of the query in separate code, simply because you can't rely on the query results. Unless your system is very small and simple, trying to do this system-wide will produce a super-complex, fragile, expensive, and ugly code base. I strongly recommend you don't try it.
The usual purpose of a master/slave persistent store organization is to off-line read traffic that's not time-dependent on writes. For example, if your system mines data to produce summaries accessible to users, you'd offline the metric computation and have it mine the slave. This prevents mining queries from drawing resources away from user request handling. The small delay between write on master and copy to slave is no problem.
If your app is struggling because there's too much load on persistent store, you probably want partitioned data (sometimes called sharding), not master/slave. Partitioning can expose you to a different kind of problem: no cross-partition transactions. But this is usually easier to work through than what you're attempting.
After studying this area, I agree with Gene that master slave should only be used for reads that have been written a significant time before the read.
My ORIGINAL concept was that its better to utilize a functional programming style whereby the process retains all the information in the parameters and then doesn't make recourse to the database. The downside of this approach is that the human mind has a hard time with functional programming and in a massive computer program it makes sense to not insist on this added complication.
If you want to write a functional method or process then that is great and very efficient but there shouldn't be anything in the code that insists on this.

Neo4j use only one core in Cypher query running

When i run a Cypher query in UI, only one core in server is going up and the query gets stuck or responds very slow.
I use Neo4j 3.0.7 Community.
Someone have idea what i can tune for using all cores?
A single Cypher query is limited to a single thread. See this tweet from late 2015 by Stefan Armbruster:
A cypher statement is (in most cases) one transaction and therefore only on one thread.
If your query is slow, you can use various tricks for optimizing it: this blog post is a good starting point.

Neo4j Traversal API vs. Cypher

When should I choose Neo4j’s traversal framework over Cypher?
For example, for a friend-of-a-friend query I would write a Cypher query as follows:
MATCH (p:Person {pid:'56'})-[:FRIEND*2..2]->(fof)
WHERE NOT (p)-[:FRIEND]->(fof)
RETURN fof.pid
And the corresponding Traversal implementation would require two traversals for friends_at_depth_1 and friends_at_depth_2 (or a core API call to get the relationships) and find the difference of these two sets using plain java constructs, outside of the traversal description. Correct me if I’m wrong here.
Any thoughts?
The key thing to remember about Cypher vs. the traversal API is that the traversal API is an imperative way of accessing a graph, and Cypher is a declarative way of accessing a graph. You can read more about that difference here but the short version is that in imperative access, you're telling the database exactly how to go get the graph. (E.g. I want to do a depth first search, prune these branches, stop when I hit certain nodes, etc). In declarative graph query, you're instead specifying what you want, and you're delegating all aspects of how to get it to the Cypher implementation.
In your query, I'd slightly revise it:
MATCH (p:Person {pid:'56'})-[:FRIEND*2..2]->(fof)
WHERE NOT (p)-[:FRIEND]->(fof) AND
p <> fof
RETURN fof.pid
(I added making sure that p<>fof because friend links might go back to the original person)
To do this in a traverser, you wouldn't need to have two traverser, just one. You'd traverse only FRIEND relationships, stop at depth 2, and accumulate a set of results.
Now, I'm going to attempt to argue that you should almost always use Cypher, and never use the traversal API unless you have very specific circumstances. Here are my reasons:
Declarative query is very powerful, in that it frees you from thinking about the how. All you need to know is what you want. This means you spend more time focusing on what your code is supposed to do, and less time in implementation detail.
The cypher query executor is getting better all the time (version 2.2 will have a cost based planner) and of course they put a lot of effort into making sure cypher exploits all available indexes. I'ts possible that for many queries, cypher would do a better job of finding your data than your traversal, unless you were very careful in coding the traversal.
Cypher is just way less code than writing your own traversal, which will frequently require you to implement certain classes to do specialized stop conditions, etc.
At present, cypher can run in embedded databases, or on the server. If you want to run a traversal, you can't send that remotely to a server to be executed; maybe at best you could write a server extension that did the traversal. So I think cypher is more flexible at present.
OK so when should you use traversal? Two key cases that I know of (others may suggest others)
Sometimes you need to execute a complex custom java code operation on everything you traverse. In this case, you're using the traverser as a "visitor function" of sorts, and sometimes traversals are more convenient to use than cypher, depending on the nature of the java you're running on the nodes.
Sometimes your performance requirements are so intense, you need to hand-traverse the graph, because there's some aspect of graph structure that you can exploit in the traverser to make it go faster that Cypher can't take advantage of. This does happen, but going to this first usually isn't a good idea.
An excerpt from the book
Core API, Traversal Framework or Cypher?
The Core API allows developers to fine-tune their queries so that they exhibit high
affinity with the underlying graph. A well-written Core API query is often faster than
any other approach. The downside is that such queries can be verbose, requiring considerable
developer effort. Moreover, their high affinity with the underlying graph
makes them tightly coupled to its structure. When the graph structure changes, they
can often break. Cypher can be more tolerant of structural changes—things such as
variable-length paths help mitigate variation and change.
The Traversal Framework is both more loosely coupled than the Core API (because it
allows the developer to declare informational goals), and less verbose, and as a result
a query written using the Traversal Framework typically requires less developer effort
than the equivalent written using the Core API. Because it is a general-purpose
framework, however, the Traversal Framework tends to perform marginally less well
than a well-written Core API query.
If we find ourselves in the unusual situation of coding with the Core API or Traversal
Framework (and thus eschewing Cypher and its affordances), it’s because we are
working on an edge case where we need to finely craft an algorithm that cannot be
expressed effectively using Cypher’s pattern matching. Choosing between the Core
API and the Traversal Framework is a matter of deciding whether the higher abstraction/
lower coupling of the Traversal Framework is sufficient, or whether the close-tothe-
metal/higher coupling of the Core API is in fact necessary for implementing an
algorithm correctly and in accordance with our performance requirements.
Ref: Graph Databases, New Opportunities for Connected Data, p161
What is cypher?
Definition goes in developer doc as follows: cypher is a declarative, SQL-inspired language for describing patterns in graphs visually using an ascii-art syntax.
You can find more about it here.
What is core API practically?
I found this page having following sentence:
Besides an object-oriented API to the graph database, working with Node, Relationship, and Path objects, it also offers highly customizable, high-speed traversal- and graph-algorithm implementations.
So practically speaking core API deals with basic objects such as Node, Relationship which belongs to org.neo4j.graphdb package.
You can find more at its developer guide.
What is traversal API practically?
Traversal API adds more interfaces to core API to help us conveniently perform traversal, instead of writing the whole traversal logic from scratch. These interfaces are contained in org.neo4j.graphdb.traversal package.
You can find more at its developer guide.
The relation between all three
According to this answer:
The Traversal API is built on the Core API, and Cypher is build on the Traversal API; So anything you can do in Cypher, can be done with the other 2.
Same example done with all three
This tutorial from 2012 shows all three in action for performing same task, with Core API being fastest. It includes a quote from Andres Taylor:
Cypher is just over a year old. Since we are very constrained on developers, we have had to be very picky about what we work on the focus in this first phase has been to explore the language, and learn about how our users use the query language, and to expand the feature set to a reasonable level.
I believe that Cypher is our future API. I know you can very easily outperform Cypher by handwriting queries. like every language ever created, in the beginning you can always do better than the compiler by writing by hand but eventually, the compiler catches up
Article's conclusion:
So far I was only using the Java Core API working with neo4j and I will continue to do so.
If you are in a high speed scenario (I believe every web application is one) you should really think about switching to the neo4j Java core API for writing your queries. It might not be as nice looking as Cypher or the traverser Framework but the gain in speed pays off.
Also I personally like the amount of control that you have when traversing over the core yourself.

Is a Component Entity System implemented in Erlang even possible?

I've learned a lot about Erlang the last couple of days and am familiar with component entity systems.
With the process centric approach of Erlang I would suggest that each entity would be an Erlang process instance. As for the CES (Component Entity System) approach I would have a process for like a "MovementSystem" for entities that own a MovementComponent (for example). I would then with tail-recursion "iterate" over all registered entities and send them messages to let them update their own process state rather than doing that as batch-processing by the MovementSystem itself... (what I then wouldn't call an entity system anymore, because in my understanding a CES has all information of all entity and components it is processing, which would mean "shared memory", which is by concept not part of Erlang)
Are those two approaches/paradigms - Erlang and "Component Entity System" - excluding each other, or am I missing something?
(I wouldn't even call this prototype on GitHub (https://gist.github.com/sntran/2986790) a real Component Entity System. This approach there looks more that the Entity-System, it rather looks to me as an gen_event based MQ-approach, for which I would probably use RabbitMQ instead... but that's not relevant here...)
Right now I don't see how these two concepts are even possible to get combined...
Okay, I did further research...
-> https://stackoverflow.com/a/1637134/3850640
This answer to another question to erlang explained it pretty well to me
One thing Erlang isn't really good at: processing big blocks of data.
Where a CES by nature handles a lot of data at once...
So, my answer would be "Yes, it is possible, but not a pretty good choice"...
I do not know about CES, but I do think that you are missing some things.
each entity would be an Erlang process instance
...
let them update their own process state rather than doing that as batch-processing by the MovementSystem itself
...
which would mean "shared memory", which is by concept not part of Erlang
It sounds as if you want to hold all your state in one place. The simplest way to do this is to use one process and have that process keep its own state. However, there are other ways: you could have a "global state" process that everyone can talk to. You can think of ETS as an example of this. Putting the shared state in a separate process makes synchronization much easier.
If you want to do parallel processing, there are many ways to arrange your code: you could have MovementSystem gen_server:cast to all MovementComponents and have them handle things. This probably works best if the different components of an entity interact and you need to know if something is trying to move and talk at the same time. If components are more independent, you might want to just spawn one-off jobs to handle the movement. Finally, it might be the case that running all movement code in serial is cheap and you just want one process per system.

One actor per simulated object, or a manager actor?

I'm developing a simulation that will feature many entities constantly updating, perhaps 30 times a second. Let's imagine we have 1000 entities, each of which has a velocity, and consequently a position that must be updated every tick.
So, how would you implement this using the actor model? I'm not necessarily using Erlang for this project, but for the sake of argument, let's just say I am. Would you have an actor for each of these entities? Or would you have a "manager" actor that maintains and updates a list of these entities?
Learn You Some Erlang says:
It is true that Erlang processes are very light: you can have hundreds
of thousands of them existing at the same time, but this doesn't mean
you have to use it that way just because you can. For example,
creating a shooter game where everything including bullets is its own
actor is madness. The only thing you'll shoot with a game like this is
your own foot. There is still a small cost in sending a message from
actor to actor, and if you divide tasks too much, you will make things
slower!
So that seems to suggest that managers would be better. Or is there a third option that I'm not seeing?
You say it! there is not one single good solution.
Now to be more helpful, and with the few background I have, I think you should look at these aspects of your project:
You say simulation. If you need to refresh a collection of entities every 30ms, first work to simplify the operations and the data model, and only second think how you can traverse the collection of data efficiently.
On the other end, if you have a huge and/or evolving collection of objects, with trivial algorithm/data model, then look at smarter data structure than lists, take care of the data copy...
If you use a multi-core (or a cluster) then think to have your entities grouped in several super entities in order to take advantage of parallelism, managing them in separate processes.
Next think if these groups can help you to reduce the number of evaluation (have an adaptive time slice? evaluation on demand? ...) .
Last, I think than generally speaking, Erlang is compact and easy to refactor, so take advantage of this an define some functional steps, and for each of them,
make them work, make them right and make then fast (Kent Beck ?)
For the last step you can get some help from the profiling tools such as fproof
Courage :o)
I think Learn You Some Erlang is making a bit of a premature optimization blunder here. You should use which ever abstraction makes the most sense to you, measure any problems, and refactor if necessary. Personally, I believe modeling each particle as its own actor would be the easiest to deal with, and is also the most idiomatic approach for the Actor model. Practically, however, you should do whatever floats your boat.

Resources