Neo4j batching using REST interface locking database? - neo4j

When batching several queries in an HTTP requests for Neo4j, does that cause the graph database to perform all the queries in the HTTP request before moving to the next request?
Could this potentially mean that a large enough batch would lock the whole database for the time it takes to perform all queries in the batch? Or are they somehow run in parallel?
Is the batching using the REST interface (and py2neo) using the batch inserter (so its non transactional) or normal transactional insertion?
Thanks

It performs all queries in the batch request, but other queries can come in in parallel and are executed on other threads. It is only if your batch-request consumes all CPU, Memory, IO that it affects other queries.
I would use the transactional API from 2.x on.

Related

Do requests in a batch run In parallel?

As long as I do not use the dependsOn property, do the requests in a batch request run in parallel?
I have heard that they may not, and for performance reasons, it may be better to send individual requests in parallel from my code, so I'm wondering if that's truly the case.
It really depends on what requests are in the batch and with entities they are touching. But in general yes, requests are run in parallel if you don't add a depends on property. Implementation details may vary though. It is possible that large batches are sliced into subsets of requests and one subset is executed at a time (with all requests in the subset being executed in parallel).
No matter what you'll save HTTP connections handshakes, HTTP request headers and more when using a batch vs lots of requests.

How to perform parallel processing on a "single" rails API in server side?

There are a lot of methods to handle "multiple" API requests in server side. Parallel processing can be implemented here.
But i would like to know how a single API can be parallely processed.
For example:
If an API request executes a method say method1.
def method 1
.....
......
.......
end
If method1 is a long method which may take a long time for processing (include multiple loops and database queries), instead of processing it sequentially, is there a scope for parallel processing there?
One way would be using resque for creating background jobs. But is there any other way to do it and if so how should the code be written to accommodate the requirement.
And is there any server side method to do it which is not ruby specific?
Note that there is a huge difference between event based servers and background jobs.
An event based server often runs on a single thread and uses callbacks for non-blocking IO. The most famous example is Node.js. For ruby there is the eventmachine library and various frameworks and simple HTTP servers.
An evented server can start processing one request and then switch to another request while the first is waiting for a database call for example.
Note that even if you have a event based server you can't be slow at processing requests. The user experience will suffer and clients will drop the connection.
Thats where background jobs (workers) come in, they let your web process finish fast so that it can send the response and start dealing with the next request. Slow processes like sending out emails or cleanup that don't have does not require user feedback or concurrency are farmed out to workers.
So in conclusion - if your application is slow then using parallel processing is not going to save you. Its not a silver bullet. Instead you should invest in optimising database queries and leveraging caching so that responses are fast.
While you could potentially run database queries or other operations in parallel in Rails, the greatly added complexity is probably not worth the performance gains.
What I mean here is that with what you are usually doing in Rails concurrency is not really applicable - you're fetching something from the DB and using it to make JSON or HTML. You can't really start rendering until you have the results back anyways. While you could potentially do something like fetch data and use it to render partials concurrently Rails does not support this out of the box since it would greatly increase the complexity while not offering much to the majority of the users of the framework.
As always - don't optimise prematurely.

In which conditions Jena turns to parallel?

I've run queries on Jena TDB and Jena In memory triple store, both on a single core machine and on a machine with 16 CPUs. I observe that in 16Cores machine Jena spawns lot of threads to handle both query and inference operations.
So I wonder, does Jena by default behaves in parallel ? or is there a way to force or avoid parallelism?
No, Jena in of itself uses minimal parallelism. In particular the query engine is entirely non-parallel in its design and implementation. Jena may on its own spawn an additional thread per-query in order to be able to monitor and kill the query if it exceeds a timeout but that will depend on the app configuration.
You've left out a lot of detail about your set up but I assume you are using Fuseki as a server for your tests?
Fuseki using the Jetty web-server framework which inherently has parallelism built in to be able to service multiple requests. So the parallelism you observe is likely just a side-effect of the web-server parallelism. If you are sending multiple requests to the the server in parallel then you will see parallel request processing on the server side.

neo4j REST API slow

I am using Neo4j 2.0.0M4 community edition with Node.js with https://github.com/thingdom/node-neo4j to access the Neo4j DB server over REST API by passing Cypher queries.
I have observed that the data returned by Neo4j from the webadmin of neo4j and even from the REST APi is pretty slow. for e.g.
a query returning 900 records takes 1.2s and then subsequent runs take around 200ms.
and similarly if the number of records go upto 27000 the query in the webadmin browser takes 21 sec.
I am wondering whats causing the REST API to be so slow and also how to go about improving the performance?
a) It's using the CYPHER? the jSON parsing or
b) the HTTP Overhead itself as similar query with 27000 records returned in mysql takes 11 ms
Any help is highly appreciated
Neo4j 2.0 is currently a milestone build that is not yet performance optimized.
Consider enabling streaming and make sure you use parameterized Cypher.
For large result sets the browser consumes a lot of time for rendering. You might try the same query using cURL to see a difference.

Mnesia Replication and Large Numbers of Dirty Operations

Some applications require really fast response, to meet their expectations to users. I am building one such application and i am using mnesia. Now, when we by-pass the mnesia transaction manager , we approach good performance. However, this is the problem: We need to replicate this database as part of load balancing, after-all, mnesia does the replication for us. We are using ONLY dirty operations in this application. We have a few parts using async_dirty context. I am wondering, would mnesia replication be affected if we are not using the transaction context at this scale ? Too many frequent dirty operations are occuring on records all the time, so i wonder if a request made on side B replica, would find the changes the have just been made by side A replica via a dirty operation ?
According to Mnesia User's Guide:
async_dirty activities "will wait for the operation to be performed on one node but not the others".
For sync_dirty activities: "The caller will wait for the updates to be performed on all active replicas".

Resources