Why is a query 10 times slower through code as opposed to the Neo4j Client? - neo4j

Recently I have been given the task to research and deliver a proof of concept of the effects of a NoSQL database. I have chosen Neo4j as the NoSQL database for this .NET application.
The thing is... When I execute a query using the Neo4j Client, it runs in 10-20 ms which is fantastic. Whenever I execute that query through code it takes 150-200 ms, which is huge difference.
The query is the following (a Bank is the dutch equivalent of a database)
The goal I want to achieve is get every Bank with their Children (To get the whole hierarchy):
MATCH (bank:Bank)-[:PARENT_OF]->(bank2:Bank)
Return (bank.id),collect(bank2.id)
This is the code I have used to execute the query.
var query = client.Cypher.Match("(bank:Bank)-[:PARENT_OF]->(child:Bank)")
.Return((bank, child) => new
{
Bank = bank.As<Bank>(),
Children = child.CollectAs<Bank>()
});
var list = query.Results
My question is: Why is the query 10 times slower through code as opposed to the Neo4j Client?

I presume you're comparing to the web version?
There are lots of overheads with using the client - things like the OGM (Object Graph Mapping) sap up some performance, and then you have to add the overhead of things like the actual HTTP calls.
The client on the web (localhost:7474) doesn't have to contend with this.
You might also notice that the web client displays different things - things which the API is not returning. I imagine you get a graph showing the Banks all joined together with nice relationships - if you run the query and look at the REST response - you'll notice there is no relationship data there, so it must be calling something else.
I know this isn't an ideal answer :/

Related

Neo4j Cypher optimization of complex paginated query

I have a rather long and complex paginated query. I'm trying to optimize it. In the worst case - first, I have to execute the data query in a one call to Neo4j, and then I have to execute pretty much the same query for the count. Of course, I do everything in one transaction. Anyway, I don't like the overall execution time, so I extracted the most common part for both - data and count queries and execute it on the first call. This common query returns the IDs of nodes, which I then pass as parameters to the rest of data and count queries. Now, everything works much faster. One thing I don't like is that a common query can sometimes return quite a large set of IDs.. it can be 20k..50k Long IDs.
So my question is - because I'm doing this in a one transaction - is there a way to preserve such Set of IDs somewhere in Neo4j between common query and data/count query calls and just refer them somehow in the subsequent data/count queries without moving between app JVM and Neo4j?
Also, am I crazy for doing this, or is this a good approach to optimize a complex paginated query?
Only with a custom procedure.
Otherwise you'd need to return them.
But usually it's uncommon to both provide counts (even google doesn't provide "real" counts) and data.
One way is to just stream the results with the reactive driver as long as the user scrolls.
Otherwise I would just query for pageSize+1 and return "more than pageSize results".
If you just stream the id's back (and don't collect them as an aggregation) you can start using the id's received already to issue your new queries (even in parallel).

Is it possible to execute read only cypher queries from java?

I'd like to know just what the title says.
The reason I'd want this is to permit constrained read-only cypher queries to be executed; the data results would later be interpreted and serialized by a separate API layer.
I've seen code that makes basic assumptions in an attempt to mimic this behavior, e.g. the code might filter out any Cypher query that contains certain special words associated with write query structures (merge, create, delete, set, and so on).
This approach tends to be limited and naive though; if it very simply looks for those tokens, it would prevent a query like MATCH n WHERE n.label =~ '.*create.*' RETURN n even though it's a read-only query.
I'd really prefer not to do a full parse on a candidate query and then descend through the AST trying to figure out whether something is read-only or not (although I would gladly accept an answer that shows how to do this easily in java)
EDIT - I'm aware it's possible to start the entire database in read-only mode via the configuration property read_only=true, but this would be undesirable; no other aspect of the java API would be able to change the database.
EDIT 2 - I found another possible strategy, but I'm not sure of its advisability. Comments welcome on this, and potential downsides:
try (Transaction ignore = graphDb.beginTx()) {
ExecutionResult result = executionEngine.execute(query);
// Do nifty stuff with result, then...
// Force transaction to fail.
ignore.failure();
}
The idea here is that if queries happen within transactions and the transaction is always force-failed, then nothing can ever be written to the DB no matter what the result.
Read-only Cypher is (not yet) directly supported. However I can think of two workarounds for that:
1) assuming you're running a Neo4j enterprise cluster: you can set read_only=true on one instance. That instance is then used for the read only queries where the other cluster instances are used for r/w. A load balancer in front of the cluster can be set up to send the requests to the right instance.
2) Use a TransactionEventHandler that vetos a transaction if its TransactionData contains write operations. Just for fun I've invested some minutes to implement that, see https://github.com/sarmbruster/read-only-cypher - feedback is appreciated.

oData - applying filters to SQL queries

I am relatively new to oData service and I am trying to explore if oData is feasible for my project.
From all the examples / demos that I have come across,every demo always loads up all data into the repository and then oData filters are applied over the data.
Is there a way to not load up all data (apply the filters to SQL from oData) from SQL which will obviously be highly inefficient for N number of requests coming in /second ?
So for example if I had a movies service :
localhost:4502/OdataService/movies(55)
The above example is actually just filtering for movie id 55 from an "entire" set of movies.Is there a way to make this filter happen at SQL level instead of bloating the memory first with all movies and then allowing oData to filter it?
Can anyone guide me in the right direction?
I found out after doing a small POC that Entity framework takes care of building dynamic query based on the request.

Linq-to-SQL query - Need to filter by IDs returned by Full-Text Search sql functions - Hitting limit for Contains

My objective:
I have built a working controller action in MVC which takes user input for various filter criteria and, using PredicateBuilder (part of LinqKit - sorry, I'm not allowed enough links yet) builds the appropriate LINQ query to return rows from a "master" table in SQL with a couple hundred thousand records. My implementation of the predicates is totally inelegant, as I'm new to a lot of this, and under a very tight deadline, but it did make life easier. The page operates perfectly as-is.
To this, I need to add a Full-Text search filter. Understanding the way LINQ translates Contains to LIKE(%%), using the advice in Simon Blog: LINQ-to-SQL - Enabling Full-Text Searching, I've already prepared Table Functions in SQL to run Freetext queries on the relevant columns. I have 4 functions, to match the query against 4 separate tables.
My approach:
At the moment, I'm building the predicates (I'll spare you) for the initial IQueryable data object, running a LINQ command to return them, like so:
var MyData = DB.Master_Items.Where(outer);
Then, I'm attempting to further filter MyData on the Keys returned by my full-text search functions:
var FTS_Matches_Subtable_1 = (from tbl in DB.Subtable_1
join fts in DB.udf_Subtable_1_FTSearch(KeywordTerms)
on tbl.ID equals fts.ID
select tbl.ForeignKey);
... I have 4 of those sets of matches which I've tried to use to filter my original dataset in several ways with no success. For instance:
MyNewData = MyData.Where(d => FTS_Matches_Subtable_1.Contains(d.Key) ||
FTS_Matches_Subtable_2.Contains(d.Key) ||
FTS_Matches_Subtable_3.Contains(d.Key) ||
FTS_Matches_Subtable_4.Contains(d.Key));
I just get the error: The incoming tabular data stream (TDS) remote procedure call (RPC) protocol stream is incorrect. Too many parameters were provided in this RPC request. The maximum is 2100.
I get that it's because I'm trying to pass a relatively large set of data into the Contains function and LINQ is converting each record into a separate parameter, exceeding the limit.
I just don't know how to get around it.
I found another post linq expression to return property value which seemed SO promising. I tried ifwdev's solution (2nd highest ranked answer): using LinqKit to build an extension that will break up the queries into manageable chunks. But I can't figure out how to implement it. Out of my depth right now maybe?
Is there another approach that I'm missing? Some simpler way to accomplish this that I've overlooked?
Sorry for the long post. But thank you for any help you can provide!
This is a perfect time to go back to raw ado.net.
Twisting things around just to use linq to sql is probably just as time consuming if you wrote the query and hydration by hand.

Is a full list returned first and then filtered when using linq to sql to filter data from a database or just the filtered list?

This is probably a very simple question that I am working through in an MVC project. Here's an example of what I am talking about.
I have an rdml file linked to a database with a table called Users that has 500,000 rows. But I only want to find the Users who were entered on 5/7/2010. So let's say I do this in my UserRepository:
from u in db.GetUsers() where u.CreatedDate = "5/7/2010" select u
(doing this from memory so don't kill me if my syntax is a little off, it's the concept I am looking for)
Does this statement first return all 500,000 rows and then filter it or does it only bring back the filtered list?
It filters in the database since your building your expression atop of an ITable returning a IQueryable<T> data source.
Linq to SQL translates your query into SQL before sending it to the database, so only the filtered list is returned.
When the query is executed it will create SQL to return the filtered set only.
One thing to be aware of is that if you do nothing with the results of that query nothing will be queried at all.
The query will be deferred until you enumerate the result set.
These folks are right and one recommendation I would have is to monitor the queries that LinqToSql is creating. LinqToSql is a great tool but it's not perfect. I've noticed a number of little inefficiencies by monitoring the queries that it creates and tweaking it a bit where needed.
The DataContext has a "Log" property that you can work with to view the queries created. I created a simple HttpModule that outputs the DataContext's Log (formatted for sweetness) to my output window. That way I can see the SQL it used and adjust if need be. It's been worth its weight in gold.
Side note - I don't mean to be negative about the SQL that LinqToSql creates as it's very good and efficient almost every time. Another good side effect of monitoring the queries is you can show your friends that are die-hard ADO.NET - Stored Proc people how efficient LinqToSql really is.

Resources