In my scenario I have a few dozens of Cypher queries executed one after another. If any of them returns some data (reveals some knowledge), at the end of the loop the graph is changed accordingly and all the queries are executed again.
Currently I store all the queries as Strings. There are never more than 20 loops, but still having to parse all the queries every time seems a an overhead. Is there a way to optimize it, like by storing the queries in some precompiled state? Or there's nothing to worry about?
Any other hints that would make the above scenario work faster?
As others have pointed out in the comments, you should use query parameters where possible. This has two benefits:
You can reuse the queries in your code without having to parse / construct the strings given whatever values you want to include.
Performance. The cypher compiler caches the execution plan for Cypher queries (ie queries it has seen before). If you use query parameters you will not incur the overhead of generating the query plan when executing the Cypher query again.
http://neo4j.com/docs/stable/cypher-parameters.html
http://neo4j.com/docs/stable/tutorials-cypher-parameters-java.html
Related
I have a rather long and complex paginated query. I'm trying to optimize it. In the worst case - first, I have to execute the data query in a one call to Neo4j, and then I have to execute pretty much the same query for the count. Of course, I do everything in one transaction. Anyway, I don't like the overall execution time, so I extracted the most common part for both - data and count queries and execute it on the first call. This common query returns the IDs of nodes, which I then pass as parameters to the rest of data and count queries. Now, everything works much faster. One thing I don't like is that a common query can sometimes return quite a large set of IDs.. it can be 20k..50k Long IDs.
So my question is - because I'm doing this in a one transaction - is there a way to preserve such Set of IDs somewhere in Neo4j between common query and data/count query calls and just refer them somehow in the subsequent data/count queries without moving between app JVM and Neo4j?
Also, am I crazy for doing this, or is this a good approach to optimize a complex paginated query?
Only with a custom procedure.
Otherwise you'd need to return them.
But usually it's uncommon to both provide counts (even google doesn't provide "real" counts) and data.
One way is to just stream the results with the reactive driver as long as the user scrolls.
Otherwise I would just query for pageSize+1 and return "more than pageSize results".
If you just stream the id's back (and don't collect them as an aggregation) you can start using the id's received already to issue your new queries (even in parallel).
I've always worked with relational DBs and recently decided to migrate a performance-critial service from SQL Server to Tarantool with a hope to take advantage of the fast in-memory search and processing. I've got a couple of questions while planning for the migration.
I've got a table with about one million records containing pricing information which means I'm dealing mostly with numbers and uuids. First, I need to run a select containing multiple conditions to get a subset of the data, like
SELECT * FROM rates WHERE SupplierId = #SupplierId AND ProductId = #ProductId AND (LocalDistributionZoneId = #LocalDistributionZoneId OR LocalDistributionZoneId IS NULL)
Q1: What is the strategy of running such a query in Lua? Do I create an index for each field in the predicate or I can go along with one secondary composite index?
Q2: Will it be more covenient to run such a query in SQL (box.sql.execute) rather than in pure Lua? Will it be considerably slower than running the same query in pure Lua?
Q3: If I use SQL, is it possible to review the execusion plan to make sure that the query I run really uses the indexes I've defined in the space?
Ok, after I've get the results from the first query I need to analyse the data and then based on the results of analysis, run one more query on the dataset returned by the first query.
Q4: Can Tarantool help me in dealing with the intermediate dataset? More specifically, may I somehow run more queries against the intermediate subset of tuples leveraging the indexes created in the space? Or, I would need to implement alternative strategies like re-add the intrim results to a temporary space with pre-defined indexes and then do another select, or implement further search myself?
Thank you!
Don't. Use SQL, it's faster: it doesn't create garbage collected objects for intermediate execution results.
Yes, please use our SQL features for that.
Use EXPLAIN statement.
I don't know what you exactly mean by "help". You could try to whatever strategy works best: create a more complex query, save the original query in a view to use in the resulting query, create a temporary table and work with it. To give more details let's look if the execution plan Tarantool chooses is good enough or you have to manually optimize it.
I've got a complicated query that I need to run, and it can potentially yield a large result set. I need to iterate linearly through this result set in order to crunch some numbers.
I'm executing the query like so:
ActiveRecord::Base.connection.select_all(query)
find_in_batches Won't work for my use case, as it's critical that I get the records in a custom order. Also, my query returns some fields that aren't part of any models, so I need to get the records as hashes.
The problem is, select_all is not lazy (from what I can tell). It loads all of the records into memory. Does Rails have a way to lazily get the results for a custom SQL query? .lazy doesn't seem applicable here, as I need custom ordering of the results.
This is possible in other languages (C#, Haskell, JavaScript), so it seems like it would be possible in Ruby.
Not sure but maybe you're asking for eager_load or preload.
http://blog.arkency.com/2013/12/rails4-preloading/
Hope this can help you.
You can try find_each or find_in_batches ActiveRecord methods.
Both query database in configurable-sized batches.
The difference it that find_each yields objects one-by-one to block (they are lazy initialized).
find_in_batches yields whole batch group.
If you can't use above methods due to custom sorting, what you can do is query the database using limit and offset. This way you will deal with data in portions. Memory consumption will decrease, but number of queries will increase.
Other solution may be to let database engine perform arithmetic operations, that you need and return calculated result.
I have a very long Cypher request in my app (running on Node.Js and Neo4j 2.0.1), which creates at once about 16 nodes and 307 relationships between them. It is about 50K long.
The high number of relationships is determined by the data model, which I probably want to change later, but nevertheless, if I decide to keep everything as it is, two questions:
1) What would be the maximum size of each single Cypher request I send to Neo4J?
2) What would be the best strategy to deal with a request that is too long? Split it into the smaller ones and then batch them in a transaction? I wouldn't like to do that because in this case I lose the consistency that I had resulting from a combination of MERGE and CREATE commands (the request automatically recognized some nodes that did not exist yet, create them, and then I could make relations between them using their indices that I already got through the MERGE).
Thank you!
I usually recommend to
Use smaller statements, so that the query plan cache can kick in and execute your query immediately without compiling, for this you also need
parameters, e.g. {context} or {user}
I think a statement size of up to 10-15 elements is easy to handle.
You can still execute all of them in a single tx with the transactional cypher endpoint, which allows batching of statements and their parameters.
I have 7000 objects in my Db4o database.
When i retrieve all of the objects it's almost instant..
When i add a where constrain ie Name = "Chris" it takes 6-8 seconds.
What's going on?
Also i've seen a couple of comments about using Lucene for search type of queries does anyone have any good links for this?
There are two things to check.
Have you added the 'Db4objects.Db4o.NativeQueries'-assembly? Without this assembly, a native query cannot be optimized.
Have set an index on the field which represents the Name? A index should make query a lot faster
Index:
cfg.Common.ObjectClass(typeof(YourObject)).ObjectField("fieldName").Indexed(true);
This question is kinda old, but perhaps this is of any use:
When using native queries, try to set a breakpoint on the lambda expression. If the breakpoint is actually invoked, you're in trouble because the optimization failed. To invoke the lambda, each of the objects will have to be instantiated which is very costly.
If optimization worked, the lambda expression tree will be analyzed and the actual code won't be needed, thus breakpoints won't be triggered.
Also note that settings indexes on fields must be performed before opening the connection.
Last, I have a test case of simple objects. When I started without query optimization and indexing (and worse, using a server that was forced to use the GenericReflector because I failed to provide the model .dlls), it too 600s for a three-criteria query on about 100,000 objects. Now it takes 6s for the same query on 2.5M objects so there is really a HUGE gain.