One single Azure SQL query is consuming almost all query_stats.total_worker_time and query_stats.execution_count - entity-framework-4

I'm running a production website for 4 years with azure SQL.
With help of 'Top Slow Request' query from alexsorokoletov on github I have 1 super slow query according to Azure query stats.
The one on top is the one that uses a lot of CPU.
When looking at the linq query and the execution plans / live stats, I can't find the bottleneck yet.
And the live stats
The join from results to project is not directly, there is a projectsession table in between, not visible in the query, but maybe under the hood of entity framework.
Might I be affected by parameter sniffing? Can I reset a hash? Maybe the optimized query plan was used in 2014 and now result table is about 4Million rows and the query is far from optimal?
If I run this query in Management Studio its very fast!
Is it just the stats that are wrong?
Regards
Vincent - The Netherlands.

I would suggest you try adding option(hash join) at the end of the query, if possible. Once you start getting into large arity, loops join is not particularly efficient. That would prove out if there is a more efficient plan (likely yes).
Without seeing more of the details (your screenshots are helpful but cut off whether auto-param or forced parameterization has kicked in and auto-parameterized your query), it is hard to confirm/deny this explicitly. You can read more about parameter sniffing in a blog post I wrote a bit longer ago than I care to admit ;) :
https://blogs.msdn.microsoft.com/queryoptteam/2006/03/31/i-smell-a-parameter/
Ultimately, if you update stats, dbcc freeproccache, or otherwise cause this plan to recompile, your odds of getting a faster plan in the cache are higher if you have this particular query + parameter values being executed often enough to sniff that during plan compilation. Your other option is to add optimize for unknown hints which will disable sniffing and direct the optimizer to use an average value for the frequency of any filters over parameter values. This will likely encourage more hash or merge joins instead of loops joins since the cardinality estimates of the operators in the tree will likely increase.

Related

Query optimization that collects and orders nodes on very large graph

I have a decently large graph (1.8 billion nodes and roughly the same number of relationships) where I am performing the follow query:
MATCH (n:Article)
WHERE n.id IN $pmids
MATCH (n)-[:HAS_MENTION]->(m:Mention)
WITH n, collect(m) as mentions
RETURN n.id as pmid, mentions
ORDER BY pmid
where $pmids are a list of strings, e.g. ["1234", "4567"] where the length of this list varies from 100-500 length.
I am currently am holding the data within neo4j docker community instance with the following conf modifications: NEO4J_dbms_memory_pagecache_size=32G, NEO4J_dbms_memory_heap_max__size=32G. Index has been created for Article.id.
This query has been quite slow to run (roughly 5 seconds) and I would like to optimize to make for faster runtime. As part of work, I have access to neo4j enterprise so one approach would be to ingest this data as part of a neo4j enterprise account where I can tweak advanced configuration settings.
In general, does anyone have any tips in how I may improve performance, whether it be optimizing the cypher query itself, increase workers or other settings in neo4j.conf?
Thanks in advance.
For anyone interested - I posed this question in the neo4j forums as well and there have already been some interesting optimization suggestions (especially around the "type hint" to trigger backward-indexing, and using pattern comprehension instead of collect()
Initial thoughts
you are using a string field to store PMID, but PMIDs are numeric, it might reduce the database size, and possibly perform better if stored as int (and indexed as int, and searched as int)
if the PMID list is usually large, and the server has over half dozen cores, it might be worth looking into the apoc parallel cypher functions
do you really need every property from the Mention nodes? if not try gathering just what you need
what is the size of the database in GBs? (some context is required in terms of memory settings), and what did neo4j-admin memrec recommend?
If this is how the db is always used, all the time, a sql database might be better, and when building that sql db, collect the mentions into one field (once and done)
Note: Go PubMed!

How to run complex queries in Tarantool

I've always worked with relational DBs and recently decided to migrate a performance-critial service from SQL Server to Tarantool with a hope to take advantage of the fast in-memory search and processing. I've got a couple of questions while planning for the migration.
I've got a table with about one million records containing pricing information which means I'm dealing mostly with numbers and uuids. First, I need to run a select containing multiple conditions to get a subset of the data, like
SELECT * FROM rates WHERE SupplierId = #SupplierId AND ProductId = #ProductId AND (LocalDistributionZoneId = #LocalDistributionZoneId OR LocalDistributionZoneId IS NULL)
Q1: What is the strategy of running such a query in Lua? Do I create an index for each field in the predicate or I can go along with one secondary composite index?
Q2: Will it be more covenient to run such a query in SQL (box.sql.execute) rather than in pure Lua? Will it be considerably slower than running the same query in pure Lua?
Q3: If I use SQL, is it possible to review the execusion plan to make sure that the query I run really uses the indexes I've defined in the space?
Ok, after I've get the results from the first query I need to analyse the data and then based on the results of analysis, run one more query on the dataset returned by the first query.
Q4: Can Tarantool help me in dealing with the intermediate dataset? More specifically, may I somehow run more queries against the intermediate subset of tuples leveraging the indexes created in the space? Or, I would need to implement alternative strategies like re-add the intrim results to a temporary space with pre-defined indexes and then do another select, or implement further search myself?
Thank you!
Don't. Use SQL, it's faster: it doesn't create garbage collected objects for intermediate execution results.
Yes, please use our SQL features for that.
Use EXPLAIN statement.
I don't know what you exactly mean by "help". You could try to whatever strategy works best: create a more complex query, save the original query in a view to use in the resulting query, create a temporary table and work with it. To give more details let's look if the execution plan Tarantool chooses is good enough or you have to manually optimize it.

Neo4J using properties on relationships for quicker lookup?

I am yet trying to make use of neo4j to perform a complex query (similar to shortest path search except I have very strange conditions applied to this search like minimum path length in terms of nodes traversed count).
My dataset contains around 2.5M nodes of one single type and around 1.5 billion edges (One single type as well). Each given node has on average 1000 directional relation to a "next" node.
Yet, I have a query that allows me to retrieve this shortest path given all of my conditions but the only way I found to have decent response time (under one second) is to actually limit the number of results after each new node added to the path, filter it, order it and then pursue to the next node (This is kind of a greedy algorithm I suppose).
I'd like to limit them a lot less than I do in order to yield more path as a result, but the problem is the exponential complexity of this search that makes going from LIMIT 40 to LIMIT 60 usually a matter of x10 ~ x100 processing time.
This being said, I am yet evaluating several solutions to increase the speed of the request but I'm quite unsure of the result they will yield as I'm not sure about how neo4j really stores my data internally.
The solution I think about yet is to actually add a property to my relationships which would be an integer in between 1 and 15 because I usually will only query the relationships that have one or two max different values for this property. (like only relationships that have this property to 8 or 9 for example).
As I can guess yet, for each relationship, neo4j then have to gather the original node properties and use it to apply my further filters which takes a very long time when crossing 4 nodes long path with 1000 relationships each (I guess O(1000^4)). Am I right ?
With relationship properties, will it have direct access to it without further data fetching ? Is there any chance it will make my queries faster? How are neo4j edges properties stored ?
UPDATE
Following #logisima 's advice I've written a procedure directly with the Java traversal API of neo4j. I then switched to the raw Java procedure API of Neo4J to leverage even more power and flexibility as my use case required it.
The results are really good : the lower bound complexity is overall a little less thant it was before but the higher bound is like ten time faster and when at least some of the nodes that will be used for the traversal are in the cache of Neo4j, the performances just becomes astonishing (depth 20 in less than a second for one of my tests when I only need depth 4 usually).
But that's not all. The procedures makes it very very easily customisable while keeping the performances at their best and optimizing every single operation at its best. The results is that I can use far more powerful filters in far less computing time and can easily update my procedure to add new features. Last but not least Procedures are very easily pluggable with spring-data for neo4j (which I use to connect neo4j to my HTTP API). Where as with cypher, I would have to auto generate the queries (as being very complex, there was like 30 java classes to do the trick properly) and I should have used jdbc for neo4j while handling a separate connection pool only for this request. Cannot recommend more to use the awesome neo4j java API.
Thanks again #logisima
If you're trying to do a custom shortespath algo, then you should write a cypher procedure with the traversal API.
The principe of Cypher is to make pattern matching, and you want to traverse the graph in a specific way to find your good solution.
The response time should be really faster for your use-case !

will Gremlin graph queries always perform operations in it's own address space?

admittedly, most of my database experience is relational. one of the tenets in that space is to avoid moving data over the network. this manifests by using something like:
select * from person order by last_name limit 10
which will presumably order and limit within the database engine vs using something like:
select * from person
and subsequently ordering and taking the top 10 at the client which could have disastrous effects if there are a million person records.
so, with Gremlin (from Groovy), if i do something like:
g.V().has('#class', 'Person').order{println('!'); it.a.last_name <=> it.b.last_name}[0..9]
i am seeing the ! printed, so i am assuming that this bringing all Person records into the address space of my client prior to the order and limit steps which is not the desired effect.
do my options for processing queries entirely in the database engine become product specific (e.g. for orient-db perhaps submit the query in their flavor of SQL), or is there something about Gremlin that i am missing?
If you want the implementer's query optimizer to kick in, you need to use as many Gremlin steps as possible and avoid pure Groovy/in-memory processing of your graph traversals.
You're most likely looking for something like this (as of TinkerPop v3.2.0):
g.V().has('#class', 'Person').order().by('last_name', incr).limit(10)
If you find yourself using lambdas, chances are often high that this could be done with pure Gremlin steps. Favor Gremlin steps over lambdas.
See TinkerPop v3.2.0 documentation:
Order By step
Limit step

How to improve the performance in big table join?

Please help me out with this big data problem.
I have a very large table (500G) that stores cookie information collected from one website, and I try to provide service to many other clients. For each client, they have their cookies, so in the end I need to do query on 500G+300G(client_data).
Since some query use both my cookie data and client cookie data, it is possible that I need to do a join between my table and their table, therefore the performance is bad. To solve this problem, I put the entire 800GB data into a giant table. Since there is no join table, the performance is good. But when I expand my service to multiple client, it takes too much storage.
Current I am using Vertica as my data source, and use bitmap to store my information.
Any suggestion that can maintain my current performance but also support like 40 cients? My storage is about 12 TB and each client in current solution talkes 1.5T.
what I want is either a replacement of Vertica with can support bitmap operation and quick table join. Or a better way to represent my data.
My storage is about 12 TB and each client in current solution talkes 1.5T.
If you have 40 * 1.5TB worth of non-duplicated cookie data to store, there's no magic to make that fit into 12TB.
This will be an imprecise answer due to the lack of details about definitions, etc. But I would add the following about performance:
Look at your projection definitions. You may be able to get performance gains depending on what you put in the order by clause of the projection.
You have a few ways forward, depending on the specifics of your case. Point 1 and 3 are the easiest to deal with:
You can properly set projections, to make sure that both tables are identically segmented: https://my.vertica.com/docs/6.1.x/HTML/index.htm#12549.htm
You can set up pre join projections, where the join cost is paid during data load, not during data retrieval, see https://my.vertica.com/docs/6.1.x/HTML/index.htm#1299.htm
Make sure that your data type is the best possible. Matching on ints is faster than matching on strings, matching columns with low cardinality is faster than matching columns with high cardinality.
If 1 and 3 are well set, Vertica can actually apply filters before decompression, fastening a lot your query and thus using a lot less memory.

Resources