Subtract two queries results - influxdb

Cannot find per the docs, and there is an old issue related to this.
It is possible to subtract the results of 2 queries?
Something like this in SQL:
(SELECT COUNT(*) FROM test - (SELECT COUNT(*) FROM test WHERE …)

As far as I know, currently it is not possible (at least not with InfluxQL).
It should, however, be possible with their new query language called Flux (as of InfluxDB1.7)
https://docs.influxdata.com/flux/v0.24/introduction/getting-started/

Related

What with clause do? Neo4j

I don't understand what WITH clause do in Neo4j. I read the The Neo4j Manual v2.2.2 but it is not quite clear about WITH clause. There are not many examples. For example I have the following graph where the blue nodes are football teams and the yellow ones are their stadiums.
I want to find stadiums where two or more teams play. I found that query and it works.
match (n:Team) -[r1:PLAYS]->(a:Stadium)
with a, count(*) as foaf
where foaf > 1
return a
count(*) says us the numbers of matching rows. But I don't understand what WITH clause do.
WITH allows you to pass on data from one part of the query to the next. Whatever you list in WITH will be available in the next query part.
You can use aggregation, SKIP, LIMIT, ORDER BY with WITH much like in RETURN.
The only difference is that your expressions have to get an alias with AS alias to be able to access them in later query parts.
That means you can chain query parts where one computes some data and the next query part can use that computed data. In your case it is what GROUP BY and HAVING would be in SQL but WITH is much more powerful than that.
here is another example
match (n:Team) -[r1:PLAYS]->(a:Stadium)
with distinct a
order by a.name limit 10
match (a)-[:IN_CITY]->(c:City)
return c.name

How to paginate query results with cypher?

Is it possible to have cypher query paginated. For instance, a list of products, but I don't want to display/retrieve/cache all the results as i can have a lot of results.
I'm looking for something similar to the offset / limit in SQL.
Is cypher skip + limit + orderby a good option ? http://docs.neo4j.org/chunked/stable/query-skip.html
SKIP and LIMIT combined is indeed the way to go. Using ORDER BY inevitably makes cypher scan every node that is relevant to your query. Same thing for using a WHERE clause. Performance should not be that bad though.
Its like normal sql, the syntax is as follow
match (user:USER_PROFILE)-[USAGE]->uUsage
where HAS(uUsage.impressionsPerHour) AND (uUsage.impressionsPerHour > 100)
ORDER BY user.hashID
SKIP 10
LIMIT 10;
This syntax suit to last version (2.x)
Neo4j apparently uses "indexed-backed order by" nowadays, which means if you are using alphabetical ORDERBY on indexed node properties within your SKIP/LIMIT query, then Neo4j will not perform a full scan of all "relevant nodes" as other have mentioned (their responses were long ago, so keep that in mind). The index will allow Neo4j to optimize on the basis that it already stores indexed properties in ORDERBY order (alphabetical), such that your pagination will be even faster than without the index.

Performance of generated T-SQL from Entity Framework

I recently used Entity Framework for a project, despite my DBA's strong disapproval. So one day he came to my office complaining about generated T-SQL that reaches his database.
For instance, when I want to select a product based on the id, I write something like this:
context.Products.FirstOrDefault(p=>p.Id==id);
Which translates to
SELECT ... FROM (SELECT TOP 1 ... FROM PRODUCTS WHERE ID=#id)
So he is shouting, "Why on earth would you write a SELECT * FROM (SELECT TOP 1)"
So I changed my code to
context.Products.Where(p=>p.Id==id).ToList().FirstOrDefault()
and this produces a much cleaner T-SQL:
SELECT ... FROM PRODUCTS WHERE ID=#id
The inner query and the TOP 1 dissappeared. Enough mambling, my question is this: Does the first query really put an overhead for SQL Server? Is it harder to parse than the second method? The Id column has a Clustered index on. I want a good answer so I can rub it on his face (or mine)
Thanks,
Themos
Have you tried running the queries manually and comparing the executions plans?
The biggest problem here isn't that the SQL isn't perfectly formed to your DBA's standards (although I'm fairly certain that the query engine will optimize out the extra select). The second query actually returns the entire contents of the Products table which you then analyse in memory and this is definitely a task that should be performed by the DB and not the application layer.
In short, he's being a pedant; leave it the way it was.

Fetch latest rows grouped by uniqe field value

I have a table of Books with an author_id field.
I'd like to fetch an array of Books which contain only one Book of every author. The one with the latest updated_at field.
The problem with straightforward approach like Books.all.group('author_id') on Postgres is that it needs all requested field in its GROUP BY block. (See https://stackoverflow.com/a/6106195/1245302)
But I need to get all Book objects one per author, the recent one, ignoring all other fields.
It seems to me that there's enough data for the DBMS to find exactly the rows I want,
at least I could do that myself without any other fields in GROUP BY block. :)
Is there any simple Rails 3 + Postgres (version < 9) or SQL implementation
independent way to get that?
UPDATE
Nice solution for Postgres:
books.unscoped.select('DISTINCT ON(author_id) *').order('author_id').order('updated_at DESC')
BUT! there still problem remains – results are sorted by author_id in the first place, but i need to sort by updated_at inside the same author_id-s (to find, say the top-10 recent book authors).
And Postgres doesn't allow you to change order of ORDER BY arguments in DISTINCT queries :(
I don't know Rails, but hopefully showing you the SQL for what you want will help get you to a way to generate the right SQL.
SELECT DISTINCT ON (author_id) *
FROM Books
ORDER BY author_id, updated_at DESC;
The DISTINCT ON (author_id) portion should not be confused with part of the result column list -- it just says that there will be one row per author_id. The list in a DISTINCT ON clause must be the leading portion of the ORDER BY clause in such a query, and the row which is kept is the one which sorts first based on the rest of the ORDER BY clause.
With a large number of rows this way of writing the query is usually much faster than any solution based on GROUP BY or window functions, often by an order of magnitude or more. It is a PostgreSQL extension, though; so it should not be used in code which is intended to be portable.
If you want to use this result set inside another query (for example, to find the 10 most recently updated authors), there are two ways to do that. You can use a subquery, like this:
SELECT *
FROM (SELECT DISTINCT ON (author_id) *
FROM Books
ORDER BY author_id, updated_at DESC) w
ORDER BY updated_at DESC
LIMIT 10;
You could also use a CTE, like this:
WITH w AS (
SELECT DISTINCT ON (author_id) *
FROM Books
ORDER BY author_id, updated_at DESC)
SELECT * FROM w
ORDER BY updated_at DESC
LIMIT 10;
The usual advice about CTEs holds here: use them only where there isn't another way to write the query or if needed to coerce the planner by introducing an optimization barrier. The plans are very similar, but passing the intermediate results through the CTE scan adds a little overhead. On my small test set the CTE form is 17% slower.
This is belated, but in response to questions about overriding/resetting a default order, use .reorder(nil).order(:whatever_you_want_instead)
(I can't comment, so posting as an answer for now)

LINQ to SQL Pagination and COUNT(*)

I'm using the PagedList class in my web application that many of you might be familiar with if you have been doing anything with ASP.NET MVC and LINQ to SQL. It has been blogged about by Rob Conery, and a similar incarnation was included in things like Nerd Dinner, etc. It works great, but my DBA has raised concerns about potential future performance problems.
His issue is around the SELECT COUNT(*) that gets issued as a result of this line:
TotalCount = source.Count();
Any action that has paged data will fire off an additional query (like below) as a result of the IQueryable.Count() method call:
SELECT COUNT(*) AS [value] FROM [dbo].[Products] AS [t0]
Is there a better way to handle this? I considered using the Count property of the PagedList class to get the item count, but realized that this won't work because it's only counting the number of items currently displayed (not the total count).
How much of a performance hit will this cause to my application when there's a lot of data in the database?
iirc this stuff is a part of index stats and should be very efficient, you should ask your DBA to substatiate his concerns, rather than prematurely optimising.
Actually, this is a pretty common issue with Linq.
Yes, index stats will get used if the statement was only SELECT COUNT(*) AS [value] FROM [dbo].[Products] AS [t0] but 99% of the time its going to contain WHERE statements as well.
So basically two SQL statements are executed:
SELECT COUNT(*) AS [value] FROM [dbo].[Products] AS [t0] WHERE blah=blah and someint=500
SELECT blah, someint FROM [dbo].[Products] AS [t0] WHERE blah=blah and someint=500
You start receiving problems if the table is updated often as the COUNT(*) returned in the first statement doesnt equal the second statement...this may return an error message 'Row not found or changed.'
Some databases (Oracle, Postgresql, SQL Server I think) keep a record of row counts in the system tables; though these are sometimes only accurate to the point at which the statistics were last refreshed (Oracle). You could use this approach, if you only need a fairly-accurate-but-not-exact metric.
Which database are you using, or does that vary?
(PS I know that you are talking about MsSQL however)
I am no DBA but count(*) in MySQL is a real performance hit. Simply changing this to count(ID) really does improve the speed.
I came across this when I was querying a table with very large GLOB (Images) data. The query tool around 15 seconds to load. Changing the query to count(id) reduced the query to 0.02. Still a little slow but a hell of a lot better.
I think this is what the DBA is getting at. I have noticed then when debuggin Linq the statement that counts takes a very long time (1 second) to jump to the next statement.
Based on my finding I have to agree with the DBA's conserns...

Resources