Duplicates in the result of a subquery

Duplicates in the result of a subquery - influxdb

I am trying to count distinct sessionIds from a measurement. sessionId being a tag, I count the distinct entries in a "parent" query, since distinct() doesn't works on tags.
In the subquery, I use a group by sessionId limit 1 to still benefit from the index (if there is a more efficient technique, I have ears wide open but I'd still like to understand what's going on).
I have those two variants:
> select count(distinct(sessionId)) from (select * from UserSession group by sessionId limit 1)
name: UserSession
time count
---- -----
0 3757
> select count(sessionId) from (select * from UserSession group by sessionId limit 1)
name: UserSession
time count
---- -----
0 4206
To my understanding, those should return the same number, since group by sessionId limit 1 already returns distinct sessionIds (in the form of groups).
And indeed, if I execute:
select * from UserSession group by sessionId limit 1
I have 3757 results (groups), not 4206.
In fact, as soon as I put this in a subquery and re-select fields in a parent query, some sessionIds have multiple occurrences in the final result. Not always, since there is 17549 rows in total, but some are.
This is the sign that the limit 1 is somewhat working, but some sessionId still get multiple entries when re-selected. Maybe some kind of undefined behaviour?

I can confirm that I get the same result.
In my experience using nested queries does not always deliver what you expect/want.
Depending on how you use this you could retrieve a list of all values for a tag with:
SHOW TAG VALUES FROM UserSession WITH KEY=sessionId
Or to get the cardinality (number of distinct values for a tag):
SHOW TAG VALUES EXACT CARDINALITY FROM UserSession WITH KEY=sessionId.
Which will return a single row with a single column count, containing a number. You can remove the EXACT modifier if you don't need to be exact about the result: SHOW TAG VALUES CARDINALITY on Influx Documentation.

Related

how to get subset of activerecord objects after performing .limit()?

I want to be able to limit the activerecord objects to 20 being returned, then perform a where() that returns a subset of the limited objects which I currently know only 10 will fulfil the second columns criteria.
e.g. of ideal behaviour:
o = Object.limit(20)
o.where(column: criteria).count
=> 10
But instead, activerecord still looks for 20 objects that fulfil the where() criteria, but looks outside of the original 20 objects that the limit() would have returned on its own.
How can I get the desired response?

One way to decrease the search space is to use a nested query. You should search the first N records rather than all records which match a specific condition. In SQL this would be done like this:
select * from (select * from table order by ORDERING_FIELD limit 20) where column = value;
The query above will only search for the condition in 20 rows from the table. Notice how I have added a ORDERING_FIELD, this is required because each query could give you a different order each time you run it.
To do something similar in Rails, you could try the following:
Object.where(id: Object.order(:id).limit(20).select(:id)).where(column: criteria)
This will execute a query similar to the following:
SELECT [objects].* FROM [objects] WHERE [objects].[id] IN (SELECT TOP (20) [objects].[id] FROM [objects] ORDER BY [objects].id ASC) AND [objects].[column] = criteria

Esper - concatenate values from multiple rows to a list

I have an Esper query that returns multiple rows, but I'd like to instead get one row, where that row has a list (or concatenated string) of all of the values from the (corresponding columns of the) matching rows that my current query returns.
For example:
SELECT Name, avg(latency) as avgLatency
FROM MyStream.win:time(5 min)
GROUP BY Name
HAVING avgLatency / 1000 > 60
OUTPUT last every 5 min
Returns:
Name avgLatency
---- ----------
A 65
B 70
C 75
What I'd really like:
Name
----
{A, B, C}
Is this possible to do via the query itself? I tried to make this work using subqueries, but I'm not working with multiple streams. I can't find any aggregation functions or enumeration functions in the Esper documentation that fits what I'm trying to do either.
Thanks to anybody that has any insight or direction for me here.
EDIT:
If this can't be done via the query, I'm open to changing the subscriber, or anything else, if necessary.

You can have a subscriber or listener do the concat. There is a "Multi-Row Delivery" for subscribers. Or use a table like below.
// create table to hold aggregation result
create table LatencyTable(name string primary key, avgLatency avg(double));
// update aggregations in table from events coming in
into LatencyTable select name, avg(latency) as avgLatency from MyStream#time(5 min) group by name;
// do a select with the "aggregate" enumeration method
select (select * from LatencyTable where avgLatency > x).aggregate(....) from pattern[every timer:interval(5 min)]

Order By String Property Value and Pagination in Neo4j

I am using neo4j to create a social network application. The data model has a FRIEND relationship between two USER nodes. I need to get all the friends of mine ordered by displayName (Unique Indexed).
I need pagination for this query. I will send the last name from the list I got from the previous query results. And I want to limit each page to 20 names.
MATCH (u:USER{displayName:{id}})-[:FRIEND]-(f:USER)
RETURN f
ORDER BY f.displayName
LIMIT 20;
What is the best way to do this? Will SKIP work here, sending SKIP 0, SKIP 1*20, SKIP 2*20, ...

You can use the query in this way i think :
ORDER BY f.displayName LIMIT START_POSITION , LAST_POSITION;
For example:
ORDER BY f.displayName LIMIT 0 , 20;
ORDER BY f.displayName LIMIT 21 , 40;

Yes, you can use the SKIP clause to do what you want. In the following, I assume that you provide the page value (starting at 0) as a parameter.
MATCH (u:USER{displayName:{id}})-[:FRIEND]-(f:USER)
RETURN f
ORDER BY f.displayName
SKIP {page} * 20
LIMIT 20;
Note that this technique is not foolproof if the list of friends can change during paging.

How to get the number of entries in a measurement

I am a newbie to influxdb. I just started to read the influx documentation.
I cant seem to get the equivalent of 'select count(*) from table' to work in influx db.
I have a measurement called cart:
time status cartid
1456116106077429261 0 A
1456116106090573178 0 B
1456116106095765618 0 C
1456116106101532429 0 D
but when I try to do
select count(cartid) from cart
I get the error
ERR: statement must have at least one field in select clause

I suppose cartId is a tag rather than a field value? count() currently can't be used on tag and time columns. So if your status is a non-tag column (a field), do the count on that.
EDIT:
Reference

This works as long as no field or tag exists with the name count:
SELECT SUM(count) FROM (SELECT *,count::INTEGER FROM MyMeasurement GROUP BY count FILL(1))
If it does use some other name for the count field. This works by first selecting all entries including an unpopulated field (count) then groups by the unpopulated field which does nothing but allows us to use the fill operator to assign 1 to each entry for count. Then we select the sum of the count fields in a super query. The result should look like this:
name: MyMeasurement
----------------
time sum
0 47799
It's a bit hacky but it's the only way to guarantee a count of all entries when no field exists that is always present in all entries.

How do I query on a subset of ActiveModel records?

I've rewritten this question as my previous explanation was causing confusion.
In the SQL world, you have an initial record set that you apply a query to. The output of this query is the result set. Generally, the initial record set is an entire table of records and the result set is the records from the initial record set that match the query ruleset.
I have a use case where I need my application to occasionally operate on only a subset of records in a table. If a table has 10,000 records in it, I'd like my application to behave like only the first 1,000 records exist. These should be the same 1,000 records each time. In other words, I want the initial record set to be the first 1,000 devices in a table (when ordered by primary key), and the result set the resulting records from these first 1,000 devices.
Some solutions have been proposed, and it's revealed that my initial description was not very clear. To be more explicit, I am not trying to implement pagination. I'm also not trying to limit the number of results I receive (which .limit(1,000) would indeed achieve).
Thanks!

This is the line in your question that I don't understand:
This causes issues though with both of the calls, as limit limits the results of the query, not the database rows that the query is performed on.
This is not a Rails thing, this is a SQL thing.
Device.limit(n) runs SELECT * FROM device LIMIT n
Limit always returns a subset of the queried result set.
Would first(n) accomplish what you want? It will both order the result set ascending by the PK and limit the number of results returned.

SQL Statements can be chained together. So if you have your subset, you can then perform additional queries with it.
my_subset = Device.where(family: "Phone")
# SQL: SELECT * FROM Device WHERE `family` = "Phone"
my_results = my_subset.where(style: "Touchscreen")
# SQL: SELECT * FROM Device WHERE `family` = "Phone" AND `style` = "Touchscreen"
Which can also be written as:
my_results = Device.where(family: "Phone").where(style: "Touchscreen")
my_results = Device.where(family: "Phone", style: "Touchscreen")
# SQL: SELECT * FROM Device WHERE `family` = "Phone" AND `style` = "Touchscreen"
From your question, if you'd like to select the first 1,000 rows (ordered by primary key, pkey) and then query against that, you'll need to do:
my_results = Device.find_by_sql("SELECT *
FROM (SELECT * FROM devices ORDER BY pkey ASC LIMIT 1000)
WHERE `more_searching` = 'happens here'")

You could specifically ask for a set of IDs:
Device.where(id: (1..4).to_a)
That will construct a WHERE clause like:
WHERE id IN (1,2,3,4)

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Duplicates in the result of a subquery - influxdb

Related

how to get subset of activerecord objects after performing .limit()?

Esper - concatenate values from multiple rows to a list

Order By String Property Value and Pagination in Neo4j

How to get the number of entries in a measurement

How do I query on a subset of ActiveModel records?

Categories

Resources