Divide count by count with InfluxDB

Divide count by count with InfluxDB - influxdb

I am trying to calculate the percentage of values within a threshold.
How would do something like this with InfluxDB?
SELECT
(
SELECT count(*) FROM "durations" WHERE "duration" < 500
)
/
(
SELECT count(*) FROM "durations"
)

That is not possible in plain InfluxQL.
You have to have every data point that belongs to particular dimension (and the range value belongs to IS a sort of a dimension, which is identified by tags) explicitly in your measurement.
See examples of something alike (without aggregation, though) there (pay attention how sample dataset looks like).
Although, what you're asking for could be done in Kapacitor (most likely, with Join and/or Union nodes, I guess).

Related

Select from multiple measurements

I have a bunch of measurements, all starting with task_runtime.
i.e.
task_runtime.task_a
task_runtime.task_b
task_runtime.task_c
Is there a way to select all of them by a partial measurement name?
I'm using grafana on top of influxdb and I want to display all of these measurements in a single graph, but I don't have a closed list of these measurements.
I thought about something like
select * from (select table_name from all_tables where table_name like "task_runtime.*")
But not sure on the influxdb syntax for this

You can use a regular expression when specifying measurements in the FROM clause as described in the InfluxDB documentation.
For example, in your case:
SELECT * FROM /^task_runtime.*/
Grafana also supports this and will display all measurements separately.

Google Big Query: How to select subset of smaller table using nested fake join?

I would like to solve the problem related to How can I join two tables using intervals in Google Big Query? by selecting subset of of smaller table.
I wanted to use solution by #FelipeHoffa using row_number function Row number in BigQuery?
I have created nested query as follows:
SELECT a.DevID DeviceId,
a.device_make OS
FROM
(SELECT device_id DevID, device_make, A, lat, long, is_gps
FROM [Data.PlacesMaster] WHERE not device_id is null and is_gps is true) a JOIN (select ROW_NUMBER() OVER() row_number,top_left_lat, top_left_long, bottom_right_lat, bottom_right_long, A, count from (SELECT top_left_lat, top_left_long, bottom_right_lat,bottom_right_long, A, COUNT(*) count from [Karol.fast_food_box]
GROUP BY (....?)
ORDER BY COUNT DESC,
WHERE row_number BETWEEN 1000 AND 2000)) b ON a.A=b.A
WHERE (a.lat BETWEEN b.bottom_right_lat AND b.top_left_lat)
AND (a.long BETWEEN b.top_left_long AND b.bottom_right_long)
GROUP EACH BY DeviceId,
OS
Could you help in finalising it please? I cannot break the smaller table by "group by", i need to have consistency between two tables and select only items with lat,long from MASTER.table that fit into the given bounding box of a smaller table. I need to match lat,long into box really, my solution form How can I join two tables using intervals in Google Big Query? works only for small tables (approx 1000 to 2000 rows), hence this issue. Thank you in advance.

It looks like you're applying two approaches at once: 1) split a table into chunks of rows, and run on each, and 2) include a field, "A", tagging your boxes and your points into 'regions', that you can equi-join on. Approach (1) just does the same total work in more pieces (also, it's adding complication), so I would suggest focusing on approach (2), which cuts the work down to be ~quadratic in each 'region' rather than quadratic in the size of the whole world.
So the key thing is what values your A takes on, and how many points and boxes carry each A value. For example, if A is a country code, that has the right logical structure, but it probably doesn't help enough once you get a lot of data in any one country. If it goes to the state or province code, that gets you one step farther. Quantized lat/long grid cells generalize better. Sooner or later you do have to deal with falling across a region edge, which can be somewhat tricky. I would use a lat/long grid myself.
What A values are you using? In your data, what is the A value with the maximum (number of points * number of boxes)?

Fetch latest rows grouped by uniqe field value

I have a table of Books with an author_id field.
I'd like to fetch an array of Books which contain only one Book of every author. The one with the latest updated_at field.
The problem with straightforward approach like Books.all.group('author_id') on Postgres is that it needs all requested field in its GROUP BY block. (See https://stackoverflow.com/a/6106195/1245302)
But I need to get all Book objects one per author, the recent one, ignoring all other fields.
It seems to me that there's enough data for the DBMS to find exactly the rows I want,
at least I could do that myself without any other fields in GROUP BY block. :)
Is there any simple Rails 3 + Postgres (version < 9) or SQL implementation
independent way to get that?
UPDATE
Nice solution for Postgres:
books.unscoped.select('DISTINCT ON(author_id) *').order('author_id').order('updated_at DESC')
BUT! there still problem remains – results are sorted by author_id in the first place, but i need to sort by updated_at inside the same author_id-s (to find, say the top-10 recent book authors).
And Postgres doesn't allow you to change order of ORDER BY arguments in DISTINCT queries :(

I don't know Rails, but hopefully showing you the SQL for what you want will help get you to a way to generate the right SQL.
SELECT DISTINCT ON (author_id) *
FROM Books
ORDER BY author_id, updated_at DESC;
The DISTINCT ON (author_id) portion should not be confused with part of the result column list -- it just says that there will be one row per author_id. The list in a DISTINCT ON clause must be the leading portion of the ORDER BY clause in such a query, and the row which is kept is the one which sorts first based on the rest of the ORDER BY clause.
With a large number of rows this way of writing the query is usually much faster than any solution based on GROUP BY or window functions, often by an order of magnitude or more. It is a PostgreSQL extension, though; so it should not be used in code which is intended to be portable.
If you want to use this result set inside another query (for example, to find the 10 most recently updated authors), there are two ways to do that. You can use a subquery, like this:
SELECT *
FROM (SELECT DISTINCT ON (author_id) *
FROM Books
ORDER BY author_id, updated_at DESC) w
ORDER BY updated_at DESC
LIMIT 10;
You could also use a CTE, like this:
WITH w AS (
SELECT DISTINCT ON (author_id) *
FROM Books
ORDER BY author_id, updated_at DESC)
SELECT * FROM w
ORDER BY updated_at DESC
LIMIT 10;
The usual advice about CTEs holds here: use them only where there isn't another way to write the query or if needed to coerce the planner by introducing an optimization barrier. The plans are very similar, but passing the intermediate results through the CTE scan adds a little overhead. On my small test set the CTE form is 17% slower.

This is belated, but in response to questions about overriding/resetting a default order, use .reorder(nil).order(:whatever_you_want_instead)
(I can't comment, so posting as an answer for now)

How to randomize (and paginate) large set of results?

I am creating a contest application that requires the main index page of entries to be randomized. As it will potentially be a large set of entries (maybe up to 5000), I will also need to paginate them.
Here are the challenges:
I have read that using a database's 'random()' function on a large set can perform poorly.
I would like for things to not be re-randomized when the pagination links are clicked. In other words, it should return a random set upon first load and then keep the same order while someone uses the pagination.
The second challenge seems potentially unrealistic, but perhaps there are some create solutions out there?
Thanks for any input.

a simple way I suggest is writing your own random function with SQL query, for the function more complicated the more random, for example:
you already know
select * from your_table order by rand() limit 0, 10
assume your_table has a primary key "id", now replace "rand()" with "MOD(id, 13)"
select * from your_table order by MOD(id, 13) limit 0,10
if your_table has a datetime column, the result would be better, try this query:
select * from your_table order by MOD(id, 13), updated_at limit 0,10
also if you don't think it's not random enough, there is I bet you love it:
select * from your_table order by MD5(id) limit 0, 10

I would just use a random number generator to select IDs, and store the seed in the session so a user will see the same ordering while paginating. I would probably also use a hash to make sure each ID is picked only once.

How to Sum calulated fields

I‘d like to ask I question that here that I think would be easy to some people.
Ok I have query that return records of two related tables. (One to many)
In this query I have about 3 to 4 calculated fields that are based on the fields from the 2 tables.
Now I want to have a group by clause for names and sum clause to sum the calculated fields but it ends up in error message saying:
“You tried to execute a query that is not part of aggregate function”
So I decided to just run the query without the totals *(ie no group by , sum etc,,,)
:
And then I created another query that totals my previous query. ( i.e. using group by clause for names and sum for calculated fields… no calculation here) This is fine ( I use to do this) but I don’t like having two queries just to get summary total. Is their any other way of doing this in the design view and create only one query?.
I would very much appreciate.
Thankyou:
JM

Sounds like the query is thinking the calculated fields need to be part of the grouping or something. You might need to look into sub-querying.
Can you post the sql (before and after). It would help in getting an understanding of what the issue is.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Divide count by count with InfluxDB - influxdb

I am trying to calculate the percentage of values within a threshold. How would do something like this with InfluxDB? SELECT ( SELECT count() FROM "durations" WHERE "duration" < 500 ) / ( SELECT count() FROM "durations" )

Related

Select from multiple measurements

Google Big Query: How to select subset of smaller table using nested fake join?

Fetch latest rows grouped by uniqe field value

How to randomize (and paginate) large set of results?

How to Sum calulated fields

Categories

Resources

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Divide count by count with InfluxDB - influxdb

I am trying to calculate the percentage of values within a threshold. How would do something like this with InfluxDB? SELECT ( SELECT count(*) FROM "durations" WHERE "duration" < 500 ) / ( SELECT count(*) FROM "durations" )

Related

Select from multiple measurements

Google Big Query: How to select subset of smaller table using nested fake join?

Fetch latest rows grouped by uniqe field value

How to randomize (and paginate) large set of results?

How to Sum calulated fields

Categories

Resources

I am trying to calculate the percentage of values within a threshold. How would do something like this with InfluxDB? SELECT ( SELECT count() FROM "durations" WHERE "duration" < 500 ) / ( SELECT count() FROM "durations" )