Multiple Selects With Influxdb - time-series

I am trying to figure out if it's possible to run multiple select statements on influxdb data. I've looked at continuous queries, but am not sure if it's what I need, or if it even makes sense to use them.
I would like to run:
select * from series group by work_id limit 1;
Then take that data, and run
select * from new_series_from_prior_query where state = 'error'
First question, is this even possible? Second, if not, is there another way to get the desired result using influxdb. Basically I need to filter all work items by their work_id and most recent state. Then, depending on what filters are passed in, check if they match and return that data.
Any help is greatly appreciated. If I cannot get it to work, I will most likely have to switch out the database, would love to stick with influxdb though.

Influx just released 1.2 today, which has sub queries, that solve this issue.
SELECT * FROM (select * from workflows GROUP BY work_id limit 1) WHERE state = 'processed'.
This is what I was looking for.

Related

Count() on pull queries in ksqldb

I have explored different ways to obtain the count of record in a materialized view in ksqldb. Note, The materialized view is created with create source table. It does not have an output topic and that’s on purpose.
So far the only approach that seemed to work was to use a push query which support groupby which is necessary for a count. The trick was to use select count(*) from table group by 1.
This works, but not fully as expected.
Indeed, although we issue the query when the materialized views is done pulling from the topic, we sometime get 1 result, sometime 2 or 3, but ultimately the final result is always the right count.
This is rather strange given that the push query is supposed to do a a full scan at first.
For our workflow we do not want to work with such behavior. We need a firm count, once.
Hence I was thinking about using UDAF.
However before I jump into it, I wanted to know if pull queries support them, and if there is some known limitation here. Indeed, given that a count as pull queries sounds like something that should be provided out of the box, I wonder why it is not ? Maybe there is some limitation about providing such a function with pull queries ?

Why Neo4j index not working with order by?

why neo4j order by is very slow for large database :(
here is the example query:
PROFILE MATCH (n:Item) RETURN n ORDER BY n.name Desc LIMIT 25
and in result it's read all records but i already used index on name property.
here is the result
Click here to see results
it reads all nodes, it's real mess for large number of records.
any solution for this?
or neo4j is not good choice too for us :(
and any way to get last record from nodes?
Your question and problem are not very clear.
1) Are you sure that you added the index correctly?
CREATE INDEX ON :Item(name)
In the Neo4j browser execute :schema to see all your indexes.
2) How many Items does your database hold and what running time are you expecting and achieving?
3) What do you mean by 'last record from nodes'?
Indexes are currently only used to find entry points into the graph, but not for other uses including ordering of results.
Indexed-backed ORDER BY operations have been a highly requested feature for awhile, and while we've been tracking and ordering its priority, we've had several other features that took priority over this work.
I believe indexed-backed ORDER BY operations are currently scheduled very soon, for our 3.5 release coming in the last few months of 2018.

Need advise : how to handle huge data to summarise a report in php

I am looking for advice to handle following situation.
I have report which shows list of products; each product has a number of times it has been viewed and also the number of times the order has been requested.
Looking in to DB I feel its not good. There are three tables participating :
product
product_view
order_item
The following SELECT query is executed
select product_title,
(select count(views) from product_view pv where p.pid=pv.pid) as product_view ,
(select count(placed) from order_item o where p.pid=o.pid) as product_request_count
From product p
order by product_title
Limit 0,10
This query returns 10 records successfully; However, it is very time consuming to load. Also when the user uses the export functionality approximately 2,000,000 records would be returned however I get a memory exhaust error.
I am not able to find the most suitable solution for this in ZF2[PHP+MySql]
Can someone suggest some good strategy to deal?
How about using background processes? It doesn't have to be purely ZF2.
And once the background process is done, the system will notify to user via email that the export is done. :)
You can:
call set_time_limit(0) to inter the execution time limitation.
loop through the whole result set in lumps of, say, 1000 records, and output to the user the result sequentially.

Caching paginated data for scrolling interface and avoid client side duplicates

Basically here is the set up:
You have a number of marketplace items and you want to sort them by price. If the cache expires when someone is browsing, they will suddenly be presented potential duplicate entries. This seems like a really terrible public API experience and we are looking to avoid this problem.
Some basic philosophies I have seen include:
Reddit's, in which they track the last id seen by the client, but they still handle duplicates.
Will Paginate, which is a simple implementation that basically returns results based on a multiple of items you want returned and an offset
Then there are many varied solutions that involve Redis sorted sets, etc. But these also don't really solve the problem of how to remove the duplicate entries
Does anyone have a fairly reliable way to deal with paginating sorted, dynamic lists without dupicates?
If the items you need to paginate are sorted properly ( on unique values ) then the only thing you need to do is to select the results by that value instead of by offset.
simple SQL example
SELECT * FROM items LIMIT 10; /*page 1*/
lets say row #10 has id = 42 (and id is the primary key)
SELECT * FROM items WHERE id < 42 LIMIT 10; /* page 2*/
If you are using postgresql (probably mysql has same problem) this solves also the problem that using OFFSET sucks in terms of performances (OFFSET N LIMIT M needs to scan N rows!)
If sorting is not unique (eg. sorting on creation timestamp can lead to multiple items created at the same time) you are going to have the duplication problem

How to efficiently fetch n most recent rows with GROUP BY in sqlite?

I have a table of event results, and I need to fetch the most recent n events per player for a given list of players.
This is on iOS so it needs to be fast. I've looked at a lot of top-n-per-group solutions that use subqueries or joins, but these run slow for my 100k row dataset even on a macbook pro. So far my dumb solution, since I will only run this with a maximum of 6 players, is to do 6 separate queries. It isn't terribly slow, but there has to be a better way, right? Here's the gist of what I'm doing now:
results_by_pid = {}
player_ids = [1,2,3,4,5,6]
n_results = 6
for pid in player_ids:
results_by_pid[pid] = exec_sql("SELECT *
FROM results
WHERE player_id = #{pid}
ORDER BY event_date DESC
LIMIT n_events")
And then I go on my merry way. But how can I turn this into a single fast query?
There is no better way.
SQL window functions, which might help, are not implemented in SQLite.
SQLite is designed as an embedded database where most of the logic stays in the application.
In contrast to client/server databases where network communication should be avoided, there is no performance disadvantage to mixing SQL commands and program logic.
A less dumb solution requires you to do some SELECT player_id FROM somewhere beforehand, which should be no trouble.
To make the individual queries efficient, ensure you have one index on the two columns player_id and event_date.
This won't be much of an answer, but here goes...
I have found that making things really quick can involve ideas from the nature of the data and schema themselves. For example, searching an ordered list is faster than searching an unordered list, but you have to pay a cost up front - both in design and execution.
So ask yourself if there are any natural partitions on your data that may reduce the number of records SQLite must search. You might ask whether the latest n events fall within a particular time period. Will they all be from the last seven days? The last month? If so then you can construct the query to rule out whole chunks of data before performing more complex searches.
Also, if you just can't get the thing to work quickly, you can consider UX trickery! Soooooo many engineers don't get clever with their UX. Will your query be run as the result of a view controller push? Then set the thing going in a background thread from the PREVIOUS view controller, and let it work while iOS animates. How long does a push animation take? .2 seconds? At what point does your user indicate to the app (via some UX control) which playerids are going to be queried? As soon as he touches that button or TVCell, you can prefetch some data. So if the total work you have to do is O(n log n), that means you can probably break it up into O(n) and O(log n) pieces.
Just some thoughts while I avoid doing my own hard work.
More thoughts
How about a separate table that contains the ids of the previous n inserts? You could add a trigger to delete old ids if the size of the table grows above n. Say..
CREATE TABLE IF NOT EXISTS recent_results
(result_id INTEGER PRIMARY KEY, event_date DATE);
// is DATE a type? I don't know. you get the point
CREATE TRIGGER IF NOT EXISTS optimizer
AFTER INSERT ON recent_results
WHEN (SELECT COUNT(*) FROM recent_results) > N
BEGIN
DELETE FROM recent_results
WHERE result_id = (SELECT result_id
FROM recent_results
WHERE event_date = MIN(event_date));
// or something like that. I have no idea if this will work,
// I just threw it together.
Or you could just create a temporary memory-based table that you populate at app load and keep up to date as you perform transactions during app execution. That way you only pay the steep price once!
Just a few more thoughts for you. Be creative, and remember that you can usually define what you want as a data structure as well as an algorithm. Good luck!

Resources