InfluxQL Continuous Query which deletes data - influxdb

Is there a way to have a continuous query which deletes data, something like:
CREATE CONTINUOUS QUERY "some_name" ON "mydb"
BEGIN
delete from "some_measurement" where something='bad'
END
When this is run with InfluxQL 1.8, there is an error
ERR: error parsing query: found DELETE, expected SELECT at line 1, char 53

Not that I'm aware of, however you're not necessarily out of luck as this is exactly the sort of workflow that retention policies were designed to handle in concert with continuous queries.
Assuming your use case is something like "downsample this data into a different measurement, then delete the high granularity data", the official documentation provides an excellent overview.

Related

Filtering MS Graph query for Planner Tasks

I am querying ms graph for planner tasks https://graph.microsoft.com/v1.0/Planner/Plans/PlanID/tasks
This returns all the tasks in planner. I am hoping to filter these tasks by 'createdDateTime' greater than equals to last month. OR 'percentComplete' is less than 100.
New to querying data, so I am unsure what syntax to use. I was hoping to use $top= x would be based on createdDateTime. but if I use this
https://graph.microsoft.com/v1.0/Planner/Plans/planID/tasks?$Top=20
it still returns all of the tasks.
Thank you
Unfortunately, Planner doesn't support filters at this time. The recommended approach is for the client to read the data and filter on the client side.
For general filtering and query parameters, this documentation should help: https://learn.microsoft.com/en-us/graph/query-parameters?view=graph-rest-1.0

How to run complex queries in Tarantool

I've always worked with relational DBs and recently decided to migrate a performance-critial service from SQL Server to Tarantool with a hope to take advantage of the fast in-memory search and processing. I've got a couple of questions while planning for the migration.
I've got a table with about one million records containing pricing information which means I'm dealing mostly with numbers and uuids. First, I need to run a select containing multiple conditions to get a subset of the data, like
SELECT * FROM rates WHERE SupplierId = #SupplierId AND ProductId = #ProductId AND (LocalDistributionZoneId = #LocalDistributionZoneId OR LocalDistributionZoneId IS NULL)
Q1: What is the strategy of running such a query in Lua? Do I create an index for each field in the predicate or I can go along with one secondary composite index?
Q2: Will it be more covenient to run such a query in SQL (box.sql.execute) rather than in pure Lua? Will it be considerably slower than running the same query in pure Lua?
Q3: If I use SQL, is it possible to review the execusion plan to make sure that the query I run really uses the indexes I've defined in the space?
Ok, after I've get the results from the first query I need to analyse the data and then based on the results of analysis, run one more query on the dataset returned by the first query.
Q4: Can Tarantool help me in dealing with the intermediate dataset? More specifically, may I somehow run more queries against the intermediate subset of tuples leveraging the indexes created in the space? Or, I would need to implement alternative strategies like re-add the intrim results to a temporary space with pre-defined indexes and then do another select, or implement further search myself?
Thank you!
Don't. Use SQL, it's faster: it doesn't create garbage collected objects for intermediate execution results.
Yes, please use our SQL features for that.
Use EXPLAIN statement.
I don't know what you exactly mean by "help". You could try to whatever strategy works best: create a more complex query, save the original query in a view to use in the resulting query, create a temporary table and work with it. To give more details let's look if the execution plan Tarantool chooses is good enough or you have to manually optimize it.

One single Azure SQL query is consuming almost all query_stats.total_worker_time and query_stats.execution_count

I'm running a production website for 4 years with azure SQL.
With help of 'Top Slow Request' query from alexsorokoletov on github I have 1 super slow query according to Azure query stats.
The one on top is the one that uses a lot of CPU.
When looking at the linq query and the execution plans / live stats, I can't find the bottleneck yet.
And the live stats
The join from results to project is not directly, there is a projectsession table in between, not visible in the query, but maybe under the hood of entity framework.
Might I be affected by parameter sniffing? Can I reset a hash? Maybe the optimized query plan was used in 2014 and now result table is about 4Million rows and the query is far from optimal?
If I run this query in Management Studio its very fast!
Is it just the stats that are wrong?
Regards
Vincent - The Netherlands.
I would suggest you try adding option(hash join) at the end of the query, if possible. Once you start getting into large arity, loops join is not particularly efficient. That would prove out if there is a more efficient plan (likely yes).
Without seeing more of the details (your screenshots are helpful but cut off whether auto-param or forced parameterization has kicked in and auto-parameterized your query), it is hard to confirm/deny this explicitly. You can read more about parameter sniffing in a blog post I wrote a bit longer ago than I care to admit ;) :
https://blogs.msdn.microsoft.com/queryoptteam/2006/03/31/i-smell-a-parameter/
Ultimately, if you update stats, dbcc freeproccache, or otherwise cause this plan to recompile, your odds of getting a faster plan in the cache are higher if you have this particular query + parameter values being executed often enough to sniff that during plan compilation. Your other option is to add optimize for unknown hints which will disable sniffing and direct the optimizer to use an average value for the frequency of any filters over parameter values. This will likely encourage more hash or merge joins instead of loops joins since the cardinality estimates of the operators in the tree will likely increase.

Dynamic query usage whole streaming with google dataflow?

I have a dataflow pipeline that is set up to receive information(JSON) and transform it into a DTO and then insert it into my database. This works great for insert, but where I am running into issues is with handling delete records. With the information I am receiving there is a deleted tag in the JSON to specify when that record is actually being deleted. After some research/experimenting, I am at a loss as whether or not this is possible.
My question: Is there a way to dynamically choose(or change) what sql statement the pipeline is using, while streaming?
To achieve this with Dataflow you need to think more in terms of water flowing through pipes than in terms of if-then-else coding.
You need to classify your records into INSERTs and DELETEs and route each set to a different sink that will do what you tell them to. You can use tags for that.
In this pipeline design example, instead of startsWithATag and startsWithBTag you can use tags for Insert and Delete.

grafana with sharded influxdbs?

I've been investigating the use of influxdb for storing our metrics.
Seeing that influxDB does not offer a clustered (free) version, I see there is an 'alternative' of using influxdb-relay, which can handle both replication and 'write-sharding'.
But the relay does not handle read queries.
In Grafana you can define multiple data sources which each pointing to specific shard, but from what I can tell it cannot combine the results form the shards into 1 data series. (I know graphite has 'web-master', that does just this, it will query multiple graphite-web instances and combine the results prior to rendering)..is there such beast out there for grafana/influxdb?
If not how difficult do folks thing it would be to update to influxdb relays to accept queries, and query all shards matching the query/merging the results, etc. yeah..I know depending on the query the merge actions would likely require query interpretation, which is where it gets difficult.
Thoughts?

Resources