First of all, I think that createDailyTimeSeriesEngine is very simple and effective.
The ticks of different exchanges are saved in the same ticks flow table. Different exchanges have different trading hours.
How should I aggregate the ticks, and how to use createDailyTimeSeriesEngine?
What is your aggregation frequency? If the aggregation frequency is not so high, (which means that your result of the filter will not be too large, ) setting the maximum time range may meet your demand.
Or you can set up a filter by each Exchange, using “setStreamTableFilterColumn”, and create a subscription and calculation engine for each exchange with different trading hours.
Related
i write sensor data every second to an influxdb database. Displaying weekly, monthly or yearly summaries in grafana is quite slow since it needs to query many thousand values.
To speed things up, i was thinking about using a cron job to run a queries like
select mean(sensor1) into data_avg_1h from data where time > start and time <= end group by time(1h)
select mean(sensor1) into data_avg_1d from data where time > start and time <= end group by time(1d)
select mean(sensor1) into data_avg_1w from data where time > start and time <= end group by time(1w)
This would mean i need more storage, but queries run much faster.
Is this a bodge job or acceptable and is there a more clever way to do something like that?
Yes. It is perfectly ok and it is also recommended to downsample the data like you have mentioned in the question.
However, instead of using a cronjob it will be better to use Continuous query feature of InfluxDB to achieve the same result.
Downsampling & Contious Query Documentation.
Please be aware that when storing the average value for short period, if you want to calculate the average for a longer period from this downsampled data you will have to calculate the weighted average. Otherwise, you will calculating the average of average which, may not be equal to the average value calculated from the Original data.
This is because, each downsampled average value might be having different number of datapoints.
So while calculating the mean on regular interval store the number of data points received in that interval. This way you will be able to calculate the weighted average.
Below is the scenario against which I have this question.
Requirement:
Pre-aggregate time series data within influxDb with granularity of seconds, minutes, hours, days & weeks for each sensor in a device.
Current Proposal:
Create five Continuous Queries (one for each granularity level i.e. Seconds, minutes ...) for each sensor of a device in a different retention policy as that of the raw time series data, when the device is onboarded.
Limitation with Current Proposal:
With increased number of device/sensor (time series data source), the influx will get bloated with too many Continuous Queries (which is not recommended) and will take a toll on the influxDb instance itself.
Question:
To avoid the above problems, is there a possibility to create Continuous Queries on the same source measurement (i.e. raw timeseries measurement) but the aggregates can be differentiated within the measurement using new tags introduced to differentiate the results from Continuous Queries from that of the raw time series data in the measurement.
Example:
CREATE CONTINUOUS QUERY "strain_seconds" ON "database"
RESAMPLE EVERY 5s FOR 1m
BEGIN
SELECT MEAN("strain_top") AS "STRAIN_TOP_MEAN" INTO "database"."raw"."strain" FROM "database"."raw"."strain" GROUP BY time(1s),*
END
As far as I know, and have seen from the docs, it's not possible to apply new tags in continuous queries.
If I've understood the requirements correctly this is one way you could approach it.
CREATE CONTINUOUS QUERY "strain_seconds" ON "database"
RESAMPLE EVERY 5s FOR 1m
BEGIN
SELECT MEAN("strain_top") AS "STRAIN_TOP_MEAN" INTO "database"."raw"."strain" FROM "database"."strain_seconds_retention_policy"."strain" GROUP BY time(1s),*
END
This would save the data in the same measurement but a different retention policy - strain_seconds_retention_policy. When you do a select you specify the corresponding retention policy from which to select.
Note that, it is not possible to perform a select from several retention policies at the same time. If you don't specify one, the default one is used (and not all of them). If it is something you need then another approach could be used.
I don't quite get why you'd need to define a continuous query per device and per sensor. You only need to define five (1 per seconds, minutes, hours, days, weeks) and do a group by * (all) which you already do. As long as the source datapoint has a tag with the id for the corresponding device and sensor, the resampled datapoint will have it too. Any newly added devices (data) will just be processed automatically by those 5 queries and saved into the corresponding retention policies.
If you do want to apply additional tags, you could process the data outside the database in a custom script and write it back with any additional tags you need, instead of using continuous queries
Quotes are not sourced from all markets and may be delayed up to 20
minutes. Information is provided 'as is' and solely for informational
purposes, not for trading purposes or advice.
This advice appears when I use GOOGLEFINANCE() function in my spreadsheet. It is unfortunate that the data is delayed up to 20 minutes.
What is the best way to get real-time stock prices? Suppose my budget is around $50 per month.
Be aware that I trade only US equities, i.e. no bonds, no cryptocurrencies, and so on.
UPDATE
Here is a sample version of my portfolio spreadsheet : https://docs.google.com/spreadsheets/d/1hIfCuupmc_OZ6514DXFe_NrDCX1Ix6tcvySP_VolppI/edit#gid=42667785. It would be important for me to get the price in real-time, and not delayed by maximum 20 minutes.
Is there a way to fix that?
The GOOGLEFINANCE formula is not consistent with the delays. Different stocks can be delayed by different times. You can get an estimate of the delay by using GOOGLEFINANCE("TICKER","DATADELAY").
This is at least somewhat helpful, but not ideal, because you'll have a price on your sheet and you don't know exactly when the price was from, just an estimate of how old the price might be. And forget about pre-market or after-hours. Once the market closes, all bets are off you'll have no idea when the price is from (i.e. after hours quote or regular session close).
If you want accurate real-time quotes, you're going to need an add-on. You said your budget is $50. That doesn't leave you a lot of options. For $9 per month you can use the Market Data Add-on and get real-time stock prices along with historical intraday prices. There is also a free tier that gives you 100 free daily prices.
Market Data's STOCKDATA formula is a drop-in replacement for GOOGLEFINANCE, so it follows the same syntax. It will accomplish what you need. For example, STOCKDATA("SPY","ALL") will produce an output like this:
Date
Bid
Bid Size
Mid
Ask
Ask Size
Last
Volume
5/19/2022 9:09:48
388.36
1400
388.38
388.41
1400
388.37
2715229
Note that the date and time of the quote is included in the output, so you know exactly when the quote was fetched. There is no doubt as to whether the quote is coming from the previous day or whether it is a quote from the pre-market session (which is the case of this example). If you compare to the current time using NOW(), you'll find the Market Data quotes are delayed by about 1-2 seconds, which is due to network latency from your Google Sheet to the servers.
it's important to notice the word "may" in the first sentence:
...and may be delayed up to 20 minutes...
usually, it's way under 20 minutes (around 1 - 1:30 minutes), but there could be times when some delay may occur.
and to answer your question: no, it's not possible to force it under 1 minute
if you want to go full pro mode with Google Sheets then try: =CRYPTOFINANCE()
The documentation links from player0 indicate that ONLY crypto exchanges are supported. Data is NOT available from stock exchanges (NASDAQ, NYSE, etc).
Is it possible to aggregate measurements or create custom queries beyond the standard dateFrom dateTo queries?
As an example, I have measurements which have a time delta of 1 minute (2015-01-01T05:05:00, 2015-01-01T05:05:00, 2015-01-01T05:05:00, ...) and I would like to query the measurements at 15 minute intervals (2015-01-01T05:15:00, 2015-01-01T05:30:00, 2015-01-01T05:45:00, ...)
So far I have only come up with these solutions:
Using the standard api request as in
https://tenant.cumulocity.com/measurement/measurements?dateFrom=2015-10-01&dateTo=2015-11-05
and then throwing away most of the data will use a massive amount of time loading the data.
Using cep (cumulocity event language) to generate a new measurement every 15 minutes using the nearest 1 minute measurement seems like a bit of overkill and not very elegant.
Batch requesting the exact minute
https://tenant.cumulocity.com/measurement/measurements?dateFrom=2015-11-05T05:15:00%2B01:00&dateTo=2015-11-05T05:16:00%2B01:00
which will in a massive amount of API requests and also does not seem very efficient.
Use the /measurements/series endpoint which will only give me all series, even those I do not want, as well as only having the aggregation options hourly and daily (as far as I can tell).
Is there a better way of doing this?
you have captured nearly all of the mechanisms that are currently available. There is one more possibility -- not sure if this is an option for you:
Mark the fifteenth measurement when sending it from the device, using e.g. a different type.
I would normally use 2. It's actually quite efficient, it's similar to a materialized view in traditional SQL, plus you can use the data everywhere and in all widgets.
Good luck :-)
Cheers,
André
I would prefer the CEP solution. The rule wouldn't be that complicated. You would of course then store these measurements twice which is not that nice but having your desired measurement with a specific type or fragment will give you the fastest way to query it.
Instead of copying the measurement you could just add a special fragment to the measurement every 15 min in the CEP rule. You cannot update measurements so you would have to delete the measurement incoming every 15 min and then create a new measurement with exactly the same values but add a fragement (e.g. "aggregatedMeasurement": {}).
Your query then looks like this:
https://tenant.cumulocity.com/measurement/measurements?dateFrom=2015-10-01&dateTo=2015-11-05&fragmentType=aggregatedMeasurement
One more idea for point 3:
You could use SmartREST to create a template with the query string and leave the dateFrom and dateTo as placeholders.
From the client side you then would have to make only one request using the bulking feature in SmartREST.
On the server side this would still be transformed into the single requests so you wouldn't gain anything in speed.
Is there an example of a time-varying numerical quantity that might be in a data warehouse that cannot be meaningfully aggregated over time? If so why?
Stock levels cannot, because they represent a value that is already an aggregation at a particular moment in time.
If you have ten items in stock today and ten yesterday, and ten in stock every day this week, you cannot add them up to "70" meaningfully for the whole week, unless you are measuring something like space utilisation efficiency.
Other examples: bank balance, or speed of flywheel, or time since overhaul.
Many subatomic processes can be observed using our notion of "time" but probably wouldn't make much sense when aggregated. This is because our notion of "time" doesn't make much sense at the quantum level.