I am familiar with HKStatisticsCollectionQuery in HealthKit, but I am not clear how to run a query for the total number of minutes of activity in a specified time frame, rather than the total number of some QuantityType units. Is this covered somewhere in the documentation?
Perform an HKStatisticsQuery specifying the HKQuantityTypeIdentifierAppleExerciseTime quantity type to get the total number of exercise minutes for a given time period.
Related
I'm using the GoogleFinance() functions on a Google spreadsheet to keep track of my stocks. With the "datadelay" attribute I can check how long ago the data has been updated for the last time. But it only returns a raw number, like "54000" for one ticker and "15" for another. What time unit is that supposed to be? minutes? seconds? milliseconds?
When I check the documentation for the Google Finance I saw that there is a page explains there might be delay up to 20 minutes. They also mentioned that they are using different exchanges to retrieve market data and all this different exchanges might have different data delay. It can explain the differences in the "datadelay" column.
For the unit of this column, my assumption is it should be shorter than seconds since 54000 seconds = 900 min, which is far higher than the maximum delay defined in the help page. But I am not sure what would be value for this column when you query in the not-trading days.
The page shows delays for each exchange.
GoogleFinance() function updates every minute (if it is set so), but keep in mind that results may be delayed up to 20 minutes. so the answer is between 1-20 minutes
If my interpretation is correct, according to the documentation provided here:InfluxDB Downsampling when we down-sample data using a Continuous Query running every 30 minutes, it runs only for the previous 30 minutes data.
Relevant part of the document:
Use the CREATE CONTINUOUS QUERY statement to generate a CQ:
CREATE CONTINUOUS QUERY "cq_30m" ON "food_data" BEGIN
SELECT mean("website") AS "mean_website",mean("phone") AS "mean_phone"
INTO "a_year"."downsampled_orders"
FROM "orders"
GROUP BY time(30m)
END
That query creates a CQ called cq_30m in the database food_data.
cq_30m tells InfluxDB to calculate the 30-minute average of the two
fields website and phone in the measurement orders and in the DEFAULT
RP two_hours. It also tells InfluxDB to write those results to the
measurement downsampled_orders in the retention policy a_year with the
field keys mean_website and mean_phone. InfluxDB will run this query
every 30 minutes for the previous 30 minutes.
When I create a Continuous Query it actually runs on the entire dataset, and not on the previous 30 minutes. My question is, does this happen only the first time after which it runs on the previous 30 minutes of data instead of the entire dataset?
I understand that the query itself uses GROUP BY time(30m) which means it'll return all data grouped together but does this also hold true for the Continuous Query? If so, should I then include a filter to only process the last 30 minutes of data in the Continuous Query?
What you have described is expected functionality.
Schedule and coverage
Continuous queries operate on real-time data. They use the local server’s timestamp, the GROUP BY time() interval, and InfluxDB database’s preset time boundaries to determine when to execute and what time range to cover in the query.
CQs execute at the same interval as the cq_query’s GROUP BY time() interval, and they run at the start of the InfluxDB database’s preset time boundaries. If the GROUP BY time() interval is one hour, the CQ executes at the start of every hour.
When the CQ executes, it runs a single query for the time range between now() and now() minus the GROUP BY time() interval. If the GROUP BY time() interval is one hour and the current time is 17:00, the query’s time range is between 16:00 and 16:59.999999999.
So it should only process the last 30 minutes.
Its a good point about the first run.
I did manage to find a snippet from an old document
Backfilling Data
In the event that the source time series already has data in it when you create a new downsampled continuous query, InfluxDB will go back in time and calculate the values for all intervals up to the present. The continuous query will then continue running in the background for all current and future intervals.
https://influxdbcom.readthedocs.io/en/latest/content/docs/v0.8/api/continuous_queries/#backfilling-data
Which would explain the behaviour you have found
I am storing Amps, Volts and Watts in influxdb in a measurement/table called "Power". The frequency of update is approx every second. I can use the integral function to get power usage (in Amp Hour or Watt Hour) on an hourly basis. This is working very nicely, so I can get a graph of power used each hour over a 24 hour period. My SQL is below.
The issue is if there is a gap in the data then I get a huge spike in the result when it returns. eg if data was missing from 3pm to 5.45 pm, then the 5 pm result shows a huge spike. Reason I can see is there is close to 3 hours gap, so it just calculates the area under the graph and lumps it into the 5 PM value. Can I avoid that?
SELECT INTEGRAL(Watts) FROM Power WHERE time > now() - 24h GROUP BY time(1h)
I had a similar issue with Influx. It turns out that integral() doesn't support fill() as noted by Yuri Lachin in the comments.
Since you're grouping by hours anyway, then the average value of the power (watts) for the hour happens to be equal to the energy consumption for the hour (watt-hours), so you can use the mean() value here and you should get the correct result.
The query I'm using is:
SELECT mean("load_power") AS "load"
FROM "power_readings"
WHERE $timeFilter
GROUP BY time(1h) fill(0)
For daily numbers I can go back to using integral() because I rarely have gaps of data that span multiple days, so there's no filler needed.
Since you can use the fill() function in this query, you can decide what makes the most sense of the various options (see https://docs.influxdata.com/influxdb/v1.7/query_language/data_exploration/#group-by-time-intervals-and-fill)
You need to use fill() in group by section of query (see docs for Influx fill() usage).
In your case fill(none) or fill(0) should do the job.
Dataset: I'm given the number of minutes individual customers use a product each day and am trying to cluster this data in order to find common usage patterns.
My question: How can I format the data so that, for example, a power user with high levels of use for a year looks the same as a different power user who has only been able to use the device for a month before I ended data collection?
So far I've turned each customer into an array where each cell is the number of minutes used that day. This array starts when the user first uses the product and ends after the user's first year of use. All entries in the cells must be double values (e.x. 200.0 minutes used) for the clustering model. I've considered either setting all cells/days after the last day of data collection to either -1.0 or NULL. Are either of these a valid approach? If not what would you suggest?
For the problem where you want both users (one that used the product a lot every day for a year, and the other used it a lot for one month), create a new entry where it's values are:
avg_usage per time_bin
time_bin can be a month, a day or another time bin which best fits your needs.
This way, a user which use a product, let's say 200 minutes per day for one year, will get:
200 * 30 * 12 / 12 = 6000 minutes per month
and the other user, which joined just last month, will also get, with the exact same usage will get:
200 * 30 * 1 / 1 = 6000 minutes per month.
This way, it doesn't matter when you have started to use the product, the only thing that matter, is the usage rate.
An important thing you might take into consideration, that products, may be forgotten for some time. for example, a computer, and I'm away for a vacation. Those days I didn't use my computer, doesn't have (maybe) an effect of my general usage of this product. So, based on your data, product and intuition you might consider removing gaps like the one I mentioned, and not take it into account inside the calculation.
The amount of time a user has used your product could be a signal of something, but if indeed he only started some time ago, and still using it until today, it may be something you need to take into consideration, and for that use, this average binning technique may help.
I'm measuring how long users are logged into a service. Every minute, for each user, their new total online time is sent to InfluxDB. I'd like to graph, in Grafana, the cumulative online time for all users.
What kind of query would I need to do that? I initially thought that I'd want sum(onlineTime) and group by time(1m), but I realized that's summing the values within that timeframe, not summing the totals of all users, so when a user wasn't logged in, the total would drop, because there were not data points for them.
I'm a bit confused about what I'm doing now. If I'm sending the wrong data, I can change that too.
So this depends on the time data you send back to InfluxDB
The time data is equal to the total time spent till that instant of time
In this case you would have to take the "last" value and add it up for all the users
The time is equal to the small increments
In this case you would have to add this multiple incremental value for a period of time.