Timezone-aware queries in Azure Data Explorer - timezone

I use the Azure Data Explorer to store temperature sensor values. The timestamps are in UTC. I want to aggregate these values by day for the last 7 days. Nevertheless, I want to use the local time from where these values came from and aggregate by the timestamps in local time (e.g. midnight would be at 00:00+2h and 22:00UTC). How can I do this with Kusto Query Language in the ADX?

E.g. if you want to provide the timezone UTC+1, you can extend your Kusto query by this:
| extend Timestamp = Timestamp + 3600s
Your filters for a time range would still need to be provided in UTC though.

Offsets work, but you cannot simply use a fixed offset for time zone if you care about daylight saving and having a very general solution. If you're doing something that regularly produces reports AND the time zone must be correct, read on.
It felt like a bit of a hack, but the way we achieved something along these lines was to create a time zone table with columns like this:
BeginOfDay: datetime(2020-01-01 00:00:00)
Timezone: "Africa/Addis_Ababa"
UTCStart: datetime(2020-01-01 00:00:00)-3h
UTCEnd: datetime(2020-01-02 00:00:00)-3h
There should be one row for every combination of time zone and day of interest. We populated something like ten years into the future. If you're worried about storage space or speed you only need to include the date range and time zones you care about, but even with 'everything' it was not a very large table.
Each row contains the 'day' BeginOfDay, which is always midnight and equivalent to "The first of January, 2020", and then the start and end of that local day in, in UTC time. We wrote a program to generate the contents of the table, of course.
After that, you can do something like:
let TimezoneDay = datatable (BeginOfDay:datetime, Timezone:string, UTCStart:datetime, UTCEnd:datetime)
[datetime(2020-01-01), "Africa/Addis_Ababa", datetime(2019-12-31 21:00:00), datetime(2020-01-01 21:00:00),
datetime(2020-01-02), "Africa/Addis_Ababa", datetime(2020-01-01 21:00:00), datetime(2020-01-02 21:00:00),
datetime(2020-01-03), "Africa/Addis_Ababa", datetime(2020-01-02 21:00:00), datetime(2020-01-03 21:00:00)
];
let TemperatureEvents = datatable (Timestamp:datetime, Device:string, Temperature:real)
[datetime(2020-01-01 05:00:00), "Device 1", 10.5,
datetime(2020-01-01 07:00:00), "Device 1", 30.5,
datetime(2020-01-02 01:50:00), "Device 1", 24.0,
datetime(2020-01-02 20:00:00), "Device 1", 20.5,
datetime(2020-01-02 23:50:00), "Device 1", 19.5,
datetime(2020-01-01 10:20:00), "Device 2", 0.5
];
TimezoneDay
| where Timezone == "Africa/Addis_Ababa"
// Use a dummy column to emulate a cross join
| extend dummy=1
| join kind=inner (TemperatureEvents | extend dummy = 1) on dummy
// Filter values into local time
| where Timestamp between (UTCStart .. UTCEnd)
| summarize AverageTemp=avg(Temperature) by BeginOfDay, Timezone, Device
The cross join may be a little expensive if you have a large dataset, but this is a starting point - you can also do a time window join to restrict the number of events you consider for each 'day'.

Azure Data Explorer doesn't have any built-in functions for converting between time zones.
The documentation recommends:
... Should time zone values be required to be kept as a part of the data, a separate columns should be used (providing offset information relative to UTC).
Thus, you should store two values - The original UTC-based timestamp so you can properly order the data, and the date from the local time zone so you can aggregate by local day.

Related

DynamoDB Timeseries: Querying large timespans of data

I have a simple timeseries table:
{
"n": "EXAMPLE", # Name, Hash Key
"t": 1640893628, # Unix Timestamp, Range Key
"v": 10 # Value being stored
}
Every 15 minutes I will poll data and insert into the table. If I want to query values between a 24-hour period, this works well - this would equate to a total of 96 records.
Now, say I want to query a larger timespan - 1 or 2 years. This is now tens of thousands of records, and (in my opinion) impractical to do regularly. This will require multiple queries to retrieve larger time ranges which would negatively impact response times as well as being much more costly.
I have thought of a couple of potential solutions to this problem:
1. Replicate data in another table, with larger increments. A table with a single record every 6 hours, for example.
2. Have another table to store common query results, such as records for "EXAMPLE" for the past week, month, and year (respectively). I would periodically update records in the new table to hold every N'th record in the main table (a total of 100). Something like:
{
"n": "EXAMPLE#WEEKLY",
"v": [
{
"t": 1640893628,
"v": 10
},
{
"t": 1640993628,
"v": 15
},
... 98 more.
]
}
I believe #2 is a solid approach. It seems to me like this would be a common enough problem, so I would love to hear about how other people have approached it.
More options present themselves if you can convert your unix timestamps into ISO 8601-type strings like 2021-12-31T09:27:58+00:00.
If so, DynamoDB's begins_with key condition expression lets us query for discrete calendar time buckets. December 2021, for example,
is queryable using n = id1 AND begins_with(t, "2021-12"). Same for days and hours. We can take this one step further by adding
other periods in indexes.
Some rolling windows are possible, too: n = id1 AND t > [24 hours ago] gives us last 24h.
n (PK) t (SK) hour_bucket (LSI1 SK) week (LSI2 SK)
id1 2021-12-31T10:45 2021-12-31T09-12 2021-52
id1 2021-12-31T13:00 2021-12-31T13-15 2021-52
id1 2022-06-01T22:00 2022-06-01T22-24 2022-01
If you are looking for arbitrary time-series queries, you might consider Athena, as the other answer suggested, or AWS's serverless
Timestream, which is a "purpose-built time series database that makes it easy to store and analyze trillions of time series data points per day. "
You could export the table to Amazon S3 and run Amazon Athena on the exported data. Here’s a blog post describing the process: https://aws.amazon.com/blogs/aws/new-export-amazon-dynamodb-table-data-to-data-lake-amazon-s3/

Google sheets find from named range a specified date&time display description from that named range

I'm writing a schedule for my wife and I in Google Sheets that seemed to be working but is missing some of our scheduled meetings/appointments. A link to an exact copy is found here.
The idea is that within the "Event Scheduler" I can list in DATE | TIME | DESCRIPTION and it generates a UNIQUE ID using =V14&"|"&COUNTIF(V$14:V14,V14) and also generating a DATE&TIME using DATE&TIME format. I highlight each person (or animal)'s scheduled items in named ranges named "MolliesEvents", "AydensEvents" and "DogsEvents". I now go to the main page, "Daily + Weekly" and compared the current date TODAY() and the time in the E column to the DATE&TIME column of the "Event Scheduler" sheet. Weirdly, this works for some of the scheduled items. However, a significant minority is not captured by the code that I've used. Weirder still, when I compared manually the time in the "Event Scheduler" with the time and date on the "Daily + Weekly" page I get a positive result. However, when I attempt to automate this process it does not.
You are losing events due to rounding.
For example, consider the event "Monday, March 22, 2021 11:30:00 Doctor calling".
In the 'Event Scheduler'!F18 cell is 442770.479166666666667.
In the 'Daily + Weekly'!F14 cell, you use TODAY()&E14 = 442770.479166666666666 in the formula.
Due to the fact that 442770.479166666666667 does not equal 442770.479166666666666, this event does not hit the sheet. This also applies to other lost events.
Possible Solution.
Delete the 'Time Intervals' sheet and enter the time manually on the 'Daily + Weekly' sheet in column E.
Also, I would change the formula like this:
=IFERROR(INDEX(MolliesEvents, MATCH(TODAY()&E14, 'Event Scheduler'!$F$14:$F$1000,0),3),"") (this is in the F14, copy it to other cells).
Here's an example of a spreadsheet that works well.

PostegreSQL Combine columns and convert to timestamp with local time zone

I'm creating a time slot table in Rails with PostegreSQL that contains columns like
slots
name | type
-----|-----
day | date
hour | int
min | int
hour would be like 11, 12, 13, 14 ...
min would be like 0, 5, 10, 15 ...
I'm trying to use these three columns and create a timestamp in order to compare against Time.now to pull records that's upcoming in the future.
Since PG's to_timestamp function creates timestamp with UTC as default timezone, I want to create time from the three columns to use server's timezone and my attempt is below.
Slot.select("
to_timestamp(
concat_ws(
' ',
day::text,
concat_ws(
':',
hour::text,
min::text),
'#{Time.now.zone}'),
'YYYY-MM-DD HH24:MI (TZ)')
AS t")
And it gives me the error:
PG::FeatureNotSupported: ERROR: "TZ"/"tz"/"OF" format patterns are not supported in to_date
Any suggestions or thoughts would be great.
Thanks
The to_timestamp() function returns a timestamp with time zone value. If you do not explicitly specify a time zone, then the time zone of the server is used. That seems to be all that you need, so you can safely forget about specifying anything beyond the simple date and time.
Seeing what you are trying to do, however, it would be much easier to use the make_time() function and add the resulting time to the day date to get the timestamp you need. This saves you lots of conversions to text and then back to a timestamp:
Slot.select("day + make_time(hour, min, 0.0::float) AS t");

Store the day of the week and time?

I have a two-part question about storing days of the week and time in a database. I'm using Rails 4.0, Ruby 2.0.0, and Postgres.
I have certain events, and those events have a schedule. For the event "Skydiving", for example, I might have Tuesday and Wednesday and 3 pm.
Is there a way for me to store the record for Tuesday and Wednesday in one row or should I have two records?
What is the best way to store the day and time? Is there a way to store day of week and time (not datetime) or should these be separate columns? If they should be separate, how would I store the day of the week? I was thinking of storing them as integer values, 0 for Sunday, 1 for Monday, since that's how the wday method for the Time class does it.
Any suggestions would be super helpful.
Is there a way for me to store the the record for Tuesday and
Wednesday in one row or do should I have two records?
There are several ways to store multiple time ranges in a single row. #bma already provided a couple of them. That might be useful to save disk space with very simple time patterns. The clean, flexible and "normalized" approach is to store one row per time range.
What is the best way to store the day and time?
Use a timestamp (or timestamptz if multiple time zones may be involved). Pick an arbitrary "staging" week and just ignore the date part while using the day and time aspect of the timestamp. Simplest and fastest in my experience, and all date and time related sanity-checks are built-in automatically. I use a range starting with 1996-01-01 00:00 for several similar applications for two reasons:
The first 7 days of the week coincide with the day of the month (for sun = 7).
It's the most recent leap year (providing Feb. 29 for yearly patterns) at the same time.
Range type
Since you are actually dealing with time ranges (not just "day and time") I suggest to use the built-in range type tsrange (or tstzrange). A major advantage: you can use the arsenal of built-in Range Functions and Operators. Requires Postgres 9.2 or later.
For instance, you can have an exclusion constraint building on that (implemented internally by way of a fully functional GiST index that may provide additional benefit), to rule out overlapping time ranges. Consider this related answer for details:
Preventing adjacent/overlapping entries with EXCLUDE in PostgreSQL
For this particular exclusion constraint (no overlapping ranges per event), you need to include the integer column event_id in the constraint, so you need to install the additional module btree_gist. Install once per database with:
CREATE EXTENSION btree_gist; -- once per db
Or you can have one simple CHECK constraint to restrict the allowed time period using the "range is contained by" operator <#.
Could look like this:
CREATE TABLE event (event_id serial PRIMARY KEY, ...);
CREATE TABLE schedule (
event_id integer NOT NULL REFERENCES event(event_id)
ON DELETE CASCADE ON UPDATE CASCADE
, t_range tsrange
, PRIMARY KEY (event_id, t_range)
, CHECK (t_range <# '[1996-01-01 00:00, 1996-01-09 00:00)') -- restrict period
, EXCLUDE USING gist (event_id WITH =, t_range WITH &&) -- disallow overlap
);
For a weekly schedule use the first seven days, Mon-Sun, or whatever suits you. Monthly or yearly schedules in a similar fashion.
How to extract day of week, time, etc?
#CDub provided a module to deal with it on the Ruby end. I can't comment on that, but you can do everything in Postgres as well, with impeccable performance.
SELECT ts::time AS t_time -- get the time (practically no cost)
SELECT EXTRACT(DOW FROM ts) AS dow -- get day of week (very cheap)
Or in similar fashion for range types:
SELECT EXTRACT(DOW FROM lower(t_range)) AS dow_from -- day of week lower bound
, EXTRACT(DOW FROM upper(t_range)) AS dow_to -- same for upper
, lower(t_range)::time AS time_from -- start time
, upper(t_range)::time AS time_to -- end time
FROM schedule;
db<>fiddle here
Old sqliddle
ISODOW instead of DOW for EXTRACT() returns 7 instead of 0 for sundays. There is a long list of what you can extract.
This related answer demonstrates how to use range type operator to compute a total duration for time ranges (last chapter):
Calculate working hours between 2 dates in PostgreSQL
Check out the ice_cube gem (link).
It can create a schedule object for you which you can persist to your database. You need not create two separate records. For the second part, you can create schedule based on any rule and you need not worry on how that will be saved in the database. You can use the methods provided by the gem to get whatever information you want from the persisted schedule object.
Depending how complex your scheduling needs are, you might want to have a look at RFC 5545, the iCalendar scheduling data format, for ideas on how to store the data.
If you needs are pretty simple, than that is probably overkill. Postgresql has many functions to convert date and time to whatever format you need.
For a simple way to store relative dates and times, you could store the day of week as an integer as you suggested, and the time as a TIME datatype. If you can have multiple days of the week that are valid, you might want to use an ARRAY.
Eg.
ARRAY[2,3]::INTEGER[] = Tues, Wed as Day of Week
'15:00:00'::TIME = 3pm
[EDIT: Add some simple examples]
/* Custom the time and timetz range types */
CREATE TYPE timerange AS RANGE (subtype = time);
--drop table if exists schedule;
create table schedule (
event_id integer not null, /* should be an FK to "events" table */
day_of_week integer[],
time_of_day time,
time_range timerange,
recurring text CHECK (recurring IN ('DAILY','WEEKLY','MONTHLY','YEARLY'))
);
insert into schedule (event_id, day_of_week, time_of_day, time_range, recurring)
values
(1, ARRAY[1,2,3,4,5]::INTEGER[], '15:00:00'::TIME, NULL, 'WEEKLY'),
(2, ARRAY[6,0]::INTEGER[], NULL, '(08:00:00,17:00:00]'::timerange, 'WEEKLY');
select * from schedule;
event_id | day_of_week | time_of_day | time_range | recurring
----------+-------------+-------------+---------------------+-----------
1 | {1,2,3,4,5} | 15:00:00 | | WEEKLY
2 | {6,0} | | (08:00:00,17:00:00] | WEEKLY
The first entry could be read as: the event is valid at 3pm Mon - Fri, with this schedule occurring every week.
The second entry could be read as: the event is valid Saturday and Sunday between 8am and 5pm, occurring every week.
The custom range type "timerange" is used to denote the lower and upper boundaries of your time range.
The '(' means "inclusive", and the trailing ']' means "exclusive", or in other words "greater than or equal to 8am and less than 5pm".
Why not just store the datestamp then use the built in functionality for Date to get the day of the week?
2.0.0p247 :139 > Date.today
=> Sun, 10 Nov 2013
2.0.0p247 :140 > Date.today.strftime("%A")
=> "Sunday"
strftime sounds like it can do everything for you. Here are the specific docs for it.
Specifically for what you're talking about, it sounds like you'd need an Event table that has_many :schedules, where a Schedule would have a start_date timestamp...

SQLITE strange timestamp and IOS

I'm trying to display a simple tableview in IOS with data from Sqlite. My database date is stored as a timestamp. I thought was an unix timestamps but if i try to use dateWithTimeIntervalSince1970 i've really strange result.
Examples of date rows stored:
1352208510267
1352208512266
1352208514266
1352208516266
1352208530266
1352208532265
Use a query like this
SELECT datetime(timestamp, 'unixepoch') from YOURTABLENAME
WHERE id = someId;
This should convert it to some readable value.
Have a look here
I found the answer here. I compared the results with the previous answers:
SELECT strftime('%Y-%m-%d %H:%M:%S', datetime(ZDATE+978307200, 'unixepoch', 'localtime')), datetime(ZDATE, 'unixepoch', 'localtime') FROM ZTABLE
The query with the adjustment for Apple's epoch (Jan 1 2001) gives me the correct date:
"2015-09-29 20:50:51", "1984-09-28 20:50:51"
"2015-09-29 21:03:10", "1984-09-28 21:03:10"
"2015-09-29 21:25:30", "1984-09-28 21:25:30"
Unix timestamps are defined as the number of seconds since Jan 1 1970.
Just now, this would be about 1365525702.
Your values are one thousand times larger, i.e., they are measured in milliseconds.
Decide whether you actually need the millisecond precision, and then add * 1000 or / 1000 at the appropriate places.

Resources