AWS SimpleDB where clause 'and' operator behaving unexpectedly - amazon-simpledb

The following simpledb query returns 51 results:
select * from logger where time > '2011-07-29 17:45:10.540284+00:00'
This query returns 20534 results:
select * from logger where time < '2011-07-29 17:50:08.615626'
These two queries both return 0 results!!?:
select * from logger where time between '2011-07-29 17:45:10.540284+00:00' and '2011-07-29 17:50:08.615626'
select * from logger where time > '2011-07-29 17:45:10.540284+00:00' and time < '2011-07-29 17:50:08.615626'
What am I missing here?

But are any of your 51 results returned from the first query actually within the time span you are searching? If they are all later than 17:50:08.615626 then your queries are performing as expected.
I am also suspicious of the fact that you are being inconsistent in how you are representing the time. You should really be using ISO 8601 timestamps if you want consistent lexicographic matching of times with SDB.
The other option is that the queries are taking longer than the query timeout to run, are you checking for errors?
Finally - perhaps SDB is having a bad day and the query is just a bit slow - in those circumstances you can find you get 0 results but DO get a next token - and the actual results follow in the next batch.
Does any of that help?

Related

InfluxDB: How to create a continuous query to calculate delta values?

I'd like to calculate the delta values for a series of measurements stored in an InfluxDB. The values are readings from an electricity meter taken every 5 minutes. The values increase over time. Here is subset of the data to give you an idea (commands shown below are executed in the InfluxDB CLI):
> SELECT "Haushaltstromzaehler - cnt" FROM "myhome_measurements" WHERE time >= '2018-02-02T10:00:00Z' AND time < '2018-02-02T11:00:00Z'
name: myhome_measurements
time Haushaltstromzaehler - cnt
---- --------------------------
2018-02-02T10:00:12.610811904Z 11725.638
2018-02-02T10:05:11.242021888Z 11725.673
2018-02-02T10:10:10.689827072Z 11725.707
2018-02-02T10:15:12.143326976Z 11725.736
2018-02-02T10:20:10.753357056Z 11725.768
2018-02-02T10:25:11.18448512Z 11725.803
2018-02-02T10:30:12.922032896Z 11725.837
2018-02-02T10:35:10.618788096Z 11725.867
2018-02-02T10:40:11.820355072Z 11725.9
2018-02-02T10:45:11.634203904Z 11725.928
2018-02-02T10:50:11.10436096Z 11725.95
2018-02-02T10:55:10.753853952Z 11725.973
Calculating the differences in the InfluxDB CLI is pretty straightforward with the difference() function. This gives me the electricity consumed within the 5 minutes intervals:
> SELECT difference("Haushaltstromzaehler - cnt") FROM "myhome_measurements" WHERE time >= '2018-02-02T10:00:00Z' AND time < '2018-02-02T11:00:00Z'
name: myhome_measurements
time difference
---- ----------
2018-02-02T10:05:11.242021888Z 0.03499999999985448
2018-02-02T10:10:10.689827072Z 0.033999999999650754
2018-02-02T10:15:12.143326976Z 0.02900000000045111
2018-02-02T10:20:10.753357056Z 0.0319999999992433
2018-02-02T10:25:11.18448512Z 0.03499999999985448
2018-02-02T10:30:12.922032896Z 0.033999999999650754
2018-02-02T10:35:10.618788096Z 0.030000000000654836
2018-02-02T10:40:11.820355072Z 0.03299999999944703
2018-02-02T10:45:11.634203904Z 0.028000000000247383
2018-02-02T10:50:11.10436096Z 0.02200000000084401
2018-02-02T10:55:10.753853952Z 0.02299999999922875
Where I struggle is getting this to work in a continuous query. Here is the command I used to setup the continuous query:
CREATE CONTINUOUS QUERY cq_Haushaltstromzaehler_cnt ON myhomedb
BEGIN
SELECT difference(sum("Haushaltstromzaehler - cnt")) AS "delta" INTO "Haushaltstromzaehler_delta" FROM "myhome_measurements" GROUP BY time(1h)
END
Looking in the InfluxDB log file I see that no data is written in the new 'delta' measurement from the continuous query execution:
...finished continuous query cq_Haushaltstromzaehler_cnt, 0 points(s) written...
After much troubleshooting and experimenting I now understand why no data is generated. Setting up a continuous query requires to use the GROUP BY time() statement. This in turn requires to use an aggregate function within the differences() function. The problem now is that the aggregate function returns only one value for the time period specified by GROUP BY time(). Obviously, the differences() function cannot calculate a difference from just one value. Essentially, continuous query executes a command like this:
> SELECT difference(sum("Haushaltstromzaehler - cnt")) FROM "myhome_measurements" WHERE time >= '2018-02-02T10:00:00Z' AND time < '2018-02-02T11:00:00Z' GROUP BY time(1h)
>
I'm now somewhat clueless as to how to make this work and appreciate any advice you might have.
Does it help using the last aggregate function? Not tested this as a cq yet.
Select difference(last(T1_Consumed)) AS T1_Delta, difference(last(T2_Consumed)) AS T2_Delta
from P1Data
where time >= 1551648871000000000 group by time(1h)
DIFFERENCE() would calculate delta from the "aggregated" value taken from previous group, not within current group.
So fill free to use selector function there - since your counters seemed to be cumulative, LAST() should be working well.

Selecting greatest date range count in a rails array

I have a database with a bunch of deviceapi entries, that have a start_date and end_date (datetime in the schema) . Typically these entries no more than 20 seconds long (end_date - start_date). I have the following setup:
data = Deviceapi.all.where("start_date > ?", DateTime.now - 2.weeks)
I need to get the hour within data that had the highest number of Deviceapi entries. To make it a bit clearer, this was my latest try on it (code is approximated, don't mind typos):
runningtotal = 0
(2.weeks / 1.hour).to_i.times do |interval|
current = data.select{ |d| d.start_time > (start_date + (1.hour * (interval - 1))) }.select{ |d| d.end_time < (start_date + (1.hour * interval)) }.count
if current > runningtotal
runningtotal = current
end
The problem: this code works just fine. So did about a dozen other incarnations of it, using .where, .select, SQL queries, etc. But it is too slow. Waaaaay too slow. Because it has to loop through every hour within 2 weeks. Then this method might need to be called itself dozens of times.
There has to be a faster way to do this, maybe a sort? I'm stumped, and I've been searching for hours with no luck. Any ideas?
To get adequate performance, you'll want to do everything in a single query, which will mean avoiding ActiveRecord functionality and doing a raw query (e.g. via ActiveRecord::Base.connection.execute).
I have no way to test it, since I have neither your data nor schema, but I think something along these lines will do what you are looking for:
select y.starting_hour, max(y.num_entries) as max_entries
from
(
select x.starting_hour, count(*) as num_entries
from
(
select date_trunc('hour', start_time) starting_hour
from deviceapi as d
) as x
group by x.starting_hour
) as y
where y.num_entries = max(y.num_entries);
The logic of this is as follows, from the inner-most query out:
"Bucket" each starting time to the hour
From the resulting table of buckets, get the total number of entries in each bucket
Get the maximum number of entries from that table, and then use that number to match back to get the starting_hour itself.
If there happen to be more than one bucket with the same number of entries, you could determine a consistent way to pick one -- say the min(starting_hour) or similar (since that would stay the same even as data gets added, assuming you are not deleting items).
If you wanted to limit the initial time slice -- I see 2 weeks referenced in your post -- you could do that in the inner-most query with a where clause bracketing the date range.

Neo4j Cypher Unknown Error

So I created a date dimension from this article
a link
I modified it and added datestamp to Day node which is Month/Day/Year (string)
I added indexes on Year.year, Month.month, Day.day && day.datestamp
When I run this query:
MATCH p=(day2:Day {datestamp:'1/1/2015'})-[:NEXT*]->(day {day:2})
return length(p)
limit 5
It takes 1667 ms to execute
When I modify the query to this:
MATCH p=(day2:Day {datestamp:'1/1/2015'})-[:NEXT*]->(day {datestamp:'1/2/2015'})
return length(p)
After it runs for about a minute, it ends in the Unknown Error message.
My schema is:
Indexes
ON :Day(day) ONLINE
ON :Day(datestamp) ONLINE
ON :Month(month) ONLINE
ON :Year(year) ONLINE
No constraints
Any ideas what I'm doing wrong?
I think I figured it out.
Looks like the 1st query that runs 1667ms only runs and completes because of limit 5, it finds 5 records and stops further execution.
While the other keeps going and going until it runs out of juice.
I think solution in this case is constraint that indicates datestamp is unique which should prevent further execution.
Still interesting, considering there's about 2600+ records connected with HAS_NEXT so traveling through those relationships shouldn't be taking this long to find out that there's only 1 record that matches that query.

RoR sum column of datetime type

I'm calculating total "walk time" for dog walking app. The Walks table has two cols, start_time and end_time. Since I want to display total time out for ALL walks for a particular dog, I should just be able to sum the two columns, subtract end_times_total from start_time_totals and result will be my total time out. However I'm getting strange results. When I sum the columns thusly,
start_times = dog.walks.sum('start_time')
end_times = dog.walks.sum('end_time')
BOTH start_times and end_times return the same value. Doing a sanity check I see that my start and end times in the db are indeed set as I would expect them to be (start times in the morning, end times in the afternoon), so the sum should definitely return a different value for each of the columns. Additionally, the value is different for each dog and in line with the relative values I would expect, so dogs with more walks return larger values than dogs with fewer walks. So, it looks like the sum is probably working, only somehow returning the same value for each column.
Btw, running this in dev Rails 3.2.3, ruby 2.0, SQLite.
Don't think that summing datetimes is a good idea. What you need is calculate duration of each single walk and sum them. You can do it in 2 ways:
1. DB-dependent, but more efficient:
# sqlite in dev and test modes
sql = "strftime('%s',end_time) - strftime('%s',start_time)" if !Rails.env.production?
# production with postgres
sql = "extract(epoch from end_time - start_time)" if Rails.env.production?
total = dog.walks.sum(sql)
2. DB-agnostic, but less efficient in case of hundreds record for each dog:
total = dog.walks.all.inject(0) {|tot,w| tot+=w.end_time-w.start_time}
I don't know, how sqlite handles datetime and operations on this data type, but while playing in sqlite console, I noticed that I could get reliable effects when converting datetime to seconds.
I would write it like:
dog.walks.sum("strftime('%s', end_time) - strftime('%s', start_time)")
Query should look like:
select sum(strftime('%s', end_time) - strftime('%s', start_time)) from walks;

Dynamic time based finder for ActiveRecord

I have a slightly complex time arithmetic problem.
I have a reminder system where the user can set "how many x before event" duration. For example: If I set '5 minutes' - I need to get reminder before 5 minutes of the event schedule.
In my reminder system, I have a cron which runs every minute and sends reminder mails. So far so good. I want to find all calendar events which are eligible for reminder (calendar entry whose scheduled time is between "5.minutes.from_now and 6.minutes.from_now"
I am trying the write the following where clause :
conds = "'when' >= '#{eval("#{cal.remind_before.to_s}.#{cal.remind_before_what.downcase}.from_now").to_s(:db)}' AND 'when' < '#{eval("#{cal.remind_before.to_s}.#{cal.remind_before_what.downcase}.from_now + 1.minutes").to_s(:db)}'"
#mail_calendar_for_reminder= Calendar.find(:all, :conditions=> conds)
Here cal.reminder_before = '5', cal.remind_before_what.downcase='minutes'
so the eval would be evaluating (5.minutes.from_now) and (6.minutes.from_now)
The resulting SQL statement is :
SELECT "calendars".* FROM "calendars" WHERE ('when' >= '2011-01-11 14:44:54' AND 'when' < '2011-01-11 14:45:54')
This SQL is syntactically and logically correct because it gets a time range of 5.minutes.from_now and 6.minutes.from_now. But it is not selecting eligible records. I suspect two things:
1. The SQL above is doing string comparisons rather than time comparisons.
2. The database entry for calendar's scheduled time has the following format :
2011-01-11 14:45:09.000000 --the 0's the end might be messing teh date comparisons.
I tried almost all sorts of date range arithmetic but could not get the eligible records in this query.
Depending on your server and its load, a one-minute window for cron might be a little optimistic.
What happens if you login to the dbms server and execute that SQL statement? Any rows returned? Any error messages?
You can try an explicit type cast. So
'when' >= CAST('2011-01-11 14:44:54' AS DATETIME) ...
Your dbms might require a different syntax for type casting and conversion. Search your docs.
Are your column names case sensitive? Is the column 'when' or 'When'? (Or wHen?)
This query returns your test event. Note the double quotes around the column name.
SELECT "calendars".*
FROM "calendars"
WHERE ("when" >= '2011-01-10 15:56'
AND "when" < '2011-01-10 15:57')

Resources