See below query:
DROP TABLE IF EXISTS rd_rt_date_integer;
CREATE TABLE rd_rt_date_integer
(
run_date DATE NOT NULL,
run_time INTEGER NOT NULL
CHECK (run_time >= 0 AND run_time < 2400 AND MOD(run_time, 100) < 60),
PRIMARY KEY(run_date, run_time)
);
INSERT INTO rd_rt_date_integer VALUES('2017-05-22', 0);
INSERT INTO rd_rt_date_integer VALUES('2017-05-22', 100);
INSERT INTO rd_rt_date_integer VALUES('2017-05-22', 200);
INSERT INTO rd_rt_date_integer VALUES('2017-05-22', 300);
INSERT INTO rd_rt_date_integer VALUES('2017-05-22', 400);
SELECT run_date, run_time,
EXTEND(run_date, YEAR TO MINUTE) +
MOD(run_time, 100) UNITS MINUTE +
(run_time / 100) UNITS HOUR AS run_date_time
FROM rd_rt_date_integer;
Question: How can we apply condition in where to clause to fetch data from certain time onwards
SELECT run_date, run_time,
EXTEND(run_date, YEAR TO MINUTE) +
MOD(run_time, 100) UNITS MINUTE +
(run_time / 100) UNITS HOUR AS run_date_time
FROM rd_rt_date_integer
where EXTEND(run_date, YEAR TO MINUTE) +
MOD(run_time, 100) UNITS MINUTE +
(run_time / 100) UNITS HOUR >='2017-05-22 02:00'
I just want to understand what is the best way to do manipulation in where clause itself where im concatenating run_date and run_time...
It would be simpler if you stored the time as an INTERVAL HOUR TO MINUTE.
Then you could simplify the first query by using:
DROP TABLE IF EXISTS rd_rt_date_integer;
CREATE TABLE rd_rt_date_integer
(
run_date DATE NOT NULL,
run_time INTERVAL HOUR TO MINUTE NOT NULL
CHECK (run_time >= INTERVAL(0:0) HOUR TO MINUTE AND run_time < INTERVAL(24:00) HOUR TO MINUTE),
PRIMARY KEY(run_date, run_time)
);
INSERT INTO rd_rt_date_integer VALUES('2017-05-22', '0:0');
INSERT INTO rd_rt_date_integer VALUES('2017-05-22', '1:00');
INSERT INTO rd_rt_date_integer VALUES('2017-05-22', '2:00');
INSERT INTO rd_rt_date_integer VALUES('2017-05-22', '3:00');
INSERT INTO rd_rt_date_integer VALUES('2017-05-22', '4:00');
SELECT run_date, run_time,
EXTEND(run_date, YEAR TO MINUTE) + run_time AS run_date_time
FROM rd_rt_date_integer;
However, you presumably have a reason for using the integer run_time, even though it makes time calculations hellish.
This code works — I recommend using the stored procedure:
DROP TABLE IF EXISTS rd_rt_date_integer;
CREATE TABLE rd_rt_date_integer
(
run_date DATE NOT NULL,
run_time INTEGER NOT NULL
CHECK (run_time >= 0 AND run_time < 2400 AND MOD(run_time, 100) < 60),
PRIMARY KEY(run_date, run_time)
);
INSERT INTO rd_rt_date_integer VALUES('2017-05-22', 0);
INSERT INTO rd_rt_date_integer VALUES('2017-05-22', 100);
INSERT INTO rd_rt_date_integer VALUES('2017-05-22', 200);
INSERT INTO rd_rt_date_integer VALUES('2017-05-22', 300);
INSERT INTO rd_rt_date_integer VALUES('2017-05-22', 400);
SELECT run_date, run_time,
EXTEND(run_date, YEAR TO MINUTE) +
MOD(run_time, 100) UNITS MINUTE +
(run_time / 100) UNITS HOUR AS run_date_time
FROM rd_rt_date_integer
WHERE EXTEND(run_date, YEAR TO MINUTE) +
MOD(run_time, 100) UNITS MINUTE +
(run_time / 100) UNITS HOUR >= DATETIME(2017-05-22 02:00) YEAR TO MINUTE
OR EXTEND(run_date, YEAR TO MINUTE) +
MOD(run_time, 100) UNITS MINUTE +
(run_time / 100) UNITS HOUR >= EXTEND('2017-05-22 02:00', YEAR TO MINUTE)
;
DROP FUNCTION IF EXISTS run_date_time;
CREATE FUNCTION run_date_time(rd DATE, rt INTEGER)
RETURNING DATETIME YEAR TO MINUTE;
DEFINE rv DATETIME YEAR TO MINUTE;
LET rv = EXTEND(rd, YEAR TO MINUTE) + MOD(rt, 100) UNITS MINUTE + (rt / 100) UNITS HOUR;
RETURN rv;
END FUNCTION;
SELECT run_date, run_time,
run_date_time(run_date, run_time) AS run_date_time
FROM rd_rt_date_integer
WHERE run_date_time(run_date, run_time) >= DATETIME(2017-05-22 02:00) YEAR TO MINUTE
OR run_date_time(run_date, run_time) >= EXTEND('2017-05-22 02:00', YEAR TO MINUTE)
OR run_date_time(run_date, run_time) >= run_date_time('2017-05-22', 200)
;
The OR conditions in the later SELECT statements show different nomenclatures for writing the condition. They're otherwise identical and only one of the conditions is needed; the others are superfluous.
If you simply stored the minutes since midnight in the run_time column, rather than encoding it as 100 * hours + minutes, then you could write expressions like:
DATETIME(2017-06-12 00:00) YEAR TO MINUTE + 245 UNITS MINUTE
That expression evaluates to 2017-06-12 04:05. You could easily arrange to update the table to this encoding. There are obvious variants of this, such as:
EXTEND(TODAY, YEAR TO MINUTE) + 245 UNITS MINUTE
EXTEND(run_date, YEAR TO MINUTE) + run_time UNITS MINUTE
Related
I've encountered this error:
Execution error in stored procedure: SQL execution internal error: Processing aborted due to error at Snowflake.execute
when running this script:
CREATE OR REPLACE PROCEDURE DATES_TABLE (INITIALDATE VARCHAR, FINALDATE VARCHAR)
RETURNS VARCHAR
LANGUAGE JAVASCRIPT
EXECUTE AS CALLER
AS
$$
var DATESDIFF = (Date.parse(formatDate(FINALDATE)) - Date.parse(formatDate(INITIALDATE)))/ (1000 * 3600 * 24);
snowflake.execute(
{
sqlText: ` CREATE OR REPLACE TEMPORARY TABLE TEMP_DATE_RANGE AS SELECT DATE FROM (
SELECT
CAST(DATEADD (DAY, DatesDiff.n, :1) AS DATE) AS DATE
FROM (
SELECT
ROW_NUMBER() OVER (ORDER BY 1) - 1
FROM
TABLE (generator (rowcount => :3))) DatesDiff (n)
); `,
binds: [formatDate(INITIALDATE), formatDate(FINALDATE), DATESDIFF]
}
);
function formatDate(date) {
var d = new Date(date),
month = '' + (d.getMonth() + 1),
day = '' + d.getDate(),
year = d.getFullYear();
if (month.length < 2)
month = '0' + month;
if (day.length < 2)
day = '0' + day;
return [year, month, day].join('-');
}
$$
;
CALL DATES_TABLE('2021-04-01','2021-05-24');
Which when ran outside of the stored procedure, creates a table with dates between the range inputted.
Any idea why this is happening, how to sort it out?
The problem is in binding a variable to TABLE (generator (rowcount => :3)), as Snowflake expects a constant there.
Instead, you could do something like:
SELECT ROW_NUMBER() OVER (ORDER BY 1) - 1 AS rn
FROM TABLE (generator (rowcount => 1000))
QUALIFY rn < :2
I did some cleanup, and this works:
CREATE OR REPLACE PROCEDURE DATES_TABLE (INITIALDATE VARCHAR, FINALDATE VARCHAR)
RETURNS VARCHAR
LANGUAGE JAVASCRIPT
EXECUTE AS CALLER
AS
$$
var DATESDIFF = (Date.parse(formatDate(FINALDATE)) - Date.parse(formatDate(INITIALDATE)))/ (1000 * 3600 * 24);
snowflake.execute(
{
sqlText: `
CREATE OR REPLACE TEMPORARY TABLE TEMP_DATE_RANGE AS
SELECT CAST(DATEADD(DAY, DatesDiff.rn, :1) AS DATE) AS DATE
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY 1) - 1 AS rn
FROM TABLE (generator (rowcount => 1000))
QUALIFY rn < :2
)
;`
, binds: [formatDate(INITIALDATE), DATESDIFF]
}
);
function formatDate(date) {
var d = new Date(date),
month = '' + (d.getMonth() + 1),
day = '' + d.getDate(),
year = d.getFullYear();
if (month.length < 2)
month = '0' + month;
if (day.length < 2)
day = '0' + day;
return [year, month, day].join('-');
}
$$
;
CALL DATES_TABLE('2021-04-01','2021-05-24');
select * from TEMP_DATE_RANGE;
For a shorter way of generating a sequence of dates, see my answer to https://stackoverflow.com/a/66449068/132438.
I require an output that shows the total number of hours worked in a rolling 24 hour window. The data is currently stored such that each row is one hourly slot (for example 7-8am on Jan 2nd) per person and how much they worked in that hour stored as "Hour". What I need to create is another field that is the sum of the most recent 24 hourly slots (inclusive) for each row. So for the 7-8am example above I would want the sum of "Hour" across the 24 rows: Jan 1st 8-9am, Jan 1st 9-10am... Jan 2nd 6-7am, Jan 2nd 7-8am.
Rinse and repeat for each hourly slot.
There are 6000 people, and we have 6 months of data, which means the table has 6000 * 183 days * 24 hours = 26.3m rows.
I am currently done this using the code below, which works on a sample of 50 people very easily, but grinds to a halt when I try it on the full table, somewhat understandably.
Does anyone have any other ideas? All date/time variables are in datetime format.
proc sql;
create table want as
select x.*
, case when Hours_Wrkd_In_Window > 16 then 1 else 0 end as Correct
from (
select a.ID
, a.Start_DTTM
, a.End_DTTM
, sum(b.hours) as Hours_Wrkd_In_Window
from have a
left join have b
on a.ID = b.ID
and b.start_dttm > a.start_dttm - (24 * 60 * 60)
and b.start_dttm <= a.start_dttm
where datepart(a.Start_dttm) >= &report_start_date.
and datepart(a.Start_dttm) < &report_end_date.
group by ID
, a.Start_DTTM
, a.End_DTTM
) x
order by x.ID
, x.Start_DTTM
;quit;
The most performant DATA step solution most likely involves a ring-array to track the 1hr time slots and hours worked within. The ring will allow a rolling aggregate (sum and count) to be computed based on what goes into and out of the ring.
If you have a wide SAS license, look into the procedures in SAS/ETS (Econometrics and Time Series). Proc EXPAND might have some rolling aggregate capability.
This sample DATA Step code took <10s (WORK folder on SSD) to run on simulated data for 6k people with 6months of complete coverage of 1hr time slots.
data have(keep=id start_dt end_dt hours);
do id = 1 to 6000;
do start_dt
= intnx('dtmonth', datetime(), -12)
to intnx('dtmonth', datetime(), -6)
by dhms(0,1,0,0)
;
end_dt = start_dt + dhms(0,1,0,0);
hours = 0.25 * floor (5 * ranuni(123)); * 0, 1/4, 1/2, 3/4 or 1 hour;
output;
end;
end;
format hours 5.2;
run;
/* %let log= ; options obs=50 linesize=200; * submit this (instead of next) if you want to log the logic; */
%let log=*; options obs=max;
data want2(keep=id start_dt end_dt hours hours_rolling_sum hours_rolling_cnt hours_out_:);
array dt_ring(24) _temporary_;
array hr_ring(24) _temporary_;
call missing (of dt_ring(*));
call missing (of hr_ring(*));
if 0 then set have; * prep pdv column order;
hours_rolling_sum = 0;
hours_rolling_cnt = 0;
label hours_rolling_sum = 'Hours worked in prior 24 hours';
index = 0;
do until (last.id);
set have;
by id start_dt;
index + 1;
if index > 24 then index = 1;
hours_out_sum = 0;
hours_out_cnt = 0;
do clear = 1 by 1 until (clear=0);
if sum (dt_ring(index), 0) = 0 then do;
* index is first go through ring array, or hit a zeroed slot;
&log putlog 'NOTE: ' index= 'clear for empty ring item. ';
clear = 0;
end;
else
if start_dt - dt_ring(index) >= %sysfunc(dhms(0,24,0,0)) then do;
&log putlog / 'NOTE: ' index= 'reducting and zeroing.' /;
hours_out_sum + hr_ring(index);
hours_out_cnt + 1;
hours_rolling_sum = hours_rolling_sum - hr_ring(index);
hours_rolling_cnt = hours_rolling_cnt - 1;
dt_ring(index) = 0;
hr_ring(index) = 0;
* advance item to next item, that might also be more than 24 hours ago;
index = index + 1;
if index > 24 then index = 1;
end;
else do;
&log putlog / 'NOTE: ' index= 'back off !' /;
* index was advanced to an item within 24 hours, back off one;
index = index - 1;
if index < 1 then index = 24;
clear = 0;
end;
end; /* do clear */
dt_ring(index) = start_dt;
hr_ring(index) = hours;
hours_rolling_sum + hours;
hours_rolling_cnt + 1;
&log putlog 'NOTE: ' index= 'overlaying and aggregating.' / 'NOTE: ' start_dt= hours= hours_rolling_sum= hours_rolling_cnt=;
output;
end; /* do until */
format hours_rolling_sum 5.2 hours_rolling_cnt 2.;
format hours_out_sum 5.2 hours_out_cnt 2.;
run;
options obs=max;
When reviewing the results you should notice the delta for hours_rolling_sum is +(hours in slot) - (hours_out_sum{which is hrs removed from ring})
If you must use SQL, I would suggest following #jspascal and index the table, but rearrange the query to left join original data to inner-joined subselect (so that SQL will do an index involved hash join on the ids) . For same amount of few people it should faster than original query, but still be too slow for doing all 6K.
proc sql;
create index id on have;
create index id_slot on have (id, start_dt);
quit;
proc sql _method;
reset inobs=50; * limit data so you can see the _method;
create table want as
select
have.*
, case
when ROLLING.HOURS_WORKED_24_HOUR_PRIOR > 16
then 1
else 0
end as REVIEW_TIME_CLOCKING_FLAG
from
have
left join
(
select
EACH_SLOT.id
, EACH_SLOT.start_dt
, count(*) as SLOT_COUNT_24_HOUR_PRIOR
, sum(PRIOR_SLOT.hours) as HOURS_WORKED_24_HOUR_PRIOR
from
have as EACH_SLOT
join
have as PRIOR_SLOT
on
EACH_SLOT.ID = PRIOR_SLOT.ID
and EACH_SLOT.start_dt - PRIOR_SLOT.start_dt between 0 and %sysfunc(dhms(0,24,0,0))-0.1
group by
EACH_SLOT.id, EACH_SLOT.start_dt
) as ROLLING
on
have.ID = ROLLING.ID
and have.start_dt = ROLLING.start_dt
order by
id, start_dt
;
%put NOTE: SQLOOPS = &SQLOOPS;
quit;
The inner join is pyramid-like and still involves a lot of internal looping.
A compound index on the columns being accessed in the joined table - id + start_dttm + hours - would be useful if there isn't one already.
Using msglevel=i will print some diagnostics about how the query is executed. It may give some additional hints.
How can I show datetime in the highcharts navigator as '%b, %Y' instead of milliseconds
SQL Code :
SELECT DATE_FORMAT(a1.date_time, '%d %b, %Y') AS dt_mon_yr,
(a4.ph1_active_energy)*10 - a1.ph1_active_energy AS 'ph1',
(a4.ph2_active_energy)*10 - a1.ph2_active_energy AS 'ph2',
(a4.ph3_active_energy)*10 - a1.ph3_active_energy AS 'ph3'
FROM powerpro a1
JOIN (SELECT DATE(date_time) date, MIN(date_time) AS min
FROM powerpro GROUP BY DATE(date_time)
) a2 ON a1.date_time = a2.min
JOIN (SELECT DATE(date_time) date, MIN(date_time) AS min
FROM powerpro GROUP BY DATE(date_time)
) a3 ON DATE(a1.date_time) = a3.date - INTERVAL 1 DAY
JOIN powerpro a4
ON a4.date_time = a3.min
WHERE DATE(a1.date_time) BETWEEN DATE_SUB(NOW(), INTERVAL 7 DAY) AND NOW() ORDER BY a1.date_time
You can use axis labels formatter and then use Highcharts.dateFormat
How to write the DB2 stored procedure to calculate the DPM logic.
For example, between Jan 1, 2014 to Sept 30, 2014 – 9 records will be displayed (one record per month). For each record, DPM (days per month) needs to be calculated at runtime, dynamically and updated in the formula :
select (((dpm*24) / 2400) * 123) / dpm
from xxx-table
where date between '2014-09-01' and '2014-01-01'
How to ensure that the correct DPM gets updated in the above mentioned formula
Expected Output
Jan - (((31*24) / 2400) * 123) / 31
Feb -(((28*24) / 2400) * 123) / 28
March - (((31*24) / 2400) * 123) / 31
April - (((30*24) / 2400) * 123) / 30
May - (((31*24) / 2400) * 123) / 31
June - (((30*24) / 2400) * 123) / 30
July - (((31*24) / 2400) * 123) / 31
August - (((31*24) / 2400) * 123) / 31
September - (((30*24) / 2400) * 123) / 30
As #mustaccio said, your formula always simplifies down to 1.23, by moving the first dpm multiplication "out" of the first fraction:
(dpm * (24/2400) * 123) / dpm
Then, the dpm's cancel each other out, only leaving
(24/2400) * 123
Which is the constant 1.23
However, if you really do need a days per month calculation, you could use a recursive query to build out your list of months, and then use the LAST_DAY combined with the DAY scalar functions. I'm assuming you're using DB2 for Linux/Unix/Windows, and I think LAST_DAY was added in 9.7
WITH DATES (DTE, LVL) AS (
SELECT CAST(#beginDate AS DATE), 0
FROM SYSIBM.SYSDUMMY1
UNION ALL
SELECT DTE + 1 MONTHS, LVL +1
FROM SYSIBM.SYSDUMMY1, DATES
WHERE DTE + 1 MONTHS <= #endDate
)
SELECT DAY(LAST_DAY(DTE))
FROM DATES
I want to iterate over a fairly large collection of records in console, divide their date by half of the time since from Time.now and save it. So say records with created_at two months ago would now be 1 month old, 1 day becomes 12 hours etc.
This doesn't work but just for example:
Log.all.each{|l| l.created_at = l.created_at - (Time.now - l.created_at * 0.5); l.save}
Try:
Log.all.each{|l| l.created_at = Time.at( l.created_at.to_f + (Time.now.to_f - l.created_at.to_f)/2 ); l.save}
Which should be the same as:
Log.all.each{|l| l.created_at = Time.at( (Time.now.to_f + l.created_at.to_f)/2 ); l.save}