Last six months data including blank rows to represent on x-axis graph (ReportBuilder) - join

Hi there,
I am trying to fetch last six months data in my query and need to represent 'Month-year' label on x-axis. So query works fine when there is data for a month but if it is unsuccesful in join and no data is returned for that month - there is no label.Hence I am unable to draw it on chart (report Builder 3.0) E.g.
ApptMonthYearname NotCompleteAppointments AppointmentYear AppointmentMonthInt
January-2012 118 2012 1
December-2011 88 2011 12
Query includes a join on three tables and then where clause checks that an appointment is falling between the selected range of month and year or not :
declare #SelectedMonth int
declare #SelectedYear int
declare #careprovider varchar(20)
DECLARE #intFlag INT
let's say
SET #SelectedMonth = 1
SET #SelectedYear =2012
declare #selectedDate datetime
declare #previoussixmonthsdate datetime
IF (#SelectedMonth = Datepart(mm,GETDate()) and #SelectedYear =Datepart(yyyy,GETDate()))
BEGIN
SET #selectedDate = CONVERT(datetime, CONVERT(varchar(2), datepart(DD,GETDATE())+ '/' + Convert(varchar(2),#SelectedMonth) + '/' +Convert(varchar(4),#SelectedYear), 103))
SET #previoussixmonthsdate= DATEADD(month, -6, #selectedDate)
END
ELSE
BEGIN
SET #selectedDate = CONVERT(datetime, '31'+ '/' + Convert(varchar(10),#SelectedMonth) + '/' +Convert(varchar(10),#SelectedYear), 103)
SET #previoussixmonthsdate= DATEADD(month, -6, #selectedDate)
END
select #selectedDate, #previoussixmonthsdate
SELECT dbo.Filteredals_clinicappointment.als_clinicappointmentid [AppointmentID],
dbo.Filteredals_clinicappointment.als_statusname [AppointmentStatus],
dbo.Filteredals_clinicappointment.als_appointmentdatetime [AppointmentBookingTime],
Datepart(mm,dbo.Filteredals_clinicappointment.als_appointmentdatetime) [AppointmentMonth],
Datepart(yyyy,dbo.Filteredals_clinicappointment.als_appointmentdatetime) [AppointmentYear],
DATENAME(month,dbo.Filteredals_clinicappointment.als_appointmentdatetime) [AppointmentMonthName],
DATENAME (year,dbo.Filteredals_clinicappointment.als_appointmentdatetime) [AppointmentyearName]
FROM dbo.Filteredals_clinicappointment LEFT OUTER JOIN
dbo.Filteredrbs_clinicinstance ON dbo.Filteredals_clinicappointment.als_clinicinstance = dbo.Filteredrbs_clinicinstance.rbs_clinicinstanceid LEFT OUTER JOIN
dbo.Filteredrbs_clinic ON dbo.Filteredrbs_clinicinstance.rbs_clinic = dbo.Filteredrbs_clinic.rbs_clinicid LEFT OUTER JOIN
dbo.Filteredrbs_careproviders ON dbo.Filteredrbs_clinic.rbs_careprovider = dbo.Filteredrbs_careproviders.rbs_careprovidersid
WHERE dbo.Filteredrbs_careproviders.rbs_careprovidersid= #careprovider
AND dbo.Filteredals_clinicappointment.als_appointmentdatetime <= #selectedDate AND
dbo.Filteredals_clinicappointment.als_appointmentdatetime >=#previoussixmonthsdate
,dbo.Filteredals_clinicappointment.als_appointmentdatetime)= #SelectedYear
GROUP BY YEAR(AppointmentList.AppointmentBookingTime), MONTH(AppointmentList.AppointmentBookingTime)) as [DNAAppts]
Any help would be greatly appreciated.

One way to guarantee that you always get data for each month in the report range is to populate a temporary/derived table with the set of dates then left join that to the data.
Alternatively you can fake the data eg by storing the results of your query (or a sensible half way stage) into a temporary table then inspect that table to ensure there is data per expected month and if not add it.
Approach 3: union your query with a statement that returns a value for each month expected where there is not real data to match
There are probably even more ways to do this but hopefully that will offer you some inspiration

Related

Postgres Join on Dynamic Subquery

Background
I have 2 data tables.
For each row in tableA, I want to find the rows in tableB with the closest dates and join those values onto the row from tableA.
Example tables:
tableA:
p_id
category
l_date
1
catA
2005-01-05
1
catB
2005-06-10
2
catC
2000-01-10
tableB:
p_id
e_id
e_date
1
22
2005-01-01
1
23
2005-01-06
1
24
2005-01-06
1
28
2005-01-10
2
29
2010-08-10
desired result:
p_id
category
l_date
e_id
e_date
1
catA
2005-01-05
23
2005-01-06
1
catA
2005-01-05
24
2005-01-06
1
catB
2005-06-10
28
2005-01-10
2
catC
2000-01-10
29
2010-08-10
Tried
This query does not work, but I think this is the direction I should be going.
select a.p_id, a.category, a.l_date, c.e_id, c.e_date from tableA a
left join lateral
(
select top 1 p_id, e_id, e_date from tableB b
where a.pid = b.pid
order by abs(datediff(days, a.l_date, b.e_date))
) c on True;
TableA and tableB are massive, 17m and 150m respective rows.
Does this sound like the correct approach?
Using redshift cluster, running postgres 8.x
Correlated subquery approaches or a full cross join approach will all perform the task of comparing every row in one table with every row in the other (in one manner or another). Comparing (joining) all these rows when the tables get large get prohibitive. In these cases different approaches are needed.
Brute forcing won't be fast (if it even completes) so we need to be a bit more efficient in going about this. I tell clients to think about how they would do this query (by hand) if I gave them stacks of index cards. A person values their time so they don't go about this by making all possible combinations, they would come up with a more efficient way that they can complete quickly and get back to their lives. In cases like the one you are describing you need to find the more efficient approach. I'd be happy to talk to you more about building these types of queries.
Taking your data (and sprucing it up a bit for some more interesting cases) I created an example of how you can do this. (Yes, you could cross join the small tables and do this with simpler SQL but that won't scale.)
Data setup:
create table tableA (p_id int, category varchar(64), l_date date);
insert into tableA values
(1,'catA','2005-01-05'),
(1,'catB','2005-06-10'),
(2,'catC','2000-01-10');
create table tableB (p_id int, e_id int, e_date date);
insert into tableB values
(1,22,'2005-01-01'),
(1,23,'2005-01-06'),
(1,24,'2005-06-01'),
(1,28,'2005-06-15'),
(2,29,'2010-08-10');
The query looks like:
with combined as
(
select
*,
coalesce(max(l_date) OVER (partition by p_id order by
dt rows between unbounded preceding and 1 preceding), '1970-01-01'::date) cb,
coalesce(min(l_date) OVER (partition by p_id order by
dt desc rows between unbounded preceding and 1 preceding), '2100-01-01'::date) ca
from
(
select
p_id,
category,
l_date,
NULL as e_id,
NULL as e_date,
l_date dt
from
tableA
union all
select
p_id,
NULL as category,
NULL as l_date,
e_id,
e_date,
e_date dt
from
tableB
) c
)
,
closest as
(
select
p_id,
e_id,
e_date,
cb,
ca,
case
when
coalesce(e_date - cb, 0) > (ca - e_date)
then ca
else cb
end closest
from
combined
where
e_date is not NULL
)
select
c.p_id,
a.category,
a.l_date,
c.e_id,
c.e_date
from
closest c
left join tableA a
on c.closest = a.l_date and c.p_id = a.p_id
order by
c.p_id,
c.e_id ;
While this can look like a lot it isn't that complex. First CTE finds the closest l_date earlier than e_date (cb) and the closest l_date later than e_date (ca). It does this on on UNIONed set of data to allow for windowing. The second CTE just determines which is closer, ca or cb, and produces this as "closest". It also strips out all the tableB information that was added by the UNION (no longer needed). Lastly this "closest" date provides the join on information needed to build the final result.
Now this query doesn't account of many possible real-world data issues that can happen so take this as a starting point. I'm also making some assumptions about your data based on the test data (like no 2 rows in tableA will have the same l_date and P_id). So use this as a starting point.
And a last word on performance - while window functions are not cheap and will do more work as your data tables increase in size, they are orders of magnitude more performant than cross-joining massive tables. What you are looking to do is complex so will take some time but this is the fastest way I have found perform these complex operations that would normally be a massive loop problem.

JOIN ON second highest value (Impala)

I don't know how or even if this is possible.... I am trying to JOIN tables on the second highest value. I tried rowNumber, lag, lead & rank but haven't been able to get any of them to do what I need. To summarize, I'm just trying to shift the activitydate table down one row to join on rollDate minus 1 (but can't use -1 because they are not consistent dates, there are days missing.)
Does anyone know a good way to do this? Any suggestions are appreciated!
Select
ds.activitydate
,sum(ws.weeklyTotals / ds.daysBetween) as newRunRates -- getting an average of daily activity from weekly totals
from
(select
fsc.activitydate
,fsc.weekstart
,max(fsc.activitydate) OVER (partition by fsc.weekstart) as rollUpDate
,datediff(to_date(max(fsc.activitydate) OVER (partition by fsc.weekstart)), to_date(fsc.weekstart)) + 1 as daysBetween
from fiscalcalendar fsc
) ds -- used this to get a week-ending date bc that is what I need to join on. I only have a week start in this table
left join
(select
activitydate_iso
,count(distinct assignedmaincomponentid) as weeklyTotals
from activityTable
group by 1
) ws -- weeklySplits -- this gives me my weekly totals by a week ending date
on ds.rollUpDate = ws.activitydate_iso
-- need this join logic to actually be
-- on ds.rollUpDate = (max(ws.activitydate_iso) where activitydate_iso < rollUpDate)
where activitydate between '2020-05-22' and '2020-06-15'
group by 1,2
order by 1,2 ```

complex db2/sql query with time-sampling, group, map, join and csv export

I have data in a table (named: TESTING) on a dashDB2 on IBM bluemix (Db2 Warehouse on Cloud) which is looking like this:
ID TIMESTAMP NAME VALUE
abc 2017-12-21 19:55:38.762 test1 123
abc 2017-12-21 19:55:42.762 test2 456
abc 2017-12-21 19:57:38.762 test1 789
abc 2017-12-21 19:58:38.762 test3 345
def 2017-12-21 19:59:38.762 test1 678
I am looking for a query that:
samples the data (for each NAME) to a given timeformat (ex. to a 1 minute based timestamp)
VALUES in same timerange (in same minute) should be averaged, empty times should be NULL
for 1. and 2. something like (only for one NAME working):
with dummy(temporaer) as (
select TIMESTAMP('2017-12-01') from SYSIBM.SYSDUMMY1
union all
select temporaer + 1 MINUTES from dummy where temporaer < TIMESTAMP('2018-02-01')
)
select temporaer, avg(VALUE) as test1 from dummy
LEFT OUTER JOIN TESTING ON temporaer=date_trunc('minute', TIMESTAMP) and ID='abc' and NAME='test1'
group by temporaer
ORDER BY temporaer ASC;
join all different NAMES column-wise to a matrix, like:
TIMESTAMP test1 test2 test3
2017-12-01 00:00:00 null null null
...
2017-12-21 19:55:00 123 456 null
2017-12-21 19:56:00 null null null
2017-12-21 19:57:00 789 null null
2017-12-21 19:58:00 678 null 345
...
2018-01-31 23:59:00 null null null
the query result should be exportet as a csv. or given back as csv-string
Does anybody know how this could be done in one query or in a simple and fast way? Or is it necessary to save the data in another tabe-format - can you give me a hint?
here is a code snipped that does the job, but needs very long time:
WITH
-- get all distinct names in table:
header(names) AS (SELECT DiSTINCT name
FROM FIELDTEST
WHERE ID='7b9bbe44d45d8f2ac324849a4951da54' AND REGEXP_LIKE(trim(VALUE),'^\d+(\.\d*)?$') AND DATE(TIMESTAMP)>='2017-12-19' AND DATE(TIMESTAMP)<'2017-12-24'),
-- select data (names, values without stringvalues) from table dedicated by timestamp to bigger timeinterval (here minutes):
dummie(time, names, values) AS (SELECT date_trunc('minute', TIMESTAMP), NAME, VALUE
FROM FIELDTEST
WHERE ID='7b9bbe44d45d8f2ac324849a4951da54' AND REGEXP_LIKE(trim(VALUE),'^\d+(\.\d*)?$')),
-- generate a range of times from date to date in defined steps:
dummy(time, rangeEnd) AS (SELECT a, a + 1 MINUTE
FROM (VALUES(TIMESTAMP('2017-12-19'))) D(a)
UNION ALL
SELECT rangeEnd, rangeEnd + 1 MINUTE
FROM dummy
WHERE rangeEnd < TIMESTAMP('2017-12-24')),
-- add each name (from header) to each time/row (in dummy):
dumpy(time, names) AS (SELECT Dummy.time, Header.names
FROM Dummy
LEFT OUTER JOIN Header
ON Dummy.time IS NOT NULL),
-- averages values by name and timeinterval and sorts result to dummy:
dummj(time, names, avgvalues) AS (SELECT Dummy.time, Dummie.names, AVG(Dummie.values)
FROM Dummy
LEFT OUTER JOIN Dummie
ON Dummie.time = Dummy.time
GROUP BY Dummie.names, Dummy.time),
-- joins the averages (by time, name) values to the times and names in dumpy (on empty value use -9999):
testo(time, names, avgvalues) AS (SELECT Dumpy.time, Dumpy.names, COALESCE(Dummj.avgvalues,-9999)
FROM Dumpy
LEFT OUTER JOIN Dummj
ON Dummj.time = Dumpy.time AND Dummj.names = Dumpy.names),
-- converts the high amount of rows to less rows with delimited strings:
test(time, names, avgvalues) AS (SELECT time, LISTAGG(names,';') WITHIN GROUP(ORDER BY names), LISTAGG(avgvalues,';') WITHIN GROUP(ORDER BY names)
FROM Testo
GROUP BY time)
SELECT* FROM test ORDER BY time ASC, names ASC;
The performance problem is in the "testo" subquery. Does anybody have an idea what is the failure here or know how to improve the query?
Well, one problem I see is that you keep using functions on columns, but that shouldn't be too big a drain if id is reasonably unique. If this query is very common, it may also be worth it to permanently build and index the range table. Hmm, you probably need several indices (starting with FieldTest.id), but you might also try this version:
-- let's name things properly, too, to keep them straight.
WITH
-- generate a range of times from date to date in defined steps:
Range (rangeStart, rangeEnd) AS (SELECT a, a + 1 MINUTE
FROM (VALUES(TIMESTAMP('2017-12-19'))) D(a)
UNION ALL
SELECT rangeEnd, rangeEnd + 1 MINUTE
FROM Range
WHERE rangeEnd < TIMESTAMP('2017-12-24')),
-- get all distinct names in table:
Header(names) AS (SELECT DISTINCT name
FROM FieldTest
WHERE ID = '7b9bbe44d45d8f2ac324849a4951da54'
-- just make the white space check part of the regex
AND REGEXP_LIKE(VALUE, '^\s*\d+(\.\d*)?\s*$')
AND timestamp >= TIMESTAMP('2017-12-19')
AND timestamp < TIMESTAMP('2017-12-24')),
-- I'm assuming the (id, name) tuple is unique, which means we don't need to repeat the regex later
Data (rangeStart, name, averaged) AS (SELECT Range.rangeStart, Header.names, COALESCE(AVG(FieldTest.value), -9999)
FROM Range
CROSS JOIN Header
LEFT JOIN FieldTest
ON FieldTest.id = '7b9bbe44d45d8f2ac324849a4951da54'
AND FieldTest.names = Header.names
AND FieldTest.timestamp >= Range.rangeStart
AND FieldTest.timestamp < Range.rangeEnd
GROUP BY Range.rangeStart, Header.names),
-- I can't recall if DB2 allows using the new column name this way, you may need to wrap this again
SELECT rangeStart,
-- converts the high amount of rows to less rows with delimited strings:
LISTAGG(names,';') WITHIN GROUP(ORDER BY names) AS names,
LISTAGG(avgvalues,';') WITHIN GROUP(ORDER BY names)
GROUP BY rangeStart
ORDER BY rangeStart, names
(not tested)
the CROSS JOIN was defenitly a nice hint. Also I was not able to implement the following LEFT JOIN like you suggested, I found a workaround, which - I am sure - still keeps room for improvement but at this moment is acceptable for me (timesaving about factor 30 compared to my first query solution). Here the actual code:
WITH
-- generate a range of times from date to date in defined steps:
TimeRange(rangeStart, rangeEnd) AS (SELECT a, a + 1 MINUTE
FROM (VALUES(TIMESTAMP('2017-12-19'))) D(a)
UNION ALL
SELECT rangeEnd, rangeEnd + 1 MINUTE
FROM TimeRange
WHERE rangeEnd < TIMESTAMP('2017-12-24')),
-- get all distinct names in table:
Header(names) AS (SELECT DISTINCT name
FROM FIELDTEST
WHERE ID = '7b9bbe44d45d8f2ac324849a4951da54'
AND REGEXP_LIKE(VALUE, '^\s*\d+(\.\d*)?\s*$')
AND timestamp >= TIMESTAMP('2017-12-19')
AND timestamp < TIMESTAMP('2017-12-24')),
-- select data (names, values without stringvalues) from table dedicated by timestamp to bigger timeinterval (here minutes):
rawData(time, names, values) AS (SELECT date_trunc('minute', TIMESTAMP), NAME, VALUE
FROM FIELDTEST
WHERE ID = '7b9bbe44d45d8f2ac324849a4951da54'
AND REGEXP_LIKE(VALUE, '^\s*\d+(\.\d*)?\s*$')),
-- I'm assuming the (id, name) tuple is unique, which means we don't need to repeat the regex later
Data(rangeStart, name, averaged) AS (SELECT TimeRange.rangeStart, Header.names, COALESCE(AVG(rawData.values), -9999)
FROM TimeRange
CROSS JOIN Header
LEFT JOIN rawData
ON rawData.names = Header.names
AND rawData.time = TimeRange.rangeStart
GROUP BY TimeRange.rangeStart, Header.names),
test(time, names, avgvalues) AS (SELECT Data.rangeStart,
LISTAGG(Data.name,';') WITHIN GROUP(ORDER BY name),
LISTAGG(Data.averaged,';') WITHIN GROUP(ORDER BY name)
FROM Data
GROUP BY Data.rangeStart)
-- build my own delimited export-string:
SELECT CONCAT(CONCAT(SUBSTR(REPLACE(time,'.',':'),1,19),';'), REPLACE(CAST(avgvalues AS VARCHAR(3980)),'-9999',''))
FROM test
UNION ALL
SELECT CONCAT(CAST('TIME;' AS VARCHAR(5)), CAST(LISTAGG(names,';') WITHIN GROUP(ORDER BY names) AS VARCHAR(3980)))
FROM Header;

How to show all records from multiple tables regardless of match on join statement

I am trouble figuring out the proper syntax to structure this query correctly. I am trying to show ALL records from both the SalesHistoryDetail AND from the SalesVsBudget table. I believe my query allows for some of the records on SalesVsBudget to not be pulled, whereas I want them all for that period, regardless of whether there was a corresponding sale. Here is my code:
SELECT MAX(a.DispatchCenterOrderKey) AS DispatchCenter,
a.CustomerKey,
CASE WHEN a.CustomerKey IN
(SELECT AddressKey
FROM FinancialData.dbo.DimAddress
WHERE AddressKey >= 99000 AND AddressKey <= 99599) THEN 1 ELSE 0 END AS InterCompanyFlag,
MAX(a.Customer) AS Customer,
a.SalesmanID,
MAX(a.Salesman) AS Salesman,
a.SubCategoryKey,
MAX(a.SubCategoryDesc) AS Subcategory,
SUM(a.Value) AS SalesAmt,
b.FiscalYear AS Year,
b.FiscalWeekOfYear AS Week,
MAX(c.BudgetLbs) AS BudgetLbs,
MAX(c.BudgetDollars) AS BudgetDollars
FROM dbo.SalesHistoryDetail AS a
LEFT OUTER JOIN dbo.M_DateDim AS b ON a.InvoiceDate = b.Date
FULL OUTER JOIN dbo.SalesVsBudget AS c ON a.SalesmanID = c.SalesRepKey
AND a.CustomerKey = c.CustomerKey
AND a.SubCategoryKey = c.SubCategoryKey
AND b.FiscalYear = c.Year AND b.FiscalWeekOfYear = c.WeekNo
GROUP BY a.SalesmanID, a.CustomerKey, a.SubCategoryKey, b.FiscalYear, b.FiscalWeekOfYear
There are two different data sets that I am pulling from, obviously the SalesHistoryDetail table and the SalesVsBudget table. I'm hoping to get ALL budgetLbs, and BudgetDollars values from the SalesVsBudget table regardless of whether they match in the join. I want all of the matching joining records too, but I also want EVERY record from SalesVsBudget. Essentially I want to show ALL sales records and I want to reference the budget values from SalesVsBudget when the salesman,customer,subcategory, year and week match but I also want to see budget entries that fall in my date range that don't have corresponding sales records in that period. Hopefully that makes sense. I feel I am very close, but my budget numbers doesn't reflect the whole story and I think that is because some of my records are being excluded! Please help.
I was able to accomplish this through playing with the FULL OUTER JOIN. My problems was there were more records in SalesVsBudget than SalesHistory_V. Therefore I had to make SalesVsBudget the initial FROM table and SaleHistory_V with a FULL OUTER JOIN and all records lined up.

Is it possible to convert the complex SQL into rails?

SPEC
I need to pick all rooms that don't have a single day with saleable = FALSE in the requested time period(07-09 ~ 07-19):
I have a table room with 1 row per room.
I have a table room_skus with one row per room and day (complete set for the relevant time range).
The column saleable is boolean NOT NULL and date is defined date NOT NULL
SELECT id
FROM room r
WHERE NOT EXISTS (
SELECT 1
FROM room_skus
WHERE date BETWEEN '2016-07-09' AND '2016-07-19'
AND room_id = r.id
AND NOT saleable
GROUP BY 1
);
The above SQL query is working, but I wonder how could I translate it into Rails ORM.
Let's say you have array of room_ids called room_ids:
needed_room_ids = room_ids - RoomSku.where(room_id: room_ids, date: '2016-07-09'..'2016-07-19', sealable: false).pluck(:room_id)
If your model of room_sku is called RoomSku
Updated version:
room_ids = Room.all.select { |record| record.room_skus.present? }.map(&:id)
And then:
needed_room_ids = room_ids - RoomSku.where(room_id: room_ids, date: '2016-07-09'..'2016-07-19', sealable: false).pluck(:room_id)
It won't be one query, but you avoid plain SQL like this.
I don't have any project here to test something like it, but it should work:
Room.where.not(id: RoomSku.where(date: DateTime.parse('2016-07-09').strftime("%Y-%m-%d")..DateTime.parse('2016-07-19').strftime("%Y-%m-%d"), saleable: false).pluck(:room_id))
I hope it helps!

Resources