paneltest=a and result=b then give me results for (paneltest=a & paneltest=b & paneltest=c) - join

OMG please be kind I have struggled with this complex problem for days and I am a complete newbie.
IF paneltest(133) is Reactive then all results for tests paneltest(2154) and paneltest(2157) should be reported whether they are reactive or non reactive. If paneltest(133) is non reactive then neither paneltest(2154) or paneltest(2157) should be reported.
Basically if this one test has a result that is reactive then I need that result plus I need the test results for two other test also. if its not reactive then I don't need any of them.
panelTestKey
t.name 'test'
result
orgChartNumber
133
paneltest133
Reactive
patient1
2154
paneltest2154
Reactive
patient1
2157
paneltest2157
NonReactive
patient1
2157
paneltest133
NonReactive
patient2
2157
paneltest2157
NonReactive
patient2
2157
paneltest2157
NonReactive
patient2
select distinct
pt.panelTestKey,
p.orgChartNumber,
t.name 'test',
r.result
FROM mytables rq
WHERE pt.paneltestKey IN (133)
AND pt.isnonreportable = 0
AND ((CONVERT(varchar(50), dbo.EpochToLocal(rq.finalDeliverystamp), 112)) >= dateadd(day,datediff(day,1,GETDATE()),0)) --shows testing done yesterday;can change day number to pull other date data
AND ((CONVERT(varchar(50), dbo.EpochToLocal(rq.finalDeliverystamp), 112)) < dateadd(day,datediff(day,0,GETDATE()),0)) --shows testing not done today
and r.result='Reactive'
AND valuetoReport = 1
I have tried CTE but could not understand how this works.
I have tried to write as a subquery.
Tried a case statement.
I guess what I don't really understand is what I need to make this work.
Results I want is for patient1 to show up with results for all three of the tests patient2 should not show up because 133 is NonReactive.

WITH alltests AS (
SELECT DISTINCT
rq.requisitionkey,pt.panelTestKey,t.name 'test',r.result
FROM mytabables rq
WHERE
pt.paneltestKey IN (133,2154,2157)
AND pt.isnonreportable = 0
AND ((CONVERT(varchar(50), dbo.EpochToLocal(rq.finalDeliverystamp), 112)) >= dateadd(day,datediff(day,1,GETDATE()),0)) --shows testing done yesterday;can change day number to pull other date data
AND ((CONVERT(varchar(50), dbo.EpochToLocal(rq.finalDeliverystamp), 112)) < dateadd(day,datediff(day,0,GETDATE()),0)) --shows testing not done today
AND valuetoReport = 1
)
SELECT alt.requisitionKey,alt.result,alt.panelTestKey,alt.test
FROM alltests alt
JOIN alltests
ON alltests.requisitionkey = alt.requisitionkey AND
(alltests.paneltestKey=133 and alltests.result='Reactive');

Related

InfluxDB: How to create a continuous query to calculate delta values?

I'd like to calculate the delta values for a series of measurements stored in an InfluxDB. The values are readings from an electricity meter taken every 5 minutes. The values increase over time. Here is subset of the data to give you an idea (commands shown below are executed in the InfluxDB CLI):
> SELECT "Haushaltstromzaehler - cnt" FROM "myhome_measurements" WHERE time >= '2018-02-02T10:00:00Z' AND time < '2018-02-02T11:00:00Z'
name: myhome_measurements
time Haushaltstromzaehler - cnt
---- --------------------------
2018-02-02T10:00:12.610811904Z 11725.638
2018-02-02T10:05:11.242021888Z 11725.673
2018-02-02T10:10:10.689827072Z 11725.707
2018-02-02T10:15:12.143326976Z 11725.736
2018-02-02T10:20:10.753357056Z 11725.768
2018-02-02T10:25:11.18448512Z 11725.803
2018-02-02T10:30:12.922032896Z 11725.837
2018-02-02T10:35:10.618788096Z 11725.867
2018-02-02T10:40:11.820355072Z 11725.9
2018-02-02T10:45:11.634203904Z 11725.928
2018-02-02T10:50:11.10436096Z 11725.95
2018-02-02T10:55:10.753853952Z 11725.973
Calculating the differences in the InfluxDB CLI is pretty straightforward with the difference() function. This gives me the electricity consumed within the 5 minutes intervals:
> SELECT difference("Haushaltstromzaehler - cnt") FROM "myhome_measurements" WHERE time >= '2018-02-02T10:00:00Z' AND time < '2018-02-02T11:00:00Z'
name: myhome_measurements
time difference
---- ----------
2018-02-02T10:05:11.242021888Z 0.03499999999985448
2018-02-02T10:10:10.689827072Z 0.033999999999650754
2018-02-02T10:15:12.143326976Z 0.02900000000045111
2018-02-02T10:20:10.753357056Z 0.0319999999992433
2018-02-02T10:25:11.18448512Z 0.03499999999985448
2018-02-02T10:30:12.922032896Z 0.033999999999650754
2018-02-02T10:35:10.618788096Z 0.030000000000654836
2018-02-02T10:40:11.820355072Z 0.03299999999944703
2018-02-02T10:45:11.634203904Z 0.028000000000247383
2018-02-02T10:50:11.10436096Z 0.02200000000084401
2018-02-02T10:55:10.753853952Z 0.02299999999922875
Where I struggle is getting this to work in a continuous query. Here is the command I used to setup the continuous query:
CREATE CONTINUOUS QUERY cq_Haushaltstromzaehler_cnt ON myhomedb
BEGIN
SELECT difference(sum("Haushaltstromzaehler - cnt")) AS "delta" INTO "Haushaltstromzaehler_delta" FROM "myhome_measurements" GROUP BY time(1h)
END
Looking in the InfluxDB log file I see that no data is written in the new 'delta' measurement from the continuous query execution:
...finished continuous query cq_Haushaltstromzaehler_cnt, 0 points(s) written...
After much troubleshooting and experimenting I now understand why no data is generated. Setting up a continuous query requires to use the GROUP BY time() statement. This in turn requires to use an aggregate function within the differences() function. The problem now is that the aggregate function returns only one value for the time period specified by GROUP BY time(). Obviously, the differences() function cannot calculate a difference from just one value. Essentially, continuous query executes a command like this:
> SELECT difference(sum("Haushaltstromzaehler - cnt")) FROM "myhome_measurements" WHERE time >= '2018-02-02T10:00:00Z' AND time < '2018-02-02T11:00:00Z' GROUP BY time(1h)
>
I'm now somewhat clueless as to how to make this work and appreciate any advice you might have.
Does it help using the last aggregate function? Not tested this as a cq yet.
Select difference(last(T1_Consumed)) AS T1_Delta, difference(last(T2_Consumed)) AS T2_Delta
from P1Data
where time >= 1551648871000000000 group by time(1h)
DIFFERENCE() would calculate delta from the "aggregated" value taken from previous group, not within current group.
So fill free to use selector function there - since your counters seemed to be cumulative, LAST() should be working well.

Best way to upload data using py2neo

I have experimented with ways to upload medium-size data sets using py2neo. In my case, there are about 80 K nodes and 400 K edges that need to be loaded every day. I want to share my experience, and ask the community if there is still a better way that I have not come across.
A. py2neo's "native" commands.
Create nodes using graph.merge_one() and set properties using push().
I had dismissed this rather quickly as it was very slow and would not even get past 10 K records in several minutes. Not surprisingly, py2neo' documentation and some posts here recommend Cypher instead.
B. Cypher without partitioning
Use py2neo.cypher.CypherTransaction append() in the loop and commit() at the end.
# query sent to MSSQL. Returns ~ 80K records
result = engine.execute(query)
statement = "MERGE (e:Entity {myid: {ID}}) SET e.p = 1"
# begin new Cypher transaction
tx = neoGraph.cypher.begin()
for row in result:
tx.append(statement, {"ID": row.id_field})
tx.commit()
This times out and crashes the Neo4j server.
I understand the problem is that all 80 K Cypher statements are trying to execute in one go.
C. Cypher with partitioning and one commit
I use a counter and process() command to run 1000 statements at a time.
# query sent to MSSQL. Returns ~ 80K records
result = engine.execute(query)
statement = "MERGE (e:Entity {myid: {ID}}) SET e.p = 1"
counter = 0
tx = neoGraph.cypher.begin()
for row in result:
counter += 1
tx.append(statement, {"ID": row.id_field})
if (counter == 1000):
tx.process() # process 1000 statements
counter = 0
tx.commit()
This runs quickly in the beginning, but slows down with 1000 of transactions processed. Eventually, it times out in stack overflow.
This was surprising as I expected process() to reset stack every time.
D. Cypher with partitioning and commits for each partition
This is the only version that worked well. Do commit() for each partition of 1000 transactions and re-start a new transaction with begin().
# query sent to MSSQL. Returns ~ 80K records
result = engine.execute(query)
statement = "MERGE (e:Entity {myid: {ID}}) SET e.p = 1"
counter = 0
tx = neoGraph.cypher.begin()
for row in result:
counter += 1
tx.append(statement, {"ID": row.id_field})
if (counter == 1000):
tx.commit() # commit 1000 statements
tx = neoGraph.cypher.begin() # reopen transaction
counter = 0
tx.commit()
This runs quickly and well.
Any comments?
As you have discovered through trial and error, a single transaction performs best when it has no more than about 10K-50K operations. The method you describe in D works best because you are committing the transaction every 1000 statements. You can probably increase that batch size safely.
Another approach that you might want to try is passing an array of values as a parameter and using Cypher's UNWIND command to iterate over them. For example:
WITH {id_array} AS ids // something like [1,2,3,4,5,6]
UNWIND ids AS ident
MERGE (e:Entity {myid: ident})
SET e.p = 1

Selecting greatest date range count in a rails array

I have a database with a bunch of deviceapi entries, that have a start_date and end_date (datetime in the schema) . Typically these entries no more than 20 seconds long (end_date - start_date). I have the following setup:
data = Deviceapi.all.where("start_date > ?", DateTime.now - 2.weeks)
I need to get the hour within data that had the highest number of Deviceapi entries. To make it a bit clearer, this was my latest try on it (code is approximated, don't mind typos):
runningtotal = 0
(2.weeks / 1.hour).to_i.times do |interval|
current = data.select{ |d| d.start_time > (start_date + (1.hour * (interval - 1))) }.select{ |d| d.end_time < (start_date + (1.hour * interval)) }.count
if current > runningtotal
runningtotal = current
end
The problem: this code works just fine. So did about a dozen other incarnations of it, using .where, .select, SQL queries, etc. But it is too slow. Waaaaay too slow. Because it has to loop through every hour within 2 weeks. Then this method might need to be called itself dozens of times.
There has to be a faster way to do this, maybe a sort? I'm stumped, and I've been searching for hours with no luck. Any ideas?
To get adequate performance, you'll want to do everything in a single query, which will mean avoiding ActiveRecord functionality and doing a raw query (e.g. via ActiveRecord::Base.connection.execute).
I have no way to test it, since I have neither your data nor schema, but I think something along these lines will do what you are looking for:
select y.starting_hour, max(y.num_entries) as max_entries
from
(
select x.starting_hour, count(*) as num_entries
from
(
select date_trunc('hour', start_time) starting_hour
from deviceapi as d
) as x
group by x.starting_hour
) as y
where y.num_entries = max(y.num_entries);
The logic of this is as follows, from the inner-most query out:
"Bucket" each starting time to the hour
From the resulting table of buckets, get the total number of entries in each bucket
Get the maximum number of entries from that table, and then use that number to match back to get the starting_hour itself.
If there happen to be more than one bucket with the same number of entries, you could determine a consistent way to pick one -- say the min(starting_hour) or similar (since that would stay the same even as data gets added, assuming you are not deleting items).
If you wanted to limit the initial time slice -- I see 2 weeks referenced in your post -- you could do that in the inner-most query with a where clause bracketing the date range.

specific query with cypher

I need help with specific query. I am using neo4j. My database consists of companies (nodes) and transactions between them(relationship). Each relationship(PAID) has properties:
amount- for amount of transaction
year - year of transaction
month - month of transaction
What I need, is to find all cycles in a graph, starting at node A. It must also be true that transaction occurred one after another.
So valid example would be A PIAD B in march, B PAID C in april, C PAID A in june.
So is there any way to get all cycles from node A, so that transactions occur in continuous order?
You may want to set up a sample graph at Neo4j console to share or at least tell more about what version of Neo4j you are using, but if you're using 2.0 and if you store year and month as long or integer, then maybe you could try something like
MATCH a-[ab:PAID]->b-[bc:PAID]->c-[ca:PAID]->a
WHERE (ab.year + ab.month) > (bc.year + bc.month) > (ca.year + ca.month)
RETURN a,b,c
EDIT:
Actually that was hasty, the additions won't work that way of course, but the structure should be ok. Maybe
WHERE ((ab.year > bc.year) or (ab.year = bc.year AND ab.month > bc.month))
AND ((bc.year > ca.year) OR (bc.year = ca.year AND bc.month > ca.month))
or
WHERE (ab.year * 12 + ab.month) > (bc.year * 12 + bc.month) > (ca.year * 12 + ca.month)
If you only use dates for this type of comparison, consider storing them as one property, perhaps as milliseconds since 'epoch' 1/1 -70 GMT. That makes comparisons very easy. But if you need to return and display dates frequently, then keeping them separate might make sense.
EDIT2:
I can't think of a way to build your condition of "r1.date < r2.date" into the pattern, which means matching all variable depth cycles and then discarding some (most) of them. That's wont to become expensive in a large graph, and you may be better off building a traversal or server plugin, which can make complex iterative decisions during the traversal. In 2.0, thanks to Wes' elegant collection slicing, you could try something like this
MATCH path=a-[ab:PAID*..10]->a
WHERE ALL (ix IN range(0,length(ab)-2)
WHERE ((ab[ix]).year * 12 +(ab[ix]).month)<((ab[ix+1]).year * 12 +(ab[ix+1]).month))
RETURN path
The same could probably be achieved in 1.9 with HEAD() and TAIL(). Again, share sample data in a console and maybe someone else can pitch in.

Dynamic time based finder for ActiveRecord

I have a slightly complex time arithmetic problem.
I have a reminder system where the user can set "how many x before event" duration. For example: If I set '5 minutes' - I need to get reminder before 5 minutes of the event schedule.
In my reminder system, I have a cron which runs every minute and sends reminder mails. So far so good. I want to find all calendar events which are eligible for reminder (calendar entry whose scheduled time is between "5.minutes.from_now and 6.minutes.from_now"
I am trying the write the following where clause :
conds = "'when' >= '#{eval("#{cal.remind_before.to_s}.#{cal.remind_before_what.downcase}.from_now").to_s(:db)}' AND 'when' < '#{eval("#{cal.remind_before.to_s}.#{cal.remind_before_what.downcase}.from_now + 1.minutes").to_s(:db)}'"
#mail_calendar_for_reminder= Calendar.find(:all, :conditions=> conds)
Here cal.reminder_before = '5', cal.remind_before_what.downcase='minutes'
so the eval would be evaluating (5.minutes.from_now) and (6.minutes.from_now)
The resulting SQL statement is :
SELECT "calendars".* FROM "calendars" WHERE ('when' >= '2011-01-11 14:44:54' AND 'when' < '2011-01-11 14:45:54')
This SQL is syntactically and logically correct because it gets a time range of 5.minutes.from_now and 6.minutes.from_now. But it is not selecting eligible records. I suspect two things:
1. The SQL above is doing string comparisons rather than time comparisons.
2. The database entry for calendar's scheduled time has the following format :
2011-01-11 14:45:09.000000 --the 0's the end might be messing teh date comparisons.
I tried almost all sorts of date range arithmetic but could not get the eligible records in this query.
Depending on your server and its load, a one-minute window for cron might be a little optimistic.
What happens if you login to the dbms server and execute that SQL statement? Any rows returned? Any error messages?
You can try an explicit type cast. So
'when' >= CAST('2011-01-11 14:44:54' AS DATETIME) ...
Your dbms might require a different syntax for type casting and conversion. Search your docs.
Are your column names case sensitive? Is the column 'when' or 'When'? (Or wHen?)
This query returns your test event. Note the double quotes around the column name.
SELECT "calendars".*
FROM "calendars"
WHERE ("when" >= '2011-01-10 15:56'
AND "when" < '2011-01-10 15:57')

Resources