In neo4j I am querying
MATCH (n)-[t:x{x:"1a"}]->()
WHERE n.a > 1 OR n.b > 1 AND toFloat(n.a) / (n.a+n.b) * 100 < 90
RETURN DISTINCT n, toFloat(n.a) / (n.a + n.b) * 100
ORDER BY toFloat(n.a) / (n.a + n.b) * 100 DESC
LIMIT 10
but I got / by zero error.
Since I declared one of n.a or n.b should be 1, if both zero it should skip that row and I shouldn't get this error. This looks like a logic issue in Neo4j. There is no problem when I delete AND toFloat(n.a)/(n.a+n.b)*100 < 90 from WHERE clause. But I want the results only lower than 90. How can I overcome this?
Can either of n.a or n.b be negative? I was able to reproduce this with:
WITH -2 AS na, 2 AS nb
WHERE (na > 1 OR nb > 1) AND toFloat(na)/(na+nb)*100 < 90
RETURN na, nb
And I get: / by zero
Perhaps try changing your WHERE clause to:
WITH -2 AS na, 2 AS nb
WHERE (na + nb > 0) AND toFloat(na)/(na+nb)*100 < 90
RETURN na, nb
And I get: zero rows.
It seems the second condition, toFloat(na) / (na + nb) * 100 < 90, is tested before the first. Look at the Filter(1) operator in this execution plan:
+--------------+---------------+------+--------+--------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Operator | EstimatedRows | Rows | DbHits | Identifiers | Other |
+--------------+---------------+------+--------+--------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Projection | 1 | 3 | 0 | anon[111], anon[138], n, toFloat(n.a)/(n.a + n.b)* 100 | anon[111]; anon[138] |
| Top | 1 | 3 | 0 | anon[111], anon[138] | { AUTOINT6}; |
| Distinct | 0 | 3 | 24 | anon[111], anon[138] | anon[111], anon[138] |
| Filter(0) | 0 | 3 | 6 | anon[29], n, t | t.x == { AUTOSTRING0} |
| Expand(All) | 1 | 3 | 6 | anon[29], n, t | ( n#7)-[t:x]->() |
| Filter(1) | 1 | 3 | 34 | n | (Ors(List(n#7.a > { AUTOINT1}, Multiply(Divide(ToFloatFunction( n#7.a),Add( n#7.a, n#7.b)),{ AUTOINT3}) < { AUTOINT4})) AND Ors(List( n#7.a > { AUTOINT1}, n.b > { AUTOINT2}))) |
| AllNodesScan | 4 | 4 | 5 | n | |
+--------------+---------------+------+--------+--------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
You can get around this by force breaking the filter into two clauses.
MATCH (n)-[t:x { x:"1a" }]->()
WHERE n.a > 1 OR n.b > 1
WITH n
WHERE toFloat(n.a) / (n.a + n.b) * 100 < 90
RETURN DISTINCT n, toFloat(n.a) / (n.a + n.b) * 100
ORDER BY toFloat(n.a) / (n.a + n.b) * 100 DESC
LIMIT 10
I found this behavior surprising, but as I think about it I suppose it isn't wrong for the execution engine to rearrange the filter in this way. There may be the assumption that the condition will abandon early on failing the first declared condition, but Cypher is exactly that: declarative. So we express the "what", not the "how", and in terms of the "what" A and B is equivalent to B and A.
Here is the query and a sample graph, you can check if it translates to your actual data:
http://console.neo4j.org/r/f6kxi5
Related
Let's say I have the following in a table :
A | B | desired_output
----------------------------
1 | 10 | 1 | 0
2 | 20 | 7 | 0
3 | 30 | 3 | 0
4 | 20 | 2 | 0
5 | 30 | 5 | 1
I'd like to find a formula for each of the cells in the desired_output column which looks at the max of B1:B5 but only for rows for which A = max(A1:A5)
If that's not clear, I'll try to put it another way :
for all the rows in A1:A5 that are equal to max(A1:A5) // so that's rows 3 and 5
find the one which has the max value on B // so between B3 and B5, that's B5
output 1 for this one, 0 for the other
I'd say there would be a where somewhere if such a function existed, something like = if(B=(max(B1:B5) where A = max(A1:A5)), 1, 0) but I can't find how to do it...
I can do it in two columns with a trick :
A | B | C | D
----------------------------
1 | 10 | 1 | | 0
2 | 20 | 7 | | 0
3 | 30 | 3 | 3 | 0
4 | 20 | 2 | | 0
5 | 30 | 5 | 5 | 1
With Cn = if(An=max(A$1:A$5),Bn,"") and Dn = if(Cn = max(C$1:C$5), 1, 0)
But I still can't find how to do it in one column
For systems without MAXIFS, put this in C1 and fill down.
=--(B1=MAX(INDEX(B$1:B$5-(A$1:A$5<>MAX(A$1:A$5))*1E+99, , )))
=ARRAYFORMULA(IF(LEN(A1:A), IF(IFERROR(VLOOKUP(CONCAT(A1:A&"×", B1:B),
JOIN("×", QUERY(A1:B, "order by A desc, B desc limit 1")), 1, 0), )<>"", 1, 0), ))
or shorter:
=ARRAYFORMULA(IF(A:A<>"",N(A:A&"×"&B:B=JOIN("×",SORTN(A:B,1,,1,0,2,0))),))
=ARRAYFORMULA(IF(A:A<>"",N(A:A&B:B=JOIN(,SORTN(A:B,1,,1,0,2,0))),))
How about the following:
=--AND(A5=MAX($A$1:$A$5),B5=MAXIFS($B$1:$B$5,$A$1:$A$5,MAX($A$1:$A$5)))
My cypher query
EXPLAIN MATCH (b:Block)<-[:INCLUDED_IN]-(tx:Transaction {pstype: 0})
WHERE 1540512000 <= b.time < 1540598400
RETURN count(tx);
produces the following execution plan
--------------------------------------------+
| Operator | Estimated Rows | Identifiers | Other |
+-------------------+----------------+-----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| +ProduceResults | 12 | count(tx) | |
| | +----------------+-----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| +EagerAggregation | 12 | count(tx) | |
| | +----------------+-----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| +Filter | 136 | anon[16], b, tx | AndedPropertyInequalities(Variable(b),Property(Variable(b),PropertyKeyName(time)),GreaterThanOrEqual(Property(Variable(b),PropertyKeyName(time)),Parameter( AUTOINT2,Integer)), LessThan(Property(Variable(b),PropertyKeyName(time)),Parameter( AUTOINT1,Integer))) |
| | +----------------+-----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| +Expand(All) | 9052 | anon[16], b, tx | (tx)-[anon[16]:INCLUDED_IN]->(b) |
| | +----------------+-----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| +NodeIndexSeek | 9052 | tx | :Transaction(pstype) |
+-------------------+----------------+-----------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
which executes way too slow because the first NodeIndexSeekByRange returns tens of millions of nodes instead of 9052. Using NodeIndexSeekByRange on b:Block(time) would produce around 600 nodes.
I have tried forcing the execution plan to start from b:Block(time), but instead it still keeps using NodeIndexSeek on tx:Transaction(pstype):
EXPLAIN MATCH (b:Block)<-[:INCLUDED_IN]-(tx:Transaction {pstype: 0})
USING INDEX b:Block(time)
WHERE 1540512000 <= b.time < 1540598400
RETURN count(tx);
produces
+-------------------------+----------------+-----------------+--------------------------------------------------------------+
| Operator | Estimated Rows | Identifiers | Other |
+-------------------------+----------------+-----------------+--------------------------------------------------------------+
| +ProduceResults | 12 | count(tx) | |
| | +----------------+-----------------+--------------------------------------------------------------+
| +EagerAggregation | 12 | count(tx) | |
| | +----------------+-----------------+--------------------------------------------------------------+
| +NodeHashJoin | 136 | anon[16], b, tx | b |
| |\ +----------------+-----------------+--------------------------------------------------------------+
| | +NodeIndexSeekByRange | 14703 | b | :Block(time) >= { AUTOINT2} AND :Block(time) < { AUTOINT1} |
| | +----------------+-----------------+--------------------------------------------------------------+
| +Expand(All) | 9052 | anon[16], b, tx | (tx)-[anon[16]:INCLUDED_IN]->(b) |
| | +----------------+-----------------+--------------------------------------------------------------+
| +NodeIndexSeek | 9052 | tx | :Transaction(pstype) |
+-------------------------+----------------+-----------------+--------------------------------------------------------------+
The only way I have gotten it to work fast is by using the rule planner: (multiple orders of magnitude faster)
CYPHER planner=rule MATCH (b:Block)
WHERE 1540512000 <= b.time < 1540598400
WITH b
MATCH (b)<-[:INCLUDED_IN]-(tx:Transaction {pstype: 0})
RETURN count(tx);
Is there a way to make it work when using the cost planner?
Both :Block(time) and :Transaction(pstype) are indexed.
You could try using a join hint on tx along with your index hint, which should ensure you only expand from one direction:
EXPLAIN
MATCH (b:Block)<-[:INCLUDED_IN]-(tx:Transaction {pstype: 0})
USING INDEX b:Block(time)
USING JOIN ON tx
WHERE 1540512000 <= b.time < 1540598400
RETURN count(tx);
Alternately you could restructure your query a bit so the tx node isn't initially part of the pattern, but enforced in the WHERE clause. You'll need to split the MATCH in 2, but I don't think you'll need any planner hints:
EXPLAIN
MATCH (tx:Transaction {pstype: 0})
MATCH (b:Block)<-[:INCLUDED_IN]-(x)
WHERE 1540512000 <= b.time < 1540598400
AND x = tx
RETURN count(tx);
EDIT
Okay, let's try another approach then:
EXPLAIN
MATCH (b:Block)<-[:INCLUDED_IN]-(x)
WHERE 1540512000 <= b.time < 1540598400
AND x.pstype = 0 // AND 'Transaction' in labels(x)
RETURN count(tx);
If we leave off the label then it can't use an indexed lookup. If there are other nodes besides :Transaction nodes that have a pstype property, you could try uncommenting the line where we use an alternate way to see if the node has that label (I don't think this will use an index lookup, but not completely sure).
Another alternative (unsure if this will work) is to use pattern comprehension to get a list of results from a pattern (after the initial match is found to b) and summing the sizes of the results:
EXPLAIN
MATCH (b:Block)
WHERE 1540512000 <= b.time < 1540598400
RETURN sum(size([(b)<-[:INCLUDED_IN]-(x:Transaction) WHERE x.pstype = 0 | x])) as count
I have 2 tables.
1 table with all possible mistakes, looks like
mistake|description
m1 | a
m2 | b
m3 | c
second table is my data:
n | m1 | m2 | m3
1 | 1 | 0 | 1
2 | 0 | 1 | 1
3 | 1 | 1 | 0
where n is row_num, and for each m I put 1 with mistake, 0 - without.
In total I want to join them showing row_nums (or other info) for each mistake.
Something like:
mistake | n
m1 |1
m1 |3
m2 |2
m2 |3
m3 |1
m3 |2
It looks to me like you are just asking to transpose the data.
data have;
input n m1 m2 m3 ;
cards;
1 1 0 1
2 0 1 1
3 1 1 0
;
proc transpose data=have out=want ;
by n ;
var m1 m2 m3 ;
run;
Is it possible to do union of two queries (from the same entity) in core data? In SQL speak, if entity is called t, then consider that T has following data:
+------+------+------+
| x | y | z |
+------+------+------+
| 1 | 11 | 2 |
| 1 | 12 | 3 |
| 2 | 11 | 1 |
| 3 | 12 | 3 |
Then I am trying to run the following query (using core data - not SQLite)
select x, y, sum(z)
from t
group by 1, 2
union
select x, 1 as y, sum(z)
from t
group by 1, 2
order by x, y, 1
;
+------+------+--------+
| x | y | sum(z) |
+------+------+--------+
| 1 | 1 | 5 |
| 1 | 11 | 2 |
| 1 | 12 | 3 |
| 2 | 1 | 1 |
| 2 | 11 | 1 |
| 3 | 1 | 3 |
| 3 | 12 | 3 |
+------+------+--------+
7 rows in set (0.00 sec)
Is it possible?
Thanks!
I need to write a criteria or HQL that selects InterfaceVersion only with the latest version for each interfaceCode. Version consists of majorVersion.minorVersion.editVersion.
exemple:
InterfaceVersion table:
id | interface_code_id | major_version | minor_version | edit_version
---|-------------------|---------------|---------------|-------------
1 | 1 | 1 | 1 | 6
2 | 1 | 1 | 5 | 0
3 | 2 | 1 | 0 | 0
4 | 2 | 0 | 1 | 0
5 | 2 | 2 | 0 | 1
6 | 2 | 1 | 3 | 6
expected result would be:
InterfaceVersions instances with ids [2, 5]
I have this GORM domain classes(simplified):
InterfaceCode{
int id
String code
static hasMany = [ versionList: InterfaceVersion]
}
InterfaceVersion{
int id
InterfaceCode interfaceCode
int majorVersion
int minorVersion
int editVersion
static belongsTo = [ InterfaceCode ]
static constraints = {
interfaceCode(unique:['majorVersion','minorVersion','editVersion'])
}
}
So far I've been able to come up with this sql:
SELECT t1.*
FROM Interface_Version t1
Left Outer Join Interface_Version T2
On (T1.Interface_Code_Id = T2.Interface_Code_Id And
(T1.Major_Version < T2.Major_Version Or
(T1.Major_Version = T2.Major_Version And T1.Minor_Version < T2.Minor_Version) Or
(T1.Major_Version = T2.Major_Version And T1.Minor_Version = T2.Minor_Version And T1.Edit_Version < T2.Edit_Version ) )
)
WHERE t2.id IS NULL
Can you please convert this sql to hql or critearia or come up with something nicer ?