How to QUERY Neo4J using .NET Client WithParms instead of injections? - neo4j

I want to avoid using injection of parms in the query statement. Therefore we used the following instructions from the NEO4J .NET client class:
var queryClassRelationshipsNodes = client.Cypher
.Start("a", (NodeReference)sourceReference.Id)
.Match("a-[Rel: ***{relationshipType***} ]->foundClass")
.Where("Rel.RelationStartNode =" + "\'" + relationshipStart + "\'")
.AndWhere("Rel.RelationDomainNode =" + "\'" + relationshipDomain + "\'")
.AndWhere("Rel.RelationClassNode =" + "\'" + relationshipClass + "\'")
.WithParam("relationshipType", relationshipType)
.Return<Node<Dictionary<string, string>>>("foundClass")
.Results;
However this code does not work once executed by the server. For some reason the PARM: relationshipType is not connected with the variable which we put in between {}.
Can someone please help us debug the problem with this code? We would prefer to use WithParms rather than injecting variables inside the statement.
Thanks a lot!

Can someone please help us debug the problem with this code?
There's a section on https://bitbucket.org/Readify/neo4jclient/wiki/cypher titled "Debugging" which describes how to do this.
As for your core problem though, your approach is hitting a Cypher restriction. Parameters are for parts of the query that aren't compiled into the query plan. The match clause is however.
From the Neo4j documentation:
Parameters can be used for literals and expressions in the WHERE clause, for the index key and index value in the START clause, index queries, and finally for node/relationship ids. Parameters can not be used as for property names, since property notation is part of query structure that is compiled into a query plan.
You could do something like:
.Match("a-[Rel:]->foundClass")
.Where("type(Rel) = {relationshipType}")
.WithParam("relationshipType", relationshipType)
(Disclaimer: I've just typed that here. I haven't tested it at all.)
That will likely be slower though, because you need to retrieve all relationships, then test their types. You should test this. There's a reason why the match clause is compiled into the query plan.

Related

Ecto's fragment allowing SQL injection

When Ecto queries get more complex and require clauses like CASE...WHEN...ELSE...END, we tend to depend on Ecto's fragment to solve it.
e.g. query = from t in <Model>, select: fragment("SUM(CASE WHEN status = ? THEN 1 ELSE 0 END)", 2)
In fact the most popular Stack Overflow post about this topic suggests to create a macro like this:
defmacro case_when(condition, do: then_expr, else: else_expr) do
quote do
fragment(
"CASE WHEN ? THEN ? ELSE ? END",
unquote(condition),
unquote(then_expr),
unquote(else_expr)
)
end
end
so you can use it this way in your Ecto queries:
query = from t in <Model>,
select: case_when t.status == 2
do 1
else 0
end
at the same time, in another post, I found this:
(Ecto.Query.CompileError) to prevent SQL injection attacks, fragment(...) does not allow strings to be interpolated as the first argument via the `^` operator, got: `"exists (\n SELECT 1\n FROM #{other_table} o\n WHERE o.column_name = ?)"
Well, it seems Ecto's team figured out people are using fragment to solve complex queries, but they don't realize it can lead to SQL injection, so they don't allow string interpolation there as a way to protect developers.
Then comes another guy who says "don't worry, use macros."
I'm not an elixir expert, but that seems like a workaround to DO USE string interpolation, escaping the fragment protection.
Is there a way to use fragment and be sure the query was parameterized?
SQL injection, here, would result of string interpolation usage with an external data. Imagine where: fragment("column = '#{value}'") (instead of the correct where: fragment("column = ?", value)), if value comes from your params (usual name of the second argument of a Phoenix action which is the parameters extracted from the HTTP request), yes, this could result in a SQL injection.
But, the problem with prepared statement, is that you can't substitute a paremeter (the ? in fragment/1 string) by some dynamic SQL part (for example, a thing as simple as an operator) so, you don't really have the choice. Let's say you would like to write fragment("column #{operator} ?", value) because operator would be dynamic and depends on conditions, as long as operator didn't come from the user (harcoded somewhere in your code), it would be safe.
I don't know if you are familiar with PHP (PDO in the following examples), but this is exactly the same with $bdd->query("... WHERE column = '{$_POST['value']}'") (inject a value by string interpolation) in opposite to $stmt = $bdd->prepare('... WHERE column = ?') then $stmt->execute([$_POST['value']]); (a correct prepared statement). But, if we come back to my previous story of dynamic operator, as stated earlier, you can't dynamically bind some random SQL fragment, the DBMS would interpret "WHERE column ? ?" with > as operator and 'foo' as value like (for the idea) WHERE column '>' 'foo' which is not syntactically correct. So, the easiest way to turn this operator dynamic is to write "WHERE column {$operator} ?" (inject it, but only it, by string interpolation or concatenation). If this variable $operator is defined by your own code (eg: $operator = some_condition ? '>' : '=';), it's fine but, in the opposite, if it involves some superglobal variable which comes from the client like $_POST or $_GET, this creates a security hole (SQL injection).
TL;DR
Then comes another guy who says "don't worry, use macros."
The answer of Aleksei Matiushkin, in the mentionned post, is just a workaround to the disabled/forbidden string interpolation by fragment/1 to dynamically inject a known operator. If you reuse this trick (and can't really do otherwise), as long as you don't blindly "inject" any random value coming from the user, you'll be fine.
UPDATE:
It seems, after all, that fragment/1 (which I didn't inspect the source) doesn't imply a prepared statement (the ? are not placeholder of a true prepared statement). I tried some simple and stupid enough query like the following:
from(
Customer,
where: fragment("lastname ? ?", "LIKE", "%")
)
|> Repo.all()
At least with PostgreSQL/postgrex, the generated query in console appears to be in fact:
SELECT ... FROM "customers" AS c0 WHERE (lastname 'LIKE' '%') []
Note the [] (empty list) at the end for the parameters (and absence of $1 in the query) so it seems to act like the emulation of prepared statement in PHP/PDO meaning Ecto (or postgrex?) realizes proper escaping and injection of values directly in the query but, still, as said above LIKE became a string (see the ' surrounding it), not an operator so the query fails with a syntax error.

Get Path in text format from Graph

In my graph I have data like following way.
Here a,b,c,d are nodes and r1,r2,r3,r4 are relations.
a-r1->b
b-r2->a
b-r2->c
c-r1->b
d-r3->a
a-r1->d like this.
I am using following Cypher to get path with max depth 3.
MATCH p=(n)-[r*1..3]-(m) WHERE n.id=1 and m.id=2 RETURN p
Here return p is path and I want to display path in text format like this.
Example : Suppose Path Lengh is 3.
a-r1->b-r2->c like this in text format.
Is this possible ?
Sort of. I'll give you most of the answer, but I myself can't complete the answer. Maybe another cypher wizard will come along and improve on the answer, but here's what I've got for you.
match p=(n)-[r*1..3]-(m)
WHERE id(n)=1 AND id(m)=2
WITH extract(node in nodes(p) | coalesce(node.label, "")) as nodeLabels,
extract(rel in relationships(p) | type(rel)) as relationshipLabels
WITH reduce(nodePath="", nodeLabel in nodeLabels | nodePath + nodeLabel + "-") as nodePath,
reduce(relPath="", relLabel in relationshipLabels | relPath + relLabel + "-") as relPath
RETURN nodePath, relPath
LIMIT 1;
EDIT - one small note, in your question you specify the WHERE criteria n.id=1 and m.id=2. Note that this is probably not what you want. Node IDs are usually checked with WHERE id(n)=1 AND id(m)=2. Id isn't technically a node property, so I changed that.
OK, so we're going to match the path. Then we're going to use the extract function to pull out the label property from nodes, and create a collection called nodeLabels. We'll do the same for the relationship types. What reduce does here is accumulate each of the individual strings in those collections down to a single string. So if your nodes are a, b, and c, you'd get a nodePath string that looks like a-b-c-. Similarly, your relationship string would look like r1-r2-r3-.
Now, I know you want those interleaved, and you'd prefer output like a-r1-b-r2-c. Here's the problem I see with that...
Normally, the way I'd approach that is to use FOREACH to iterate over the node label collection. Since you know there is one less relationship than nodes because of what paths are, ideally (in pseudo code) I'd want to do something like this:
buffer = ""
foreach x in range(0, length(nodeLabels)) |
buffer = buffer + nodeLabels[idx] + "-" + relLabels[idx] + "->")
This would be a way of reducing to the string that you want. You can't use the reduce function, because it doesn't provide you a way of getting which index you're at in the collection. Meaning that you can iterate over one of the collections, but not at the same time over the other. This FOREACH pseudo code will not work, because the second part of FOREACH I believe has to be a mutating operation on the graph, and you can't just use it to accumulate a string like I did here, or like the extract function does.
So as far as I can tell, you might kinda be stuck here. Hopefully someone will prove me wrong on this - I am not 100% sure.
Finally another way to go after this would be, if there was a path function that extracted node/relationship pairs, rather than just nodes() or relationships() individually as I used them above, then you could use that function to iterate over one collection, rather than shuffling two collections, as my code above attempts and fails to do. Sadly, I don't think there's any such path function, so that's just more reason why I think you might be up a creek.
Now, practically speaking, you could always execute this query in java or some other language, return the path, and then use the full power of whatever programming language you want to build up this string. But pure cypher? I'm doubtful.
Here What I ended up doing. Hope that somebody else find it useful for future.
MATCH p=(n)-[r*1..3]->(m)
WHERE n.id=1 AND m.id=4
WITH extract(rel in relationships(p) | STARTNODE(rel).name + '->' + type(rel)) as relationshipLabels, m.name as endnodename
WITH reduce(relPath="", relLabel in relationshipLabels | relPath + relLabel+ '->') as relPath , end
RETURN distinct relPath + endnodename

I can't guess legacy code's purpose (Neo4j, cypher query)

I'm new programming with Neo4j, so I don't know enough from it's cypher language yet to solve without help an annoying bug from legacy and undocumented code.
My main problem is that I can't guess the purpose of the following query... :s .
That's the problematic query:
START
n=node({self})
MATCH
n-[:RECOMMENDATION]->(m)
WHERE
m.concept_type='unifying_theme' AND
not( ()-[:REQUIRED]->m ) RETURN m
The query itself is written with only one line, I've formatted it to make more readable. The error message is the following (reformatted to be easier to read):
PatternException: Some identifiers are used as both relationships and nodes:
UNNAMED1 Query:
START n=node({self})
MATCH n-[:RECOMMENDATION]->(m)
WHERE m.concept_type='unifying_theme' AND not( ()-[:REQUIRED]->m )
RETURN m
Params: {'self': 423}
Trace:
org.neo4j.cypher.internal.pipes.matching.PatternGraph.validatePattern(PatternGraph.scala:98)
org.neo4j.cypher.internal.pipes.matching.PatternGraph.<init>(PatternGraph.scala:36)
org.neo4j.cypher.internal.executionplan.builders.PatternGraphBuilder$cla...
The query is embedded inside a NeoModel's python library "StructuredNode" instance. I guess the {self} refers to the node represented by the StructuredNode instance, and that the error if this query is related with the m variable...
I suppose maybe I should use more variable names to avoid conflicts, but I'm suspicious. I think there are more errors on this query because I've seen more ugly and buggy code of this disastrous programmer.
I don't know what was trying to do with the not( ()-[:REQUIRED]->m ) block, Is that a legal Cypher "subsentence" if m represents a node?
P.D.: I'm using Neo4j 1.9.7 .
Thank you in advance.

gremlin request

How can i request with gremlin a limited list of nodes with properties of my choosing?
Something like:
g.V. 10 nodes with nodeType=="User", return only id, name and email.
For speed, do filter{it.getProperty('nodeType').equals('User')}...
Using TinkerPop 3+, that would be:
g.V().hasLabel('user').limit(10).valueMap(true, 'name', 'email')
Calling valueMap(true) returns both the id and the label of the traversed graph element.
For performance, it is now recommended to avoid lambdas and use Gremlin steps.
If you are using Tinkerpop 3 and you have the "Type" you are are searching on defined as the node label then you can do something like this:
g.V.hasLabel('User')[0..10].valueMap.select('id','name','e-mail')
Note also that I think you need to specify [0..10] if you want 10 nodes and not [0..9]
However I totally defer to Marko's answer on performance as he understands the internals. I just like the clean feel of hasLabel().
g.V.filter{it.nodeType=='User'}[0..9].transform(){it.id + ' ' + it.name + ' ' + it.email}

DB Closure function for dynamic select queries vs performance

I am working on reports page for a ZF2 project. Now I need to generate dynamic query depends on filters (which can be '=', '>', '>=', '<', '<=', 'IN' ). I am using DB select closure for generate where statement. But I am afraid if it could be a bottleneck in coming days ( by performance or by limitations ).
Can any body suggest if my approach is Ok or need to generate string where statements like
->where('A > 12 AND B < 12 AND C IN (1,2,3)')
instead of
->where(function(Where $where){
$where->equalTo('A', 10)->equalTo('B', 12)->IN('C', array(1,2,3));
});
Or any better idea ?
I got a better solution. Instead of using closure I using direct Where Object something like that
$where = new \Zend\Db\Sql\Where();
$where->equalTo('A', 10)->equalTo('B', 12)->IN('C', array(1,2,3));
$sql->select()->where($where);
It is more dynamic as $where can be dynamically updated by other values. Still if anybody have other ideas please share.

Resources