How can i request with gremlin a limited list of nodes with properties of my choosing?
Something like:
g.V. 10 nodes with nodeType=="User", return only id, name and email.
For speed, do filter{it.getProperty('nodeType').equals('User')}...
Using TinkerPop 3+, that would be:
g.V().hasLabel('user').limit(10).valueMap(true, 'name', 'email')
Calling valueMap(true) returns both the id and the label of the traversed graph element.
For performance, it is now recommended to avoid lambdas and use Gremlin steps.
If you are using Tinkerpop 3 and you have the "Type" you are are searching on defined as the node label then you can do something like this:
g.V.hasLabel('User')[0..10].valueMap.select('id','name','e-mail')
Note also that I think you need to specify [0..10] if you want 10 nodes and not [0..9]
However I totally defer to Marko's answer on performance as he understands the internals. I just like the clean feel of hasLabel().
g.V.filter{it.nodeType=='User'}[0..9].transform(){it.id + ' ' + it.name + ' ' + it.email}
Related
I have array of objects with this structure
[{
name:'here is name, which can have punctuation marks ',
value: 'here will be text '
},
{
name:'here is name, which can have punctuation marks ',
value: 'here will be text '
}]
I am trying to find the best way to keep it in neo4j node. Since later I am going to search, filter ... on this data I don't want to keep hole object as string. Creating property by name object.name is not possible because I have punctuation marks. The ideal way would be to keep it as property, because I am going to use this data as property of node, but removing punctuation marks from name is not an option too.
Probably I could keep them in array
['here is the name', ' and the second element of array is the text']
In this case the problem will be to give correct name to the property, which will have this array.
Another option could be to keep all data in list like this
tabs: ['first name - first value', ' second name - second value']
but to search later I will need to use regex inside the list. this doesn't seem flexible.
So what would be the best way ?
Thank you in advance!
You have few possibilities in terms of storing objects in the Neo4j database.
You can turn your JSON object into a string and save it as a property and later you can use APOC procedures for conversation to/from (using these procedures can help you with filtering or sorting on these).
You can create the helper nodes and treat each element of a list as a separate node (I think this is the suggested approach).
Worth to mention, that you can link your nodes to keep the order or just connect them directly if you don't really care about order.
The simplest solution is to use backticks as mentioned in neo4j community forum by Andreas Kollegger
CREATE ({`here is "name", which has punctuation marks!`:"here will be text"})
for more complex cases Giuseppe Villani suggested better solution
CALL apoc.create.nodes(["MyLabel"], [{
`name.with.dots`:'here is name, which can have punctuation marks .',
value: 'here will be text '
},
{
name:'here is name, which can have punctuation marks ',
value: 'here will be text '
}]) yield node
return node
to create a Node with a label ("MyLabel" in this case), so that you can have every data you need in a specific label (possibly to be indexed and/or connect with other entities)
I have some cypher queries that I execute against my neo4j database. The query is in this form
MATCH p=(j:JOB)-[r:HAS|STARTS]->(s:URL)-[r1:VISITED]->(t:URL)
WHERE j.job_id =5000 and r1.origin='iframe' and r1.job_id=5000 AND NOT (t.netloc =~ 'VERY_LONG_LIST')
RETURN count(r1) AS number_iframes;
If you can't understand what I am doing. This is a much simpler query
MATCH (s:WORD)
WHERE NOT (s.text=~"badword1|badword2|badword3")
RETURN s
I am basically trying to match some words against specific list
The problem is that this list is very large as you can see my job_id=5000 and I have more than 20000 jobs, so if my whitelist length is 1MB then I will end up with very large queries. I tried 500 jobs and end up with 200 MB queries file.
I was trying to execute these queries using transactions from py2neo but this is wont be feasible because my post request length will be very large and it will timeout. As a result, I though of using
neo4j-shell -file <queries_file>
However as you can see the file size is very large because of the large whitelist. So my question is there anyway that I can store this "whitelist" in a variable in neo4j using cypher??
I wish if there is something similar to this
SAVE $whitelist="word1,word2,word3,word4,word5...."
MATCH p=(j:JOB)-[r:HAS|STARTS]->(s:URL)-[r1:VISITED]->(t:URL)
WHERE j.job_id =5000 and r1.origin='iframe' and r1.job_id=5000 AND NOT (t.netloc =~ $whitelist)
RETURN count(r1) AS number_iframes;
What datatype is your netloc?
If you have an index on netloc you can also use t.netloc IN {list} where {list} is a parameter provided from the outside.
Such large regular expressions will not be fast
What exactly is your regexp and netloc format like? Perhaps you can change that into a split + index-list lookup?
In general also for regexps you can provide an outside parameter.
You can also use "IN" + index for job_ids.
You can also run a separate job that tags the jobs within your whitelist with a label and use that label for additional filtering e.g. in the match already.
Why do you have to check this twice ? Isn't it enough that the job has id=5000?
j.job_id =5000 and r1.job_id=5000
In my graph I have data like following way.
Here a,b,c,d are nodes and r1,r2,r3,r4 are relations.
a-r1->b
b-r2->a
b-r2->c
c-r1->b
d-r3->a
a-r1->d like this.
I am using following Cypher to get path with max depth 3.
MATCH p=(n)-[r*1..3]-(m) WHERE n.id=1 and m.id=2 RETURN p
Here return p is path and I want to display path in text format like this.
Example : Suppose Path Lengh is 3.
a-r1->b-r2->c like this in text format.
Is this possible ?
Sort of. I'll give you most of the answer, but I myself can't complete the answer. Maybe another cypher wizard will come along and improve on the answer, but here's what I've got for you.
match p=(n)-[r*1..3]-(m)
WHERE id(n)=1 AND id(m)=2
WITH extract(node in nodes(p) | coalesce(node.label, "")) as nodeLabels,
extract(rel in relationships(p) | type(rel)) as relationshipLabels
WITH reduce(nodePath="", nodeLabel in nodeLabels | nodePath + nodeLabel + "-") as nodePath,
reduce(relPath="", relLabel in relationshipLabels | relPath + relLabel + "-") as relPath
RETURN nodePath, relPath
LIMIT 1;
EDIT - one small note, in your question you specify the WHERE criteria n.id=1 and m.id=2. Note that this is probably not what you want. Node IDs are usually checked with WHERE id(n)=1 AND id(m)=2. Id isn't technically a node property, so I changed that.
OK, so we're going to match the path. Then we're going to use the extract function to pull out the label property from nodes, and create a collection called nodeLabels. We'll do the same for the relationship types. What reduce does here is accumulate each of the individual strings in those collections down to a single string. So if your nodes are a, b, and c, you'd get a nodePath string that looks like a-b-c-. Similarly, your relationship string would look like r1-r2-r3-.
Now, I know you want those interleaved, and you'd prefer output like a-r1-b-r2-c. Here's the problem I see with that...
Normally, the way I'd approach that is to use FOREACH to iterate over the node label collection. Since you know there is one less relationship than nodes because of what paths are, ideally (in pseudo code) I'd want to do something like this:
buffer = ""
foreach x in range(0, length(nodeLabels)) |
buffer = buffer + nodeLabels[idx] + "-" + relLabels[idx] + "->")
This would be a way of reducing to the string that you want. You can't use the reduce function, because it doesn't provide you a way of getting which index you're at in the collection. Meaning that you can iterate over one of the collections, but not at the same time over the other. This FOREACH pseudo code will not work, because the second part of FOREACH I believe has to be a mutating operation on the graph, and you can't just use it to accumulate a string like I did here, or like the extract function does.
So as far as I can tell, you might kinda be stuck here. Hopefully someone will prove me wrong on this - I am not 100% sure.
Finally another way to go after this would be, if there was a path function that extracted node/relationship pairs, rather than just nodes() or relationships() individually as I used them above, then you could use that function to iterate over one collection, rather than shuffling two collections, as my code above attempts and fails to do. Sadly, I don't think there's any such path function, so that's just more reason why I think you might be up a creek.
Now, practically speaking, you could always execute this query in java or some other language, return the path, and then use the full power of whatever programming language you want to build up this string. But pure cypher? I'm doubtful.
Here What I ended up doing. Hope that somebody else find it useful for future.
MATCH p=(n)-[r*1..3]->(m)
WHERE n.id=1 AND m.id=4
WITH extract(rel in relationships(p) | STARTNODE(rel).name + '->' + type(rel)) as relationshipLabels, m.name as endnodename
WITH reduce(relPath="", relLabel in relationshipLabels | relPath + relLabel+ '->') as relPath , end
RETURN distinct relPath + endnodename
I want to avoid using injection of parms in the query statement. Therefore we used the following instructions from the NEO4J .NET client class:
var queryClassRelationshipsNodes = client.Cypher
.Start("a", (NodeReference)sourceReference.Id)
.Match("a-[Rel: ***{relationshipType***} ]->foundClass")
.Where("Rel.RelationStartNode =" + "\'" + relationshipStart + "\'")
.AndWhere("Rel.RelationDomainNode =" + "\'" + relationshipDomain + "\'")
.AndWhere("Rel.RelationClassNode =" + "\'" + relationshipClass + "\'")
.WithParam("relationshipType", relationshipType)
.Return<Node<Dictionary<string, string>>>("foundClass")
.Results;
However this code does not work once executed by the server. For some reason the PARM: relationshipType is not connected with the variable which we put in between {}.
Can someone please help us debug the problem with this code? We would prefer to use WithParms rather than injecting variables inside the statement.
Thanks a lot!
Can someone please help us debug the problem with this code?
There's a section on https://bitbucket.org/Readify/neo4jclient/wiki/cypher titled "Debugging" which describes how to do this.
As for your core problem though, your approach is hitting a Cypher restriction. Parameters are for parts of the query that aren't compiled into the query plan. The match clause is however.
From the Neo4j documentation:
Parameters can be used for literals and expressions in the WHERE clause, for the index key and index value in the START clause, index queries, and finally for node/relationship ids. Parameters can not be used as for property names, since property notation is part of query structure that is compiled into a query plan.
You could do something like:
.Match("a-[Rel:]->foundClass")
.Where("type(Rel) = {relationshipType}")
.WithParam("relationshipType", relationshipType)
(Disclaimer: I've just typed that here. I haven't tested it at all.)
That will likely be slower though, because you need to retrieve all relationships, then test their types. You should test this. There's a reason why the match clause is compiled into the query plan.
I'm just starting to learn the Cypher query language and GraphDb in general. I've created some indexes using the class name of my nodes like:
"com.acme.node.SomeNodeType"
I can't for the life of me figure out how to reference this index in Cypher. I found this thread but using ` didn't work for me.
So I guess I have 2 questions:
Is it possible to use an index with dots in the name?
If so, how do I specify the name in the query?
can you try to query them with '' like
start n = node:`my.index`('name:test') return n
?