I have a program where I have a tree. I place the nodes and their edges in NodeList and EdgeList. Then put the lists in a DirectedGraph which I then run through DirectedGraphLayout.visit().
I now place the nodes and edges on the screen. The diagram is almost perfect but there are some inconsistencies with nodes being out of order.
The first one is shown in the following picture. As you can see it places a few children of parent1, then places some nodes from parent2. Then for some reason that I can't figure out it places the last child from parent1 to the right of parent2's children.
Picture
The next problem are the children of a parent being out of order. When the children should be placed in the order: 1 2 3 4, they are instead placed in different orders such as: 2 3 1 4.
I have tried adding the nodes to NodeList is different orders, trying post and pre order recursion to parse my tree. I end up getting the exact same positioning.
I'm at a loss on how to fix this. Any tips about how DirectedGraph works and why it's placing nodes like this is appreciated.
From the DirectedGraphLayout documentation,
[the layout algorithm will] assign x coordinates such that the graph is easily readable. The exact behavior is undefined
And further down
This class is not guaranteed to produce the same results for each invocation.
So I guess there is no way to "fix" this as the behavior is undefined.
Related
I am completely new to NEO4j and using it for the first time ever now for my masters program. Ive read the documentation and watched tutorials online but can’t seem to figure out how I can represent my nodes in the way I want.
I have a dataframe with 3 columns, the first represents a page name, the second also represents a page name, and the third represents a similarity score between those two pages. How can I create a graph in NEO4J where the nodes are my unique page names and the relationships between nodes are drawn if there is a similarity score between them (so if the sim-score is 0 they don’t draw a relationship)? I want to show the similarity score as the text of the relationship.
Furthermore, I want to know if there is an easy way to figure out which node had the most relationships to other nodes?
I’ve added a screenshot of the header of my DF for clarity https://imgur.com/a/pg0knh6. I hope anyone can help me, thanks in advance!
Edit: What I have tried
LOAD CSV WITH HEADERS FROM 'file:///wiki-small.csv' AS line
MERGE (p:Page {name: line.First})
MERGE (p2:Page {name: line.Second})
MERGE (p)-[r:SIMILAR]->(p2)
ON CREATE SET r.similarity = toFloat(line.Sim)
Next block to remove the similarities relationships which are 0
MATCH ()-[r:SIMILAR]->() WHERE r.Sim=0
DELETE r
This works partially. As in it gives me the correct structure of the nodes but doesn't give me the similarity scores as relationship labels. I also still need to figure out how I can find the node with the most connections.
For the first question:
How can I create a graph in NEO4J where the nodes are my unique page names and the relationships between nodes are drawn if there is a similarity score between them (so if the sim-score is 0 they don’t draw a relationship)?
I think a better approach is to remove in advance the rows with similarity = 0.0 before ingesting them into Neo4j. Could it be something feasible? If your dataset is not so big, I think it is very fast to do in Python. Otherwise the solution you provide of deleting after inserting the data is an option.
In case of a big dataset, maybe it's better if you load the data using apoc.periodic.iterate or USING PERIODIC COMMIT.
Second question
I want to know if there is an easy way to figure out which node had the most relationships to other nodes?
This is an easy query. Again, you can do it with play Cypher or using APOC library:
# Plain Cypher
MATCH (n:Page)-[r:SIMILAR]->()
RETURN n.name, count(*) as cat
ORDER BY cnt DESC
# APOC
MATCH (n:Page)
RETURN apoc.node.degree(n, "SIMILAR>") AS output;
EDIT
To display the similarity scores, in Neo4j Desktop or in the others web interfaces, you can simply: click on a SIMILARITY arrow --> on the top of the running cell the labels are shown, click on the SIMILAR label marker --> on the bottom of the running cell, at the right of Caption, select the property that you want to show (similarity in your case)
Then all the arrows are displayed with the similarity score
To the second question: I think you should keep a clear separation between the way you store data and the way you visualize it. Having the similarity score (a property of the SIMILARITY edge) as a "label" is something that is best dealt with by using an adequate viz library or platform. Ours (Graphileon) could be such a platform, although there are also others.
We offer the possibility to "style" the edges with so-called selectors like
"label":"(%).property.simScore" that would use the simScore as a label. On top of that you could do thing like
"width":"evaluate((%).properties.simScore < 0.500 ? 3 : 10)"
or
"fillColor":"evaluate((%).properties.simScore < 0.500 ? grey : red)"
to distinguish visually high simScores.
Full disclosure : I work for Graphileon.
Let's say I have three nodes in my Neo4j graph, with directed relationships like this: (a)<--(b)-->(c). Furthermore, assume that (b) does NOT have the property visit_type_name, whereas both (a) and (c) do. Now what I would like to do is reverse only one of these arrows. For the moment, it does not matter which one, although being able to specify conditions, involving properties, on which one to reverse would be nice. I tried the following:
MATCH(x)-[r]->(y)
WHERE NOT EXISTS(()-->(x))
AND NOT EXISTS(x.visit_type_name)
DELETE r
MERGE(y)-->(x)
My thought was that after this code reversed, say, the arrow (a)<--(b) to (a)-->(b), then (b) would no longer be parent-less, and the MATCH would not continue on and do the same thing with the (b)-->(c) link. Unfortunately, Cypher does continue and reverse both arrows, which is not what I want. So then I tried this, thinking that I needed to change the granularity of the Cypher match:
MATCH(y)
WITH y
MATCH(x)-[r]->(y)
WHERE NOT EXISTS(()-->(x))
AND NOT EXISTS(x.visit_type_name)
DELETE r
MERGE(y)-->(x)
Unfortunately, this does the same thing as before.
How can I reverse only one arrow in this situation?
Is there a way to finalize the first arrow reversal transaction before moving on?
Many thanks for your time!
I think I have a method of doing this. Each node has a unique numerical id which I can leverage as follows:
MATCH(x)-->(y)
WHERE NOT EXISTS(()-->(x))
AND NOT EXISTS(x.visit_type_name)
WITH MIN(y.id) AS min_y_id, x
MATCH(x)-[r]->(min_y)
WHERE min_y.id = min_y_id
DELETE r
MERGE(min_y)-->(x)
This essentially picks out the minimum id and only reverses the arrow for the corresponding node.
I'm trying to decide if i should implement Categories as nodes or labels.
Especially the query to get a count of nodes belonging to a category is not so easy.
Nodes have to be able to belong to more categories !
Categories as labels, variant 1
Keep a list of categories somewhere, then:
MATCH a:cat1, b:cat2, c:cat3, ...
With a lot of categories i will get a lot of columns .. so that's not really good. Also lot's of preprocessing on the query.
Not even sure if i could get a count easily from that.
Categories as labels, variant 2
MATCH n:category <-- the category label is used to limit the amount of nodes
RETURN DISTINCT labels(n), count(*) as count
Will return something like:
["category","the actual category label"], 2
Looks perfect, but this won't work when a node has multiple categories
["category","cat1","cat2"], 2 <-- two nodes found with category "cat1" and "cat2"
["category","cat1"], 4 <-- four nodes found with category "cat1"
Now i don't know how to get the count per category ...
Maybe something with extract(..labels()..) or filter(..labels()..) could be able to do it, but i don't know how.
Categories as nodes
Yes this works ... this is pretty straight forwarded. But aren't labels suppose to be THE thing for categorizing nodes? Plus all the extra relationships i would be creating ..
Maybe i should implement it as both labels and nodes?
Then with labels i can get every node with a category fast. And with a node i could get the category count.
I'm still searching for a good perspective on this problem, so i can not give a concrete implementation question yet.
My two cents.
For your kind of categories, I would go with a node per category and create a BELONGS_TO relationship from nodes belonging to that category. There are a number of reasons for this preference of mine.
One of the reasons labels were added is that many people were putting a "type" property on nodes. Another way to talk about labels is that they add a little bit of a "schema" to your graph - in the sense that you can categorise nodes.
With the introduction of labels, there's always the risk that they will be abused. It is just an extra tool in a database that is primarily designed for storing graphs. In an extreme case, you could use labels for almost everything, ending up with a store of "tagged" nodes.
Finally, traversing relationships is the fastest thing Neo4j does. We're talking units of microseconds. Don't be afraid adding thousands of relationships to a node. I'd leave labels for developer-defined "schema-like" information.
So in your case of user-added categories, I'd definitely create category nodes and BELONGS_TO relationships, in favour of labelling.
One last thing with a disclaimer that this is a bit of self-marketing. If you get to a point where you have tens of thousands or millions of relationships per node, and all you're after is counting the relationships, it might be a good idea to cache those counts on the nodes as properties. I've developed a module called "Relationship Count Module" for the GraphAware Framework, which does exactly that. I've demonstrated in my MSc. thesis, which is gonna be public in a couple of weeks, that the module speeds up count queries for high-degree vertices by several orders of magnitude, for as little as 10-25% write throughput penalty. Let me know if you need more detail about that.
My graph is composed of multiple "sub-graphes" that are disconnected from one another. These sub-graphes are composed of nodes that are connected with a given relation type.
I would like to get (for example) the list of sub-graphes that contain at least one node that has the property "name" equals "John".
It's equivalent to finding one node per subgraph having this property.
One solution would be to find all the nodes having this property and loop through this list to only pick the ones that are not connected to the previously picked ones. But that would be ugly and quite heavy. Is there an elegant way to do that with Cypher?
I'm trying with something along this direction but have no success so far:
START source=node:user('name:"John"')
MATCH source-[r?:KNOWS*]-target
WHERE r is null
RETURN source
Try this one it may help
START source=node:user('name:"John"')
MATCH source-[r:KNOWS]-()-[r2:KNOWS]-target
WHERE NOT(source-[r:KNOWS]-target)
RETURN target
This is quite a general question but to make it more understandable I'll give it a bit of context.
In neo4j I have a series of words (nodes) that are associated with one another. I want to specify a list of nodes and the Cypher query return a list of any relationships between those nodes.
The nodes specified in the list are all guaranteed to have at least one relationship to another node specified in the list.
I created a query to do this and in certain circumstances it works fine - http://console.neo4j.org/?id=s30cbm
Unfortunately, when I add the words 'bark' and 'dog' to the list I get an 'unexpected traversal state encountered' error message. I presume this is because the database cursor has got to the fruit node and then there's no relationship between that and bark, even though there is a relationship from tree to bark. http://console.neo4j.org/?id=258d6g
I'm obviously doing the query slightly wrong and any advice would be appreciated on how I can rectify this.
This works in the latest console (your second link), btw, so it looks like they fixed it. Looks like it should be working in 1.9-M04+.