So I am making a Gephi graph and wants to make some subgraphs using filters.
After filtering the nodes using intra edges(based in group) I get some lonely nodes, so I want to remove them: Library->Topology-> Degree range
So theoretically, all those unused nodes should be removed.
But because degree range still stays at 1-32, and no 0 is found, I can't remove the nodes.
I tried different combinations but it just doesn't work somehow. Here is one of the nodes I want to remove because it has 0 in and out degree
Tough one because you need to recompute degrees for the filtered graph. I would recommend the following approach:
Copy your current degree columns to something like [in|out|degree]_global so that they don't get overwritten by step 2
Copy your filtered graph to a new workspace (or save the visible graph only)
Recompute the degrees on the filtered graph
Filter by range as you do now
Related
I am completely new to NEO4j and using it for the first time ever now for my masters program. Ive read the documentation and watched tutorials online but can’t seem to figure out how I can represent my nodes in the way I want.
I have a dataframe with 3 columns, the first represents a page name, the second also represents a page name, and the third represents a similarity score between those two pages. How can I create a graph in NEO4J where the nodes are my unique page names and the relationships between nodes are drawn if there is a similarity score between them (so if the sim-score is 0 they don’t draw a relationship)? I want to show the similarity score as the text of the relationship.
Furthermore, I want to know if there is an easy way to figure out which node had the most relationships to other nodes?
I’ve added a screenshot of the header of my DF for clarity https://imgur.com/a/pg0knh6. I hope anyone can help me, thanks in advance!
Edit: What I have tried
LOAD CSV WITH HEADERS FROM 'file:///wiki-small.csv' AS line
MERGE (p:Page {name: line.First})
MERGE (p2:Page {name: line.Second})
MERGE (p)-[r:SIMILAR]->(p2)
ON CREATE SET r.similarity = toFloat(line.Sim)
Next block to remove the similarities relationships which are 0
MATCH ()-[r:SIMILAR]->() WHERE r.Sim=0
DELETE r
This works partially. As in it gives me the correct structure of the nodes but doesn't give me the similarity scores as relationship labels. I also still need to figure out how I can find the node with the most connections.
For the first question:
How can I create a graph in NEO4J where the nodes are my unique page names and the relationships between nodes are drawn if there is a similarity score between them (so if the sim-score is 0 they don’t draw a relationship)?
I think a better approach is to remove in advance the rows with similarity = 0.0 before ingesting them into Neo4j. Could it be something feasible? If your dataset is not so big, I think it is very fast to do in Python. Otherwise the solution you provide of deleting after inserting the data is an option.
In case of a big dataset, maybe it's better if you load the data using apoc.periodic.iterate or USING PERIODIC COMMIT.
Second question
I want to know if there is an easy way to figure out which node had the most relationships to other nodes?
This is an easy query. Again, you can do it with play Cypher or using APOC library:
# Plain Cypher
MATCH (n:Page)-[r:SIMILAR]->()
RETURN n.name, count(*) as cat
ORDER BY cnt DESC
# APOC
MATCH (n:Page)
RETURN apoc.node.degree(n, "SIMILAR>") AS output;
EDIT
To display the similarity scores, in Neo4j Desktop or in the others web interfaces, you can simply: click on a SIMILARITY arrow --> on the top of the running cell the labels are shown, click on the SIMILAR label marker --> on the bottom of the running cell, at the right of Caption, select the property that you want to show (similarity in your case)
Then all the arrows are displayed with the similarity score
To the second question: I think you should keep a clear separation between the way you store data and the way you visualize it. Having the similarity score (a property of the SIMILARITY edge) as a "label" is something that is best dealt with by using an adequate viz library or platform. Ours (Graphileon) could be such a platform, although there are also others.
We offer the possibility to "style" the edges with so-called selectors like
"label":"(%).property.simScore" that would use the simScore as a label. On top of that you could do thing like
"width":"evaluate((%).properties.simScore < 0.500 ? 3 : 10)"
or
"fillColor":"evaluate((%).properties.simScore < 0.500 ? grey : red)"
to distinguish visually high simScores.
Full disclosure : I work for Graphileon.
I have a graph of Position nodes that are connected using direction :TO edges. Each node has a uuid property and may have many edges from it to other nodes and each edge has a property probability. I want to get the subgraph from a particular starting node using only the edges with the top N probabilities from each node. For example, if each node has ten edges, I might want to use the three edges with the highest probability.
On top of this I want to exclude all edges that end in an already visited node and, preferably, be able to parameterize the maximum number of levels (maxLevel in the apoc procedures, I believe).
The apoc path expansion procedures would probably work fine except for the last requirement; there's no apparent way to limit the number of edges, just the number of rows.
I've tried chaining MATCH queries together but can't figure out how to limit the number of edges on a per node basis, just the number of rows.
I think I have a few additional ideas that I'm going to work on, but I feel like this has to be a common enough use case that I'm missing something fundamental.
Try this:
MATCH (n:label1)-[e:type]->(m:label2)
WHERE n.name='xxx'
WITH n,e1,m
ORDER BY e1.prop DESC
LIMIT 3
MATCH (m)-[e2:type]->(k)
RETURN n,e1,m,e2,k
I am a Neo4J beginner, so, apologies in advance if my question is too trivial.
I am trying to create a Neo4J graph representing a set of consecutive steps in a game, as shown in this diagram.
You will see in the diagram that I start with zero points, and, at certain steps (but not in every step), additional points are accumulated.
I want to assign points to nodes that don't have points yet, according to the following principle: whenever a node does not have points, I want to assign to it a number of points equal to the points possessed by the closest previous node that has points assigned to it. In the sample diagram, step 2 would have 0 points (:Step {id: 2, points_so_far: 0}), and step 4 would have 1 point (:Step {id: 4, points_so_far: 1}). Note that there may be an arbitrary number of scoreless nodes between nodes that do have a score.
Any help in creating a respective Cypher query would be much appreciated!
Many thanks in advance!
Here is a way to do it :
match (s:Step) WHERE not exists(s.points_so_far)
match (prev:Step)<-[:HAS_PREVIOUS_STEP*]-(s) where exists(prev.points_so_far)
with s, head(collect(prev)) as prev
SET s.points_so_far = prev.points_so_far
How does it work ?
First, find all nodes that have no points_so_far
match (s:Step) WHERE not exists(s.points_so_far)
with that nodes, find all previous steps that have points_so_far
match (prev:Step)<-[:HAS_PREVIOUS_STEP*]-(s) where exists(prev.points_so_far)
get all the previous nodes with points, collect them in a list, and keep only the first one encountered
with s, head(collect(prev)) as prev
set the value of the node, with the value of the previous node
SET s.points_so_far = prev.points_so_far
Note:
This request uses variable path length (the * in <-[:HAS_PREVIOUS_STEP*]-) wich has some performance cost.
Given an undirected, unweighted graph in which some nodes are marked, is there an efficient way to find the unmarked nodes between node A and B which would create a "marked" path from A to B when they are marked? The number of those "bridge" nodes should be minimal.
For example, in the graph below there would be two minimal ways to connect node A to B. One possibility would be to mark the node labelled 1, the other possibility would be to mark node 2.
Convert your graph into a directed, weighted graph such that:
the weight of each edge going into a marked node is set to 0
the weight of each edge going into an unmarked node is 1
Find all lowest-cost paths from A to B.
Suppose I have 3 subgraphs in Neo4j and I would like to select and delete the whole subgraph if all the nodes in the subgraph matching the filtering criteria that is each node's property value <= 1. However if there is atleast one node within the subgraph that is not matching the criteria then the subgraph will not be deleted.
In this case the left subgraph will be deleted but the right subgraph and the middle one will stay. The right one will not be deleted even though it has some nodes with value 1 because there are also nodes with values greater than 1.
userids and values are the node properties.
I will be thankful if anyone can suggest me the cypher query that can be used to do that. Please note that the query will be on the whole graph, that is on all three subgraphs or more if there are anymore.
Thanks for the clarification, that's a tricky requirement, and it's not immediately clear to me what the best approach is that will scale well with large graphs, as most possibilities seem to be expensive full graph operations. We'll likely need to use a few steps to set up the graph for easier querying later. I'm also assuming you mean "disconnected subgraphs", otherwise this answer won't work.
One start might be to label nodes as :Alive or :Dead based upon the property value. It should help if all nodes are of the same label, and if there's an index on the value property for that label, as our match operations could take advantage of the index instead of having to do a full label scan and property comparison.
MATCH (a:MyNode)
WHERE a.value <= 1
SET a:Dead
And separately
MATCH (a:MyNode)
WHERE a.value > 1
SET a:Alive
Then your query to mark nodes to delete would be:
MATCH (a:Dead)
WHERE NOT (a)-[*]-(:Alive)
SET a:ToDelete
And if all looks good with the nodes you've marked for delete, you can run your delete operation, using apoc.periodic.commit() from APOC Procedures to batch the operation if necessary.
MATCH (a:ToDelete)
DETACH DELETE a
If operations on disconnected subgraphs are going to be common, I highly encourage using a special node connected to each subgraph you create (such as a single :Cluster node at the head of the subgraph) so you can begin such operations on :Cluster nodes, which would greatly speed up these kind of queries, since your query operations would be executed per cluster, instead of per :Dead node.