I have 565 articles in Neo4j and I ran LPA to obtain clusters.
I have the following result: 69 communities.
I would like to display all the communities at the same time in Neo4j.
I tried several Cypher queries with the property key 'community' but it didn't work.
My data looks like this:
How can I do it ?
Presumably, you are using the neo4j Browser to visualize your results.
When your Cypher query returns any nodes, relationships, or paths, the browser will automatically show you the Graph view (on the left side of the result panel, you should see icons with captions that may include Graph, Table, Text, etc.). The Graph view only shows nodes and relationships, and not anything else that was returned.
However, if you click on the other icons (say, Table or Text), you should see more results -- like the communities, presented in different formats.
By the way, specifying the node label would make your query more efficient (and adding an index would make it even more efficient if you have a lot of ARTICLE nodes):
MATCH (n:ARTICLE) WHERE EXISTS(n.community)
RETURN n, n.community
Related
I am completely new to NEO4j and using it for the first time ever now for my masters program. Ive read the documentation and watched tutorials online but can’t seem to figure out how I can represent my nodes in the way I want.
I have a dataframe with 3 columns, the first represents a page name, the second also represents a page name, and the third represents a similarity score between those two pages. How can I create a graph in NEO4J where the nodes are my unique page names and the relationships between nodes are drawn if there is a similarity score between them (so if the sim-score is 0 they don’t draw a relationship)? I want to show the similarity score as the text of the relationship.
Furthermore, I want to know if there is an easy way to figure out which node had the most relationships to other nodes?
I’ve added a screenshot of the header of my DF for clarity https://imgur.com/a/pg0knh6. I hope anyone can help me, thanks in advance!
Edit: What I have tried
LOAD CSV WITH HEADERS FROM 'file:///wiki-small.csv' AS line
MERGE (p:Page {name: line.First})
MERGE (p2:Page {name: line.Second})
MERGE (p)-[r:SIMILAR]->(p2)
ON CREATE SET r.similarity = toFloat(line.Sim)
Next block to remove the similarities relationships which are 0
MATCH ()-[r:SIMILAR]->() WHERE r.Sim=0
DELETE r
This works partially. As in it gives me the correct structure of the nodes but doesn't give me the similarity scores as relationship labels. I also still need to figure out how I can find the node with the most connections.
For the first question:
How can I create a graph in NEO4J where the nodes are my unique page names and the relationships between nodes are drawn if there is a similarity score between them (so if the sim-score is 0 they don’t draw a relationship)?
I think a better approach is to remove in advance the rows with similarity = 0.0 before ingesting them into Neo4j. Could it be something feasible? If your dataset is not so big, I think it is very fast to do in Python. Otherwise the solution you provide of deleting after inserting the data is an option.
In case of a big dataset, maybe it's better if you load the data using apoc.periodic.iterate or USING PERIODIC COMMIT.
Second question
I want to know if there is an easy way to figure out which node had the most relationships to other nodes?
This is an easy query. Again, you can do it with play Cypher or using APOC library:
# Plain Cypher
MATCH (n:Page)-[r:SIMILAR]->()
RETURN n.name, count(*) as cat
ORDER BY cnt DESC
# APOC
MATCH (n:Page)
RETURN apoc.node.degree(n, "SIMILAR>") AS output;
EDIT
To display the similarity scores, in Neo4j Desktop or in the others web interfaces, you can simply: click on a SIMILARITY arrow --> on the top of the running cell the labels are shown, click on the SIMILAR label marker --> on the bottom of the running cell, at the right of Caption, select the property that you want to show (similarity in your case)
Then all the arrows are displayed with the similarity score
To the second question: I think you should keep a clear separation between the way you store data and the way you visualize it. Having the similarity score (a property of the SIMILARITY edge) as a "label" is something that is best dealt with by using an adequate viz library or platform. Ours (Graphileon) could be such a platform, although there are also others.
We offer the possibility to "style" the edges with so-called selectors like
"label":"(%).property.simScore" that would use the simScore as a label. On top of that you could do thing like
"width":"evaluate((%).properties.simScore < 0.500 ? 3 : 10)"
or
"fillColor":"evaluate((%).properties.simScore < 0.500 ? grey : red)"
to distinguish visually high simScores.
Full disclosure : I work for Graphileon.
I'm building a Neo4J system just do to visualization in the Neo4J browser. I build the various nodes and relationships and I can visualize the database by running match (n) return n. The problem is that the resulting display shows the relationship names but not their associated properties. Can anyone tell me the cypher query to show the entire database including relationship properties? Thanks.
The neo4j browser does not support visualizing all properties (for nodes or relationships) at the same time. Such a capability would generally result in a very congested and unusable visualization, especially since the browser would also have to display the property names.
You can, however, opt to show the value of a single property per node label or or relationship type as its caption. You can do that manually, or you can edit the GRASS file to set all of the captions at once. As an example of how to set relationship captions in the GRASS file, the following entry in that file would specify that all BAR relationships should show their foo property:
relationship.BAR {
caption: '{foo}';
}
I have a question related to the NEO4J
when I create the nodes arround 40,000 and their relationships which are around 20,000 it means, 4K nodes and 6![enter image description here][1]K relationships which are less than 15MB for sure.
when I run a query
"match (n) optional match (n)-[r]-() return n,r: "
it starts to load and after waiting for long time it returns nothing (in graphical form). But in the resultant file it shows how many nodes and relationships I have but no graphs . I want to see the complete graph of my data. is there anyway to see how does it look like, its only to visualize. When I limit the query till 800 it works.
Is there anything I need to change in settings or in my system memory?
any suggestion for that?
The web console isn't very good for more than the hundreds of nodes scale. I'd suggest looking at Gephi:
http://gephi.github.io/
Alternatively you could use Linkurious, an online tool:
https://linkurio.us/
If you want to roll your own there are a number of choices out there. I like Sigma.js:
http://sigmajs.org/
Linkurious also has a library based on Sigma:
https://github.com/Linkurious/linkurious.js
EDIT: http://keylines.com/ is another online service like Linurious
I was reading a book recommended on Neo4j site: http://neo4j.com/books/graph-databases/ about graph database performance and it said:
"In contrast to relational databases, where join-intensive query performance deteriorates
as the dataset gets bigger, with a graph database performance tends to remain
relatively constant, even as the dataset grows. This is because queries are localized to a
portion of the graph. As a result, the execution time for each query is proportional only
to the size of the part of the graph traversed to satisfy that query, rather than the size of
the overall graph."
So e.g. I want to return only nodes with a label "Doctor, that's localized to a portion of a graph. But my question is how does the database itself know where those nodes are ? In other words, does it not need to traverse all nodes to find out whether or not they satisfy the query and make decision based on that ?
Neo4j has a special indexing for node labels so that it can find all nodes for a label without searching all nodes. Beyond that you can:
Create your own indexes based on node properties (either schema indexes or legacy indexes) in order to find nodes as starting points
Query by node IDs to find a starting point (though I'd suggest using your own property with an index if you need to identify nodes more permanently)
In general localized searches mean: you start from a smallish set of starting points which can be people, products, places, orders etc.
A portion of the graph that is annotated with a label, often doesn't fall into that category, i.e. all doctors are not a smallish set of starting points.
Your query would probably touch a large portion of the graph if you traverse out from all doctors to their neighborhoods.
A query like this would be a graph local one:
MATCH (:City {name:"SFO"})<-[:RESIDES_IN]-(d:Doctor)-[presc:PRESCRIBES]->(m:Medicine)
RETURN d.name, m.name, sum(presc.amount) as amount
I use query
"START a=node("+str(node1)+"),
b =node("+str(node2)+")
MATCH p=shortestPath(a-[:cooperate*..200]-b)
RETURN length(p)"
to see the path between a and b. I have many nodes, so when i run the query, sometimes it runs fast and sometimes run slowly.I use neo4j 1.9 community. Can anyone helps?
Query time is proportional to the amount of the graph searched. Your query allows for very deep searches, up to depth 200. If a. and b. are very close, you'll not search much of the graph, and the query will return very fast. If a. and b. are separated by 200 edges, you will search a very large swathe of graph (perhaps the whole graph?), which for a large graph will be much slower.
Is the graph changing between the two queries, is it possible these two nodes end up in different places in relation to eachother between the queries? For example if you generate some random data to populate the graph?