pre-filter large network in cytoscape - cytoscape

I'm have a large network of ~300K nodes that my machine has a hard time plotting with Cytpscape (Desktop version under Windows).
I know that the network has discrete groups that are not interconnected - I also have the id of those groups as a node attribute.
I want to be able to graph each group based on what id I select.
I tried achieving this with the filter (Cytoscape gave me the option to not plot the graph when opening it the first time - "Do you want to create a view for your large network now?") but it still seems to try to plot the entire graph when setting the filter and then clicking on "Create View".
So in short: Is there any way to "pre-filter" the graph, or to somehow else cut it up so that cytoscape will plot the one I want?
Any thoughts would be appreciated.

You are almost there. Once you have the filter set, then you need to create a subnetwork (see File > New Network), then you can create a view of that subset (or it will automatically be created if the node count is below the threshold).

Related

NEO4J How to make graph with relationships

I am completely new to NEO4j and using it for the first time ever now for my masters program. Ive read the documentation and watched tutorials online but can’t seem to figure out how I can represent my nodes in the way I want.
I have a dataframe with 3 columns, the first represents a page name, the second also represents a page name, and the third represents a similarity score between those two pages. How can I create a graph in NEO4J where the nodes are my unique page names and the relationships between nodes are drawn if there is a similarity score between them (so if the sim-score is 0 they don’t draw a relationship)? I want to show the similarity score as the text of the relationship.
Furthermore, I want to know if there is an easy way to figure out which node had the most relationships to other nodes?
I’ve added a screenshot of the header of my DF for clarity https://imgur.com/a/pg0knh6. I hope anyone can help me, thanks in advance!
Edit: What I have tried
LOAD CSV WITH HEADERS FROM 'file:///wiki-small.csv' AS line
MERGE (p:Page {name: line.First})
MERGE (p2:Page {name: line.Second})
MERGE (p)-[r:SIMILAR]->(p2)
ON CREATE SET r.similarity = toFloat(line.Sim)
Next block to remove the similarities relationships which are 0
MATCH ()-[r:SIMILAR]->() WHERE r.Sim=0
DELETE r
This works partially. As in it gives me the correct structure of the nodes but doesn't give me the similarity scores as relationship labels. I also still need to figure out how I can find the node with the most connections.
For the first question:
How can I create a graph in NEO4J where the nodes are my unique page names and the relationships between nodes are drawn if there is a similarity score between them (so if the sim-score is 0 they don’t draw a relationship)?
I think a better approach is to remove in advance the rows with similarity = 0.0 before ingesting them into Neo4j. Could it be something feasible? If your dataset is not so big, I think it is very fast to do in Python. Otherwise the solution you provide of deleting after inserting the data is an option.
In case of a big dataset, maybe it's better if you load the data using apoc.periodic.iterate or USING PERIODIC COMMIT.
Second question
I want to know if there is an easy way to figure out which node had the most relationships to other nodes?
This is an easy query. Again, you can do it with play Cypher or using APOC library:
# Plain Cypher
MATCH (n:Page)-[r:SIMILAR]->()
RETURN n.name, count(*) as cat
ORDER BY cnt DESC
# APOC
MATCH (n:Page)
RETURN apoc.node.degree(n, "SIMILAR>") AS output;
EDIT
To display the similarity scores, in Neo4j Desktop or in the others web interfaces, you can simply: click on a SIMILARITY arrow --> on the top of the running cell the labels are shown, click on the SIMILAR label marker --> on the bottom of the running cell, at the right of Caption, select the property that you want to show (similarity in your case)
Then all the arrows are displayed with the similarity score
To the second question: I think you should keep a clear separation between the way you store data and the way you visualize it. Having the similarity score (a property of the SIMILARITY edge) as a "label" is something that is best dealt with by using an adequate viz library or platform. Ours (Graphileon) could be such a platform, although there are also others.
We offer the possibility to "style" the edges with so-called selectors like
"label":"(%).property.simScore" that would use the simScore as a label. On top of that you could do thing like
"width":"evaluate((%).properties.simScore < 0.500 ? 3 : 10)"
or
"fillColor":"evaluate((%).properties.simScore < 0.500 ? grey : red)"
to distinguish visually high simScores.
Full disclosure : I work for Graphileon.

Integrate multiple same structure datasets in one database

I have 8 different datasets with the same structure. I am using Neo4j and need to query all of them at different points on the website I am developing. What would be the approaches at storing the datasets in one database?
One idea that comes to my mind is to supply for each node an additional property that would distinguish nodes of one dataset from nodes of the other ones. But that seems too repetitive and wrong for me. The other idea is just to create 8 databases and query them separately but how could I do that? Running each one in its own port seems crazy.
Any suggestions would be greatly appreciated.
If your datasets are in a tree structure, you could add a different root node to each of them that you could use for reference, similar to GraphAware TimeTree. Another option (better than a property, I think) would be to differentiate each dataset by adding a specific label to nodes from that dataset (i.e. all nodes from "dataset A" get a :DataSetA label)
I imagine that the specific structure of your dataset may yield other options. For example, if you always begin traversals of the dataset from a few set locations, you only need to be able to determine which dataset the entry points are a part of, because once entered, all traversals would be made within the same dataset <-- if that makes sense.

Data model of existing data in Neo4J

I have a small dataset loaded into Neo4J consisting of a 6 node labels with about 20 nodes for each label and there are about 10 different relationships. I was wondering if you can automatically create a picture of this data model using the data available in the database.
I would like to create something like this automatically from the data:
taken from http://neo4j.com/docs/stable/cypherdoc-movie-database.html
I know that it would be quite simple doing it manually in this example but it could come in handy looking at more complex data models.
Any suggestions?
Thank you Michael, that helped. There is also functionality in the web tool that ships with Neo4J that can do something similar although less graphically.
You click on the little bubbles in the top left corner of the interface and then there is a predefined query that extracts all lables and relations from the graph.

Hierarchical labels or dense nodes?

(I am new to Neo4J and very excited about it)
Here is my conceptual question:
Suppose we want to represent life on earth (based on a biological taxonomy hierarchy).
However, suppose at the leaves of the taxonomy tree we want to actually identify individual organisms. For example, at the mammalia branch, the homo-sapient sub-branch we want to identify each and every one of 7 billion humans and do the same for some other branches (give an ID to every living known great Ape left in the wild and so on)
Is this type of organization done with dense nodes (in the billions) ? or is it done with extensive use of labels (do labels support nesting)?
From my point of view it's better to use multiple nodes instead of multiple labels.
But it depends on the use case and what you want to do with it.
Neo4j doesn't support nested labels or some labels hierarchy.
Here are some resources which could be interesting for you
Graph Databases in Life Sciences: Bringing Biology Back to Its Nature
Open Tree of Life and Neo4j

Sizing nodes according to input weighting not connectivity

I am trying to use Gephi to help graph interview analysis results. The relationship map is only used to describe conventional connections and life cycles. What I would like to do is to size the nodes based on the number of interview responses that talk about the node, not the number of connections it has or the weighting of those connections. Can Gephi do this and if so, how do I do it please?
I have loaded in node weightings and can see this as part of node labels, but haven't been able to find a way of this having an effect on node size.
Many thanks
Data input field - change input format to integer
You can load the graph in gexf format adding a float attribute and add this attribute to ALL the nodes. It would like something like:
```
...
...
```
Once imported in Gephi, just go to the appearance tab and it will appear as one more attribute in "ranking" drop-down list.
If any problem with gefx format, let me know and I'll will share a whole example (just trying to remain short :-)
Regards

Resources