Frequency of connections between nodes represented by the edge thickness

Frequency of connections between nodes represented by the edge thickness - neo4j

I have a directed graph where some two nodes are connected by multiple edges.
However I would like the two connected nodes to be visualised with only one edge with a property specifying the number of edges between them and possibly a relative edge thickness.
What query do I have to use to achieve this?

If you are referring to the neo4j browser web ui, there is no command to customize the visualization in that way.
In particular, the Cypher query language is only for performing DB operations, and does not have a way to directly affect visualizations.

Related

How to store graphs in a graph database?

I'm currently working on a project where we want to store sets of graphs in a graph database. For this purpose im using Neo4J and I can imagine two different solutions.
Put the nodes as the nodes into the database and the edges as the edges. Every edge and every node has a property graph_id, which indicates for nodes and edges to which graph they belong (trivial).
create three different labels of Nodes. The first label is Node, the second Edge and the third Graph. Now every node can be stored as Node, every edge as edge-Node and every edge and node belonging to a graph is connected to a graph node. ( the Neo4J documentation suggests to create labels for the different kinds of elements in the database )
Which solution would you prefer?

I think you are misreading the docs. They mean that different kinds of nodes should have different labels and different kinds of edges should have different labels. If you make everything, including edges into "nodes" then you will not be able to apply the library's implementations of graph theory algorithms.
Option 1 is the way to go.

How to realize a multi-client capability in Neo4j?

Initial situation
I have several independent and disconnected graphs, each of them have a hierarchical like structure with a local root element. Each of these graphs consists of approximately 8 million nodes and 40 million relationships. I have successfully created a three-digit number of Cypher queries, which should now be applied to a single graph only and not the entirety of all graphs. The graph, the queries have to apply to, is specified by its root node.
Challenge to be solved
How can I realize a kind of pseudo multi-client capability for a graph, if all graphs have to remain in a common Neo4j database for reasons of reporting and pattern matching?
approach to the problem / preliminary result
Implement a single shortest path to the given root element for selection purposes in really every query at the beginning? Cons:
huge performance losses expected
with high development costs
Expand each graph with a separate, additional label? Cons:
complex queries, high development effort

For these cases, adding a specific label per tenant/client to all nodes in the subgraph tends to be the approach taken. It would require you to ensure that when you match to the relevant nodes in the query that you additionally make sure the nodes you're working with have the client's label present.
As a note for the future, native multi-tenancy support is one of the key features we're working on for the next year.

Combine nodes by categorical attribute in Gephi

I'm working on a visualization of organizational structure in Gephi. I have a graph of individuals, connected by whether or not they have worked together in the past. Graphing individuals looks good, but I would like to combine nodes (individuals) based on a categorical attribute (department; string). The new graph -- or at least a visualization -- would have a node for every department, preferably with a numerical weight proportional to how many individuals comprise it.
I could do this in the scripts that generate the graph files before importing. But I did exactly this about a year ago entirely in Gephi. Either the functionality was removed (like the pie charts!) or I've just forgotten (more likely).
Am using Gephi 0.9.1. Any help, much appreciated.

Is Neo4j Suitable for Large Scale-free Network?

I know Neo4j works well on large graphs, under assumption of nodes are generally equally distributed. However, in most cases, graphs in the real world follow a scale-free degree distribution. My question is, if relationship types and node labels are the same respectively, are there any ways to remain speedy when querying through nodes with really high degree, like 100k neighbor nodes?

Neo4J Performance Benchmarking

I have created a basic implementation of high level client over Neo4J (https://github.com/impetus-opensource/Kundera/tree/trunk/kundera-neo4j) and want to compare its performance with Native neo4j driver (and maybe SpringData too). This way I would be able to determine overhead my library is putting over native driver.
I plan to create an extension of YCSB for Neo4J.
My question is: what should be considered as a basic unit of object to be written into neo4j (should it be a single node or a couple of nodes joined by an edge).
What's current practice in Neo4J world. How people benchmarking neo4j performance are doing it.

There's already been some work for benchmarking Neo4J with Gatling: http://maxdemarzi.com/2013/02/14/neo4j-and-gatling-sitting-in-a-tree-performance-t-e-s-t-ing/
You could maybe adapt it.

See graphdb-benchmarks
The project graphdb-benchmarks is a benchmark between popular graph dataases. Currently the framework supports Titan, OrientDB, Neo4j and Sparksee. The purpose of this benchmark is to examine the performance of each graph database in terms of execution time. The benchmark is composed of four workloads, Clustering, Massive Insertion, Single Insertion and Query Workload. Every workload has been designed to simulate common operations in graph database systems.
Clustering Workload (CW): CW consists of a well-known community detection algorithm for modularity optimization, the Louvain Method. We adapt the algorithm on top of the benchmarked graph databases and employ cache techniques to take advantage of both graph database capabilities and in-memory execution speed. We measure the time the algorithm needs to converge.
Massive Insertion Workload (MIW): Create the graph database and configure it for massive loading, then we populate it with a particular dataset. We measure the time for the creation of the whole graph.
Single Insertion Workload (SIW): Create the graph database and load it with a particular dataset. Every object insertion (node or edge) is committed directly and the graph is constructed incrementally. We measure the insertion time per block, which consists of one thousand edges and the nodes that appear during the insertion of these edges.
Query Workload (QW): Execute three common queries:
FindNeighbours (FN): finds the neighbours of all nodes.
FindAdjacentNodes (FA): finds the adjacent nodes of all edges.
FindShortestPath (FS): finds the shortest path between the first node and 100 randomly picked nodes.

One way to performance-test is to use e.g. http://gatling-tool.org/. There is work underway to create benchmark frameworks at http://ldbc.eu . Otherwise, benchmarking is highly dependent on your domain dataset and the queries you are trying to do. Maybe you could start at https://github.com/neo4j/performance-benchmark and improve on it?

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart