Calling Gephi from Ruby on Rails - ruby-on-rails

I'm very interested in building a data visualisation component and can
see how it could be done but would prefer not to reinvent something
which already exists. If this truly is a 'first' then I'm prepared to put my initial code
on Github for others to share [and hopefully improve !!]
Essentially I'd like to be able to do the following:
1) Access a table or tables within a database and create nodes based
on entries within them. Add nodes on create, remove them on delete.
2) Use the foreign keys and/or join tables [for many-many links] to
create edges. Add edge(s) when node created, remove edges when node
deleted, check and add/remove edges when node updated.
3) Pass the nodes and edges to Gephi for display
I can see how to do steps 1 and 2 quickly and easily -- what I haven't
been able to find (after much searching) is how to do step 3.
Has anyone had any success in doing this? -- any example code that they're willing to share ?
Thanks

We tried something similar once, but it may not help you that much. We wrote a Rake task that got data out our DB, which we then fed into Gephi manually. That wasn't really satisfactory and in the end I went with Rake task -> CSV -> R script for visualization (basically connections of users on a world map). If you are not dead set on using Gephi I could show you some of the R code :-)

Related

How should I represent tasks and sub-tasks in neo4j?

I'm trying to decide on a good data model for representing tasks and sub-tasks. It's a two part problem:
First, I want to be able to get a string of tasks (task1)-[:NEXT]->(task2)-[:NEXT]->(task3) etc. And I want to be able to gather them starting with the first one and display them in order. The cypher is simple enough ... something like
p = match(first:Task)-[:NEXT*]->(others:Task)
return o.name, o.instructions
order by length(p) // or something like this, probably with a union to get both the first task and other tasks in the same output
However, I'd also like to let a sub-task have children. For instance, I might have a set of tasks that constitute "How to make coffee", but then when I'm creating a set of tasks that constitute "How to make breakfast", I'd like to point to the "How to make coffee" set of tasks and re-use them.
It would be nice to get cypher to return a staggered list (e.g. 1, 1.1, 1.1.1, 2, etc.), but I'd actually be equally happy with just 1, 2, 3 ... n.
I've been looking and haven't seen a clear solution anywhere. Here's a picture of what I'm imagining. Any directions, thoughts, or references much appreciated.
I'm going to reuse my answer to another question, but it really is the best solution to this problem. The problem you have is that your scheme starts to fall apart once you have multiple distinct but intersecting paths. So you end up trying to corrupt your data to try and resolve the conflicts generated.
1) For each chain, create a node to represent that chain.
2) Create a relation from that node to each node in the chain, and add an index property on the relationship.
3) Run Cyphers on your "chain" nodes instead.
To expand on the above for your case...
In this instance, the chain node represents a list of tasks that have to be done in order; And can itself also be a task. Separating the list from the task is more work, but it would allow you to define multiple ways to complete a certain task. For example, I can make coffee by starting my coffee machine, asking Google to make it (which will start the machine), or go to Starbucks. This should be flexible enough to support anything you need to represent, without instances trampling on each-other. Part of the key here is relationships can have properties too! Don't be afraid to use that to make them distinct. (You could just add a 'group-id' to each task chain to make them distinct, but that will have scaling issues)

Neo4j - Creating nodes when an intermediate one does not exist

I'm using a LOAD CSV for some tests, and i figured out one problem. How can I create an intermediate node if there isn't one already, using a CREATE?
For example:
I have (p:Person)-[:HAS_BANCOMAT]->(bm:Bancomat)-[:FROM]->(b:Bank). In my model, one Person can only have one bancomat, so, if in my CSV file I'm going to find some people who actually have more then one Bancomat, I will just persist the first occurrence.
I've ended up with this script:
LOAD CSV FROM 'file:///myfile.csv' as row
WITH row.name as name, row.bank as bank, row.id as bancomat_id
//OMITTING CREATING PERSONS p AND BANKS b PART
//JUST GOING ON WHAT IS NOT WORKING
WHERE size((p)-[:HAS_BANCOMAT]->(:Bancomat)-[:FROM]->(b)) = 0
CREATE (bm:Bancomat {bancomatId: row.bancomat_id})
CREATE (p)-[:HAS_BANCOMAT]->(bm)
CREATE (bm)-[:FROM]->(b)
The size part it's not working. I've also tried using NOT over the (p)-[:HAS_BANCOMAT]->(:Bancomat)-[:FROM]->(b) path, bot it doesn't work either.
MERGE is not the solution I'm looking for.
Any suggestions about what I'm doing wrong?
EDIT1: it's not working because this script will create anyway more then one linked Bancomat if some persons got more then one.

Neo4j data modeling for branching/merging graphs

We are working on a system where users can define their own nodes and connections, and can query them with arbitrary queries. A user can create a "branch" much like in SCM systems and later can merge back changes into the main graph.
Is it possible to create an efficient data model for that in Neo4j? What would be the best approach? Of course we don't want to duplicate all the graph data for every branch as we have several million nodes in the DB.
I have read Ian Robinson's excellent article on Time-Based Versioned Graphs and Tom Zeppenfeldt's alternative approach with Network versioning using relationnodes but unfortunately they are solving a different problem.
I Would love to know what you guys think, any thoughts appreciated.
I'm not sure what your experience level is. Any insight into that would be helpful.
It would be my guess that this system would rely heavily on tags on the nodes. maybe come up with 5-20 node types that are very broad, including the names and a few key properties. Then you could allow the users to select from those base categories and create their own spin-offs by adding tags.
Say you had your basic categories of (:Thing{Name:"",Place:""}) and (:Object{Category:"",Count:4})
Your users would have a drop-down or something with "Thing" and "Object". They'd select "Thing" for instance, and type a new label (Say "Cool"), values for "Name" and "Place", and add any custom properties (IsAwesome:True).
So now you've got a new node (:Thing:Cool{Name:"Rock",Place:"Here",IsAwesome:True}) Which allows you to query by broad categories or a users created categories. Hopefully this would keep each broad category to a proportional fraction of your overall node count.
Not sure if this is exactly what you're asking for. Good luck!
Hmm. While this isn't insane, think about the type of system you're replacing first. SQL. In SQL databases you wouldn't use branches because it's data storage. If you're trying to get data from multiple sources into one DB, I'd suggest exporting them all to CSV files and using a MERGE statement in cypher to bring them all into your DB at once.
This could manifest similar to branching by having each person run a script on their own copy of the DB when you merge that takes all the nodes and edges in their copy and puts them all into a CSV. IE
MATCH (n)-[:e]-(n2)
RETURN n,e,n2
Then comparing these CSV's as you pull them into your final DB to see what's already there from the other copies.
IMPORT CSV WITH HEADERS FROM "file:\\YourFile.CSV" AS file
MERGE (N:Node{Property1:file.Property1, Property2:file.Property2})
MERGE (N2:Node{Property1:file.Property1, Property2:file.Property2})
MERGE (N)-[E:Edge]-(N2)
This will work, as long as you're using node types that you already know about and each person isn't creating new data structures that you don't know about until the merge.

Creating nodes and relationships at the same time in neo4j

I am trying to build an database in Neo4j with a structure that contains seven different types of nodes, in total around 4-5000 nodes and between them around 40000 relationships. The cypher code i am currently using is that i first create the nodes with the code:
Create (node1:type {name:'example1', type:'example2'})
Around 4000 of that example with unique nodes.
Then I've got relationships stated as such:
Create
(node1)-[:r]-(node51),
(node2)-[:r]-(node5),
(node3)-[:r]-(node2);
Around 40000 of such unique relationships.
With smaller scale graphs this has not been any problem at all. But with this one, the Executing query never stops loading.
Any suggestions on how I can make this type of query work? Or what i should do instead?
edit. What I'm trying to build is a big graph over a product, with it's releases, release versions, features etc. in the same way as the Movie graph example is built.
The product has about 6 releases in total, each release has around 20 releaseversion. In total there is 371 features and of there 371 features there is also 438 featureversions. ever releaseversion (120 in total) then has around 2-300 featureversions each. These Featureversions are mapped to its Feature whom has dependencies towards a little bit of everything in the db. I have also involed HW dependencies such as the possible hw to run these Features on, releases on etc. so basicaly im using cypher code such as:
Create (Product1:Product {name:'ABC', type:'Product'})
Create (Release1:Release {name:'12A', type:'Release'})
Create (Release2:Release {name:'13A, type:'release'})
Create (ReleaseVersion1:ReleaseVersion {name:'12.0.1, type:'ReleaseVersion'})
Create (ReleaseVersion2:ReleaseVersion {name:'12.0.2, type:'ReleaseVersion'})
and below those i've structured them up using
Create (Product1)<-[:Is_Version_Of]-(Release1),
(Product1)<-[:Is_Version_Of]-(Release2),
(Release2)<-[:Is_Version_Of]-(ReleaseVersion21),
All the way down to features, and then I've also added dependencies between them such as:
(Feature1)-[:Requires]->(Feature239),
(Feature239)-[:Requires]->(Feature51);
Since i had to find all this information from many different excel-sheets etc, i made the code this way thinking i could just put it together in one mass cypher query and run it on the /browser on the localhost. it worked really good as long as i did not use more than 4-5000 queries at a time. Then it created the entire database in about 5-10 seconds at maximum, but now when I'm trying to run around 45000 queries at the same time it has been running for almost 24 hours, and are still loading and saying "executing query...". I wonder if there is anyway i can improve the time it takes, will the database eventually be created? or can i do some smarter indexes or other things to improve the performance? because by the way my cypher is written now i cannot divide it into pieces since everything in the database has some sort of connection to the product. Do i need to rewrite the code or is there any smooth way around?
You can create multiple nodes and relationships interlinked with a single create statement, like this:
create (a { name: "foo" })-[:HELLO]->(b {name : "bar"}),
(c {name: "Baz"})-[:GOODBYE]->(d {name:"Quux"});
So that's one approach, rather than creating each node individually with a single statement, then each relationship with a single statement.
You can also create multiple relationships from objects by matching first, then creating:
match (a {name: "foo"}), (d {name:"Quux"}) create (a)-[:BLAH]->(d);
Of course you could have multiple match clauses, and multiple create clauses there.
You might try to match a given type of node, and then create all necessary relationships from that type of node. You have enough relationships that this is going to take many queries. Make sure you've indexed the property you're using to match the nodes. As your DB gets big, that's going to be important to permit fast lookup of things you're trying to create new relationships off of.
You haven't specified which query you're running that isn't "stopping loading". Update your question with specifics, and let us know what you've tried, and maybe it's possible to help.
If you have one of the nodes already created then a simple approach would be:
MATCH (n: user {uid: "1"}) CREATE (n) -[r: posted]-> (p: post {pid: "42", title: "Good Night", msg: "Have a nice and peaceful sleep.", author: n.uid});
Here the user node already exists and you have created a new relation and a new post node.
Another interesting approach might be to generate your statements directly in Excel, see http://blog.bruggen.com/2013/05/reloading-my-beergraph-using-in-graph.html?view=sidebar for an example. You can run a lot of CREATE statements in one transaction, so this should not be overly complicated.
If you're able to use the Neo4j 2.1 prerelease milestones, then you should try using the new LOAD CSV and PERIODIC COMMIT features. They are designed for just this kind of use case.
LOAD CSV allows you to describe the structure of your data with one or more Cypher patterns, while providing the values in CSV to avoid duplication.
PERIODIC COMMIT can help make large imports more reliable and also improve performance by reducing the amount of memory that is needed.
It is possible to use a single cypher query to create a new node as well as relate it to an existing now.
As an example, assume you're starting with:
an existing "One" node which has an "id" property "1"
And your goal is to:
create a second node, let's call that "Two", and it should have a property id:"2"
relate the two nodes together
You could achieve that goal using a single Cypher query like this:
MATCH (one:One {id:'1'})
CREATE (one) -[:RELATED_TO]-> (two:Two {id:'2'})

Neo4j linked list - multiple nodes

I'm working on understanding how to use linked lists to improve performance and create activity feeds on Neo4j.. Still working on learning Cypher, so I have a question.. I've found some examples of linked lists, but I need lists with bigger examples to finally put all the pieces together in my head..
I've used this code from grepcode and have found it to be more helpful than the example in the Neo4j manual. Yet I'm still a bit confused.. Can someone modify it to have say seven nodes with seven items in the linked list, and then insert a node on the front of it?
Yea, I'm trying to put the latest status update on the top of the linked list. This example doesn't really do that, but it's close.. so looking for some mods.. No, I'm not really coding yet, still trying to master Cypher first - will continue to study it for the next two weeks... Have the Ruby on Rails side working .. just need to understand linked lists used with Cypher/Neo a bit better.
CREATE zero={name:0,value:0}, two={value:2,name:2}, zero-[:LINK]->two-[:LINK]->zero
==== zero ====
MATCH zero-[:LINK*0..]->before,
after-[:LINK*0..]->zero,
before-[old:LINK]->after
WHERE before.value? <= 1 AND
1 <= after.value?
CREATE newValue={name:1,value : 1},
before-[:LINK]->newValue,
newValue-[:LINK]->after
DELETE old
==== zero ====
MATCH p = zero-[:LINK*1..]->zero
RETURN length(p) as list_length
What I'm trying to do in my mind is understand the before after and zero data sets - I almost have it, but want to see how it's done on a set with more than two starting nodes so as to clear up any confusion
Thank you!
The node in front is special as it doesn't have a incoming link relationship. Usually you also keep the connection to the head node somewhere, so this is about replacing this link to the head node and moving the head node one step further away. Something like this:
start user=node:node_auto_index(user="me")
match user-[old:MESSAGES]->head
delete old
create new_heads = { title: "Title", date : 2348972389, text: "Text" },
user-[:MESSAGES]->new_head-[:LINK]->head

Resources