Neo4j.rb: autoincrement relationship attribute? - ruby-on-rails

Is it possible to autoincrement an ActiveRel attribute? In contrast to ActiveRecord, it doesn't seem that ActiveNode/ActiveRel support autoincrement attributes out-of-the-box.
I considered using before_save to manually generate an id. However, it appears that it's not possible to order relationships (to find the previous highest id).
How does one implement autoincrementing ids? (I know Neo4j.rb generates UUIDs but this application requires a separate incremental serial number)

There is the auto-incrementing ID from Neo4j which starts at 0 independently for nodes and relationships. It can (I think) be depended on for referring to nodes in the short-term (i.e. seconds), but not in the long term as they may get cleaned up and moved around by Neo4j for performance.
If you're thinking about putting IDs on relationships what you're doing may not be the right modeling approach for Neo4j (though I couldn't say for sure without details). Relationships themselves can't be queried directly, but rather can only be accessed via first finding nodes. I think it would make sense to have an incrementing ID which is unique for all relationships relative to a node, but not globally. This is also why Neo4j.rb doesn't generate UUIDs for relationships. You may want to consider representing the relationships as intermediate nodes.
If you want to implement an incrementing ID on an ActiveNode model, before_save should be a fine way to do it.

Related

Neo4j: which is faster, indexed node property or relationship property?

What is faster/better way to model, searching for a node with an indexed property, or having a single ROOT node with lots of ChildOf relationships, each with a relationship property equal to the index property and starting the search from ROOT and traversing the relationships that have the correct relationship property? Assume the key being sought is unique.
My understanding is that the current version of Neo4j (2.2.3) uses the built-in indexing features of Neo4j (as of version 2.x) when you declare an index on the label.property combination you wish to use in a predicate. With relationship properties, the indexing does not use the newer indexing scheme. You can only use the old legacy indexing for relationship properties, which is not as fast.
See the note on this page.
I think this is the wrong way to think about this question; you should model the data in the way that's more natural for the domain.
It's hard to answer which will be faster because you haven't specified things like how many valid values the index would have in it, the total number of nodes, and so on. In any case, if you're trying to express some kind of semantic relationship like ChildOf you're almost certainly better off with the node and relationships. You should consider storing the ID of one node as a property value of another node to be a major anti-pattern to be avoided.
If on the other hand, the property is say, gender of a person, M/F, and you have 1,000,000 people, then you end up with two "index nodes", each with 500,000 relationships, that's not going to be a good idea.
In general, neo4j is set up to traverse relationships fast, so in general you'll be better off exploiting relationships. But there are a lot of exceptions to that which depend on your domain's semantics, and the cardinality of your attribute values, so YMMV.

How to determine the Max property on a Relationship in Neo4j 2.2.3

How do you quickly get the maximum (or minimum) value for a property of all instances of a relationship? You can assume the machine I'm running this on is well within the recommended spec's for the cpu and memory size of graph and the heap size is set accordingly.
Facts:
Using Neo4j v2.2.3
Only have access to modify graph via Cypher query language which I'm hitting via PHP or in the web interfacxe--would love to avoid any solution that requires java coding.
I've got a relationship, call it likes that has a single property id that is an integer.
There's about 100 million of these relationships and growing
Every day I grab new likes from a MySQL table to add to the graph within in Neo4j
The relationship property id is actually the primary key (auto incrementing integer) from the raw MySQL table.
I only want to add new likes so before querying MySQL for the new entries I want to get the max id from the likes, so I can use it in my SQL query as SELECT * FROM likes_table WHERE id > max_neo4j_like_property_id
How can I accomplish getting the max id property from neo4j in a optimal way? Please indicate the create statement needed for any index as well as the query you'd used to get the final result.
I've tried creating an index as follows:
CREATE INDEX ON :likes(id);
After the index is online I've tried:
MATCH ()-[r:likes]-() RETURN r.i ORDER BY r.id DESC LIMIT 1
as well as:
MATCH ()-[r:likes]->() RETURN MAX(r.id)
They work but take freaking forever as the explain plan for both indicate no indexes being used.
UPDATE: Holy $?##$?!!!! It looks like the new schema indexes aren't functional for relationships even though you can create them and show them with :schema. It also looks as if there's no way with cypher directly to create Legacy Indexes which look like they might solve this issue.
If you need to query relationship properties, it is generally a sign of a model issue.
The need of this query reveals you that you would better extract these properties into a node, that you'll then be able to query faster.
I don't say it is 100% the case, but certainly 99% of the people seen so far with the same problem has been demonstrating this model concern.
What is your model right now ?
Also you don't use labels at all in your query, likes have a context bound to the nodes.

Neo4j unique IDs by tree with root node counter?

Is using a tree with a counter on the root node, to be referenced and incremented when creating new nodes, a viable way of managing unique IDs in Neo4j? In a previous question on performance on this forum (Neo4j merge performance VS create/set), the approach was described, and it occurred to me it may suggest a methodology for unique ID management without having to extend the Neo4j database (and support that extension). However, I noticed this approach has not been mentioned in other discussions on best practice for unique ID management (Best practice for unique IDs in Neo4J and other databases?).
Can anyone help validate or reject this approach?
Thanks!
You can just create a singleton node (I'll give it the label IdCounter in my example) to hold the "next-valid ID counter" value. There is no need for it be part of any "tree" or for it to have any relationships at all.
When you create the singleton, initialize it with the first id value that you want to use. For example:
CREATE (:IdCounter {nextId: 1});
Here is a simple example of how to use it when creating a new node.
MATCH (c:IdCounter)
CREATE (x {id: c.nextId})
SET c.nextId = c.nextId + 1
RETURN x;
Since all Cypher queries are transactional, if the node creation did not happen for any reason, then the nextId increment would also not be done, so you should never end up with any gaps in assigned id numbers.
However, to avoid re-using the same id number, you would have to write your queries carefully to ensure that the increment always happens whenever you create a new node (using CREATE, CREATE UNIQUE, or MERGE).

neo4j - Relationship between three nodes

I'm totally new to Neo4j and I'm testing it in these days.
One issue I have with it is how to correctly implement a relationship which involves 3 different nodes using Spring Data. Suppose, for example, that I have 3 #NodeEntitys: User, Tag and TaggableObject.
As you can argue, a User can add a Tag to a TaggableObject; I model this operation with a #RelationshipEntity TaggingOperation.
However, I can't find a simple way to glue the 3 entities inside the relationship. I mean, the obvious choice is to set #StartNode User tagger and #EndNode TaggedObject taggedObject; but how can I also add the Tag to the relationship?
This is called a "hyperedge", I believe, and it's not something that Neo4j supports directly. You can create an additional node to support it, tough. So you could have a TagEvent node with a schema like so:
(:User)-[:PERFORMED]->(:TagEvent)
(:Tag)<-[:USED]-(:TagEvent)
(:TagObject)<-[:TAGGED]-(:TagEvent)
Another alternative is to store a foreign key as a property on a relationship or a node. Obviously that's not very graphy, but if you just need it for reference that might not be a bad solution. Just remember to not use the internal Neo4j ID as in future versions that may not be dependable. You should create your own ID for this purpose.

Working with cyclical graphs in RoR

I haven't attempted to work with graphs in Rails before, and am curious as to the best approach. Some background:
I am making a Rails 3 site and thought it would be interesting to store certain objects and their relationships as a graph, where each object is a node and some are connected to show that the two objects are related. The graph does contain cycles, and there wouldn't be more than 100-150 nodes in the graph (probably only closer to 50). One node probably wouldn't have more than five edges, with an average of three to four edges per node.
I figured a simple join table with two columns (each the ID of the object) might be the easiest way to do it, but I doubt it's the best way. Another thought was to use a plugin such as acts_as_tree (which doesn't appear to be updated for Rails 3...) or acts_as_tree_with_dotted_ids, but I am unsure of their ability to work with cycles rather than hierarchical trees.
the most I would currently like is to easily traverse from one node to its siblings. I really can't think of a reason I would want to traverse to a node's sibling's sibling, which is why I was considering just making an SQL join table. I only want to have a section on the site to display objects related to a specified object, and this graph is one of the ways I am specifying relationships.
Advice? Things I should check out? Thanks!
I would use two SQL tables, node and link where a link is simply two foreign keys, source and target. This way you can get the set of inbound or outbound links to a node by performing an SQL select query by constraining the source or target node id. You could take it a step further by adding a "graph_id" column to both tables so you can retrieve all the data for a graph in two queries and build it as a post-processing step.
This strategy should be just as easy (if not easier) than finding, installing, learning to use, and implementing a plugin to do the same, IMHO.
Depending on whether your concern is primarily about operations on graphs, or on storage of graphs, what you need is potentially quite different. If you want convenient operations on graphs, investigate the gem "rgl" (ruby graph library). It has implementations of most of the basic classic traversal and search algorithms.
If you're dealing with something on the order of 150 nodes, you can probably get away with a minimalist adjacency list representation in the database itself, or incidence list. Then you can feed that into RGL for traversal and search operations.
If I remember correctly, RGL has enough abstraction that you may be able to work with an existing class structure and you simply provide methods to get adjacent nodes.
Assuming that it is a directed graph, use a mapping table such as
id | src | dest
where src and dest are FKs to your object table.
If your objects are not all of the same type, either have them all inherit a ruby class or have another table:
id | type | type_id
Where type is the type of object it is and type_id is its id in another table.
By doing this, you should be able to get an array of objects for each object that it points to using:
select dest
from maptable
where dest = self.id
If you need to know its inbound edges, you can preform the same type of query using src instead of dest.
From there, you should be able to easily write any graph algorithms that you want. If you need weights, you can modify the mapping table as such.
id | src | dest | weight

Resources