I'm trying to create a model which allows users to navigate between positions (nodes) using various techniques (edges). Basically to traverse the positions graph using their own specific edges, which are unique and available just for them.
I want every user to be able to create their own edges(techniques) between nodes (positions). I've considered having technique edges to all have the same name/type - something like "LEADS_TO", but their properties will be different (name, description and most importantly, reference to user who is allowed to use the edge - basically a creator of that edge).
This means that during graph traversal, I'll have to filter only edges which have the the createdBy property matching with the userId.
Also, this model expects that if there will be 1000 users using the app, there will likely be 1000 unique edges (techniques) between 2 nodes (positions).
Would this be correct approach or is my graph thinking/understanding conceptually wrong? Thanks!
There are 3 ways to do what you want :
an edge with a property user_id that is a string. So like you said you will have multiple edges between your nodes pos1 & pos2 (on for each user)
an edge with a property user_id that is an array of string. So you will have one edge between your nodes pos1 & pos2, but the size of the array will match the number of user
prefix each edge's type with the user_id : USER_2_LEADS_TO
The choice depends on the type of your queries and also on hte volumetry, ie the average number of relationship you will have between your nodes pos1 & pos2.
As a first approach, your choice is good.
Cheers
Related
I have a lot of triplets with different attributes for each user node (it can contain user_id, location, role for some nodes and others can be without location but with their marital status, use_car etc.) and data node.
Data node can contain location, size, origin and sometimes it will contain only the location.
I have a relation between these nodes - that has some attributes like folder_name, approved/rejected etc.
Given a new triplet, how can I find the most similar triplets by their attributes (user node + relation + data node)
Is there any functionality to do this? I will be happy to get a direction to check or minimal example.
Maybe you could use Node Similarity algorithm that uses the Jaccard similarity under the hood. It is available in the Neo4j Graph Data Science library. This algorithm will allow you to compare node similarity based on their attribute nodes as you call them. If you also want to compare node properties, you would have to create your own comparing sets for each node and then use the basic implementation of Jaccard similarity score: https://neo4j.com/docs/graph-data-science/current/alpha-algorithms/jaccard/
something along these lines should get you started
MATCH (u:User)-->(something)<--(other:User)
WHERE u <> other
WITH u, COUNT(something) AS sharedSomethings
ORDER BY sharedSomethings DESC
i'm working with Users assigned to a Grid location
(User)-[:PICK_UP]->(Grid)
With the query
MATCH (u:User)-[:PICK_UP]->(g:Grid)-[:TO]-(g2:Grid)<-[:PICK_UP]-(u2:User)
RETURN g,g2,u,u2
I have the result
In the image i have two groups of nodes, that represent the grid and its neighbors with users (red node). I would like to 'group'/create relations between the users nearby to a Spot node.
E.g. with the first group: grids 34, 40, 41, with the users 1,4,5,9. I would like to group the users in my query so i can get the result [user1, u4, u5, u9] and then those users i can assign them to a Spot, like this
Any suggestions??
Thank you !!
The thing to keep in mind is that your (u:User)-[:PICK_UP]->(g:Grid)-[:TO]-(g2:Grid)<-[:PICK_UP]-(u2:User) is matching a specific path, and while you see two groups in the graphical display, there are actually overlapping paths there. Viewing your result in table mode might be helpful.
So onto answering your question! Firstly, this was a tricky one, but a really cool one. I think I've got a good solution:
MATCH path=(grid:Grid)-[:TO]-(other_grid:Grid)
WITH CASE WHEN ID(grid) < ID(other_grid) THEN ID(other_grid) ELSE ID(grid) END AS id_to_reject
WITH collect(DISTINCT id_to_reject) AS ids_to_reject
MATCH (grid:Grid)
WHERE NOT(ID(grid) IN ids_to_reject)
CREATE (spot:Spot)
WITH grid, spot
MATCH (grid)-[:TO|PICK_UP*1..6]-(user:User)
MERGE (user)-[:AT_SPOT]->(spot)
The first thing that the query does it to compare all Grid nodes which are related to each other. For each of these pairs it passes on the ID() of the Grid node which is greater. The IDs which aren't in the list are therefore the smallest in the group and can act as a representative of the group. For each one of these representative Grid nodes we create a Spot node.
Using that node, it finds all User nodes within six hops via both TO and PICK_UP relationships. That should give all users in the group (both the users of our representative grid as well as the users of the other grids).
Then it's a simple matter to MERGE a relationship from each user to the Spot.
The answer to this question shows how to get a list of all nodes connected to a particular node via a path of known relationship types.
As a follow up to that question, I'm trying to determine if traversing the graph like this is the most efficient way to get all nodes connected to a particular node via any path.
My scenario: I have a tree of groups (group can have any number of children). This I model with IS_PARENT_OF relationships. Groups can also relate to any other groups via a special relationship called role playing. This I model with PLAYS_ROLE_IN relationships.
The most common question I want to ask is MATCH(n {name: "xxx") -[*]-> (o) RETURN o.name, but this seems to be extremely slow on even a small number of nodes (4000 nodes - takes 5s to return an answer). Note that the graph may contain cycles (n-IS_PARENT_OF->o, n<-PLAYS_ROLE_IN-o).
Is connectedness via any path not something that can be indexed?
As a first point, by not using labels and an indexed property for your starting node, this will already need to first find ALL the nodes in the graph and opening the PropertyContainer to see if the node has the property name with a value "xxx".
Secondly, if you now an approximate maximum depth of parentship, you may want to limit the depth of the search
I would suggest you add a label of your choice to your nodes and index the name property.
Use label, e.g. :Group for your starting point and an index for :Group(name)
Then Neo4j can quickly find your starting point without scanning the whole graph.
You can easily see where the time is spent by prefixing your query with PROFILE.
Do you really want all arbitrarily long paths from the starting point? Or just all pairs of connected nodes?
If the latter then this query would be more efficient.
MATCH (n:Group)-[:IS_PARENT_OF|:PLAYS_ROLE_IN]->(m:Group)
RETURN n,m
I am trying to realize a datamodel in Neo4j. The model has points of interest in a city and streets. The streets connect the points.
Initially I thought that points and streets should both represented in the graph database as nodes.
Between these two different type of nodes there is a relationship ("point is connected with").
Now I am thinking the possibility that instead of representing the street as a node, perhaps is more correct to represent the street as relationship ("connects two points")
And this is my question actually. What is the more correct way to represent the network (line part) in a model: with nodes or with relationships?
The only major difference between relationships and nodes is that relationships must exist between two nodes. This means that you wouldn't be able to store a specific street if you didn't store two points of interest that it connects. So, if you see this being an issue, you may want to store streets as nodes. If you are certain that you will only want to store streets if there are points of interest in your database that exist on the street, then it'd make more sense to represent the streets as relationships.
In general, you should try to avoid storing properties in nodes that you only intend to use to find relationships between them. In this case, you mention possible storying a "point is connected with" property in each point of interest node. This would work, but is essentially just saying that a relationship exists between two points without actually using a relationship. Again, in the case where you want to be able to store streets that don't have points of interests existing on them, this may be necessary, and you could store streets that don't have points of interests on them by leaving the "point is connected with" property as NULL, but I would advise against this.
Another thing to think about is what you would store in the relationship. If you go with the model where streets are nodes, it becomes very difficult to represent quantities like distances between points of interest without adding relationships into your graph specifically for those properties, which may as well be properties of a street relationship.
UPDATE: Thought I'd add an example query to show how making the streets relationships can simplify your logic and make using your database much simpler and more intuitive.
Imagine you wish to find the path with the fewest points of interest between points A and B.
This is what the query would look like with the relationships model:
MATCH (a:Point {name: "foo"}), (b:Point {name: "bar"}),
p = shortestPath(a-[*:Street]-b)
RETURN p
By using relationships where appropriate, you enable the capabilities of Neo4j, allowing you to get a lot of work done with relatively simple queries. It's hard to think of a way to write this query in the model where you represent streets as nodes, but it would in all likelihood be much more complex and less efficient.
I am new to Neo4j and I need some advice from the more experienced Neo4j developers.
In which situation does it makes sense for an inventory system to represent individual items as a path through their properties instead of a node with the same properties?
In order to make my self clear:
Let's say we have a eyeglass lens. This item has properties like it's SPHERE power it's CYLINDER power and an AXIS, among others.
There is a finite set of SPHERE powers but also of CYLINDER power and AXIS. The combination of those makes an item (lens).
Does it make sense to represent a lens like this:
MATCH (lens:Lens)-[:-2.00]-(sph:Sphere:{power:'-2.00'})-[:-0.50]-(cyl:Cylinder{power:'-0.50'})-[:90]-(ax:Axis{degree:'90'})
RETURN lens.brand_name, lens.price
Please note that the above item(lens) can be available from different manufacturers and with different brand names and list prices so "lens" will represent all individual brands that can match with the above query and will have as properties the brand name and price, at least.
Let's say you have a piece of data ("SPHERE"). When should it be a property of the lens node, and when should it be its own node, via relation?
Do you need to relate multiple lenses to the same sphere? This argues it should be its own node, so that multiple lenses can link to the same sphere.
Do you need to assert extra properties about the sphere value? (Like who measured it, or when?) This argues you should make it a separate node.
Do you need to store properties about the relationship? If the relationship is any more complicated than simple "HAS A" you might want a relationship between two nodes, so you can store properties on the relationship.
Any of those cases would argue you should store that piece of data as a separate node, and then relate it by relationship.
ON THE OTHER HAND, if it's a simple primitive data type (float), with a simple "HAS-A" relationship to the parent (i.e. a lens HAS-A sphere measurement) and you have no need for extra metadata, then it should be a node property.
I'm not an optometrist but I think this latter situation is your case, I'm just trying to give you a more general answer. "Sphere" should probably be a node property, but the cases above are how to think about the issue more generally for future data items.
In your special domain, with finite ranges and discrete values for each of the parameters, it absolutely makes sense to model the properties of a lens as value nodes. The resulting index graph seems not to be too large, and quite balanced (no supernodes).