Composite index usage - neo4j

I have this Cypher query and I can't figure out how to take advantage of a composite index when running it. I tried an index on type, for, buying and renting properties on both nodes, but the index doesn't show up on query's PROFILE.
MATCH (prof:Profile)
MATCH (prop:Property)
WHERE prof.type = prop.type
and prof.for = prop.for
and prof.buying = prop.buying
and prof.renting = prop.renting
Is it even possible to achieve this?

This will do the trick (for the second node type) :
PROFILE MATCH (pp:Property)
WITH pp, pp.type as type, pp.for as for, pp.bying as buying, pp.renting as renting
MATCH (pf:Profile)
WHERE pf.type = type
AND pf.for = for
AND pf.buying = buying
AND pf.renting = renting
RETURN pp, pf;
The reason the index is not used is that it has no actual values to use it with, those are only available after the nodes have been scanned.
Hope this helps.
Regards,
Tom

Related

Neo4J Match & Set Multiple Relationships/Nodes in One Query

I'm currently running the following query to update the properties on two nodes and relationships.
I'd like to be able to update 1,000 nodes and the corresponding relationships in one query.
MATCH (p1:Person)-[r1:OWNS_CAR]->(c1:Car) WHERE id(r1) = 3018
MATCH (p2:Person)-[r2:OWNS_CAR]->(c2:Car) WHERE id(r2) = 3019
SET c1.serial_number = 'SERIAL027436', c1.signature = 'SIGNATURE728934',
r1.serial_number = 'SERIAL78765', r1.signature = 'SIGNATURE749532',
c2.serial_number = 'SERIAL027436', c2.signature = 'SIGNATURE728934',
r2.serial_number = 'SERIAL78765', r2.signature = 'SIGNATURE749532'
This query has issues when you run it in larger quantities. Is there a better way?
Thank you.
You could work with a LOAD CSV. Your input would contain the keys (not the ids, using the ids is not recommended) for Person and Car and whatever properties you need to set. For example
personId, carId, serial_number, signature
00001, 00045, SERIAL78765, SIGNATURE728934
00002, 00046, SERIAL78665, SIGNATURE724934
Your query would then be something like :
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'file:///input.csv' AS row
MATCH (p:Person {personId: row.PersonId})-[r:OWNS_CAR]->(c:Car {carId: row.carId})
SET r.serial_number = row.serialnumber, c.signature = row.signature
Note that you should have unique constraints on Person and Car to make that work. You can do thousands (even millions) like that very quickly ...
Hope this helps,
Tom

Neo4j - Total relationships from Node

First let me point out that I am new to Neo4j, if there is a better way to do this please let me know.
Secondly, how can I also find out:
// QUERY
if(!empty($_SESSION['uid'])) {
$uid = $_SESSION['uid'];
$query = 'MATCH (cu:User)
WHERE cu.id = "'.$uid.'"
WITH cu
MATCH (p:Painting)<-[:PAINTED]-(u:User)-[:LIVES_IN]->(l:Location)
WHERE (p.slug) = "'.$slug.'"
RETURN p,u,l,
EXISTS((cu)-[:LIKES]->(p)) as liked,
EXISTS((cu)-[:FOLLOWS]->(u)) as followed,
SIZE((:User)-[:LIKES]->(p)) as total_likes,
SIZE((:User)-[:FOLLOWS]->(u)) as total_follows
LIMIT 1';
} else {
$query = 'MATCH (p:Painting)<-[:PAINTED]-(u:User)-[:LIVES_IN]->(l:Location) WHERE (p.slug) = "'.$slug.'"
RETURN p,u,l,
SIZE((:User)-[:LIKES]->(p)) as total_likes,
SIZE((:User)-[:FOLLOWS]->(u)) as total_follows
LIMIT 1';
}
So now I'm running two queries depending on the availability of a user. I have to imagine there must be a better / more efficient way to do this but at least it works for the moment.
It sounds like there's a concept of a current user, that you'll probably get by ID or name or similar, and you want to see if they like the painting and follow the user for this particular painting defined by its "slug" property (is that a unique value, or does this match to multiple paintings?)
I'll make a stab at this one, and offer some improvements on how to better get the number of likes and follows .
MATCH (currentUser:User)
WHERE currentUser.ID = 123
WITH currentUser
MATCH (p:Painting)<-[:PAINTED]-(painter:User)-[:LIVES_IN]->(l:Location)
WHERE (p.slug) = "blah-blah-blah"
RETURN p,painter,l,
SIZE( (:User)-[:LIKES]->(p) ) as painting_likes,
SIZE( (:User)-[:FOLLOWS]->(painter) ) as painter_follows,
EXISTS( (currentUser)-[:LIKES]->(p) ) as current_user_liked,
EXISTS( (currentUser)-[:FOLLOWS]->(painter) ) as current_user_followed
LIMIT 1
MATCH (p:Painting)<-[:PAINTED]-(u:User)-[:LIVES_IN]->(l:Location) WHERE (p.slug) = "blah-blah-blah"
WITH p, u, l
OPTIONAL MATCH (ul:User)-[likes:LIKES]->(p)
OPTIONAL MATCH (uf:User)-[follows:FOLLOWS]->(u)
RETURN p,u,l,ul,uf LIMIT 1
This returns the correct graph, I can't figure out why I'm having such a hard time verifying the like / follow connection.
Ugh, I've tried to many things today, I do not know what is up and down anymore. I think the OPTIONAL MATCHES i have now actually handle total values. I guess what I'm having trouble with is verifying if the existing user has liked or followed.

connecting the nodes with relationships in py2neo

I have the following python code to make a graph in neo4j. I am using py2neo version 2.0.3.
import json
from py2neo import neo4j, Node, Relationship, Graph
graph = neo4j.Graph("http://localhost:7474/db/data/")
with open("example.json") as f:
for line in f:
while True:
try:
file = json.loads(line)
break
except ValueError:
# Not yet a complete JSON value
line += next(f)
# Now creating the node and relationships
news, = graph.create(Node("Mainstream_News", id=unicode(file["_id"]), entry_url=unicode(file["entry_url"]),
title=unicode(file["title"]))) # Comma unpacks length-1 tuple.
authors, = graph.create(
Node("Authors", auth_name=unicode(file["auth_name"]), auth_url=unicode(file["auth_url"]),
auth_eml=unicode(file["auth_eml"])))
graph.create(Relationship(news, "hasAuthor", authors ))
I can create a graph with nodes Mainstream_News and Authors with a relation 'hasAuthor'. My problem is when I am doing this I am having one Mainstream_News node with one Authors but in reality one author nodes has more than one Mainstream_News. I would like to make auth_name property of a Author nodes as a index to connect with the Mainstream_news nodes. Any suggestions will be great.
You are creating a new Authors node each time through your loop, even if an Author node (with the same properties) already exists.
First, I think you should create uniqueness constraints on Authors(auth_name) and Mainstream_News(id), to enforce what seem to be your requirements. This only needs to be done once. A uniqueness constraint also creates an index for you automatically, which is a bonus.
graph.schema.create_uniqueness_constraint("Authors", "auth_name")
graph.schema.create_uniqueness_constraint("Mainstream_News", "id")
But you will probably have to empty out your DB first (at least of all Authors and Mainstream_News nodes and their relationships), since I presume it currently has a lot of duplicate nodes.
Then, you can use the merge_one and create_unique APIs to prevent duplicate nodes and relationships:
news = graph.merge_one("Mainstream_News", "id", unicode(file["_id"]))
news.properties["entry_url"] = unicode(file["entry_url"])
news.properties["title"] = unicode(file["title"])
authors = graph.merge_one("Authors", "auth_name", unicode(file["auth_name"]))
news.properties["auth_url"] = unicode(file["auth_url"])
news.properties["auth_eml"] = unicode(file["auth_eml"])
graph.create_unique(Relationship(news, "hasAuthor", authors))
This is what I normally do, as I find it easier to know what's happening. As far as I know there are a but when you create_unique with only a Node, and there are no need to create the nodes, when you also have to create an edge.
I don't have the database on this computer, so please bear with me, if there are some typo'es, I'll correct it in the morning, but I guess you'll rather have a fast answer.. :-)
news = graph.cypher.execute_one('MATCH (m:Mainstream_News) '
'WHERE m.id = {id} '
'RETURN p'.format(id=unicode(file["_id"])))
if not news:
news = Node("Mainstream_News")
news.properties['id] = unicode(file["_id"])
news.properties['entry_url'] = unicode(file["entry_url"])
news.properties['title'] = unicode(file["title"])
# You can make a for-loop here
authors = Node("Authors")
authors.properties['auth_name'] = unicode(file["auth_name"])
authors.properties['auth_url'] = unicode(file["auth_url"])
authors.properties['auth_eml'] = unicode(file["auth_eml"])
rel = Relationship(new, "hasAuthor", authors)
graph.create_unique(rel)
# For-loop should end here
I've included the tree first lines, to make it more generic. It returns a node-object or None.
EDIT:
#cybersam use of schema is cool, implement that to, I'll try to use it myselfe also.. :-)
You can read more about it here:
http://neo4j.com/docs/stable/query-constraints.html
http://py2neo.org/2.0/schema.html

How use Cypher with Merge to create a unique sub graph path

in Neo4j 2.0 M06 I understand that CREATE UNIQUE is depreciated and replaced with MERGE and MATCH instead, but I am finding it hard to see how this can be used to create a unique path.
as an example, I want to create a
MERGE root-[:HAS_CALENDER]->(cal:Calender{name:'Booking'})-[:HAS_YEAR]->(year:Year{value:2013})-[:HAS_MONTH]-(month:Month{value:'January'})-[:HAS_DAY]->(day:Day{value:1})
ON CREATE cal
SET cal.created = timestamp()
ON CREATE year
SET year.created = timestamp()
ON CREATE month
SET month.created = timestamp()
ON CREATE day
SET day.created = timestamp()
intention is that when I try to add a new days to my calender, it should only create the year, and month when it does not exist else just add to the existing path. Now when i run the query, i get an STATEMENT_EXECUTION_ERROR
MERGE only supports single node patterns
should I be executing multiple statements here to achieve this.
So the question is what's the best way in Neo4j to handle cases like this?
Edit
I did change my approach a bit and now even after making multiple calls, I think my merge is happening at a label level and not trying to restrict to the start node I provide as a result I am ending up with nodes that are shared across years and month which is not what I was expecting
I would really appreciate if some one can suggest me how to get a proper graph like below
my c# code is somewhat like this:
var qry = GraphClient.Cypher
.Merge("(cal:CalendarType{ Name: {calName}})")
.OnCreate("cal").Set("cal = {newCal}")
.With("cal")
.Start(new { root = GraphClient.RootNode})
.CreateUnique("(root)-[:HAS_CALENDAR]->(cal)")
.WithParams(new { calName = newCalender.Name, newCal = newCalender })
.Return(cal => cal.Node<CalenderType>());
var calNode = qry.Results.Single();
var newYear = new Year { Name = date.Year.ToString(), Value = date.Year }.RunEntityHousekeeping();
var qryYr = GraphClient.Cypher
.Merge("(year:Year{ Value: {yr}})")
.OnCreate("year").Set("year = {newYear}")
.With("year")
.Start(new { calNode })
.CreateUnique("(calNode)-[:HAS_YEAR]->(year)")
.WithParams(new { yr = newYear.Value, newYear = newYear })
.Return(year => year.Node<Year>());
var yearNode = qryYr.Results.Single();
var newMonth = new Month { Name = date.Month.ToString(), Value = date.Month }.RunEntityHousekeeping();
var qryMonth = GraphClient.Cypher
.Merge("(mon:Month{ Value: {mnVal}})")
.OnCreate("mon").Set("mon = {newMonth}")
.With("mon")
.Start(new { yearNode })
.CreateUnique("(yearNode)-[:HAS_MONTH]->(mon)")
.WithParams(new { mnVal = newMonth.Value, newMonth = newMonth })
.Return(mon => mon.Node<Month>());
var monthNode = qryMonth.Results.Single();
var newDay = new Day { Name = date.Day.ToString(), Value = date.Day, Date = date.Date }.RunEntityHousekeeping();
var qryDay = GraphClient.Cypher
.Merge("(day:Day{ Value: {mnVal}})")
.OnCreate("day").Set("day = {newDay}")
.With("day")
.Start(new { monthNode })
.CreateUnique("(monthNode)-[:HAS_DAY]->(day)")
.WithParams(new { mnVal = newDay.Value, newDay = newDay })
.Return(day => day.Node<Day>());
var dayNode = qryDay.Results.Single();
Regards
Kiran
Nowhere on the documentation page does it say that CREATE UNIQUE has been deprecated.
MERGE is just a new approach that's available to you. It enables some new scenarios (matching based on labels, and ON CREATE and ON MATCH triggers) but also does not cover more complex scenarios (more than a single node).
It sounds like you're already familiar with CREATE UNIQUE. For now, I think you should still be using that.
It seems to me the picture of what you want your graph to look like has the order imposed by relationships, but your code models the order with nodes. If you want that graph, you will need to use relationship types like [2010], [2011] instead of a pattern like [HAS_YEAR]->({value:2010}).
Another way to say the same thing: you are trying to constitute uniqueness for a node intrinsically, by a combination of label and property, e.g. (unique:Day {value:4}). Assuming you have the relevant constraints, this would be database wide uniqueness, so only one fourth-day-of-the-month for all the months to share. What you want is extrinsic local uniqueness, uniqueness established and extended transitively by a hierarchy of relationships. Uniqueness for a node is then not in its internal properties but in its external 'position' or 'order' in relation to its parent. The locally unique pattern (month)-[:locally_unique_rel]->(day) is made unique for a wider scope when the month is made unique, and the month is made unique, not by property and label, but extrinsically by its 'order' or 'position' under its year. Hence the transitivity. I think this is a strength of modeling with graphs, among other things it allows you to continue to partition your structure. If for instance you want to split some of your days into AM and PM or into hours, you can easily do so.
So, in your graph, [HAS_DAY] makes all days equally related to their month, and cannot therefore be used to differentiate between them. You have solved this locally under a month, since the property value differentiates, but since the fourth-day-of-the-month in
(november)-[:HAS_DAY]->(4th)` and `(december)-[:HAS_DAY]->(4th)
are not distinct by property value or label, they are the same node in your graph. Locally, under a month say, unique nodes can be achieved equally with
[11]->()-[4]->(unique1), [11]->()-[5]->(unique2)
and
[HAS_MONTH]->({value:11})-[HAS_DAY]->(unique1 {value:4}),
[HAS_MONTH]->({value:11})-[HAS_DAY]->(unique2 {value:5})
The difference is that with the former extrinsic local uniqueness, you have the benefit of transitivity. Since the months are unique in a year, as (november) in [11]->(november) is locally unique, therefore the days of November are also unique in that year - the (fourth) node is distinct between
[11]->(november)-[4]->(fourth)
and
[12]-(december)->[4]->(fourth)
What this amounts to is transferring more of your semantic model to your relationships, leaving the nodes for storing data. The node identifiers in the picture you posted are only pedagogical, replacing them with x,y,z or empty parentheses would perhaps better reveal the structure or scaffolding of the graph.
If you want to keep the relationship types intact, adding an ordering property to each relationship to create a pattern like (november)-[:HAS_DAY {order:4}]->(4th) will also work. This may be less performant for querying, but you may have other concerns that make it worth it.
This code allows you to create calendar graphs on demand upon creation of an event for a specific day. You'll want to modify it to allow events on multiple days, but it seems more like your issue is creating unique paths, right? And you'd probably want to modify this to use parameters in your language of choice.
First I create the root:
CREATE (r:Root {id:'root'})
Then use this reusable MERGE query to successively match or create subgraphs for the calendar. I pass along the root so I can display the graph at the end:
MATCH (r:Root)
MERGE r-[:HAS_CAL]->(cal:Calendar {id:'General'})
WITH r,cal MERGE (cal)-[:HAS_YEAR]->(y:Year {id:2011})
WITH r,y MERGE (y)-[:HAS_MONTH]->(m:Month {id:'Jan'})
WITH r,m MERGE (m)-[:HAS_DAY]->(d:Day {id:1})
CREATE d-[:SCHEDULED_EVENT]->(e:Event {id:'ev3', t:timestamp()})
RETURN r-[*1..5]-()
Creates a graph like this when called multiple times:
Does this help?

Can Neo4j be effectively used to show a collection of nodes in a sortable and filterable table?

I realise this may not be ideal usage, but apart from all the graphy goodness of Neo4j, I'd like to show a collection of nodes, say, People, in a tabular format that has indexed properties for sorting and filtering
I'm guessing the Type of a node can be stored as a Link, say Bob -> type -> Person, which would allow us to retrieve all People
Are the following possible to do efficiently (indexed?) and in a scalable manner?
Retrieve all People nodes and display all of their names, ages, cities of birth, etc (NOTE: some of this data will be properties, some Links to other nodes (which could be denormalised as properties for table display's and simplicity's sake)
Show me all People sorted by Age
Show me all People with Age < 30
Also a quick how to do the above (or a link to some place in the docs describing how) would be lovely
Thanks very much!
Oh and if the above isn't a good idea, please suggest a storage solution which allows both graph-like retrieval and relational-like retrieval
if you want to operate on these person nodes, you can put them into an index (default is Lucene) and then retrieve and sort the nodes using Lucene (see for instance How do I sort Lucene results by field value using a HitCollector? on how to do a custom sort in java). This will get you for instance People sorted by Age etc. The code in Neo4j could look like
Transaction tx = neo4j.beginTx();
idxManager = neo4j.index()
personIndex = idxManager.forNodes('persons')
personIndex.add(meNode,'name',meNode.getProperty('name'))
personIndex.add(youNode,'name',youNode.getProperty('name'))
tx.success()
tx.finish()
'*** Prepare a custom Lucene query context with Neo4j API ***'
query = new QueryContext( 'name:*' ).sort( new Sort(new SortField( 'name',SortField.STRING, true ) ) )
results = personIndex.query( query )
For combining index lookups and graph traversals, Cypher is a good choice, e.g.
START people = node:people_index(name="E*") MATCH people-[r]->() return people.name, r.age order by r.age asc
in order to return data on both the node and the relationships.
Sure, that's easily possible with the Neo4j query language Cypher.
For example:
start cat=node:Types(name='Person')
match cat<-[:IS_A]-person-[born:BORN]->city
where person.age > 30
return person.name, person.age, born.date, city.name
order by person.age asc
limit 10
You can experiment with it in our cypher console.

Resources