Approaches to tagging in Neo4j - neo4j

I'm pretty new to Neo4j; I've only gotten as far as writing a hello world. Before I proceed, I want to make sure I have the right idea about how Neo4j works and what it can do for me.
As an example, say you wanted to write a Neo4j back end for a site like this. Questions would be nodes. Naïvely, tags would be represented by an array property on the question node. If you wanted to find questions with a certain tag, you'd have to scan every question in the database.
I think a better approach would to represent tags as nodes. If you wanted to find all questions with a certain tag, you'd start at the tag node and follow the relationships to the questions. If you wanted to find questions with all of a set of tags, you'd start at one of the tag nodes (preferably the least common/most specific one, if you know which one that is), follow its relationships to questions, and then select the questions with relationships to the other tags. I don't know how to express that in Cypher yet, but is that the right idea?
In my real application, I'm going to have entities with a potentially long list of tags, and I'm going to want to find entities that have all of the requested tags. Is this something where Neo4j would have significant advantages over SQL?

Kevin, correct.
You'd do it like that.
I even created a model some time ago for stackoverflow that does this.
For Cypher you can imagine queries like these
Find the User who was most active
MATCH (u:User)
OPTIONAL MATCH (u)-[:AUTHORED|ASKED|COMMENTED]->()
RETURN u,count(*)
ORDER BY count(*) DESC
LIMIT 5
Find co-used Tags
MATCH (t:Tag)
OPTIONAL MATCH (t)<-[:TAGGED]-(question)-[:TAGGED]->(t2)
RETURN t.name,t2.name,count(distinct question) as questions
ORDER BY questions DESC
MATCH (t:Tag)<-[r:TAGGED]->(question)
RETURN t,r,question

Related

Can graph database query "nodes that a given node has no relationship with"?

I am working on a dating app where users can "like" or "dislike" other users and get matched.
As you can imagine the most important query of the app would be:
Give me a stack of nearby user profiles that I have NOT liked/disliked before.
I tried to work on this with a document database (Firestore) and figured it's simply not suitable for such kind of application and hence landed in the graph database world which is new and fascinating to me.
I understand that by nature a graph database retrieves data by tracing through the relationships and make relationships first-class citizens. My question now is that what if the nodes that I am trying to get are those with no relationship from the given node? What would the query look like? Can anyone provide an example query?
Edit:
- added nearby criteria to the query statement
This is definitely possible, here is a query example :
MATCH (me:Profile {name: "Chris"})
MATCH (other:Profile) WHERE NOT (other)-[:LIKES]->(me)
As stated in the comments of your original question, on a large dataset it might not scale well, that said it is pretty uncommon that you would use only one criteria for matching, for example, the list of possible profiles to match from can be grouped by :
geolocation
profiles in depth 2 ( who is liking me, then find who other people they like, do those people like me ? )
shared interests
age group
skin color
...

Fuzzy neo4j relationships

I want to do something in neo4j that I hope will work ok: I want to make "fuzzy" path matches; the links will sometimes count as a relationship, and sometimes not, depending on the query.
Here's an example: let's say I have a (p:Person)-[:HAS]->(n:Name). A search has found a Person (say, by phone number). I want to go from this Person to other Persons with similar names, to get their phone numbers. Also, I want the similarity to be adjustable, so the user might ask to match very similar names, or not very similar names.
I could get the first person's name, and then do a search against other names with some lucene patterns - this is easy enough, but it means doing a full lucene search on the Name values, which in my use case is not ideal as I think it might be a bit slow (there are very many names - let's say a billion, remembering this is just an example). I hope there is a better way.
One approach I can imagine is having a "similarity" relationship between Names. Whenever a new Name node is added, we check for similar names and link them (creating these relationships would be slow, but we could push it onto a batch process, and it's ok if it takes some minutes). We would only link names that were fairly similar (so the number of links would hopefully not get too large). I suppose we could then craft a query on this, matching similarities greater than my threshold. Something like this:
MATCH (p1:Person {phone:"555-234234"})-->(n1:Name)-[s:SIMILAR]->(n2:Name)-->(p2:Person)
WHERE s.matchLevel >=2
RETURN p2.phone;
Is this approach better or worse than just doing the lucene search? Has anyone else wanted to do something like this?
Also, based on the suggestion at http://graphaware.com/neo4j/2013/10/24/neo4j-qualifying-relationships.html, I believe I'll be better off having many relationships (SIMILAR_1, SIMILAR_2 ..) instead of using a "match level" attribute on my relationship.
BTW, I know there are many similar questions to this (eg. Neo4j 2 Cypher fuzzy search), but afaik this exact question isn't on stackoverflow (and I have looked).

How to create multiple nodes and relationships in Neo4J with one Cypher / REST query?

I want to create multiple nodes (if do not exist yet) and relationships between them (parallel ones, if already exist) with one query.
What would be the best way to do that in Neo4J 2.0?
I tried different ways, but what I've found so far is either to add them pair by pair, as described here, merge on multiple relationships (but that seems to work also only by pairs), or through transactions (as described here). The combination of the 2nd and the 3rd option would work fine, but I would just like to limit it to two queries:
1) Create all the nodes (if don't exist yet), get their IDs.
2) Create relationships between them (using IDs obtained in 1).
3) Submit the two queries as statements into transaction, commit.
The only thing is that I'm new to Cypher and don't know how to make a query like that.
Can anybody help, please?
Thank you!

Neo4j Cypher Query Builder

I have been trying to come across a query builder for Neo4j's query language Cypher, ideally using a fluent API. I did not find much, and decided to invest some time on building one myself.
The result so far is a fluent API query builder for the Cypher 1.9 spec.
I wanted to use StackOverflow to kick off a discussion and see what the thoughts are, before I release the code.
Here is a demo query that you would want to send off to Neo4j using Cypher.
Show me all people who John knows who know software engineers at Google (Google company code assumed to be 12345).
The relationship strength between John and the people who connect him to Google employees should be at least 3 (assuming a range from 1-5).
Return all of John's connections and the people they know at Google, including the relationships between those people.
Sort the results by name of John's connections in ascending order and then by relationship strength in descending order.
Using Fluent-Cypher:
Cypher
.on(Node.named("john").with(Index.named("PERSON_NAMES").match(Key.named("name").is("John"))))
.on(Node.named("google").with(Id.is(12345)))
.match(Connection.named("rel1").andType("KNOWS").between("john").and("middle"))
.match(Connection.named("rel2").andType("KNOWS").between("middle").and("googleEmployee"))
.match(Connection.withType("WORKS_AT").from("googleEmployee").to("google"))
.where(Are.allOfTheseTrue(Column.named("rel1.STRENGTH").isGreaterThanOrEqualTo(3)
.and(Column.named("googleEmployee.TITLE").isEqualTo("Software Engineer"))))
.returns(Columns.named("rel1", "middle", "rel2", "googleEmployee"))
.orderBy(Asc.column("middle.NAME"), Desc.column("rel1.STRENGTH"))
which yields the following query:
START john=node:PERSON_NAMES(name='John'),google=node(12345) MATCH john-[rel1:KNOWS]-middle,middle-[rel2:KNOWS]-googleEmployee,googleEmployee-[:WORKS_AT]->google WHERE ((rel1.STRENGTH >= '3' AND googleEmployee.TITLE = 'Software Engineer')) RETURN rel1,middle,rel2,googleEmployee ORDER BY middle.NAME ASC,rel1.STRENGTH DESC
I agree that you should build this with an eye towards Cypher 2.0. As of 2.0, it's very important that WHERE clauses are matched up with the correct START, (OPTIONAL) MATCH, and WITH clauses making the design of a fluent API a bit more challenging.
I like your first example where you just use the text to describe the query. The second option, to tell you the truth, doesn't look so much easier to me than constructing the Cypher query itself. The language is quite easy to use and is well documented. Adding another layer of abstraction would only increase complexity. However, if you find a way of translating this natural language request into a Cypher request, that'd be cool :)
Also, why not start working directly with Cypher 2.0?
Finally, check out this here: http://github.com/noduslabs/infranodus – I'm working on a similar problem but for adding the nodes into the database, not querying them. I chose to use #hashtags to make it easier for people to understand how their queries should be structured (as we already use them). So in your case it could become something like
#show-all #people who #John :knows who :know #software-engineers :at #Google.
#relationship-strength between #John and the #people who are #linked to #Google #software-engineers should be at least #3
#return #all of #John's #connections and the #people they :know at #Google, including the #relationships-between those #people.
#sort the #results #by-name of #John's #connections in #ascending order and then by #relationship-strength in #descending order.
(let's say the #hashtags refer to nodes, the #at refers to actions on them)
If you could pull something like this off, I think that'd be a much better and more useful simplification of the already easy-to-use Cypher.

Create Unique Relationship is taking much amount of time

START names = node(*),
target=node:node_auto_index(target_name="TARGET_1")
MATCH names
WHERE NOT names-[:contains]->()
AND HAS (names.age)
AND (names.qualification =~ ".*(?i)B.TECH.*$"
OR names.qualification =~ ".*(?i)B.E.*$")
CREATE UNIQUE (names)-[r:contains{type:"declared"}]->(target)
RETURN names.name,names,names.qualification
Iam consisting of nearly 1,80,000 names nodes, i had iterated the above process to create unique relationships above 100 times by changing the target. its taking too much amount of time.How can i resolve it..
i build the query with java and iterated.iam using neo4j 2.0.0.5 and java 1.7 .
I edited your cypher query because I think I understand it, but I can barely read the rest of your question. If you edit it with white spaces and punctuation it might be easier to understand what you are trying to do. Until then, here are some thoughts about your query being slow.
You bind all the nodes in the graph, that's typically pretty slow.
You bind all the nodes in the graph twice. First you bind universally in your start clause: names=node(*), and then you bind universally in your match clause: MATCH names, and only then you limit your pattern. I don't quite know what the Cypher engine makes of this (possibly it gets a migraine and goes off to make a pot of coffee). It's unnecessary, you can at least drop the names=node(*) from your start clause. Or drop the match clause, I suppose that could work too, since you don't really do anything there, and you will still need a start clause for as long as you use legacy indexing.
You are using Neo4j 2.x, but you use legacy indexing instead of labels, at least in this query. Without knowing your data and model it's hard to know what the difference would be for performance, but it would certainly make it much easier to write (and read) your queries. So, that's a different kind of slow. It's likely that if you had labels and label indices, the query performance would improve.
So, first try removing one of the universal bindings of nodes, then use the 2.x schema tools to structure your data. You should be able to write queries like
MATCH target:Target
WHERE target.target_name="TARGET_1"
WITH target
MATCH names:Name
WHERE NOT names-[:contains]->()
AND HAS (names.age)
AND (names.qualification =~ ".*(?i)B.TECH.*$"
OR names.qualification =~ ".*(?i)B.E.*$")
CREATE UNIQUE (names)-[r:contains{type:"declared"}]->(target)
RETURN names.name,names,names.qualification
I have no idea if such a query would be fast on your data, however. If you put the "Name" label on all your nodes, then MATCH names:Name will still bind all nodes in the database, so it'll probably still be slow.
P.S. The relationships you create have a TYPE called contains, and you give them a property called type with value declared. Maybe you have a good reason, but that's potentially very confusing.
Edit:
Reading through your question and my answer again I no longer think that I understand even your cypher query. (Why are you returning both the bound nodes and properties of those nodes?) Please consider posting sample data on console.neo4j.org and explain in more detail what your model looks like and what you are trying to do. Let me know if my answer meets your question at all or I'll consider removing it.

Resources