neo4j - why and how create and use indexes? - neo4j

I'm learning Neo4J and it is working well without create any index.
I can create and read nodes fine.
So, why/when should I create indexes? Maybe for performance? Is it a must?

You should create a lookup index when you are going to find nodes as starting points by this properties, e.g. a :Person(userId) or :Book(isbn) or :City(zip) or :Product(productNo).
Usually the stuff where you have a business (unique) identifier to find nodes.
In general for indexes there is some confusion because there are also legacy indexes (which are still used for fulltext and spatial) vs. the new exact schema indexes, see this post for more detail:
http://nigelsmall.com/neo4j/index-confusion

Related

NEO4J - Best practices to store 40 millions of text nodes

I've been using Neo4j for some weeks and I think it's awesome.
I'm building an NLP application, and basically, I'm using Neo4j for storing the dependency graph generated by a semantic parser, something like this:
https://explosion.ai/demos/displacy?text=Hi%20dear%2C%20what%20is%20your%20name%3F&model=en_core_web_sm&cpu=1&cph=0
In the nodes, I store the single words contained in the sentences, and I connect them through relations with a number of different types.
For my application, I have the requirement to find all the nodes that contain a given word, so basically I have to search through all the nodes, finding those that contain the input word. Of course, I've already created an index on the word text field.
I'm working on a very big dataset:
On my laptop, the following query takes about 20 ms:
MATCH (t:token) WHERE t.text="avoid" RETURN t.text
Here are the details of the graph.db:
47.108.544 nodes
45.442.034 relationships
13.39 GiB db size
Index created on token.text field
PROFILE MATCH (t:token) WHERE t.text="switch" RETURN t.text
------------------------
NodeIndexSeek
251,679 db hits
---------------
Projection
251,678 db hits
--------------
ProduceResults
251,678 db hits
---------------
I wonder if I'm doing something wrong in indexing such amount of nodes. At the moment, I create a new node for each word I encounter in the text, even if the text is the same of other nodes.
Should I create a new node only when a new word is encountered, managing the sentence structures through relationships?
Could you please help me with a suggestion or best practice to adopt for this specific case?
Thank you very much
For this use case, each of your :Token nodes should be unique. When you create these you should be using MERGE instead of CREATE for the node itself, so if the node already exists it will use the existing one rather than creating a new one.
It may help to also add a unique constraint for this after you've cleaned up your data.

How do a general search across string properties in my nodes?

Working with Neo4j in a Rails app.
I have nodes with several string properties containing long strings of user generated content. For example in my nodes of type: "Book", I might have properties, "review", and "summary", which would contain long-form string values.
I was trying to design queries that returned nodes which match those properties to general language search terms provided by a user in a search box. As my query got increasingly complicated, it occurred to me that I was trying to resolve natural language search.
I looked into some of the popular search gems in Rails, but they all seem to depend on ActiveRecord. What search solutions exist for Neo4j.rb?
There are a few ways that you could go about this!
As FrobberOfBits said, Neo4j has what are called "legacy indexes" which use Lucene it the background to provide indexing of generic things. It does support the new schema indexes. Unfortunately those are based on exact matches (though I'm pretty sure that will change in Neo4j 2.3.x somewhat).
Neo4j does support pattern matching on strings via the =~ operator, but those queries aren't indexed. So the performance depends on the size of your database.
We often recommend a gem called searchkick which lets you define indexes for Elasticsearch in your models. Then you can just call a Model.search method to do your searches and it will first query elasticsearch to get the node IDs and then load those nodes via Neo4j.rb. You can use that via the neo4j-searchkick gem: https://github.com/neo4jrb/neo4j-searchkick
Lastly, if you're doing NLP and are trying to extract important words from your text, you could create a Tag/Word label and create relationships from your nodes to these NLP extracted nodes so that you can search based on those nodes in the future. You could even build recommendations from one text node to another based on the number/type of common tag nodes.
I don't know if anything specific exists for neo4j.rb and activerecord. What I can say is that generally this stuff is handled through the use of legacy indexes that are implemented by Lucene.
The premise is that you create a lucene-managed index on certain properties, and that then gives you access to use the Lucene query language via cypher to get data from those indices. Relative to neo4j.rb, it doesn't look any different than running cypher queries, like this:
START item=node:node_auto_index("(title:'foo bar' AND body:baz*) OR title:'bat'")
RETURN item
Note that lucene indexes and that query language can only be used in a START block, not a MATCH block. Refer to the Lucene Query Syntax to discover more about what you can do with that query syntax (fuzzy matching, wildcards, etc -- quite a bit more extensive than what regex would give you).

Are there Label properties - Is there a Neo4j schema for Labels?

How to ensure that all nodes of a label have some common properties ?
For example, I want to create a property "name" for all nodes of a label "Person", but I can make a mistake in writing of property name (namee ! for example)
There is no such mechanism built in Neo4j today (the current version of Neo4j at the time of writing is 2.1.6). What you are describing is some sort of schema (if you compare e.g. DDL for a RDBMS) and Neo4j is basically schema free. This type of structural integrity is quite often handled in the application layer for NoSQL databases.
The only schema operations that are available today for Neo4j are described here.
Currently they include:
Unique - e.g. CREATE CONSTRAINT ON (p:Person) ASSERT p.name IS UNIQUE
Indexes - create an index on a label e.g. CREATE INDEX ON :Person(name)
A comment on this answer from Michael Hunger who is part of team behind Neo4j indicates that more constraints will be available for Neo4j in the future. Furthermore, Michael points to the following alternatives:
Take a look at Structr, a layer above Neo4j that among other things enforces a stricter schema (check the schema docs here)
SylvaDB, an easy-to-use layer above Neo4j that also has schema support. Seems very
In addition to this, FrobberOfBits pointed to the tool NeoProfiler that contains a number of profilers, most of which run very simple Cypher queries against your database and provide summary statistics. Some profilers will actually discover data in your graph and then spawn other profilers which will run later. For example, if a label called "Person" is discovered in the data, a label profiler will be added to the run queue to inspect the population of nodes with that label.

Understanding Neo4j, creating unique nodes

I'm trying to wrap my head around how Neo4j works and how I can apply it to my problem. I thought it should be really easy and a matter of minutes, but I'm stuck.
I have data in MongoDB, say User and Item. What I want is connecting User and Item in a graph with a LIKE relationship (maybe with a score). Later I want to do things like recommending items based on connections, basic stuff.
But how do I get the data into Neo4j? Every document in MongoDB has an unique _id, so I though I could just throw both _ids into Neo4j and have them connected. What I found so far is that it's not even possible to have unique nodes based on the _id field (Neo4j has numeric incremented ids), which is only possible with some "hack" (https://github.com/jexp/app-net-graph/blob/master/lib/appnet.rb#L11) or using MERGE (I'm stuck on < 2.0). Even their examples on the website add the same node again if executed multiple times. I think I have a fundamental misunderstanding of how to use Neo4j. Maybe I'm too spoiled by redis, where I can put strings in and and it just works. Redis' sets aren't feasible though for complex graphs, only for simple connections.
Maybe someone can help me with a simple cypher example of how to add two nodes foo and bar and have them connected with a LIKE connection. And the operation should be idempotent, no matter if none or all of the nodes/relationships already existed before execution.
I'm accessing Neo4j via REST, in particular using this node module https://github.com/thingdom/node-neo4j
You could define your external ID as extra property on your nodes. Then depending on if your are using SpringData or not, you can insert the data.
If you are using SpringData, you can configure your external ID as unique index and then normally save you nodes(consider though, that inserting a duplicated ID will overwrite the existing one).
If you are using the plain java API, you can create unique nodes as described here:
http://docs.neo4j.org/chunked/stable/tutorials-java-embedded-unique-nodes.html#tutorials-java-embedded-unique-get-or-create
EDIT:
As for a sample query, does this help you?
http://console.neo4j.org/?id=b0z486
With the java api you would do it like this
firstNode = graphDb.createNode();
firstNode.setProperty( "externalID", "1" );
firstNode.setProperty( "name", "foo" );
secondNode = graphDb.createNode();
secondNode.setProperty( "externalID", "2" );
secondNode.setProperty( "name", "bar" );
relationship = firstNode.createRelationshipTo( secondNode, RelTypes.Likes );
I suggest you read some tutorials here: http://docs.neo4j.org/chunked/stable/tutorials-java-embedded-hello-world.html
Given you are using Neo4J1.9, have you tried creating a unique index on your _ID column?
Try this article from the docs
If you were using Neo4j2, then this article is helpful

How can I port a relational database to Neo4j?

I am playing around with Neo4j but trying to get my head around the graph concepts. As a learning process I want to port a small Postgres relational database schema to Neo4j. Is there any way I can port it and issues "equivalent" relational queries to Neo4j?
Yes, you can port your existing schema to a graph database. Keep in mind that this is not necessarily the best model for your data, but it is a starting point.
How easy it is depends a lot on the quality of your existing schema.
The tables corresponding to entities in an entity-relationship-diagram define your types of nodes. In the upcoming neo4j 2.0, you can labels them with the name of the entity to make a lookup easier. In older versions you can use an index or a manual label property.
Assuming a best case, where all your relationships between data is modelled using foreign keys, any 1:1 relationship between nodes can be identified and ported next.
For tables modelling n:m relationships, identify the corresponding nodes and add a direct relationship between them.
So as an example assume tables Author[id, name, publisher foreign key], Publisher[id, name] and Book[id, title] and written_by[author foreign key, book foreign key].
Every row in Author, Publisher and Book becomes a node.
Every Author node gets a relationship to the publisher identified by the foreign key relationship.
For every row in written_by you add a relationship between the Author node and Book node referenced
For queries in neo4j I recommend cypher due to its expressiveness. A (2.0) query looking for books by some author would look like:
MATCH (author:Author)-[:written_by]-(book:Book)
WHERE author.name='Hugh Laurie'
RETURN book.title
You actually have several options at hand:
use the Talend connector for Neo4J
export your schema+data in CSV files consumable by the batch importer
or you can do it programmatically
I'm afraid not. The relational data model and the graph data model are two different ways of modelling a real-world domain. It requires a human brain (at least as of 2013) to understand the domain in order to model it.
I suggest that you take a piece of paper and capture, using circles and arrows, what your entities are (nodes) and how they relate to each other (relationships). Then, have a look at that piece of paper. Voila, your new Neo4j data model.
Then, take a query that you want to be answered and try to figure out how you would do that without a computer, just by tracing your nodes and relationships with a finger on that piece of paper. Once you've figured that out, translate what you've done to a Cypher query.
Have a look at neo4j.org, there are plenty of examples.
Check this out:
The musicbrainz -> neo4j
https://github.com/redapple/sql2graph/tree/master/examples/musicbrainz
Neo4j Sql-importer
https://github.com/peterneubauer/sql-import
Good Luck!
This tool does exactly that.
Import any relational db into neo4j
https://github.com/jexp/neo4j-rdbms-import

Resources