It is recommended not to use Neo4j's id property because it may change, but rather create our own identifier. Then to identify my users, I plan to create a user_id property on the nodes labelled User and put an index on it. However, I cannot figure out a way to make it auto increase.
After some searching, I noticed there are two kinds of indexes in Neo4j, the schema index and the legacy index. Could anyone explain to me the difference between them? And is there a way to make my user_id index auto increase?
Schema indices are effectively labels, e.g. :User. You can also create indices on the properties of those labels if you wish. There's also no need to specify which index you're using as this is done automatically, in this case.
Legacy indices are the node indices that were around prior to Neo4j 2.0. They're a traditional index where you can specify what you're indexing and which properties they apply to, but, they're only used in START statements, which are optional (and on their way to deprecation).
For more detail, have a look here (http://docs.neo4j.org/chunked/stable/graphdb-neo4j-schema.html) and here (http://docs.neo4j.org/chunked/stable/indexing.html).
As for auto-incrementing, I'm unaware of any such functionality for user-defined index keys.
HTH
Related
The only indices that I know about them are indices on properties (these indices are created on particular labels (node types)). I have some doubts, however.
Are there exists indices on edges/relationships?
I often read that Neo4j leveraged Lucene Index. Is it still used? What is aim?
Are there exists any other indicses than indices on properties?
Thanks in advance,
Neo4j has two indexing systems.
The more modern one is referred to as "schema indexes", and these are the ones that are automatic and apply to properties of a given label for quick lookup by those properties when the given properties and label are provided within a query. This does not currently support indexing of relationship properties. These started out based on lucene, but we've gradually replaced the implementation with our own native indexing solution. Discussion of these, as well as any noteworthy information and limitations, can be found in our index configuration documentation.
The other indexing system is an older manual system that is called "explicit indexes", though this has previously been called "manual indexes". This is also based on lucene, but these are not automatic -- it is up to the user to manually add or remove entries to the index and keep them up to date when data in the database changes. This makes usage and maintenance cumbersome, and we recommend avoid using this system if possible.
Built-in procedures are the means to create and lookup using explicit indexes, as these are never used automatically under the hood (as opposed to schema indexes). APOC Procedures also offers various means of interfacing with explicit indexes.
The main reason one would use explicit indexes is because you are able to create an index on relationships for properties and get fast lookup when querying the index. This also allows for a full text lookup across multiple labels and properties, provided the index has been configured in such a way.
Separate from all of these, it should be noted that usage of labels is itself a kind of index, as it provides quick access to all nodes with the given label.
call apoc.index.nodes('Product', 'name:iPhone*') yield node return node
In my graph I have 'iPhone X' and 'iPhone Plus', but this query doesn't return anything. I also have an index on 'name' property of Product.
Indexes
ON :Product(name) ONLINE
apoc.index.nodes is one of the APOC procedures for "manual indexes", which are also confusingly referred to in various docs as "legacy indexes" and "explicit indexes". Such indexes use the Apache Lucene library and are NOT the same as the standard neo4j indexes that most people use, and the way you create/update/use such indexes is also not standard.
For example, you cannot create a "manual index" via a Cypher CREATE INDEX clause. And neo4j Browser's :schema command will not show any manual indexes.
If you will only be searching :Product(name) via manual indexes, then you should drop your standard index for :Product(name), since it will not be needed but will add overhead (time and space) to your DB.
One way to create/update/use manual indexes is through the special APOC procedures. The APOC documentation for manual indexes (linked above) provides a good amount of information about how to add nodes and relationships to such indexes, and how to search using them.
As an example, before you can use the query in your question, you first have to add all the :Product(name) values to the Product manual index. If you want to add them all at once, you can use the following query (and since it has to return something, it just returns a count of the number of Products):
MATCH (p:Product)
CALL apoc.index.addNode(p, ['name'])
RETURN count(*)
[UPDATED]
Manual indexing is typically only used for partial and fuzzy text search use cases. When you just need exact value matching, standard indexes are recommended, especially since they require much less effort on your part. The reason manual indexes are called "manual" is because the responsibility for maintaining them falls entirely on your shoulders. That is, your node/relationship/property addition/removal/update queries would normally have to add/remove/update any relevant manual index entries as well. Note that when you update a property that is manually indexed, you have to remove the old index entry and then add the new entry.
Suppose I have 2 types of nodes :Server and :Client.
(Client)-[:CONNECTED_TO]->(Server)
I want to find the Female clients connected to some Server ordered by age.
I did this
Match (s:Server{id:"S1"})<-[:CONNNECTED_TO]-(c{gender:"F"}) return c order by c.age DESC
Doing this, all the Client nodes linked to my Server node are traversed to find the highest age.
Is there a way to index the Client nodes on gender and age properties to avoid the full scan?
You can create an index on :Client(gender), as follows:
CREATE INDEX ON :Client(gender);
However, your particular query will probably benefit more from creating an index on :Server(id), since there are probably a lot of female clients but probably only a single Server with that id. So, you probably want to do this instead:
CREATE INDEX ON :Server(id);
But, even better, if every Server has a unique id property, you should create a uniqueness constraint (which also automatically creates an index for you):
CREATE CONSTRAINT ON (s:Server) ASSERT s.id IS UNIQUE;
Currently, neo4j does not use indexes to perform ordering, but there are some APOC procedures that do support that. However, the procedures do not support returning results in descending order, which is what you want. So, if you really need to use indexing for this purpose, a workaround would be to add an extra minusAge property to your Client nodes that contains the negative value of the age property. If you do this, then first create an index:
CREATE INDEX ON :Client(minusAge);
and then use this query:
MATCH (s:Server{id:"S1"})<-[:CONNNECTED_TO]-(cl:Client {gender:"F"})
CALL apoc.index.orderedRange('Client', 'minusAge', -200, 0, false, -1) YIELD node AS c
RETURN c;
The 3rd and 4th parameters of that procedure are for the minimum and maximum values you want to use for matching (against minusAge). The 5th parameter should be false for your purposes (and is actually currently ignored by the implementation). The last parameter is for the LIMIT value, and -1 means you do not want a limit.
If that is a request you're doing quite frequently, then you might want to write that data out. As you're experiencing, that query can be quite expensive and it won't get better the more clients you get, as in fact, all of the connected nodes have to be retrieved and get a property check/comparison run on them.
As a workaround, you can add another connection to your clients when you modify their age data.
As a suggestion, you can create an Age node and create a MATURE relationship to your oldest clients.
(:Server)<-[:CONNNECTED_TO]-(:Client)-[:MATURE]->(:Age)
You can do this for all the ages, and run queries off the Age nodes (with an indexed/unique age property on the) as needed. If you have 100,000 clients, but only are interested in the top 100 ordered by age, there's no need to get all the clients and order them... It really depends on your use case and client apps.
While this is certainly not a nice pattern, I've seen it work in production and is the only workaround that's been doing well in different production environments that I've seen.
If this answer didn't solve your problem (I'd rather use an age property, too), I hope it gave you at least an idea on what to do next.
I am a beginner in graph databases and neo4j. I am trying to undestand how to make the migration (and what would this mean) from neo4j 1.9 to neo4j 2.1.6.
I read here the procedure I have to follow (http://neo4j.com/docs/stable/deployment-upgrading.html#explicit-upgrade).
I understand that after the upgrade I will have all the nodes and relationships I had previously together with the functionalities of neo4j2.1.6. Is that correct?
What I want to know is if there is a way to automatically declare labels, unique constraints and the new indexing functionalities during the migration.
Or this is something that I will have to do "manually" after?
Thank you in advance.
Dimitris
After the upgrade, you'll have the features of neo4j 2.1.* in the sense that you can use them, but it's not done automatically for you.
Labels, unique constraints, and certain types of indexes are the really useful new stuff that you'll see. Labels are a way of categorizing types of nodes. Say you have Person nodes and Job nodes, well you might want to apply those labels. But no database is smart enough by itself to automatically figure that out. Instead, what you might do is go through your data and apply the label.
After migration for example, you could do this:
MATCH (n)
WHERE has(n.first_name)
SET n:Person
RETURN n;
This will apply the "Person" label to any node with a first_name attribute.
Everything else (indexes, unique constraints) again has to be done manually by you. Consider it a portion of your graph structure design. Neo4J will let you implement any kind of graph you like, but it won't do it for you. :)
I have nodes with multiple "sourceIds" in one array-valued property called "sourceIds", just because there could be multiple resources a node could be derived from (I'm assembling multiple databases into one Neo4j model).
I want to be able to look up nodes by any of their source IDs. With legacy indexing this was no problem, I would just add a node to the index associated with each element of the sourceIds property array.
Now I wanted to switch to indexing with labels and I'm wondering how that kind of index works here. I can do
CREATE INDEX ON :<label>(sourceIds)
but what does that actually do? I hoped it would just create index entries for each array element, but that doesn't seem to be the case. With
MATCH n:<label> WHERE "testid" in n.sourceIds RETURN n
the query takes between 300ms and 500ms which is too long for an index lookup (other schema indexes work three to five times faster). With
MATCH n:<label> WHERE n.sourceIds="testid" RETURN n
I don't get a result. That's clear because it's an array property but I just gave it a try since it would make sense if array properties would be broken down to their elements for indexing purposes.
So, is there a way to handle array properties with schema indexing or are there plans or will I just have to stick to legacy indexing here? My problem with the legacy Lucene index was that I hit the max number of boolean clauses (1024). Another question thus would be: Can I raise this number? Lucene allows that, but can I do this with the Lucene index used by Neo4j?
Thanks and best regards!
Edit: A bit more elaboration on why I hit the boolean clauses max limit: I need to export specific parts of the database into custom file formats for text processing pipelines. These pipelines use components I cannot (be it for the sake of accessibility or time) change to query Neo4j directly, so I'd rather stay with the defined required file format(s). I do the export via the pattern "give me all IDs in the DB; now, for batches of IDs, query the desired information (e.g. specific paths) from Neo4j and store the results to file". Why I use batches at all? Well, if I don't, things are slowed down significantly via the connection overhead. Thus, large batches are a kind of optimization here.
Schema indexes can only do exact matches right now. Your "testid" in n.sourceIds does not use the index (as shown by your query times). I think there are plans to make this behave better, but I'm waiting for them just as eagerly as you are.
I've actually hit a lower max in the lucene query: 512. If there is a way to increase it I'd love to hear of it. The way I got around it is just doing more than one query if I have one of the rare cases that actually goes over 512 ids. What query are you doing where you need more?