I have just started working with py2neo and neo4j.
I am confused about how to go about using indices in my database.
I have created a create_user function:
g = neo4j.GraphDatabaseService()
users_index = g.get_or_create_index(neo4j.Node, "Users")
def create_user(name, username, **kwargs):
batch = neo4j.WriteBatch(g)
user = batch.create(node({"name" : name, "username" : username}))
for key, value in kwargs.iteritems():
batch.set_property(user, key, value)
batch.add_labels(user, "User")
batch.get_or_add_to_index(neo4j.Node, users_index, "username", username, user)
results = batch.submit()
print "Created: " + username
Now to obtain users by their username:
def lookup_user(username):
print node(users_index.get("username", username)[0])
I saw the Schema class and noticed that I can create an index on the "User" label, but I couldn't figure out how to obtain the index and add entities to it.
I want it to be as efficient as possible, so would adding the index on the "User" label add to performance, in case I were to add more nodes with different labels later on? Is it already the most efficient it can be?
Also, if I would want my username system to be unique per user, how would I be able to do that? How do I know whether the batch.get_or_add_to_index is getting or adding the entity?
Your confusion is understandable. There are actually two types of indexes in Neo4j - the Legacy Indexes (which you access with the get_or_create_index method) and the new Indexes (which deal with indexing based on labels).
The new Indexes do not need to be manually kept up to date, they keep themselves in sync as you make changes to the graph, and are automatically used when you issue cypher queries against that label/property pair.
The reason the legacy indexes are kept around is that they support some complex functionality that is not yet available for the new indexes - such as geospatial indexing, full text indexing and composite indexing.
Related
In a rails 4 app, in one model I have a column containing multiple ids as a string with comma separated values.
"123,4568,12"
I have a "search" engine that I use to retrieve the records with one or many values using the full text search of postgresql I can do something like this which is very useful:
records = MyModel.where("my_models.col_name ## ?", ["12","234"])
This return all the records that have both 12 and 234 in the targeted column. The array comes from a form with a multiple select.
Now I'm trying to make a query that will find all the records that have either 12 or 234 in there string.
I was hopping to be able to do something like:
records = MyModel.where("my_models.col_name IN (?)", ["12","234"])
But it's not working.
Should I iterate through all the values in the array to build a query with multiple OR ? Is there something more appropriate to do this?
EDIT / TL;DR
#BoraMa answer is a good way to achieve this.
To find all the records containing one or more ids referenced in the request use:
records = MyModel.where("my_models.col_name ## to_tsquery(?)", ["12","234"].join('|'))
You need the to_tsquery(?) and the join with a single pipe |to do a OR like query.
To find all the records containing exactly all the ids in the query use:
records = MyModel.where("my_models.col_name ## ?", ["12","234"])
And of course replace ["12","234"] with something like params[:params_from_my_form]
Postgres documentation for full text search
If you already started to use the fulltext search in Postgres in the first place,I'd try to leverage it again. I think you can use a fulltext OR query which can be constructed like this:
records = MyModel.where("my_models.col_name ## to_tsquery(?)", ["12","234"].join(" | "));
This uses the | operator for ORing fulltext queries in Postgres. I have not tested this and maybe you'll need to do to_tsvector('my_models.col_name') for this to work.
See the documentation for more info.
Suppose your ids are :
a="1,2,3,4"
You can simply use:
ModelName.find(a)
This will give you all the record of that model whose id is present in a.
I just think a super simple solution, we just sort the ids in saving callback of MyModel, then the query must be easier:
class MyModel < ActiveRecord::Base
before_save :sort_ids_in_col_name, if: :col_name_changed?
private
def sort_ids_in_col_name
self.col_name = self.col_name.to_s.split(',').sort.join(',')
end
end
Then the query will be easy:
ids = ["12","234"]
records = MyModel.where(col_name: ids.sort.join(',')
I have framed query to submit to solr which is of following format.
id:95154 OR id:68209 OR id:89482 OR id:94233 OR id:112481 OR id:93843
i want to get records according to order from starting. say i need to get document with id 95154 document first then id 68209 next and so on. but its not happening right now its giving last id 93843 first and some times random.i am using solr in grails 2.1 and my solr version is 1.4.0. here is sample way i am getting documents from solr
def server = solrService.getServer('provider')
SolrQuery sponsorSolrQuery = new SolrQuery(solarQuery)
def queryResponse = server.query(sponsorSolrQuery);
documentsList = queryResponse.getResults()
As #injecteer mentions, there is nothing built-in to Lucene to consider the sequence of clauses in a boolean query, but:
You are able to apply boosts to each term, and as long as the field is a basic field (meaning, not a TextField), the boosts will apply cleanly to give you a decent sort by score.
id:95154^6 OR id:68209^5 OR id:89482^4 OR id:94233^3 OR id:112481^2 OR id:93843
there's no such thing in Lucene (I strongly assume, that in Solr as well). In Lucene you can sort the results based on contents of documents' fields, but not on the order of clauses in a query.
that means, that you have to sort the results yourself:
documentsList = queryResponse.getResults()
def sordedByIdOrder = solarQueryAsList.collect{ id -> documentList.find{ it.id == id } }
Can I create an index with multiple properties in cypher?
I mean something like
CREATE INDEX ON :Person(first_name, last_name)
If I understand correctly this is not possible, but if I want to write queries like:
MATCH (n:Person)
WHERE n.first_name = 'Andres' AND n.last_name = 'Doe'
RETURN n
Does these indexes make sense?
CREATE INDEX ON :Person(first_name)
CREATE INDEX ON :Person(last_name)
Or should I try to merge "first_name" and "last_name" in one property?
Thanks!
Indexes are good for defining some key that maps to some value or set of values. The key is always a single dimension.
Consider your example:
CREATE INDEX ON :Person(first_name)
CREATE INDEX ON :Person(last_name)
These two indexes now map to those people with the same first name, and separately it maps those people with the same last name. So for each person in your database, two indexes are created, one on the first name and one on the last name.
Statistically, this example stinks. Why? Because the distribution is stochastic. You'll be creating a lot of indexes that map to small clusters/groups of people in your database. You'll have a lot of nodes indexed on JOHN for the first name. Likewise you'll have a lot of nodes indexed on SMITH for the last name.
Now if you want to index the user's full name, then concatenate, forming JOHN SMITH. You can then set a property of person as person.full_name. While it is redundant, it allows you to do the following:
Create
CREATE INDEX ON :Person(full_name)
Match
MATCH (n:Person)
USING INDEX n:Person(full_name)
WHERE n.full_name = 'JOHN SMITH'
You can always refer to http://docs.neo4j.org/refcard/2.0/ for more tips and guidelines.
Cheers,
Kenny
As of 3.2, Neo4j supports composite indexes. For your example:
CREATE INDEX ON :Person(first_name, last_name)
You can read more on composite indexes here.
I'm using Neo4j 2.0.0-M06. Just learning Cypher and reading the docs. In my mind this query would work, but I should be so lucky...
I'm importing tweets to a mysql-database, and from there importing them to neo4j. If a tweet is already existing in the Neo4j database, it should be updated.
My query:
MATCH (y:Tweet:Socialmedia) WHERE
HAS (y.tweet_id) AND y.tweet_id = '123'
CREATE UNIQUE (n:Tweet:Socialmedia {
body : 'This is a tweet', tweet_id : '123', tweet_userid : '321', tweet_username : 'example'
} )
Neo4j says: This pattern is not supported for CREATE UNIQUE
The database is currently empty on nodes with the matching labels, so there are no tweets what so ever in the Neo4j database.
What is the correct query?
You want to use MERGE for this query, along with a unique constraint.
CREATE CONSTRAINT on (t:Tweet) ASSERT t.tweet_id IS UNIQUE;
MERGE (t:Tweet {tweet_id:'123'})
ON CREATE
SET t:SocialMedia,
t.body = 'This is a tweet',
t.tweet_userid = '321',
t.tweet_username = 'example';
This will use an index to lookup the tweet by id, and do nothing if the tweet exists, otherwise it will set those properties.
I would like to point that one can use a combination of
CREATE CONSTRAINT and then a normal
CREATE (without UNIQUE)
This is for cases where one expects a unique node and wants to throw an exception if the node unexpectedly exists. (Far cheaper than looking for the node before creating it).
Also note that MERGE seems to take more CPU cycles than a CREATE. (It also takes more CPU cycles even if an exception is thrown)
An alternative scenario covering CREATE CONSTRAINT, CREATE and MERGE (though admittedly not the primary purpose of this post).
I realise this may not be ideal usage, but apart from all the graphy goodness of Neo4j, I'd like to show a collection of nodes, say, People, in a tabular format that has indexed properties for sorting and filtering
I'm guessing the Type of a node can be stored as a Link, say Bob -> type -> Person, which would allow us to retrieve all People
Are the following possible to do efficiently (indexed?) and in a scalable manner?
Retrieve all People nodes and display all of their names, ages, cities of birth, etc (NOTE: some of this data will be properties, some Links to other nodes (which could be denormalised as properties for table display's and simplicity's sake)
Show me all People sorted by Age
Show me all People with Age < 30
Also a quick how to do the above (or a link to some place in the docs describing how) would be lovely
Thanks very much!
Oh and if the above isn't a good idea, please suggest a storage solution which allows both graph-like retrieval and relational-like retrieval
if you want to operate on these person nodes, you can put them into an index (default is Lucene) and then retrieve and sort the nodes using Lucene (see for instance How do I sort Lucene results by field value using a HitCollector? on how to do a custom sort in java). This will get you for instance People sorted by Age etc. The code in Neo4j could look like
Transaction tx = neo4j.beginTx();
idxManager = neo4j.index()
personIndex = idxManager.forNodes('persons')
personIndex.add(meNode,'name',meNode.getProperty('name'))
personIndex.add(youNode,'name',youNode.getProperty('name'))
tx.success()
tx.finish()
'*** Prepare a custom Lucene query context with Neo4j API ***'
query = new QueryContext( 'name:*' ).sort( new Sort(new SortField( 'name',SortField.STRING, true ) ) )
results = personIndex.query( query )
For combining index lookups and graph traversals, Cypher is a good choice, e.g.
START people = node:people_index(name="E*") MATCH people-[r]->() return people.name, r.age order by r.age asc
in order to return data on both the node and the relationships.
Sure, that's easily possible with the Neo4j query language Cypher.
For example:
start cat=node:Types(name='Person')
match cat<-[:IS_A]-person-[born:BORN]->city
where person.age > 30
return person.name, person.age, born.date, city.name
order by person.age asc
limit 10
You can experiment with it in our cypher console.