I've seen some simple examples text searching STARTS WITH name such as:
http://www.jexp.de/blog/html/full-text-and-spatial-search-in-neo4j-3.html
https://blog.knoldus.com/2016/12/11/neo4j-with-scala-neo4j-vs-elasticsearch/
But I'm looking for something more along the lines of full-text search across multiple fields: title, content:
https://www.digitalocean.com/community/tutorials/how-to-use-full-text-search-in-postgresql-on-ubuntu-16-04
Can I see an example of how this should be done with Neo4j?
You can do this using the APOC Neo4j procedure library. Let's say you have node labels Book and Author and you want to make a full text query across :Book(title), :Book(content), and :Author(name) and :Author(address). First, use apoc.index.addAllNodes to create an index called bookIndex and specify the labels and properties to include in the index:
CALL apoc.index.addAllNodes('bookIndex',{
Book: ["title","content"],
Author: ["name","address"]
})
Then, to search the index:
CALL apoc.index.search('bookIndex', 'River Runs Through It')
You can use this with more complex graph queries as well:
CALL apoc.index.search('bookIndex, 'River Runs Through It')
YIELD node AS book
MATCH (book)-[:IN_GENRE]->(g:Genre)
RETURN g
Lucene query syntax is used so you can do fuzzy search, required components of the string, etc: 'Norman Maclean~' or 'Norman~ +Maclean'
See the APOC docs for more info.
Related
I am loading simple csv data into neo4j. The data is simple as follows :-
uniqueId compound value category
ACT12_M_609 mesulfen 21 carbon
ACT12_M_609 MNAF 23 carbon
ACT12_M_609 nifluridide 20 suphate
ACT12_M_609 sulfur 23 carbon
I am loading the data from the URL using the following query -
LOAD CSV WITH HEADERS
FROM "url"
AS row
MERGE( t: Transaction { transactionId: row.uniqueId })
MERGE(c:Compound {name: row.compound})
MERGE (t)-[r:CONTAINS]->(c)
ON CREATE SET c.category= row.category
ON CREATE SET r.price =row.value
Next I do the aggregation to count total orders for a compound and create property for a node in the following way -
MATCH (c:Compound) <-[:CONTAINS]- (t:Transaction)
with c.name as name, count( distinct t.transactionId) as ord
set c.orders = ord
So far so good. I can accomplish what I want but I have the following 2 questions -
How can I create the orders property for compound node in the first step itself? .i.e. when I am loading the data I would like to perform the aggregation straight away.
For a compound node I am also setting the property for category. Theoretically, it can also be modelled as category -contains-> compound by creating Categorynode. But what advantage will I have if I do it? Because I can execute the queries and get the expected output without creating this additional node.
Thank you for your answer.
I don't think that's possible, LOAD CSV goes over one row at a time, so at row 1, it doesn't know how many more rows will follow.
I guess you could create virtual nodes and relationships, aggregate those and then use those to create the real nodes, but that would be way more complicated. Virtual Nodes/Rels
That depends on the questions/queries you want to ask.
A graph database is optimised for following relationships, so if you often do a query where the category is a criteria (e.g. MATCH (c: Category {category_id: 12})-[r]-(:Compound) ), it might be more performant to create a label for it.
If you just want to get the category in the results (e.g. RETURN compound.category), then it's fine as a property.
Here is my simplified graph schema,
package:
property:
- name: str (indexed)
- version: str (indexed)
I want to query the version using multiple set of property criteria within single query. I can use within for a list of single property, but how to do it for multiple properties?
Consider I have 10 package nodes, (p1,v1, p2,v2, p3,v3,.. p10,v10)
I want to select only nodes which has (p1 with v1, p8 with v8, p10 with v10)
Is there a way to do with single gremlin query?
Something equivalent to SELECT * from package WHERE (name, version) in ((p1,v1),(p8,v8),(p10,v10)).
It's always best to provide some sample data when asking questions about Gremlin. I assume that this is an approximation of what your model is:
g.addV('package').property('name','gremlin').property('version', '1.0').
addV('package').property('name','gremlin').property('version', '2.0').
addV('package').property('name','gremlin').property('version', '3.0').
addV('package').property('name','blueprints').property('version', '1.0').
addV('package').property('name','blueprints').property('version', '2.0').
addV('package').property('name','rexster').property('version', '1.0').
addV('package').property('name','rexster').property('version', '2.0').iterate()
I don't think that there is a way that you can compare pairs of inputs and expect an index hit. You therefore have to do what you normally do in graphs and choose the index to best narrow your results before you filter in memory. I would assume that in your case this would be the "name" property, therefore grab those first then filter the pairs:
gremlin> g.V().has('package','name', within('gremlin','blueprints')).
......1> elementMap().
......2> where(select('name','version').is(within([name:'gremlin',version:'2.0'], [name:'blueprints',version:'2.0'])))
==>[id:3,label:package,name:gremlin,version:2.0]
==>[id:12,label:package,name:blueprints,version:2.0]
this might not be the most "creative" way of doing that,
but I think that the easiest way would be to use or:
g.V().or(
hasLabel('v1').has('prop', 'p1'),
hasLabel('v8').has('prop', 'p8'),
hasLabel('v10').has('prop', 'p10')
)
example: https://gremlify.com/6s
I am creating a single database to store nodes with content in several languages.
I have a model like:
(Book {title, summary, text})<-[WRITES {date}]-(Author {name, lang})
I would like to be able to perform full text search on Book's titles and text in several language.
I have tried to simply create several index with different analyzer in a very stupid way:
CALL db.index.fulltext.createNodeIndex("searchEN",["Book"],["title", "summary", "text"], {analyzer: "english")
CALL db.index.fulltext.createNodeIndex("searchFR",["Book"],["title", "summary", "text"], {analyzer: "french")
But when I try to create the index for french I get this error:
neobolt.exceptions.ClientError: There already exists an index NODE:label[0](property[1], property[9], property[11]).
A solution I would like would to limit a search in English to books that are written by an English speaking author without having to create a new node type like EnglishBook. I want to avoid it because other node types of the schema can share connections with books of different language.
For instance I still want to be able to do:
MATCH (p: Publisher)-[r: PUBLISHES]->(b: Book)
RETURN p, r, b
I'd like to make a cypher query that generates a specific json output. Part of this output includes an object with a dynamic amount of keys relative to the children of a parent node:
{
...
"parent_keystring" : {
child_node_one.name : child_node_one.foo
child_node_two.name : child_node_two.foo
child_node_three.name : child_node_three.foo
child_node_four.name : child_node_four.foo
child_node_five.name : child_node_five.foo
}
}
I've tried to create a cypher query but I do not believe I am close to achieving the desired output mentioned above:
MATCH (n)-[relone:SPECIFIC_RELATIONSHIP]->(child_node)
WHERE n.id='839930493049039430'
RETURN n.id AS id,
n.name AS name,
labels(n)[0] AS type,
{
COLLECT({
child.name : children.foo
}) AS rel_two_representation
} AS parent_keystring
I had planned for children.foo to be a count of how many occurrences of each particular relationship/child of the parent. Is there a way to make use of the reduce function? Where a report would generate based on analyzing the array proposed below? ie report would be a json object where each key is a distinct RELATIONSHIP and the property value would be the amount of times that relationship stems from the parent node?
Thank you greatly in advance for guidance you can offer.
I'm not sure that Cypher will let you use a variable to determine an object's key. Would using an Array work for you?
COLLECT([child.name, children.foo]) AS rel_two_representation
I think, Neo4j Server API output by itself should be considered as any database output (like MySQL). Even if it is possible to achieve, with default functionality, desired output - it is not natural way for database.
Probably you should look into creating your own server plugin. This allows you to implement any custom logic, with desired output.
Can I create an index with multiple properties in cypher?
I mean something like
CREATE INDEX ON :Person(first_name, last_name)
If I understand correctly this is not possible, but if I want to write queries like:
MATCH (n:Person)
WHERE n.first_name = 'Andres' AND n.last_name = 'Doe'
RETURN n
Does these indexes make sense?
CREATE INDEX ON :Person(first_name)
CREATE INDEX ON :Person(last_name)
Or should I try to merge "first_name" and "last_name" in one property?
Thanks!
Indexes are good for defining some key that maps to some value or set of values. The key is always a single dimension.
Consider your example:
CREATE INDEX ON :Person(first_name)
CREATE INDEX ON :Person(last_name)
These two indexes now map to those people with the same first name, and separately it maps those people with the same last name. So for each person in your database, two indexes are created, one on the first name and one on the last name.
Statistically, this example stinks. Why? Because the distribution is stochastic. You'll be creating a lot of indexes that map to small clusters/groups of people in your database. You'll have a lot of nodes indexed on JOHN for the first name. Likewise you'll have a lot of nodes indexed on SMITH for the last name.
Now if you want to index the user's full name, then concatenate, forming JOHN SMITH. You can then set a property of person as person.full_name. While it is redundant, it allows you to do the following:
Create
CREATE INDEX ON :Person(full_name)
Match
MATCH (n:Person)
USING INDEX n:Person(full_name)
WHERE n.full_name = 'JOHN SMITH'
You can always refer to http://docs.neo4j.org/refcard/2.0/ for more tips and guidelines.
Cheers,
Kenny
As of 3.2, Neo4j supports composite indexes. For your example:
CREATE INDEX ON :Person(first_name, last_name)
You can read more on composite indexes here.