How to make this query more efficient in neo4j? - neo4j

headers = {'Accept': 'application/json;charset=UTF-8','Content-Type':'application/json'}
data = {
"statements" : [
{
"statement" : "MATCH (n:Product) RETURN n.name, name.id",
# "parameters" : { "nproduct" : 5 }
} ]
}
r = requests.post(URL, headers = headers,json=data)
data = r.json()['results'][0]['data']
I want to extract nodes from neo4j database. If I have a large amount of Product nodes in the database, how can this query be possible to extract all nodes? The current query has to load all product nodes into memory.

Depends on how large is large. The best way to limit this is to use LIMIT and return sets of Products at a time to your application

Related

Neo4j cypher return Path with a specific format

I have a cypher query to return the shortest Path between two nodes.
I am using Java JdbcTemplate.
MATCH (nodeA),(nodeB), p = shortestPath((nodeA)-[*..15]-(nodeB)) " //
+ "WHERE ID(nodeA) = {1} and ID(nodeB) = {2} RETURN nodes(p) as nodes " //
+ ",relationships(p) as edges
My problem is that I have a Util method that do the mapping of the jdbctempate result. It only works when my cypher returns nodes suchs as :
RETURN { id : id(node), labels : labels(node), data: node } as node
Is there a way to make my initial cypher return the result in that format?
NODES returns a list of nodes, so you can use UNWIND to unpack the list into a set of rows. So in your example...
MATCH (nodeA),(nodeB), p = shortestPath((nodeA)-[*..15]-(nodeB))
WHERE ID(nodeA) = {1} AND ID(nodeB) = {2}
UNWIND nodes(p) as n
RETURN { id : id(n), labels : labels(n), data: n} as node, relationships(p) as edges
You can use WITH + COLLECT to fold the nodes back into a list, and then perform the same operation on relationships if you need.

How to select distinct graph nodes by property

I have a database including PlayStation games and it contains games from all regions and platforms. Some of the games from different regions have the same title and platform so I would like to filter out "duplicates". At this time, I don't have region information on each game so the best I can do is filter out by game name and platform.
Is it possible to select distinct nodes by property? I seem to remember that you can return distinct rows based on a column in SQL, but it seems that Cypher applies distinct to the entire row and not just a specific column.
I would like to achieve something like the following:
MATCH (game:PSNGame) RETURN game WHERE distinct game.TitleName, distinct game.Platforms
The above query if it were valid would return all PSNGame nodes with a distinct TitleName and Platforms combination. Since the above query is not valid Cypher, I have tried returning a list of distinct TitleName/Platforms where distinct is applied to both columns.
The query I have for returning the distinct TitleName/Platforms list looks like this:
MATCH (game:PSNGame) RETURN distinct game.TitleName, game.Platforms
The JSON response from Neo4j is similar to this:
[["God of War", ["PS3", "PSVITA"]], ["God of War II", ["PS3", "PSVITA"]]]
The problem I'm facing is that the JSON response is not really an object with properties. It's more of an array of arrays. If I could get the response to be more like an object, I could deserialize without issues. I tried to deserialize as an IList<PsnGame>, but haven't had much luck.
Here's my POCO for the IList<PsnGame> implementation:
public class PsnGame
{
public string TitleName { get; set; }
public string[] Platforms { get; set; }
}
EDIT:
Here is the simplest example of my Neo4jClient query:
// helper function for handling searching by name and platform
private ICypherFluentQuery BuildPSNGamesQuery(string gameName, string platform)
{
var query = client.Cypher
.Match("(g:PSNGame)");
if (!string.IsNullOrWhiteSpace(gameName))
{
query = query.Where($"g.TitleName =~ \"(?i).*{gameName}.*\"");
if (!string.IsNullOrWhiteSpace(platform) && platform.ToLower() != "all")
{
query = query.AndWhere($"\"{platform}\" in g.Platforms");
}
}
else
{
if (!string.IsNullOrWhiteSpace(platform) && platform.ToLower() != "all")
{
query = query.Where($"\"{platform}\" in g.Platforms");
}
}
return query;
}
Distinct games:
var distinctGames = await BuildPSNGamesQuery(gameName, platform)
.With("DISTINCT g.TitleName AS TitleName, g.Platforms AS Platforms")
.With("{ TitleName: TitleName, Platforms: Platforms } as Games")
.OrderBy("TitleName")
.Return<PsnGame>("Games")
.Skip((pageNumber - 1) * pageSize)
.Limit(pageSize)
.ResultsAsync;
All games (somehow need to filter based on previous query):
var results = await BuildPSNGamesQuery(gameName, platform)
.Return(g => new Models.PSN.Composite.PsnGame
{
Game = g.As<PsnGame>()
})
.OrderBy("g.TitleName")
.Skip((pageNumber - 1) * pageSize)
.Limit(pageSize)
.ResultsAsync;
By using a map, I'm able to return the TitleName/Platforms pairing that I want, but I suspect I'll need to do a collect on the Platforms to get all platforms for a particular game title. Then I can filter the entire games list by the distinctGames that I return. However, I would prefer to perform a request and merge the queries to reduce HTTP traffic.
An example of duplicates can be seen on my website here:
https://www.gamerfootprint.com/#/games/ps
Also, the data for duplicates looks something like this:
MATCH (n:PSNGame)
WHERE n.TitleName = '1001 Spikes'
RETURN n.TitleName, n.Platforms LIMIT 25
JSON:
{
"columns":[
"n.TitleName",
"n.Platforms"
],
"data":[
{
"row":[
"1001 Spikes",
[
"PSVITA"
]
],
"graph":{
"nodes":[
],
"relationships":[
]
}
},
{
"row":[
"1001 Spikes",
[
"PS4"
]
],
"graph":{
"nodes":[
],
"relationships":[
]
}
}
],
"stats":{
"contains_updates":false,
"nodes_created":0,
"nodes_deleted":0,
"properties_set":0,
"relationships_created":0,
"relationship_deleted":0,
"labels_added":0,
"labels_removed":0,
"indexes_added":0,
"indexes_removed":0,
"constraints_added":0,
"constraints_removed":0
}
}
EDIT: 10-31-15
I was able to get distinct game title and platforms returning with the platforms for each game rolled up into a single collection. My new query is the following:
MATCH (game:PSNGame)
WITH DISTINCT game.TitleName as TitleName,
game.Platforms as coll UNWIND coll as Platforms
WITH TitleName as TitleName, COLLECT(DISTINCT Platforms) as Platforms
RETURN TitleName, Platforms
ORDER BY TitleName
Here is a small subset of the results:
{
"columns":[
"TitleName",
"Platforms"
],
"data":[
{
"row":[
"1001 Spikes",
[
"PSVITA",
"PS4"
]
],
"graph":{
"nodes":[
],
"relationships":[
]
}
}
],
"stats":{
"contains_updates":false,
"nodes_created":0,
"nodes_deleted":0,
"properties_set":0,
"relationships_created":0,
"relationship_deleted":0,
"labels_added":0,
"labels_removed":0,
"indexes_added":0,
"indexes_removed":0,
"constraints_added":0,
"constraints_removed":0
}
}
Finally, 1001 Spikes is in the list once and has both PS VITA and PS4 listed as platforms. Now, I need to figure out how to grab the full game nodes and filter against the above query.
try this one:
MATCH (game:PSNGame)
with game, collect([game.TitleName, game.Platforms]) as wow
return distinct(wow)
If I understand you correctly, you want to select different nodes by property and remove duplicates? If so, it would be something like this:
MATCH (game:PSNGame {property:'value'}) RETURN DISTINCT game.property
That should remove duplicates and return your node by property.

neo4j rest api cypher appears to return cartesian product of nodes

I've created a simple graph using the following data:
CREATE (england:Country {Name:"England",Id:1})
CREATE CONSTRAINT ON (c:Country) ASSERT c.ID IS UNIQUE
CREATE (john:King {Name:"John",Id:1})
CREATE (henry3:King {Name:"Henry III",Id:2})
CREATE (edward1:King {Name:"Edward I",Id:3})
CREATE CONSTRAINT ON (k:King) ASSERT k.ID IS UNIQUE
MATCH (country:Country),(king:King) WHERE country.Id = 1 AND king.Id = 1 CREATE (country)<-[r:KING_OF]-(king)
MATCH (country:Country),(king:King) WHERE country.Id = 1 AND king.Id = 2 CREATE (country)<-[r:KING_OF]-(king)
MATCH (country:Country),(king:King) WHERE country.Id = 1 AND king.Id = 3 CREATE (country)<-[r:KING_OF]-(king)
CREATE (cornwall:County {Name:"Cornwall",Id:1})
CREATE (devon:County {Name:"Devon",Id:2})
CREATE (somerset:County {Name:"Somerset",Id:3})
CREATE CONSTRAINT ON (c:County) ASSERT c.ID IS UNIQUE
MATCH (country:Country),(county:County) WHERE country.Id = 1 AND county.Id = 1 CREATE (country)<-[r:COUNTY_IN]-(county)
MATCH (country:Country),(county:County) WHERE country.Id = 1 AND county.Id = 2 CREATE (country)<-[r:COUNTY_IN]-(county)
MATCH (country:Country),(county:County) WHERE country.Id = 1 AND county.Id = 3 CREATE (country)<-[r:COUNTY_IN]-(county)
CREATE (bristol:City {Name:"Bristol",Id:1})
CREATE (london:City {Name:"London",Id:2})
CREATE (york:City {Name:"York",Id:3})
CREATE CONSTRAINT ON (c:City) ASSERT c.ID IS UNIQUE
MATCH (country:Country),(city:City) WHERE country.Id = 1 AND city.Id = 1 CREATE (country)<-[r:CITY_IN]-(city)
MATCH (country:Country),(city:City) WHERE country.Id = 1 AND city.Id = 2 CREATE (country)<-[r:CITY_IN]-(city)
MATCH (country:Country),(city:City) WHERE country.Id = 1 AND city.Id = 3 CREATE (country)<-[r:CITY_IN]-(city)
(Note that I use 'thiscomputer.mydomain.com' as an alias for 'localhost', which SO ridiculously bars me from using)...
If I go to http://thiscomputer.mydomain.com:7474/browser/ and execute the cypher
MATCH cities = (city:City)-[]->(c:Country {Id:1}), counties = (county:County)-[]->(c:Country {Id:1}), kings = (king:King)-[]->(c:Country {Id:1}) RETURN cities, counties, kings
...I get a nice graph centred around the country of England. No nodes are repeated.
If I fire up a REST client and execute the following:
URL: http://thiscomputer.mydomain.com:7474/db/data/transaction/commit
Verb: POST
Body:
{
"statements" : [{
"statement": "MATCH cities = (city:City)-[]->(c:Country {Id:1}), counties = (county:County)-[]->(c:Country {Id:1}), kings = (king:King)-[]->(c:Country {Id:1}) RETURN cities, counties, kings",
"resultDataContents" : [ "graph" ]
}]
}
(Note that the cypher query is identical)
I end up with a set of results that are too large to paste in here.
Basically, I appear to get the cartesian product of the results. Since there are 3 kings, 3 counties and 3 cities, there are 27 results.
This is a fairly simple graph. In the domain in which I work, the graph I want would contain an aggregate root (in this example, England), and many associated nodes of different types. If the REST API returned the cartesian product, I could end up with many thousands of results.
So my question is this: why does the REST API return the cartesian product of all the returned nodes? Is my cypher incorrect, or am I using the REST API incorrectly?
It would be much simpler if I could get results like this (pseudo-JSON):
{
Country:{
Name: "England",
Id: 1,
Cities:[
...etc
],
Counties:[
...etc
],
Kings:[
...etc
]
}
}
Or possibly even this:
{
data:{
nodes:[
{
Name:"England",
Id: 1,
uniqueIdInResults:1
},
...etc
],
relationships:[
uniqueIdInResults1: 1,
uniqueIdInResults2: 2,
type: "KING_OF"
]
}
}
In short, I think denormalising the data in the results like this can quickly result in a response which is too large. Is there any way to structure the cypher or the call to the REST API to give results with less repetition?
The query you are running in the browser is absolutely returning a Cartesian product as well - it just doesn't appear that way because the visual result will only show the nodes once.
Run the following in the browser. You'll see pretty quickly all the overlap you're getting.
MATCH cities = (city:City)-[]->(c:Country {Id:1}),
counties = (county:County)-[]->(c:Country {Id:1}),
kings = (king:King)-[]->(c:Country {Id:1})
RETURN EXTRACT(x IN NODES(cities) | x.Name),
EXTRACT(x IN NODES(counties) |x.Name),
EXTRACT(x IN NODES(kings) | x.Name)
As expected, I get a Cartesian product of 27 rows.

neo4j cypher - how to find all nodes that have a relationship to list of nodes

I have nodes- named "options". "Users" choose these options. I need a chpher query that works like this:
retrieve users who had chosen all the options those are given as a list.
MATCH (option:Option)<-[:CHOSE]-(user:User) WHERE option.Key IN ['1','2','2'] Return user
This query gives me users who chose option(1), option(2) and option(3) and also gives me the user who only chose option(2).
What I need is only the users who chose all of them -option(1), option(2) and option(3).
For an all cypher solution (don't know if it's better than Chris' answer, you'll have to test and compare) you can collect the option.Key for each user and filter out those who don't have a option.Key for each value in your list
MATCH (u:User)-[:CHOSE]->(opt:Option)
WITH u, collect(opt.Key) as optKeys
WHERE ALL (v IN {values} WHERE v IN optKeys)
RETURN u
or match all the options whose keys are in your list and the users that chose them, collect those options per user and compare the size of the option collection to the size of your list (if you don't give duplicates in your list the user with an option collection of equal size has chosen all the options)
MATCH (u:User)-[:CHOSE]->(opt:Option)
WHERE opt.Key IN {values}
WITH u, collect(opt) as opts
WHERE length(opts) = length({values}) // assuming {values} don't have duplicates
RETURN u
Either should limit results to users connected with all the options whose key values are specified in {values} and you can vary the length of the collection parameter without changing the query.
If the number of options is limited, you could do:
MATCH
(user:User)-[:Chose]->(option1:Option),
(user)-[:Chose]->(option2:Option),
(user)-[:Chose]->(option3:Option)
WHERE
option1.Key = '1'
AND option2.Key = '2'
AND option3.Key = '3'
RETURN
user.Id
Which will only return the user with all 3 options.
It's a bit rubbishy as obviously you end up with 3 lines where you have 1, but I don't know how to do what you want using the IN keyword.
If you're coding against it, it's pretty simple to generate the WHERE and MATCH clause, but still - not ideal. :(
EDIT - Example
Turns out there is some string manipulation going on here (!), but you can always cache bits. Importantly - it's using Params which would allow neo4j to cache the queries and supply faster responses with each call.
public static IEnumerable<User> GetUser(IGraphClient gc)
{
var query = GenerateCypher(gc, new[] {"1", "2", "3"});
return query.Return(user => user.As<User>()).Results;
}
public static ICypherFluentQuery GenerateCypher(IGraphClient gc, string[] options)
{
ICypherFluentQuery query = new CypherFluentQuery(gc);
for(int i = 0; i < options.Length; i++)
query = query.Match(string.Format("(user:User)-[:CHOSE]->(option{0}:Option)", i));
for (int i = 0; i < options.Length; i++)
{
string paramName = string.Format("option{0}param", i);
string whereString = string.Format("option{0}.Key = {{{1}}}", i, paramName);
query = i == 0 ? query.Where(whereString) : query.AndWhere(whereString);
query = query.WithParam(paramName, options[i]);
}
return query;
}
MATCH (user:User)-[:CHOSE]->(option:Option)
WHERE option.key IN ['1', '2', '3']
WITH user, COUNT(*) AS num_options_chosen
WHERE num_options_chosen = LENGTH(['1', '2', '3'])
RETURN user.name
This will only return users that have relationships with all the Options with the given keys in the array. This assumes there are not multiple [:CHOSE] relationships between users and options. If it is possible for a user to have multiple [:CHOSE] relationships with a single option, you'll have to add some conditionals as necessary.
I tested the above query with the below dataset:
CREATE (User1:User {name:'User 1'}),
(User2:User {name:'User 2'}),
(User3:User {name:'User 3'}),
(Option1:Option {key:'1'}),
(Option2:Option {key:'2'}),
(Option3:Option {key:'3'}),
(Option4:Option {key:'4'}),
(User1)-[:CHOSE]->(Option1),
(User1)-[:CHOSE]->(Option4),
(User2)-[:CHOSE]->(Option2),
(User2)-[:CHOSE]->(Option3),
(User3)-[:CHOSE]->(Option1),
(User3)-[:CHOSE]->(Option2),
(User3)-[:CHOSE]->(Option3),
(User3)-[:CHOSE]->(Option4)
And I get only 'User 3' as the output.
For shorter lists, you can use path predicates in your WHERE clause:
MATCH (user:User)
WHERE (user)-[:CHOSE]->(:Option { Key: '1' })
AND (user)-[:CHOSE]->(:Option { Key: '2' })
AND (user)-[:CHOSE]->(:Option { Key: '3' })
RETURN user
Advantages:
Clear to read
Easy to generate for dynamic length lists
Disadvantages:
For each different length, you will have a different query that has to be parsed and cached by Cypher. Too many dynamic queries will watch your cache hit rate go through the floor, query compilation work go up, and query performance go down.

Cypher to Neo4JClient Distinct Collect

I am trying to get a path from a base node to its root node as 1 row. The Cypher query looks like this:
start n = node:node_auto_index(Name = "user1") match path = (n-[r:IS_MEMBER_OF_GROUP*]->b) return last(collect(distinct path));
But when changing this over to the Neo4JClient syntax:
var k = clientConnection.Cypher
.Start(new { n = "node:node_auto_index(Name = 'user1')" })
.Match("path = (n-[r:IS_MEMBER_OF_GROUP*]->b)")
.ReturnDistinct<Node<Principles>>("last(collect(path))").Results;
It gets an error:
{"Value cannot be null.\r\nParameter name: uriString"}
When continuing on from there:
Neo4jClient encountered an exception while deserializing the response from the server. This is likely a bug in Neo4jClient.
Please open an issue at https://bitbucket.org/Readify/neo4jclient/issues/new
To get a reply, and track your issue, ensure you are logged in on BitBucket before submitting.
Include the full text of this exception, including this message, the stack trace, and all of the inner exception details.
Include the full type definition of Neo4jClient.Node`1[[IQS_Neo4j_TestGraph.Nodes.Principles, IQS Neo4j TestGraph, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null]].
Include this raw JSON, with any sensitive values replaced with non-sensitive equivalents:
{
"columns" : [ "last(collect(path))" ],
"data" : [ [ {
"start" : "http://localhost:7474/db/data/node/3907",
"nodes" : [ "http://localhost:7474/db/data/node/3907", "http://localhost:7474/db/data/node/3906", "http://localhost:7474/db/data/node/3905", "http://localhost:7474/db/data/node/3904" ],
"length" : 3,
"relationships" : [ "http://localhost:7474/db/data/relationship/4761", "http://localhost:7474/db/data/relationship/4762", "http://localhost:7474/db/data/relationship/4763" ],
"end" : "http://localhost:7474/db/data/node/3904"
} ] ]
}
How would one convert the cypher query to Neo4JClient query?
Well, the thing you are returning is a PathsResult not a Node<>, so if you change your query to be:
var k = clientConnection.Cypher
.Start(new { n = "node:node_auto_index(Name = 'user1')" })
.Match("path = (n-[r:IS_MEMBER_OF_GROUP*]->b)")
.ReturnDistinct<PathsResult>("last(collect(path))").Results; //<-- Change here
you will get results, this returns what I get from running your query against my db, if you specifically want the nodes this post: Getting PathsResults covers converting to actual nodes and relationships.
One other thing, (and this will help the query perform better as neo4j can cache the execution plans easier), is that you can change your start to make it use parameters by doing:
.Start(new { n = Node.ByIndexLookup("node_auto_index", "Name", "user1")})

Resources