How to select distinct graph nodes by property - neo4j

I have a database including PlayStation games and it contains games from all regions and platforms. Some of the games from different regions have the same title and platform so I would like to filter out "duplicates". At this time, I don't have region information on each game so the best I can do is filter out by game name and platform.
Is it possible to select distinct nodes by property? I seem to remember that you can return distinct rows based on a column in SQL, but it seems that Cypher applies distinct to the entire row and not just a specific column.
I would like to achieve something like the following:
MATCH (game:PSNGame) RETURN game WHERE distinct game.TitleName, distinct game.Platforms
The above query if it were valid would return all PSNGame nodes with a distinct TitleName and Platforms combination. Since the above query is not valid Cypher, I have tried returning a list of distinct TitleName/Platforms where distinct is applied to both columns.
The query I have for returning the distinct TitleName/Platforms list looks like this:
MATCH (game:PSNGame) RETURN distinct game.TitleName, game.Platforms
The JSON response from Neo4j is similar to this:
[["God of War", ["PS3", "PSVITA"]], ["God of War II", ["PS3", "PSVITA"]]]
The problem I'm facing is that the JSON response is not really an object with properties. It's more of an array of arrays. If I could get the response to be more like an object, I could deserialize without issues. I tried to deserialize as an IList<PsnGame>, but haven't had much luck.
Here's my POCO for the IList<PsnGame> implementation:
public class PsnGame
{
public string TitleName { get; set; }
public string[] Platforms { get; set; }
}
EDIT:
Here is the simplest example of my Neo4jClient query:
// helper function for handling searching by name and platform
private ICypherFluentQuery BuildPSNGamesQuery(string gameName, string platform)
{
var query = client.Cypher
.Match("(g:PSNGame)");
if (!string.IsNullOrWhiteSpace(gameName))
{
query = query.Where($"g.TitleName =~ \"(?i).*{gameName}.*\"");
if (!string.IsNullOrWhiteSpace(platform) && platform.ToLower() != "all")
{
query = query.AndWhere($"\"{platform}\" in g.Platforms");
}
}
else
{
if (!string.IsNullOrWhiteSpace(platform) && platform.ToLower() != "all")
{
query = query.Where($"\"{platform}\" in g.Platforms");
}
}
return query;
}
Distinct games:
var distinctGames = await BuildPSNGamesQuery(gameName, platform)
.With("DISTINCT g.TitleName AS TitleName, g.Platforms AS Platforms")
.With("{ TitleName: TitleName, Platforms: Platforms } as Games")
.OrderBy("TitleName")
.Return<PsnGame>("Games")
.Skip((pageNumber - 1) * pageSize)
.Limit(pageSize)
.ResultsAsync;
All games (somehow need to filter based on previous query):
var results = await BuildPSNGamesQuery(gameName, platform)
.Return(g => new Models.PSN.Composite.PsnGame
{
Game = g.As<PsnGame>()
})
.OrderBy("g.TitleName")
.Skip((pageNumber - 1) * pageSize)
.Limit(pageSize)
.ResultsAsync;
By using a map, I'm able to return the TitleName/Platforms pairing that I want, but I suspect I'll need to do a collect on the Platforms to get all platforms for a particular game title. Then I can filter the entire games list by the distinctGames that I return. However, I would prefer to perform a request and merge the queries to reduce HTTP traffic.
An example of duplicates can be seen on my website here:
https://www.gamerfootprint.com/#/games/ps
Also, the data for duplicates looks something like this:
MATCH (n:PSNGame)
WHERE n.TitleName = '1001 Spikes'
RETURN n.TitleName, n.Platforms LIMIT 25
JSON:
{
"columns":[
"n.TitleName",
"n.Platforms"
],
"data":[
{
"row":[
"1001 Spikes",
[
"PSVITA"
]
],
"graph":{
"nodes":[
],
"relationships":[
]
}
},
{
"row":[
"1001 Spikes",
[
"PS4"
]
],
"graph":{
"nodes":[
],
"relationships":[
]
}
}
],
"stats":{
"contains_updates":false,
"nodes_created":0,
"nodes_deleted":0,
"properties_set":0,
"relationships_created":0,
"relationship_deleted":0,
"labels_added":0,
"labels_removed":0,
"indexes_added":0,
"indexes_removed":0,
"constraints_added":0,
"constraints_removed":0
}
}
EDIT: 10-31-15
I was able to get distinct game title and platforms returning with the platforms for each game rolled up into a single collection. My new query is the following:
MATCH (game:PSNGame)
WITH DISTINCT game.TitleName as TitleName,
game.Platforms as coll UNWIND coll as Platforms
WITH TitleName as TitleName, COLLECT(DISTINCT Platforms) as Platforms
RETURN TitleName, Platforms
ORDER BY TitleName
Here is a small subset of the results:
{
"columns":[
"TitleName",
"Platforms"
],
"data":[
{
"row":[
"1001 Spikes",
[
"PSVITA",
"PS4"
]
],
"graph":{
"nodes":[
],
"relationships":[
]
}
}
],
"stats":{
"contains_updates":false,
"nodes_created":0,
"nodes_deleted":0,
"properties_set":0,
"relationships_created":0,
"relationship_deleted":0,
"labels_added":0,
"labels_removed":0,
"indexes_added":0,
"indexes_removed":0,
"constraints_added":0,
"constraints_removed":0
}
}
Finally, 1001 Spikes is in the list once and has both PS VITA and PS4 listed as platforms. Now, I need to figure out how to grab the full game nodes and filter against the above query.

try this one:
MATCH (game:PSNGame)
with game, collect([game.TitleName, game.Platforms]) as wow
return distinct(wow)

If I understand you correctly, you want to select different nodes by property and remove duplicates? If so, it would be something like this:
MATCH (game:PSNGame {property:'value'}) RETURN DISTINCT game.property
That should remove duplicates and return your node by property.

Related

Processing temporary properties returned by map projections

Consider the following database schema:
type Actor {
actorId: ID!
name: String
movies: [Movie!]! #relationship(type: "ACTED_IN", direction: OUT)
}
type Movie {
movieId: ID!
title: String
description: String
year: Int
actors(limit: Int = 10): [Actor!]! #relationship(type: "ACTED_IN", direction: IN)
}
Now, I want to know what the top movies with most number of actors, along with the counts. The following Cypher works perfectly in Neo4j:
type Query {
getMoviesWithMostActors(limit: Int = 5): [Movie]
(
statement: """
MATCH (movie:Movie)
MATCH (movie) <-[act:ACTED_IN]- (:Actor)
WITH movie, count(act) AS actorCount
ORDER BY actorCount DESCENDING
LIMIT $limit
RETURN movie {.*, numActors: actorCount}
"""
)
}
However, it fails in GraphQL playground. I tried the following:
query {
this_works: getMoviesWithMostActorsbase(limit: 2) {
movieId
}
this_does_not_work: getMoviesWithMostActorsbase(limit: 2) {
movieId
numActors
}
}
This returned: GRAPHQL_VALIDATION_FAILED.
"GraphQLError: Cannot query field \"numActors\" on type \"Movie\"."
My question is how do I return temporary properties without modifying the type definitions itself. And since this is a dummy example and in fact I need to do it with multiple types of nodes and with different types of scoring (int/float/array), I want to know how to do it without frequently editing the schema whenever I want to add a new query.
Known Workarounds
Extend schema with multiple nullable properties.
Schema needs to be changed with every new query
Return a map and simulate a node object, as shown below.
Need to add 2 more types for every single type of nodes
Actual object types are lost
type MovieData {
identity: Int!
labels: [String]!
properties: Movie!
}
type MovieWithScore {
movieData: MovieData!
movieScore: String!
}
type Query {
getMoviesWithMostActors(limit: Int = 5): [MovieWithScore]
(
statement: """
MATCH (movie:Movie)
MATCH (movie) <-[act:ACTED_IN]- (:Actor)
WITH movie, count(act) AS actorCount
ORDER BY actorCount DESCENDING
LIMIT $limit
RETURN {
movieData: movie,
movieScore: 'number of actors: ' + toString(actorCount)
}
"""
)
}

Returning Neo4j map projection with WebFlux

I have the nodes user and game with some relationships between them.
My REST API should return all relationships between the games and 1 user.
The cypher query i use is:
MATCH (u:User {id: '1234'} ) -[rel]- (game:Game) return game{.*, relationships: collect(DISTINCT rel)}
In my Neo4j Browser, everything works as expected and i see all properties i need.
But the GetMapping retuns everything except the relationship properties.
Neo4j Browser
{
"relationships": [
{
"identity": 54,
"start": 9,
"end": 8,
"type": "OWNED",
"properties": {
"ownedDate": "2021-07-03"
}
},
{
"identity": 45,
"start": 9,
"end": 8,
"type": "PLAYED",
"properties": {
"times": 5
}
}
],
"name": "Blood Rage",
"state": "ACTIVE",
"id": "1c152c91-4044-41f0-9208-0c436d6f6480",
"gameUrl": "https://asmodee.de/blood-rage"
}
GetMapping result (As you can see, the relationships are empty, but i have more empty JsonObjects, when there are more Relationships
{
"game": {
"relationships": [
{},
{}
],
"name": "Blood Rage",
"gameUrl": "https://asmodee.de/blood-rage",
"state": "ACTIVE",
"id": "1c152c91-4044-41f0-9208-0c436d6f6480"
}
}
The GetMapping is:
...
final ReactiveNeo4jClient client;
...
...
...
#GetMapping(value = { "/{id}/games"})
#RolesAllowed({"user", "admin"})
Flux<Map<String, Object>> findGamesByUser(#PathVariable String id){
String query = "MATCH (uuser:User {id: '" + id + "'} ) -[rel]- (game:Game) return game{.*, relationships: collect(DISTINCT rel)}";
return client.query(query).fetch().all();
}
A RelationshipProperty-Example
#RelationshipProperties
#Data
#Builder
public class PlayedGame {
#Id
#GeneratedValue
private Long relationshipId;
#Property
int times = 0;
#TargetNode private GameEntity game;
public int addPlay(){
this.times = this.times + 1;
return this.times;
}
}
What do i have to change in my GetMapping to show the relationship-properties?
Thank you,
Kevin
You need to return the actual nodes and relationships, otherwise you're missing the id-mapping.
There should be examples in the SDN docs.
Best if you have a small reproducible example (e.g. with the default movies graph).
Not sure if there is something off in your SDN setup, in general for such simple queries you should be able to just use a repository and not need to write cypher queries by hand.
The general information given by Michael is correct but there is more in you question:
First of all the meta domain model is completely ignored if you are using the Neo4jClient. It does not automatically map anything back but uses the driver's types.
As a result you will end up with an (current state of this answer) InternalRelationship which does not have any getter-methods.
I assume that you are serializing the result in the application with Jackson. This is the reason why you see objects that represent the relationships but without any content within.
If you want to get things mapped for you, create also the domain objects properly and use (at least) the Neo4jTemplate with your query.
If you model User, Game, and the relationship properties like PlayedGame correctly, a
neo4jTemplate.findAll("MATCH (u:User)<-[rel]-(g:Game) return u, collect(rel), collect(g)", User.class)
will map the results properly. Also if this is all you have, you could also skip the custom query at all and use
neo4jTemplate.findAll(User.class)
or
neo4jTemplate.findById(useId, User.class)

How to make this query more efficient in neo4j?

headers = {'Accept': 'application/json;charset=UTF-8','Content-Type':'application/json'}
data = {
"statements" : [
{
"statement" : "MATCH (n:Product) RETURN n.name, name.id",
# "parameters" : { "nproduct" : 5 }
} ]
}
r = requests.post(URL, headers = headers,json=data)
data = r.json()['results'][0]['data']
I want to extract nodes from neo4j database. If I have a large amount of Product nodes in the database, how can this query be possible to extract all nodes? The current query has to load all product nodes into memory.
Depends on how large is large. The best way to limit this is to use LIMIT and return sets of Products at a time to your application

Rails PSQL query JSON for nested array and objects

So I have a json (in text field) and I'm using postgresql and I need to query the field but it's nested a bit deep. Here's the format:
[
{
"name":"First Things",
"items":[
{
"name":"Foo Bar Item 1",
"price":"10.00"
},
{
"name":"Foo Item 2",
"price":"20.00"
}
]
},
{
"name":"Second Things",
"items": [
{
"name":"Bar Item 3",
"price":"15.00"
}
]
}
]
And I need to query the name INSIDE the items node. I have tried some queries but to no avail, like:
.where('this_json::JSON #> [{"items": [{"name": ?}]}]', "%#{name}%"). How should I go about here?
I can query normal JSON format like this_json::JSON -> 'key' = ? but need help with this bit.
Here you need to use json_array_elements() twice, as your top level document contains array of json, than items key has array of sub documents. Sample psql query may be the following:
SELECT
item->>'name' AS item_name,
item->>'price' AS item_price
FROM t,
json_array_elements(t.v) js_val,
json_array_elements(js_val->'items') item;
where t - is the name of your table, v - name of your JSON column.

Correct pagination withCriteria while looking at many-to-one table (Grails)

There is a paginated search for Product table. The search looks at Product objects and also at ProductParam objects.
The problem is there are duplicate Products in search result: having 25 items per page, two thirds are duplicated Products.
If applying resultTransformer org.hibernate.Criteria.DISTINCT_ROOT_ENTITY in criteria builder, the resulted items per page are fewer than 25 - it's about 6.
In both cases, the search is broken. Could that issue be solved? (Without rewriting the code completely.)
class Product {
String name
static hasMany = [ params: ProductParam ]
}
class ProductParam {
String key
String value
static belongsTo = [ product: Product ]
}
HibernateCriteriaBuilder criteriaBuilder = Product.createCriteria()
PagedResultList results = criteriaBuilder.list(max: 25, offset: offset) {
or {
// searching in Product
ilike 'name', "%${query}%"
// searching in ProductParam
createAlias('params', 'pp')
ilike 'pp.value', "%${query}%"
}
//resultTransformer org.hibernate.Criteria.DISTINCT_ROOT_ENTITY
}
Grails 2.2.0, Postgres
Your code looks a bit overcomplicated :)
I'd put it like that:
def results = Product.withCriteria{
projections{ distinct 'id' }
or{
eq 'name', "%${query}%" // do you really mean *eq* here, not *ilike*?
params{
ilike 'value', "%${query}%"
}
}
maxResults 25
firstResult offset
}

Resources