I am still rather new to Neo4j, Cypher and programming in general.
Is there a way to access the posted output below, i.e. access the "count" values for every "item“ (which has to be the pair), and also access the "item" values? I need the amount of how often a pair, i.e. specific neighboring nodes occur not only as information, but as values with which I can further work with in order to adjust my graph.
My last lines of code (in the preceding lines I just ordered the nodes sequentially):
...
WITH apoc.coll.pairs(a) as pairsOfa
WITH apoc.coll.frequencies(pairsOfa) AS giveBackFrequencyOfPairsOfa
UNWIND giveBackFrequencyOfPairsOfa AS x
WITH DISTINCT x
RETURN x
Output from the Neo4j Browser that I need to work with:
"x"
│{"count":1,"item":[{"aName“:"Rob","time":1},{"aName":"Edwin“,"time“:2}]},{„count“:4,“item":[{"aName":"Edwin","time":2},{"aName“:"Celesta","time":3}]}
...
Based on your code, your result should contain multiple x records (not a single record, as implied by the "output" provided in your question). Here is an example of what I would expect:
╒══════════════════════════════════════════════════════════════════════╕
│"x" │
╞══════════════════════════════════════════════════════════════════════╡
│{"count":1,"item":[{"aName":"Rob","time":1},{"aName":"Edwin","time":2}│
│]} │
├──────────────────────────────────────────────────────────────────────┤
│{"count":1,"item":[{"aName":"Edwin","time":2},{"aName":"Celesta","time│
│":3}]} │
└──────────────────────────────────────────────────────────────────────┘
If that is true, then you can just access the count and item properties of each x directly via x.count and x.item. To get each value within an item, you could use x.item[0] and x.item[1].
Asides: you probably want to use apoc.coll.pairsMin instead of apoc.coll.pairs, to avoid the generation of a spurious "pair" (whose second element is null) when the number of values to be paired is odd. Also, you probably do not need the DISTINCT step.
Related
Recently, I am experimenting Neo4j. I like the idea but I am facing a problem that I have never faced with relational databases.
I want to perform these inserts and then return them exactly in the insertion order.
Insert elements:
create(p1:Person {name:"Marc"})
create(p2:Person {name:"John"})
create(p3:Person {name:"Paul"})
create(p4:Person {name:"Steve"})
create(p5:Person {name:"Andrew"})
create(p6:Person {name:"Alice"})
create(p7:Person {name:"Bob"})
While to return them:
match(p:Person) return p order by id(p)
I receive the elements in the following order:
Paul
Andrew
Marc
John
Steve
Alice
Bob
I note that these elements are not returned respecting the query insertion order (through the id function).
In fact the id of my elements are the following:
Marc: 18221
John: 18222
Paul: 18208
Steve: 18223
Andrew: 18209
Alice: 18224
Bob: 18225
How does the Neo4j id function work? I read that it generates an auto incremental id but it seems a little strange his mechanism. How do I return items respecting the query insertion order? I thought about creating a timestamp attribute for each node but I don't think it's the best choice
If you're looking to generate sequence numbers in Neo4j then you need to manage this yourself using a strategy that works best in your application.
In ours we maintain sequence numbers in key/value pair nodes where Scope is the application name given to the sequence number range, and Value is the last sequence number used. When we generate a node of a given type, such as Product, then we increment the sequence number and assign it to our new node.
MERGE (n:Sequence {Scope: 'Product'})
SET n.Value = COALESCE(n.Value, 0) + 1
WITH n.Value AS seq
CREATE (product:Product)
SET product.UniqueId = seq
With this you can create as many sequence numbers you need just by creating sequence nodes with unique scope names.
For more examples and tests see the AutoInc.Neo4j project https://github.com/neildobson-au/AutoInc/blob/master/src/AutoInc.Neo4j/Neo4jUniqueIdGenerator.cs
The id of Neo4j is maintained internally, which your business code should not depend on.
Generally it's auto incrementally, but if there is delete operation, you may reuse the deleted id according to the Reuse Policy of Neo4j Server.
I have many relationship types in the database. How do I count relationships by each type without using apoc?
Solution
MATCH ()-[relationship]->()
RETURN TYPE(relationship) AS type, COUNT(relationship) AS amount
ORDER BY amount DESC;
The first line specifies the pattern to define the relationship variable, which is used to determine type and amount in line two.
Example result
╒══════════════╤════════╕
│"type" │"amount"│
╞══════════════╪════════╡
│"BELONGS_TO" │1234567 │
├──────────────┼────────┤
│"CONTAINS" │432552 │
├──────────────┼────────┤
│"IS_PART_OF" │947227 │
├──────────────┼────────┤
│"HOLDS" │4 │
└──────────────┴────────┘
There's also a built in procedure in 3.5.x that you can use to retrieve counts, but it does take a bit of filtering to get down to those you are interested in:
CALL db.stats.retrieve('GRAPH COUNTS') YIELD data
UNWIND [data IN data.relationships WHERE NOT exists(data.startLabel) AND NOT exists(data.endLabel)] as relCount
RETURN coalesce(relCount.relationshipType, 'all') as relationshipType, relCount.count as count
Using Neo4j.
I would like to add a integer number to values already existing in properties of several relationships that I call this way:
MATCH x=(()-[y]->(s:SOL{PRB:"Taking time"})) SET y.points=+2
But it doesn't add anything, just replace by 2 the value I want to incremente.
To achieve this use
SET y.points = y.points + 2
From your original question it looks like you were trying to use the Addition Assignment operator which exists in lots of languages (e.g. python, type/javascript, C#, etc.). However, in cypher += is a little different and is designed to do this in a way which allows you to add or update properties to or on entire nodes or relationships based on a mapping.
If you had a parameter like the below (copy this into the neo4j browser to create a param).
:param someMapping: {a:1, b:2}
The query below would create a property b on the node with value 2, and set the value of property a on that node to 1.
MATCH (n:SomeLabel) WHERE n.a = 0
SET n+= $someMapping
RETURN n
I can query using Cypher in Neo4j from the Panama database the countries of three types of identity holders (I define that term) namely Entities (companies), officers (shareholders) and Intermediaries (middle companies) as three attributes/columns. Each column has single or double entries separated by colon (eg: British Virgin Islands;Russia). We want to concatenate the countries in these columns into a unique set of countries and hence obtain the count of the number of countries as new attribute.
For this, I tried the following code from my understanding of Cypher:
MATCH (BEZ2:Officer)-[:SHAREHOLDER_OF]->(BEZ1:Entity),(BEZ3:Intermediary)-[:INTERMEDIARY_OF]->(BEZ1:Entity)
WHERE BEZ1.address CONTAINS "Belize" AND
NOT ((BEZ1.countries="Belize" AND BEZ2.countries="Belize" AND BEZ3.countries="Belize") OR
(BEZ1.status IN ["Inactivated", "Dissolved shelf company", "Dissolved", "Discontinued", "Struck / Defunct / Deregistered", "Dead"]))
SET BEZ4.countries= (BEZ1.countries+","+BEZ2.countries+","+BEZ3.countries)
RETURN BEZ3.countries AS IntermediaryCountries, BEZ3.name AS
Intermediaryname, BEZ2.countries AS OfficerCountries , BEZ2.name AS
Officername, BEZ1.countries as EntityCountries, BEZ1.name AS Companyname,
BEZ1.address AS CompanyAddress,DISTINCT count(BEZ4.countries) AS NoofConnections
The relevant part is the SET statement in the 7th line and the DISTINCT count in the last line. The code shows error which makes no sense to me: Invalid input 'u': expected 'n/N'. I guess it means to use COLLECT probably but we tried that as well and it shows the error vice-versa'd between 'u' and 'n'. Please help us obtain the output that we want, it makes our job hell lot easy. Thanks in advance!
EDIT: Considering I didn't define variable as suggested by #Cybersam, I tried the command CREATE as following but it shows the error "Invalid input 'R':" for the command RETURN. This is unfathomable for me. Help really needed, thank you.
CODE 2:
MATCH (BEZ2:Officer)-[:SHAREHOLDER_OF]->(BEZ1:Entity),(BEZ3:Intermediary)-
[:INTERMEDIARY_OF]->(BEZ1:Entity)
WHERE BEZ1.address CONTAINS "Belize" AND
NOT ((BEZ1.countries="Belize" AND BEZ2.countries="Belize" AND
BEZ3.countries="Belize") OR
(BEZ1.status IN ["Inactivated", "Dissolved shelf company", "Dissolved",
"Discontinued", "Struck / Defunct / Deregistered", "Dead"]))
CREATE (p:Connections{countries:
split((BEZ1.countries+";"+BEZ2.countries+";"+BEZ3.countries),";")
RETURN BEZ3.countries AS IntermediaryCountries, BEZ3.name AS
Intermediaryname, BEZ2.countries AS OfficerCountries , BEZ2.name AS
Officername, BEZ1.countries as EntityCountries, BEZ1.name AS Companyname,
BEZ1.address AS CompanyAddress, AS TOTAL, collect (DISTINCT
COUNT(p.countries)) AS NumberofConnections
Lines 8 and 9 are the ones new and to be in examination.
First Query
You never defined the identifier BEZ4, so you cannot set a property on it.
Second Query (which should have been posted in a separate question):
You have several typos and a syntax error.
This query should not get an error (but you will have to determine if it does what you want):
MATCH (BEZ2:Officer)-[:SHAREHOLDER_OF]->(BEZ1:Entity),(BEZ3:Intermediary)- [:INTERMEDIARY_OF]->(BEZ1:Entity)
WHERE BEZ1.address CONTAINS "Belize" AND NOT ((BEZ1.countries="Belize" AND BEZ2.countries="Belize" AND BEZ3.countries="Belize") OR (BEZ1.status IN ["Inactivated", "Dissolved shelf company", "Dissolved", "Discontinued", "Struck / Defunct / Deregistered", "Dead"]))
CREATE (p:Connections {countries: split((BEZ1.countries+";"+BEZ2.countries+";"+BEZ3.countries), ";")})
RETURN BEZ3.countries AS IntermediaryCountries,
BEZ3.name AS Intermediaryname,
BEZ2.countries AS OfficerCountries ,
BEZ2.name AS Officername,
BEZ1.countries as EntityCountries,
BEZ1.name AS Companyname,
BEZ1.address AS CompanyAddress,
SIZE(p.countries) AS NumberofConnections;
Problems with the original:
The CREATE clause was missing a closing } and also a closing ).
The RETURN clause had a dangling AS TOTAL term.
collect (DISTINCT COUNT(p.countries)) was attempting to perform nested aggregation, which is not supported. In any case, even if it had worked, it probably would not have returned what you wanted. I suspect that you actually wanted the size of the p.countries collection, so that is what I used in my query.
I have nodes within a the graph with property pathway storing an array of of values ranging from
path:ko00030
path:ko00010
.
.
path:koXXXXX
As an example, (i'm going to post in the batch import format:https://github.com/jexp/batch-import/tree/20)
ko:string:koid name definition l:label pathway:string_array pathway.name:string_array
ko:K00001 E1.1.1.1, adh alcohol dehydrogenase [EC:1.1.1.1] ko path:ko00010|path:ko00071|path:ko00350|path:ko00625|path:ko00626|path:ko00641|path:ko00830
the subsequent nodes might have a different combination of pathway values.
How do i query using CYPHER to retrieve all nodes with path:ko00010 in pathway
the closest i've gotten is using the solution provided for a different problem:
How to check array property in neo4j?
match (n:ko)--cpd
Where has(n.pathway) and all ( m in n.pathway where m in ["path:ko00010"])
return n,cpd;
but here only nodes with pathways matching exactly to the list provided are returned.
ie. if i were to query path:ko00010 like in the example above, I'll only be able to retrieve nodes holding path:ko00010 as the only element in the pathway property and not nodes containing path:ko00010 as well as other path:koXXXXX
In your query the extension of the predicate ALL is all the values in the property array, meaning that your query will only return those cases where every value in the pathway property array are found in the literal array ["path:ko00010"]. If I understand you right you want the opposite, you want to test that all values in the literal array ["path:ko00010"] are found in the property array pathway. If that's indeed what you want you can just switch their places, your WHERE clause will then be
WHERE HAS(n.pathway) AND ALL (m IN ["path:ko00010"] WHERE m IN n.pathway)
It is not strictly correct to say that your query only matches cases where the array you ask for and the property array are exactly the same. You could have had more than one value in the literal array, something like ["path:ko00010","path:ko00020"], and nodes with only one one of those values in their pathway array would also have matched–as long as all values in the property array could be found in the literal array. Conversely, with the altered WHERE filter that I've suggested, the query will match any node that has all of the values of the literal array in their pathway property.
If you want to filter the matched patterns with an array of values where all of them have to be present, this is good. In your example you only use one value, however, and for those queries there is no reason to use an array and the ALL predicate. You can simply do
WHERE HAS(n.pathway) and "path:ko00010" IN n.pathway
If in some context you want to include results where any of a set of values are found in the pathway property array you can just switch from ALL to ANY
WHERE HAS(n.pathway) AND ANY (m IN ["path:ko00010","path:ko00020"] WHERE m IN n.pathway)
Also, you probably don't need to check for the presence of the pathway property, unless you have some special use for it you should be fine without the HAS(n.pathway).
And once you've got the queries working right, try to switch out literal strings and arrays for parameters!
WHERE {value} IN n.pathway
// or
WHERE ALL (m IN {value_array} WHERE m IN n.pathway)
// or
WHERE ANY (m IN {value_array} WHERE m IN n.pathway)