How to concatenate three columns into one and obtain count of unique entries among them using Cypher neo4j? - neo4j

I can query using Cypher in Neo4j from the Panama database the countries of three types of identity holders (I define that term) namely Entities (companies), officers (shareholders) and Intermediaries (middle companies) as three attributes/columns. Each column has single or double entries separated by colon (eg: British Virgin Islands;Russia). We want to concatenate the countries in these columns into a unique set of countries and hence obtain the count of the number of countries as new attribute.
For this, I tried the following code from my understanding of Cypher:
MATCH (BEZ2:Officer)-[:SHAREHOLDER_OF]->(BEZ1:Entity),(BEZ3:Intermediary)-[:INTERMEDIARY_OF]->(BEZ1:Entity)
WHERE BEZ1.address CONTAINS "Belize" AND
NOT ((BEZ1.countries="Belize" AND BEZ2.countries="Belize" AND BEZ3.countries="Belize") OR
(BEZ1.status IN ["Inactivated", "Dissolved shelf company", "Dissolved", "Discontinued", "Struck / Defunct / Deregistered", "Dead"]))
SET BEZ4.countries= (BEZ1.countries+","+BEZ2.countries+","+BEZ3.countries)
RETURN BEZ3.countries AS IntermediaryCountries, BEZ3.name AS
Intermediaryname, BEZ2.countries AS OfficerCountries , BEZ2.name AS
Officername, BEZ1.countries as EntityCountries, BEZ1.name AS Companyname,
BEZ1.address AS CompanyAddress,DISTINCT count(BEZ4.countries) AS NoofConnections
The relevant part is the SET statement in the 7th line and the DISTINCT count in the last line. The code shows error which makes no sense to me: Invalid input 'u': expected 'n/N'. I guess it means to use COLLECT probably but we tried that as well and it shows the error vice-versa'd between 'u' and 'n'. Please help us obtain the output that we want, it makes our job hell lot easy. Thanks in advance!
EDIT: Considering I didn't define variable as suggested by #Cybersam, I tried the command CREATE as following but it shows the error "Invalid input 'R':" for the command RETURN. This is unfathomable for me. Help really needed, thank you.
CODE 2:
MATCH (BEZ2:Officer)-[:SHAREHOLDER_OF]->(BEZ1:Entity),(BEZ3:Intermediary)-
[:INTERMEDIARY_OF]->(BEZ1:Entity)
WHERE BEZ1.address CONTAINS "Belize" AND
NOT ((BEZ1.countries="Belize" AND BEZ2.countries="Belize" AND
BEZ3.countries="Belize") OR
(BEZ1.status IN ["Inactivated", "Dissolved shelf company", "Dissolved",
"Discontinued", "Struck / Defunct / Deregistered", "Dead"]))
CREATE (p:Connections{countries:
split((BEZ1.countries+";"+BEZ2.countries+";"+BEZ3.countries),";")
RETURN BEZ3.countries AS IntermediaryCountries, BEZ3.name AS
Intermediaryname, BEZ2.countries AS OfficerCountries , BEZ2.name AS
Officername, BEZ1.countries as EntityCountries, BEZ1.name AS Companyname,
BEZ1.address AS CompanyAddress, AS TOTAL, collect (DISTINCT
COUNT(p.countries)) AS NumberofConnections
Lines 8 and 9 are the ones new and to be in examination.

First Query
You never defined the identifier BEZ4, so you cannot set a property on it.
Second Query (which should have been posted in a separate question):
You have several typos and a syntax error.
This query should not get an error (but you will have to determine if it does what you want):
MATCH (BEZ2:Officer)-[:SHAREHOLDER_OF]->(BEZ1:Entity),(BEZ3:Intermediary)- [:INTERMEDIARY_OF]->(BEZ1:Entity)
WHERE BEZ1.address CONTAINS "Belize" AND NOT ((BEZ1.countries="Belize" AND BEZ2.countries="Belize" AND BEZ3.countries="Belize") OR (BEZ1.status IN ["Inactivated", "Dissolved shelf company", "Dissolved", "Discontinued", "Struck / Defunct / Deregistered", "Dead"]))
CREATE (p:Connections {countries: split((BEZ1.countries+";"+BEZ2.countries+";"+BEZ3.countries), ";")})
RETURN BEZ3.countries AS IntermediaryCountries,
BEZ3.name AS Intermediaryname,
BEZ2.countries AS OfficerCountries ,
BEZ2.name AS Officername,
BEZ1.countries as EntityCountries,
BEZ1.name AS Companyname,
BEZ1.address AS CompanyAddress,
SIZE(p.countries) AS NumberofConnections;
Problems with the original:
The CREATE clause was missing a closing } and also a closing ).
The RETURN clause had a dangling AS TOTAL term.
collect (DISTINCT COUNT(p.countries)) was attempting to perform nested aggregation, which is not supported. In any case, even if it had worked, it probably would not have returned what you wanted. I suspect that you actually wanted the size of the p.countries collection, so that is what I used in my query.

Related

(Neo4j 3.5) Executing query with two parts and the second one consists on a group of matches where at least one needs to be fulfilled

we are trying to run a cypher query but we are not able to get the results we want.
It is important to note that we cannot make this work with subqueries because we are using Neo4j 3.5 and in this version, they are still not available.
The problem is that we have two parts for a query, the first one will fix the variables for the second part, and the second part consists of multiple matches, and it has to get the results of the previous query and if at least one of the matches return a result for a row, this row is not filtered out, else if none return result it is discarded.
More specifically, the query we are trying to run is similar to the following one:
//First part of the query where we want to fix variables with the match and where
MATCH (u:User)-[:ASSIGNED_TO]->(t:Task)-[:PENDING]->(ob:Object)<-[:HAS_OPEN_OBJECT]-(do:DataObject)<-[:ASOCIATED]-(:Module)-[:CAN_LIST]->(view:WidgetObject)
WHERE u.uid = 'user_uid'
AND view.uid = 'view_uid'
AND view.object_name = do.object_type
with do, t, ob
// In this second part of the query we want to maintain the variables of the previous part and if at least one matches the value should be returned
// we have tried with UNION but we will need pagination, but even with union it's not working
MATCH (ac:Action)<-[:ASOCIATED]-(t)-[rel:COMPLETED|PENDING]->(ob)<-[:HAS_OPEN_OBJECT|HAS_CLOSED_OBJECT]-(do)
WHERE ac.name CONTAINS 'body'
WITH COLLECT({data_object_uid: do.uid}) as act_filter
MATCH (c:Comment)<-[:COMMENTED]-(t)-[rel:COMPLETED|PENDING]->(ob)<-[:HAS_OPEN_OBJECT|HAS_CLOSED_OBJECT]-(do)
WHERE c.body CONTAINS 'body'
WITH act_filter + COLLECT({data_object_uid: do.uid}) as comment_filter
MATCH (at:Attachment)<-[:HAS_ATTACHMENT]-(t)-[rel:COMPLETED|PENDING]->(ob)<-[:HAS_OPEN_OBJECT|HAS_CLOSED_OBJECT]-(do)
WHERE at.name CONTAINS 'body'
WITH comment_filter + COLLECT({data_object_uid: do.uid}) as attachment_filter
UNWIND attachment_filter as row
return row.data_object_uid
We are not sure if in the second part, the second and third matches are maintaining the same subset of results coming from the first part of the query.
A funny behavior we have found is that if we remove the last match we are getting results but if we add it, we are not getting any results. We do not understand this behavior because if the second match is returning results and they are stored in a variable after a collect, appending this to the next collected results should return something.
For example, if the second match returns as comment_filter [{data_object_uid: "343dienmd3-dasd"}] and the third match is not returning anything, after the concatenation in the WITH clause it should return the same thing, but the result is empty.
We need some light here, we don't know if we are close and we are making a stupid mistake or we are getting all wrong and we need to change the approach completely.
Since you do not know which of the three matches in the second part will yield results, I would try something along the lines below:
NOTE I used ASSOCIATED instead of ASOCIATED
MATCH (n)<-[:ASSOCIATED|COMMENTED|HAS_ATTACHMENT]-(t)-[rel:COMPLETED|PENDING]->(ob)<-[:HAS_OPEN_OBJECT|HAS_CLOSED_OBJECT]-(do)
WHERE
(n:Action AND n.name CONTAINS 'body')
OR
(n:Comment AND n.body CONTAINS 'body')
OR
(n:Attachment AND n.name CONTAINS 'body')
RETURN COLLECT(DISTINCT {data_object_uid: do.uid})

Resolve DGET function "More than one match found" error

I am trying to match a list of range with certain criteria in a google spreadsheet. I am using DGET function for the same. Everything is working fine but the problem comes when there are many entries that contain the whole string and I receive "More than one match found in DGET evaluation.".
For the better understanding look below:
Sheet "Form Responses 1":
B
-------
Ronald
Ronaldo
Ronaldinho
Rebarto
Matching sheet entries:
A
------
Ronald
Rebarto
Juhino
My Formula is:
=DGET('Form Responses 1'!B:H,"Date",{"Email Address","Logging In or Logging out ?","Date";A2,$B$1,$H$1})
Now the problem is Ronald is matching with "Ronald","Ronaldo" and "Ronaldinho" and I am receiving the error which says "multiple entries found".
How do we solve this?
I solved the problem by Concatenating a constant variable before and after the name. For example Ronaldo becomes mRonamdom and Ronald becomes mRonaldm. This makes the Names Unique and solves the problem.
If you don't want to modify the data but to fix the formula so it doesn't get confused with similar entries in your database parameter you can add a character to the criteria field of the dget function as shown below (I'm using an '=' sign concatenated to the value I want to match with in the database parameter)
=dget(database!$A$1:$B$11,$M$1,{"columnName";"="&F2})
where
A1:B11 is my database
M1 is the matching column name
and "="&F2 is the field with the caracter I chose that I want to match with to retrieve values from the matching database column, now even if the there are more than one matches found (becuase matching substrings"), the addition of the caracter contatenated with the matching value, should take care of the in-accurate error.

Auto increment id Neo4j to retrieve elements in insert order

Recently, I am experimenting Neo4j. I like the idea but I am facing a problem that I have never faced with relational databases.
I want to perform these inserts and then return them exactly in the insertion order.
Insert elements:
create(p1:Person {name:"Marc"})
create(p2:Person {name:"John"})
create(p3:Person {name:"Paul"})
create(p4:Person {name:"Steve"})
create(p5:Person {name:"Andrew"})
create(p6:Person {name:"Alice"})
create(p7:Person {name:"Bob"})
While to return them:
match(p:Person) return p order by id(p)
I receive the elements in the following order:
Paul
Andrew
Marc
John
Steve
Alice
Bob
I note that these elements are not returned respecting the query insertion order (through the id function).
In fact the id of my elements are the following:
Marc: 18221
John: 18222
Paul: 18208
Steve: 18223
Andrew: 18209
Alice: 18224
Bob: 18225
How does the Neo4j id function work? I read that it generates an auto incremental id but it seems a little strange his mechanism. How do I return items respecting the query insertion order? I thought about creating a timestamp attribute for each node but I don't think it's the best choice
If you're looking to generate sequence numbers in Neo4j then you need to manage this yourself using a strategy that works best in your application.
In ours we maintain sequence numbers in key/value pair nodes where Scope is the application name given to the sequence number range, and Value is the last sequence number used. When we generate a node of a given type, such as Product, then we increment the sequence number and assign it to our new node.
MERGE (n:Sequence {Scope: 'Product'})
SET n.Value = COALESCE(n.Value, 0) + 1
WITH n.Value AS seq
CREATE (product:Product)
SET product.UniqueId = seq
With this you can create as many sequence numbers you need just by creating sequence nodes with unique scope names.
For more examples and tests see the AutoInc.Neo4j project https://github.com/neildobson-au/AutoInc/blob/master/src/AutoInc.Neo4j/Neo4jUniqueIdGenerator.cs
The id of Neo4j is maintained internally, which your business code should not depend on.
Generally it's auto incrementally, but if there is delete operation, you may reuse the deleted id according to the Reuse Policy of Neo4j Server.

GSheets - How to query a partial string

I am currently using this formula to get all the data from everyone whose first name is "Peter", but my problem is that if someone is called "Simon Peter" this data is gonna show up on the formula output.
=QUERY('Data'!1:1000,"select * where B contains 'Peter'")
I know that for the other formulas if I add an * to the String this issue is resolved. But in this situation for the QUERY formula the same logic do not applies.
Do someone knows the correct syntax or a workaround?
How about classic SQL syntax
=QUERY('Data'!1:1000,"select * where B like 'Peter %'")
The LIKE keyword allows use of wildcard % to represent characters relative to the known parts of the searched string.
See the query reference: developers.google.com/chart/interactive/docs/querylanguage You could split firstname and lastname into separate columns, then only search for firstnames exactly equal to 'Peter'. Though you may want to also check if lowercase/uppercase where lower(B) contains 'peter' or whitespaces are present in unexpected places (e.g., trim()). You could also search only for values that start with Peter by using starts with instead of contains, or a regular expression using matches. – Brian D
It seems that for my case using 'starts with' is a perfect fit. Thank you!

Neo4j variable-length pattern matching tunning

Query:
PROFILE
MATCH(node:Symptom) WHERE node.symptom =~ '.*adult male.*|.*151.*'
WITH node
MATCH (node)-[*1..2]-(result:Disease)
RETURN result
Profile:
enter image description here
Problems:
There are over 40 thousand "Symptom" nodes in the database, and the query is very slow because of the part - "[*1..2]".
It only took 4 seconds when length is 1, i.e "[*1]", but it will take about 30 seconds when length is 2, i.e "[*1..2]".
Is there any way to tune this query???
Firstly your query is using the regex operator, and it can't use indexes. You should use the CONTAINS operator instead :
MATCH (node:Symptom)
WHERE node.symptom CONTAINS 'adult male' OR node.symptom CONTAINS '151'
RETURN node
And you can create an index :CREATE INDEX ON :Symptom(symptom)
For the second part of your query, as it, there is nothing to do ... it's due to the complexity you are asking to do.
So to have better performances, you should think to :
put the relationship type on the pattern to reduce the number returned path : (node)-[*1..2:MY_REL_TYPE]-(result:Disease)
put the direction on the relationship on the pattern to reduce the number returned path : (node)-[*1..2:MY_REL_TYPE]->(result:Disease)
find an other way to reduce this complexity (filter on a property of the relationship , review your model, etc)
For your information, you can directly write your query in one step (ie. without the WITH, but in your case performances should be the same) :
MATCH (node:Symptom)-[*1..2]-(result:Disease)
WHERE node.symptom CONTAINS 'adult male' OR node.symptom CONTAINS '151'
RETURN result

Resources