Neo4j Cypher NULL values and descending sorting - neo4j

I have an auditing filed for all of my entities:
createDate
updateDate
I always initialize createDate during the entity creation but updateDate can contain NULL until the first update.
I have to implement sorting feature over these fields.
With createDate everything works fine but with updateDate I have issues.
In case of a mixed set of NULLs and Dates in updateDate during the descending sort, I have the NULLs first and this is not the something I'm expecting here.
I understand that according to the Neo4j documentation, this is an expecting behavior - When sorting the result set, null will always come at the end of the result set for ascending sorting, and first when doing descending sort. but I don't know right now how to implement the proper sorting from the user perspective where the user will see the latest updated documents at the top of the list. Some time ago I have even created GitHub issue for this feature https://github.com/opencypher/openCypher/issues/238
One workaround I can see here - is to populate also updateDate together with createDate during the entity creation but I really hate this solution.
Are there any other solutions in order to properly implement this task?

You can try using the coalesce() function. It will return the first non-null value in the list of expressions passed to it.
MATCH (n:Node)
RETURN n
ORDER BY coalesce(n.updateDate, 0) DESC
EDIT:
From comments:
on the database level it is something like this: "updateDate":
"2017-09-07T22:27:11.012Z". On the SDN4 level it is a Java -
java.util.Date type
In this case you can change the 0 by a date representing an Start-Of-Time constant (like "1970-01-01T00:00:00.000Z").
MATCH (n:Node)
RETURN n
ORDER BY coalesce(n.updateDate, "1970-01-01T00:00:00.000Z") DESC

I'd just use the createDate as the updateDate when updateDate IS NULL:
MATCH (n:Node)
RETURN n
ORDER BY coalesce(n.updateDate, n.createDate) DESC

You may want to consider storing your ISO 8601 timestamp strings as (millisecond) integers instead. That could make most queries that involve datetime manipulations more efficient (or even possible), and would also use up less DB space compared to the equivalent string.
One way to do that conversion is to use the APOC function apoc.date.parse. For example, this converts 2017-09-07T22:27:11.012Z to an integer (in millisecond units):
apoc.date.parse('2017-09-07T22:27:11.012Z', 'ms', "yyyy-MM-dd'T'HH:mm:ss.SSSX")
With this change to your data model, you could also initialize updateDate to 0 at node creation time. This would allow you to avoid having to use COALESCE(n.updateDate, 0) for sorting purposes (as suggested by #Bruno Peres),
and the 0 value would serve as an indication that the node was never updated.
(But the drawback would be that all nodes would have an updateDate property, even the ones that were never updated.)

Related

Datetime comparison query doesn't return any results

I'm trying to get a simple date-time comparison to work, but the query doesn't return any results.
The query is
MATCH (n:Event) WHERE n.start_datetime > datetime("2019-06-01T18:40:32.142+0100") RETURN n.start_datetime
According to this documentation page, this type of comparisons should work. I've also tried creating the datetime object explicitly, for instance with datetime({year: 2019, month: 7}).
I've checked that the start_datetime is in fact well formatted, by checking if the values start_datetime.year for example was correct, and couldn't find any error.
Given that all the records in the database are from 2021, the above query should return every event, yet is returning nothing.
Doing the query using only the year comparison instead of doing full datetime comparison works:
MATCH (n:Event) WHERE n.start_datetime.year > datetime("2019-06-01T18:40:32.142+0100").year RETURN n.start_datetime
Double check the data type of start_datetime. It can be either in epoch seconds or epoch milliseconds. You need to convert the epoch format to datetime, so that both are on the same data type. The reason that your 2nd query works (.year) is because .year returns an integer value.
Run below to get samples:
MATCH (n:Event)
RETURN distinct n.start_datetime LIMIT 5
Then if you see that it is 10 digits then it is in epochSeconds. If yes, then run below query:
MATCH (n:Event)
WHERE n.start_datetime is not null
AND datetime({epochSeconds: n.start_datetime}) > datetime("2019-06-01T18:40:32.142+0100")
RETURN n.start_datetime
LIMIT 25
It turns out the error was due to the timezone. Neo4j had saved the properties as LocalDateTime, which apparently can't be compared to ZonedDateTime.
I used py2neo for most of the nodes management, and the solution was to give a specific timezone to the python property. This was done (in my case) using:
datetime.datetime.fromtimestamp(kwargs["end"], pytz.UTC)
After that, I was able to do the comparisons.
Hopes this saves a couple of hours to future developers.

How to filter out non-null path between nodes in Neo4J/Cypher

My current graph monitors board members at a company through time.
However, I'm only interested in currently employed directors. This can be observed because director nodes connect to company nodes through an employment path which includes an end date (r.to) when the director is no longer employed at the firm. If he is currently employed, there will be no end date(null as per below picture). Therefore, I would like to filter the path not containing an end date. I am not sure if the value is an empty string, a null value, or other types so I've been trying different ways without much success. Thanks for any tips!
Current formula
MATCH (c2:Company)-[r2:MANAGED]-(d:Director)-[r:MANAGED]-(c:Company {ticker:'COMS'})
WHERE r.to Is null
RETURN c,d,c2
Unless the response from the Neo4j browser was edited, it looks like the value of r.to is not null or empty, but the string None.
This query will help verify if this is the case:
MATCH (d:Director)-[r:MANAGED]-(c:Company {ticker:'COMS'})
RETURN DISTINCT r.to ORDER by r.to DESC
Absence of the property will show a null in the tabular response. Any other value is a real value of that property. If None shows up, then your query would be
MATCH (c2:Company)-[r2:MANAGED]-(d:Director)-[r:MANAGED]-(c:Company {ticker:'COMS'})
WHERE r.to="None"
RETURN c,d,c2

Auto increment id Neo4j to retrieve elements in insert order

Recently, I am experimenting Neo4j. I like the idea but I am facing a problem that I have never faced with relational databases.
I want to perform these inserts and then return them exactly in the insertion order.
Insert elements:
create(p1:Person {name:"Marc"})
create(p2:Person {name:"John"})
create(p3:Person {name:"Paul"})
create(p4:Person {name:"Steve"})
create(p5:Person {name:"Andrew"})
create(p6:Person {name:"Alice"})
create(p7:Person {name:"Bob"})
While to return them:
match(p:Person) return p order by id(p)
I receive the elements in the following order:
Paul
Andrew
Marc
John
Steve
Alice
Bob
I note that these elements are not returned respecting the query insertion order (through the id function).
In fact the id of my elements are the following:
Marc: 18221
John: 18222
Paul: 18208
Steve: 18223
Andrew: 18209
Alice: 18224
Bob: 18225
How does the Neo4j id function work? I read that it generates an auto incremental id but it seems a little strange his mechanism. How do I return items respecting the query insertion order? I thought about creating a timestamp attribute for each node but I don't think it's the best choice
If you're looking to generate sequence numbers in Neo4j then you need to manage this yourself using a strategy that works best in your application.
In ours we maintain sequence numbers in key/value pair nodes where Scope is the application name given to the sequence number range, and Value is the last sequence number used. When we generate a node of a given type, such as Product, then we increment the sequence number and assign it to our new node.
MERGE (n:Sequence {Scope: 'Product'})
SET n.Value = COALESCE(n.Value, 0) + 1
WITH n.Value AS seq
CREATE (product:Product)
SET product.UniqueId = seq
With this you can create as many sequence numbers you need just by creating sequence nodes with unique scope names.
For more examples and tests see the AutoInc.Neo4j project https://github.com/neildobson-au/AutoInc/blob/master/src/AutoInc.Neo4j/Neo4jUniqueIdGenerator.cs
The id of Neo4j is maintained internally, which your business code should not depend on.
Generally it's auto incrementally, but if there is delete operation, you may reuse the deleted id according to the Reuse Policy of Neo4j Server.

NEO4j 3.0 retrieve data between certain period

I'm using NEO4J 3.0 and it seems that HAS function was removed.
Type of myrelationship is a date and I'm looking to retrieve all relation between two dates such as my property "a" is greater than certain value.
How can I test this using NEO4j
Thank you
[EDITED to add info from comments]
I have tried this:
MATCH p=(n:origin)-[r]->()
WHERE r>'2015-01'
RETURN AVG(r.amount) as totalamout;
I created relationship per date and each one has a property, amount, and I am looking to compute the average amount for certain period. As example, average amount since 2015-04.
To answer the issue raised by your first sentence: in neo4j 3.x, the HAS() function was replaced by EXISTS().
[UPDATE 1]
This version of your query should work:
MATCH p=(n:origin)-[r]->()
WHERE TYPE(r) > '2015-01'
RETURN AVG(r.amount) as totalamout;
However, it is a bad idea to give your relationships different types based on a date. It is better to just use a date property.
[UPDATE 2]
If you changed your data model to add a date property to your relationships (to which I will give the type FOO), then the following query will find the average amount, per p, of all the relationships whose date is after 2015-01 (assuming that all your dates follow the same strict YYYY-MM pattern):
MATCH p=(n:origin)-[r:FOO]->()
WHERE r.date > '2015-01'
RETURN p, AVG(r.amount) as avg_amout;

Ascending sort order Index versus descending sort order index when performing OrderBy

I am working on an asp.net mvc web application, and I am using Sql server 2008 R2 + Entity framework.
Now on the sql server I have added a unique index on any column that might be ordered by . for example I have created a unique index on the Sql server on the Tag colum and I have defined that the sort order for the index to be Ascending. Now I have some queries inside my application that order the tag ascending while other queries order the Tag descending, as follow:-
LatestTechnology = tms.Technologies.Where(a=> !a.IsDeleted && a.IsCompleted).OrderByDescending(a => a.Tag).Take(pagesize).ToList(),;
TechnologyList = tms.Technologies.Where(a=> !a.IsDeleted && a.IsCompleted).OrderBy (a => a.Tag).Take(pagesize).ToList();
So my question is whether the two OrderByDescending(a => a.Tag). & OrderBy(a => a.Tag), can benefit from the asending unique index on the sql server on the Tag colum ? or I should define two unique indexes on the sql server one with ascending sort order while the other index with decedning sort order ?
THanks
EDIT
the following query :-
LatestTechnology = tms.Technologies.Where(a=> !a.IsDeleted && a.IsCompleted).OrderByDescending(a => a.Tag).Take(pagesize).ToList();
will generate the following sql statement as mentioned by the sql server profiler :-
SELECT TOP (15)
[Extent1].[TechnologyID] AS [TechnologyID],
[Extent1].[Tag] AS [Tag],
[Extent1].[IsDeleted] AS [IsDeleted],
[Extent1].[timestamp] AS [timestamp],
[Extent1].[TypeID] AS [TypeID],
[Extent1].[StartDate] AS [StartDate],
[Extent1].[IT360ID] AS [IT360ID],
[Extent1].[IsCompleted] AS [IsCompleted]
FROM [dbo].[Technology] AS [Extent1]
WHERE ([Extent1].[IsDeleted] <> cast(1 as bit)) AND ([Extent1].[IsCompleted] = 1)
ORDER BY [Extent1].[Tag] DESC
To answer your question:
So my question is whether the two OrderByDescending(a => a.Tag). &
OrderBy(a => a.Tag), can benefit from the asending unique index on the
sql server on the Tag colum ?
Yes, SQL Server can read an index in both directions: as in index definition or in the exact opposite direction.
However, from your intro I suspect that you still have a wrong impression how indexing works for order by. If you have both, a where clause and an order by clause, you must make sure to have a single index that covers both clauses! It does not help to have on index for the where clause (like on isDeleted and isCompleted — whatever that is in your example) and another index on tag. You need to have a single index that first has the columns of the where clause followed by the columns of the order by clause (multi-column index).
It can be tricky to make it work correctly, but it's worth the effort especially if your are only fetching the first few rows (like in your example).
If it doesn't work out right away, please have a look at this:
http://use-the-index-luke.com/sql/sorting-grouping/indexed-order-by
It is generally best to show the actual SQL query—not the .NET source code—when asking for performance advice. Then I could tell you which index to create exactly. At the moment I'm unsure about isDeleted and isCompleted — are these table columns or expressions that evaluate upon other columns?
EDIT (after you added the SQL query)
There are two ways to make your query work as indexed top-n query:
http://sqlfiddle.com/#!6/260fb/4
The first option is a regular index on the columns from the where clause followed by those from the order by clause. However, as you query uses this filter IsDeleted <> cast(1 as bit) it cannot use the index in a order-preserving way. If, however, you re-phrase the query so that it reads like this IsDeleted = cast(0 as bit) then it works. Please look at the fiddle, I've prepared everything there. Yes, SQL Server could be smart enough to know that, but it seems like it isn't.
I don't know how to tweak EF to produce the query in the above described way, sorry.
However, there is a second option using a so called filtered index — that is an index that only contains a sub-set of the table rows. It's also in the SQL Fiddle. Here it is important that you add the where clause to the index definition in the very same way as it appears in your query.
In both ways it still works if you change DESC to ASC.
The important part is that the execution plan doesn't show a sort operation. You can also verify this in SQL Fiddle (click on 'View execution plan').

Resources