I am attempting to query the db for the 10 most recently created nodes. I have attempt
MATCH (a:Post) RETURN a ORDER BY TIMESTAMP() LIMIT 10
I have also tried this
MATCH (a:Post) RETURN a ORDER BY TIMESTAMP() DESC LIMIT 10
If I create nodes with contents {one, two, three} in that order, both queries produce the nodes in the order one, two, three. Any thoughts or ideas as to why this happens??
TIMESTAMP() is a scalar function to mean the exact time at query execution. It does not have anything to do with the time the nodes or relationships were created.
That is why you get the exact same results for both queries. You're simply ordering by the current time, which doesn't make a lot of sense because the current time is exactly the same for all records.
Neo4j does not store any creation timestamps by default. You need to store them as an additional property, if it's important to you. This is where you should use the scalar function.
CREATE (:Post {created_at: TIMESTAMP()})
Once that is done, match and order like this.
MATCH (a:Post) RETURN a ORDER BY a.created_at LIMIT 10
Note that you're ordering by the created_at property, and not the TIMESTAMP() scalar function.
Related
match(m:master_node:Application)-[r]-(k:master_node:Server)-[r1]-(n:master_node)
where (m.name contains '' and (n:master_node:DeploymentUnit or n:master_node:Schema))
return distinct m.name,n.name
Hi,I am trying to get total number of records for the above query.How I change the query using count function to get the record count directly.
Thanks in advance
The following query uses the aggregating funtion COUNT. Distinct pairs of m.name, n.name values are used as the "grouping keys".
MATCH (m:master_node:Application)--(:master_node:Server)--(n:master_node)
WHERE EXISTS(m.name) AND (n:DeploymentUnit OR n:Schema)
RETURN m.name, n.name, COUNT(*) AS cnt
I assume that m.name contains '' in your query was an attempt to test for the existence of m.name. This query uses the EXISTS() function to test that more efficiently.
[UPDATE]
To determine the number of distinct n and m pairs in the DB (instead of the number of times each pair appears in the DB):
MATCH (m:master_node:Application)--(:master_node:Server)--(n:master_node)
WHERE EXISTS(m.name) AND (n:DeploymentUnit OR n:Schema)
WITH DISTINCT m.name AS n1, n.name AS n2
RETURN COUNT(*) AS cnt
Some things to consider for speeding up the query even further:
Remove unnecessary label tests from the MATCH pattern. For example, can we omit the master_node label test from any nodes? In fact, can we omit all label testing for any nodes without affecting the validity of the result? (You will likely need a label on at least one node, though, to avoid scanning all nodes when kicking off the query.)
Can you add a direction to each relationship (to avoid having to traverse relationships in both directions)?
Specify the relationship types in the MATCH pattern. This will filter out unwanted paths earlier. Once you do so, you may also be able to remove some node labels from the pattern as long as you can still get the same result.
Use the PROFILE clause to evaluate the number of DB hits needed by different Cypher queries.
You can find examples of how to use count in the Neo4j docs here
In your case the first example where:
count(*)
Is used to return a count of each returned item should work.
I have a query that I am trying to execute. The query works, but there isn't an option to see this data in graph format. Instead the data is returned in table/text format.
When I simplify the query, the output is displayed in graph format - No idea why,
This is the query that is giving me the issue:
MATCH (p:Person)-[hi:hasIdentity]->(i:Identity)
MATCH (j:Person)-[hi2:hasIdentity]->(i2:Identity)
MATCH (i)-[bl:Linked]->(i2)
WHERE NOT p=j
return DISTINCT(p.id), COUNT(DISTINCT(j))
LIMIT 5
Does anyone have any idea why that might be the case?
You'll need to return variables associated with nodes and/or relationships for it to display as a graph. As it is now you're returning properties of nodes (p.id), probably integers or strings. Try this return instead:
...
RETURN p, COUNT(DISTINCT j)
LIMIT 5
By the way, DISTINCT isn't a function, no need for parenthesis, and when you have a RETURN or WITH that has an aggregation, you don't need to use DISTINCT for that line since the non-aggregation variables become distinct since they act as the grouping key for the aggregation.
I have some sample tweets stored as neo4j. Below query finds top hashtags from specific country. It is taking a lot of time because the time filter for status type nodes is in where clause and is slowing the response. Is it possible to move this filter to MATCH clause so that status nodes are filtered before relationships are found?
match (c:country{countryCode:"PK"})-[*0..4]->(s:status)-[*0..1]->(h:hashtag) where (s.createdAt >= datetime('2017-06-01T00:00:00') AND s.createdAt
>= datetime('2017-06-01T23:59:59')) return h.name,count(h.name) as hCount order by hCount desc limit 100
thanks
As mentioned in my comment, whether a predicate for a property is in the MATCH clause or the WHERE clause shouldn't matter, as this is just syntactical sugar and is interpreted the same way by the query planner.
You can use PROFILE or EXPLAIN to see the query plan to see what it's doing. PROFILE will give you more information but will have to actually execute the query. You can attempt to use planner hints to force the planner to plan the match differently which may yield a better approach.
You will want to ensure you have an index on :status(createdAt).
You can also try altering your match a little, and moving the portion connecting to the country in question into your WHERE clause instead. Also it's a good idea to get the count based upon the hashtag node itself (assuming there's only one :hashtag node for a given name) so you can order and limit before you do property access:
MATCH (s:status)-[*0..1]->(h:hashtag)
WHERE (s.createdAt >= datetime('2017-06-01T00:00:00') AND s.createdAt
>= datetime('2017-06-01T23:59:59'))
AND (:country{countryCode:"PK"})-[*0..4]->(s)
WITH h, count(h) as hCount
ORDER BY hCount DESC
LIMIT 100
RETURN h.name, hCount
Is there a way on Neo4j to fetch a list of all the new nodes created after a certain time? like a built in change-feed?
I know this could be done by traversing the entire graph and comparing if a node's date is > than the treshold set before.
However, this is not optimal at the very least and would not perform well on a 10 million node graph.
Is there a way to know if new nodes were added? (or relationships) some sort of change feed like a built in bloom filter?
If not, any ideas on getting a change feed every x minutes?
Have you tried an INDEX? With an index your query performance will be improved. Try creating an index in the property related to the creation time of the nodes.
CREATE INDEX ON :Person(created_at)
After, when creating a node, you can use the timestamp() function and save the current timestamp in the property created_at of :Person nodes.
CREATE (:Person {name:'Jon', created_at: timestamp()})
CREATE (:Person {name:'Doe', created_at: timestamp()})
Then you can query normally by the created_at property of :Person nodes and the index will be used.
MATCH (p:Person)
WHERE p.created_at > 1502882338889 // given a timestamp...
RETURN p
Also, if you don't need all the nodes modified after a given timestamp at same time, you can make a pagination in the query and work with pieces of the entire data using SKIP and LIMIT.
MATCH (p:Person)
WHERE p.created_at > 1502882338889 // given a timestamp...
RETURN p
ORDER BY p.created_at
SKIP 1
LIMIT 2
So i have over 130M nodes of one type and 500K nodes of another type, i am trying to create relationships between them as follows:
MATCH (p:person)
MATCH (f:food) WHERE f.name=p.likes
CREATE (p)-[l:likes]->(f)
The problem is there are 130M relationships created and i would like to do it in a similar fashion to PERIODIC COMMIT when using LOAD CSV
Is there such a functionality for my type of query?
Yes, there is. You'll need the APOC Procedures library installed (download here). You'll be using the apoc.periodic.commit() function in the Job Management section. From the documentation:
CALL apoc.periodic.commit(statement, params) - repeats a batch update
statement until it returns 0, this procedure is blocking
You'll be using this in combination with the LIMIT clause, passing the limit value as the params.
However, for best results, you'll want to make sure your join data (f.name, I think) has an index or a unique constraint to massively cut down on the time.
Here's how you might use it (assuming from your example that a person only likes one food, and that we should only apply this to :persons that don't already have the relationship set):
CALL apoc.periodic.commit("
MATCH (p:person)
WHERE p.likes IS NOT NULL
AND NOT (p)-[:likes]->(:food)
WITH p LIMIT {limit}
MATCH (f:food) WHERE p.likes = f.name
CREATE (p)-[:likes]->(f)
RETURN count(*)
", {limit: 10000})