Neo4j generate heatmap data from X and Y coordinates - neo4j

I need to generate heatmap data in CSV format, something like this:
X,Y,OCCURRENCES
269,697,41
199,493,8
125,318,2
205,526,24
261,572,2
My neo4j database has an entity called "Point" that contains a date, an X and a Y coordinate and it looks like this:
Point: {
"at": "2018-06-26T06:54:42.671141000+12:00"
"locationPlanX": 367,
"locationPlanY": 716
}
I have a query that gives the desired output, it works well with a few thousands of points but it starts to struggle with millions.
Query:
MATCH (point:Point)
WHERE datetime("2018-06-22T15:00:00.000000+12:00") <= point.at < datetime("2018-06-23T16:00:00.000000+12:00")
AND point.locationPlanX >= 0
AND point.locationPlanY >= 0
WITH point.locationPlanX as x, point.locationPlanY as y, COUNT(point) AS occurrences
RETURN x, y, occurrences
As I said before, the query works well for an hour of data, but it starts to struggle with days/weeks.
Is there any other thing I can do to improve my query? Or any other way to do it?
UPDATE: The 3 properties in the node are indexed.

You should create an index on :Point(at):
CREATE INDEX ON :Point(at);
That would allow your query to avoid scanning through every Point node to find the ones with acceptable at values. This should greatly speed up your query.
Also, if it is not necessary to test locationPlanX and locationPlanY for non-negativity, eliminate those tests.

Related

how to select SpatRaster layers from their names?

I've got a SpatRaster of (150 x 150 x 1377) that shows temporal evolution of precipitations. Each layer is a given hour in a 2-month interval, but some hours are missing, and the dataset isn't continuous. The layers names are strings as "YYYYMMDDhhmm".
I need to find the mean value every three hours even on whole intervals or on missing-data intervals. On entire ones I want to average three data and on missing-data ones I would like to average two of them or, if two are missing, to select the unique value as the averaged one.
How can I use data names to select how to act?
I've already tried this code but I'm averaging on three continuous layers by index and not by hours. How can I convert names in DateTime form from "tidyverse" in order to use rollapply() to see if two steps back I find the DateTime I am expecting? Is there any other method to check this out?
HSAF=rast(c((paste0(resfolder, "HSAF_final1_5.tif")),(paste0(resfolder, "HSAF_final6_10.tif")),(paste0(resfolder, "HSAF_final11_15.tif")),
(paste0(resfolder, "HSAF_final16_20.tif")),(paste0(resfolder, "HSAF_final21_25.tif")),(paste0(resfolder, "HSAF_final26_30.tif")),
(paste0(resfolder, "HSAF_final31_N04.tif")),(paste0(resfolder, "HSAF_finalN05_N08.tif")),(paste0(resfolder, "HSAF_finalN09_N13.tif")),
(paste0(resfolder, "HSAF_finalN14_N18.tif")),(paste0(resfolder, "HSAF_finalN19_N23.tif")),(paste0(resfolder, "HSAF_finalN24_N28.tif")),
(paste0(resfolder, "HSAF_finalN29_N30.tif"))))
index=names(HSAF)
j=2
for (i in seq(1,3, by=3))
{third_el<- HSAF[index[i+j]]
second_el <- HSAF[index[i+j-1]]
first_el<- HSAF[index[i+j-2]]
newraster<- c(first_el, second_el, third_el)
newraster<- mean(newraster, filename=paste0(tempfile(), ".tif"))
names(newraster)<- paste0(index[i+j-2],index[i+j-1],index[i+j])
}
for (i in seq(4,1374 , by=3))
{ third_el<- HSAF[index[i+j]]
second_el <- HSAF[index[i+j-1]]
first_el<- HSAF[index[i+j-2]]
subraster<- c(first_el, second_el, third_el)
subraster<- mean(subraster, filename=paste0(tempfile(), ".tif"))
names(subraster)<- paste0(index[i+j-2],index[i+j-1],index[i+j])
add(newraster)<- subraster
}

Find nodes with 3+ occurrences in a 10 minute period

I have a list of nodes with a startTime property. I need to determine if the list contains a clump of 3 or more nodes with a startTime within 10 minutes of each other. I don't need to get the nodes that are in the clump, I just need a boolean indicating the existence of such a clump.
I am at a loss, everything I have tried fails so badly that it is not worth posting them.
I feel that I am missing something easy.
This should be doable.
First you'll need to collect the startTimes, order them, and collect them.
From there, you'll need to get the relevant pairings (each entry, and the entry 2 indices ahead for the end of the duration) that will comprise a group of 3, then see if the start times of that pair occur within 10 minutes of each other.
Assuming for the sake of example :Event nodes with a startTime property, you might use this query to get the results you want:
MATCH (e:Event)
WITH e
ORDER BY e.startTime ASC
WITH collect(e.startTime)[1..] as times
WITH times, range(0, size(times) - 3) as indices
RETURN any(index in indices WHERE times[index + 2] <= times[index] + duration({minutes:10}))

How to find nodes being contained in a node's properties interval?

I'm currently developing some kind of a configurator using neo4j as a backend. Now I ran into a problem, I don't know how to solve best.
I've got nodes created like this:
(A:Product {name:'ProductA', minWidth:20, maxWidth:200, minHeight:10, maxHeight:400})
(B:Product {name:'ProductB', minWidth:40, maxWidth:100, minHeight:20, maxHeight:300})
...
There is an interface where the user can input a desired width & height, f.e. Width=30, Height=250. Now I'd like to check which products match the input criteria. As the input might be any long value, the approach used in http://neo4j.com/blog/modeling-a-multilevel-index-in-neoj4/ with dates doesn't seem to be suitable for me. How can I run a cypher query giving me all the nodes matching the input criteria?
I don't know if I understand well what you are asking for, but if I do, here a simple query to get this:
Assuming the user wants width = 30 and height = 50
Match (p:Product)
WHERE
p.minWidth < 30 AND p.maxWidth > 30 AND
p.minHeight < 50 AND p.maxHeight > 50
RETURN
p
If this is not what you are looking for, feel free to say it as comment.

how to get a random set of records from an index with cypher query

what's the syntax to get random records from a specific node_auto_index using cypher?
I suppose there is this example
START x=node:node_auto_index("uname:*") RETURN x SKIP somerandomNumber LIMIT 10;
Is there a better way that won't return a contiguous set?
there is no feature similar to SQL's Random() in neo4j.
you must either declare the random number in the SKIP random section before you use cypher (in case you are not querying directly from console and you use any upper language with neo4j)
- this will give a random section of nodes continuously in a row
or you must retrieve all the nodes and than make your own random in your upper language across these nodes - this will give you a random set of ndoes.
or, to make a pseudorandom function in cypher, we can try smthing like this:
START x=node:node_auto_index("uname:*")
WITH x, length(x.uname) as len
WHERE Id(x)+len % 3 = 0
RETURN x LIMIT 10
or make a sophisticated WHERE part in this query based upon the total number of uname nodes, or the ordinary ascii value of uname param, for example

searching for closest number in a rails app

Give two parameters which correspond to two attributes on an object how can one find 20 records in a database that are closest to those two numbers.
The parameters you have are x, and y. The object also has those attributes. For example. x = 1, and y = 9999. You need to find the record that is the closest to x and y.
That depends on how you define the distance between two points. If you are using a two-dimensional cartesian coordinate system, this SQL statement will work:
SELECT id, x, y FROM points ORDER BY SQRT(POWER((X-x),2)+POWER((Y-y),2)) ASC LIMIT 20;
Where X,Y are the inputs.
It sounds like you're using geolocated data. If your database backend is Postgres, check to see if you have or can install the PostGIS extensions. This gives you very fast tools which give you searches like 'search for the nearest thing to this point', 'search for everything within this circle', 'search for everything within this square', and so on.
http://postgis.refractions.net/
You would do something like this:
CREATE INDEX [indexname] ON [tablename] USING GIST ( [geometrycolumn] gist_geometry_ops);
Then you can do something like this - find everything within 100 metres of a point:
SELECT * FROM GEOTABLE WHERE
GEOM && GeometryFromText(’BOX3D(900 900,1100 1100)’,-1) AND
Distance(GeometryFromText(’POINT(1000 1000)’,-1),GEOM) < 100;
Examples from the manual.

Resources