I have a CSV consisting of Places of Interest in one table and Other table consists of office locations of a company. Both the tables consist of Latitude and Longitude information as well.
Structure of CSV file consisting of Places of Interest
POI_Name Longitude Latitude City
POI_1 77.573957 12.970125 Bangalore
POI_2 77.579886 13.009582 Bangalore
POI_3 77.546688 13.023931 Chennai
Similarly we have a CSV file with office locations of a company
Office Longitude Latitude City
Office_1 78.324445 12.970125 Bangalore
Office_2 77.254555 13.234444 Chennai
Office_3 76.098438 14.135567 Bangalore
Both tables consists of thousands of records. Now I want to create a query in Neo4J that will give me top 5 Nearest place of interest to the office location (Passed as a parameter in the query) in the decreasing order at the run time.
As the distance bewteen two nodes every time will be the same, I recommend you parse all places vs all offices and create a "distance" relation with a value of distance in km or whatever between the nodes(placei,officej), then you only query the first n nearest nodes from another node (then you can run a query like: MATCH (p:Place)-[d:DISTANCE]->(o:Office{myoffice}) RETURN p,o,d ORDER BY d.km ASC LIMIT 5). You will save time and computational cost.
Anyway you can use something like the next query:
MATCH (p:Place), (o:Office {id:myoffice}) RETURN distance(point({latitude: p.latitude, longitude: p.longitude}), point({latitude: o.latitude, longitude: o.longitude})) / 1000.0 as km, p, o ORDER BY km ASC LIMIT 5
Related
I have two tables i.e.
1) Places data - 2.4 Million records
2) Office data - 40 thousand records
I have a Neo4J query that takes 3 inputs from the users through a UI and outputs the results after calculating distance between them using Latitude/Longitude information at the run time. I want to calculate the distance in the run-time only
Below is the query:-
MATCH (c:places), (c2:office)
WHERE c2.office_id = {office}
AND c2.city = {city}
AND c.category = {category}
RETURN c.places_id as place_name, c.category as Category,
c.sub_category as Sub_Category, distance(c.location, c2.location)
as Distance_in_meters order by distance(c.location, c2.location) LIMIT 50
Above query taken some 10-15 seconds to output the results on the UI, which is a bit annoying. Can you please help to optimize the performance ?
You can try the next query:
MATCH (c:places), (c2:office {office_id: YourOffice, city: YourCity, category: YourCategory}) RETURN c.places_id as place_name, c.category as Category,
c.sub_category as Sub_Category, distance(c.location, c2.location)
as Distance_in_meters ORDER BY Distance_in_meters ASC/DESC LIMIT 50
And decide how order the results: ASC or DESC
I have a Neo4j query with searched multiple entities and I would like to pass parameters in batch using nodes object. However, I the speed of query execution is not quite high. How can I optimize this query and make its performance better?
WITH $nodes as nodes
UNWIND nodes AS node
with node.id AS id, node.lon AS lon, node.lat AS lat
MATCH
(m:Member)-[mtg_r:MT_TO_MEMBER]->(mt:MemberTopics)-[mtt_r:MT_TO_TOPIC]->(t:Topic),
(t1:Topic)-[tt_r:GT_TO_TOPIC]->(gt:GroupTopics)-[tg_r:GT_TO_GROUP]->(g:Group)-[h_r:HAS]->
(e:Event)-[a_r:AT]->(v:Venue)
WHERE mt.topic_id = gt.topic_id AND
distance(point({ longitude: lon, latitude: lat}),point({ longitude: v.lon, latitude: v.lat })) < 4000 AND
mt.member_id = id
RETURN
distinct id as member_id,
lat as member_lat,
lon as member_lon,
g.group_name as group_name,
e.event_name as event_name,
v.venue_name as venue_name,
v.lat as venue_lat,
v.lon as venue_lon,
distance(point({ longitude: lon,
latitude: lat}),point({ longitude: v.lon, latitude: v.lat })) as distance
Query profiling looks like this:
So, your current plan has 3 parallel threads. One we can ignore for now because it has 0db hits.
The biggest hit you are taking is the match for (mt:MemberTopics) ... WHERE mt.member_id = id. I'm guessing member_id is a unique id, so you will want to create an index on it CREATE INDEX ON :MemberTopics(member_id). That will allow Cypher to do an index lookup instead of a node scan, which will reduce the DB hits from ~30mill to ~1 (Also, in some cases, in-lining property matches is faster for more complex queries. So (mt:MemberTopics {member_id:id}) is better. It explicitly makes clear that this condition must always be true while matching, and will reinforce to use the index lookup)
The second biggest hit is the point-distance check. Right now, this is being done independently, because the node scan takes so long. Once you make the changes for MemberTopic, The planner should switch to finding all connected Venues, and then only doing the distance check on thous, so that should become cheaper as well.
Also, it looks like mt and gt are linked by a topic, and you are using a topic id to align them. If t and t1 are suppose to be the same Topic node, you could just use t for both nodes to enforce that, and then you don't need to do the id check to link mt and gt. If t and t1 are not the same node, the use of a foriegn key in your node's properties is a sign that you should have a relationship between the two nodes, and just travel along that edge (Relationships can have properties too, but the context looks a lot like t and t1 are suppose to be the same node. You can also enforce this by saying WHERE t = t1, but at that point, you should just use t for both nodes)
Lastly, Depending on the number of rows your query returns, you may want to use LIMIT and SKIP to page your results. This looks like info going to a user, and I doubt they need the full dump. So Only return the top results, and only process the rest if the user wants to see more. (Useful as results approach a metric ton) Since you only have 21 results so far, this won't be an issue right now, but keep in mind as you need to scale to 100,000+ results.
I have about 200,000 rows of 24 hour data as follows:
I can use the query to create a room node with time, roomtemp, and set temp as properties. Moreover, I can also, define the relationship of each room with its corresponding temperatures.
Now, I need to find:
all rows that show an update/increase/decrease from initial temperature till set temperature for all rooms. e.g. based on above data, I need:
Here I have discarded 5th row data as 16 was repetitive and showed no update(increase or decrease) in temp value. The temperature values continued till it reached set temperature '18'.
I can manually create the temperature states by giving its values one by one, but I am unsure how to MERGE the above requirement into the graph using Cypher.
Can I utilize any other programming language to obtain same results using Neo4j in conjunction?
Do I have to utilize in-graph time-tree for this scenario? Can I retrieve my results without creating a time tree?
Filter temparature by room and date (which can also be a date-node)
Sort by time
Collect into a list
Filter by differences in two subsequent temperatues
Turn list into rows
Here is a query that does this:
MATCH (r:Room)<-[:TEMP]-(t:Temparature)
WHERE t.time STARTS WITH "2016-01-01"
AND t.temp < room.temp ADN t.temp > {initial}
WITH t ORDER by t.time ASC
WITH collect(t) temps
WITH [idx in range(0,size(temps)-2) WHERE temps[idx].temp <> temps[idx+1].temp | temps[idx] ] as filtered
UNWIND filtered as t
RETURN t;
Consider a scenario, where user wants deals which are sorted by nearest distance from his current location.
I have defined 2 documents, one is locations and other is deals. Deals can have many locations, locations have a 2dsphere index, and deals have array of location ids.
Now, I am getting locations in order of nearest to farthest from a specified point.
db.locations.aggregate([ { "$geoNear": { near: { coordinates: [73, 18]}, maxDistance: 1, distanceField : "distance", spherical: true } } ])
The real problem I'm facing is to get the deals in the order of locations that I've retrived.
Is there a way by which I can get deals based on the order of location ids given ?
I am using html5 geolocation api to get my position in latitude and longitude. I want to store them in a table of locations and want to retrieve those locations within a particular distance.
my current latitudes and longitudes are stored in variables "latval", "longval", "distance"
My table is "location"
columns are "location", "lat", "long"
I am using DB2 Express C as database and latitude and longitude columns are set as double type now. What type should I use to store these values and what would be the query to get location names within a distance
Thank you.
It looks like there's an extension for Express C that includes Spatial processing. I've never used it (and can't seem to get access at the moment), so I can't speak to it. I'm assuming that you'd want to use that (find all locations within a radius is a pretty standard query).
If for some reason you can't use the extension, here's what I would do:
Keep your table as-is, or maybe use a float data-type, although please use full attribute names (there's no reason to truncate them). For simple needs, the name of the 'location' can be stored in the table, but you may want to give it a numeric id if more than one thing is at the same location (so actual points are only in there once).
You're also going to want indicies covering latitude and longitude (probably one each way, or one covering each column).
Then, given a starting position and distance, use this query:
SELECT name, latitude, longitude
FROM location
WHERE (latitude >= :currentLatitude - :distance
AND latitude <= :currentLatitude + :distance)
AND (longitude >= :currentLongitude - :distance
AND longitude <= :currentLongitude + :distance)
-- The previous four lines reduce the points selected to a box.
-- This is, or course, not completely correct, but should allow
-- the query to use the indicies to greatly reduce the initial
-- set of points evaluated.
-- You may wish to flip the condition and use ABS(), but
-- I don't think that would use the index...
AND POWER(latitude - :currentLatitude, 2) + POWER(longitude - :currentLongitude, 2)
<= POWER (:distance, 2)
-- This uses the pythagorean theorem to find all points within the specified
-- distance. This works best if the points have been pre-limited in some
-- way, because an index would almost certainly not be used otherwise.
-- Note that, on a spherical surface, this isn't completely accurate
-- - namely, distances between longitude points get shorter the farther
-- from the equator the latitude is -
-- but for your purposes is likely to be fine.
EDIT:
Found this after searching for 2 seconds on google, which also reminded me that :distance would be in the wrong units. The revised query is:
WITH Nearby (name, latitude, longitude, dist) as (
SELECT name, latitdude, longitude,
ACOS(SIN(RADIANS(latitude)) * SIN(RADIANS(:currentLatitude)) +
COS(RADIANS(latitude)) * COS(RADIANS(:currentLatitude)) *
COS(RADIANS(:currentLongitude - longitude))) * :RADIUS_OF_EARTH as dist
FROM location
WHERE (latitude >= :currentLatitude - DEGREES(:distance / :RADIUS_OF_EARTH)
AND latitude <= :currentLatitude + DEGREES(:distance / :RADIUS_OF_EARTH))
AND (longitude >= :currentLongitude -
DEGREES(:distance / :RADIUS_OF_EARTH / COS(RADIANS(:currentLatitude)))
AND longitude <= :currentLongitude +
DEGREES(:distance / :RADIUS_OF_EARTH / COS(RADIANS(:currentLatitude))))
)
SELECT *
FROM Nearby
WHERE dist <= :distance
Please note that wrapping the distance function in a UDF marked DETERMINISTIC would allow it to be placed in both the SELECT and HAVING portions, but only actually be called once, eliminating the need for the CTE.