How to Optimize below Neo4J query - neo4j

I have two tables i.e.
1) Places data - 2.4 Million records
2) Office data - 40 thousand records
I have a Neo4J query that takes 3 inputs from the users through a UI and outputs the results after calculating distance between them using Latitude/Longitude information at the run time. I want to calculate the distance in the run-time only
Below is the query:-
MATCH (c:places), (c2:office)
WHERE c2.office_id = {office}
AND c2.city = {city}
AND c.category = {category}
RETURN c.places_id as place_name, c.category as Category,
c.sub_category as Sub_Category, distance(c.location, c2.location)
as Distance_in_meters order by distance(c.location, c2.location) LIMIT 50
Above query taken some 10-15 seconds to output the results on the UI, which is a bit annoying. Can you please help to optimize the performance ?

You can try the next query:
MATCH (c:places), (c2:office {office_id: YourOffice, city: YourCity, category: YourCategory}) RETURN c.places_id as place_name, c.category as Category,
c.sub_category as Sub_Category, distance(c.location, c2.location)
as Distance_in_meters ORDER BY Distance_in_meters ASC/DESC LIMIT 50
And decide how order the results: ASC or DESC

Related

Speed up Active Record group by count query

How can I speed up the following query? I'm look to find record with 6 or less unique values of fb_id. The select doesn't seem to be adding much in terms of time but instead it's the group and count. Is there an alternate way to query? I added an index on fb_id and it only sped up the query by 50%
FbGroupApplication.group(:fb_id).where.not(
fb_id: _get_exclude_fb_group_ids
).group(
"count_fb_id desc"
).count(
"fb_id"
).select{|k, v| v <= 6 }
The query is looking for FbGroupApplications that have 6 or less applications to the same fb_id
Passing a block to the select method made Rails trigger the SQL, convert the found rows into ActiveRecord::Base's ruby object (record), and then perform a select on the array based of the block you gave. This whole process is costly (ruby is not good at this).
You can "delegate" the responsibility of comparing the count vs 6 to the database with a having clause:
FbGroupApplication
.group(:fb_id)
.where.not(fb_id: _get_exclude_fb_group_ids)
.having('count(fb_id) <= 6')

Neo4j List Consecutive

I am currently working with a football match data set and trying to get Cypher to return the teams with the most consecutive wins.
At the moment I have a collect statement which creates a list i.e. [0,1,1,0,1,1,1] where '0' represents a loss and '1' represents a win. I am trying to return the team with the most consecutive wins.
Here is what my code looks like at the moment:
MATCH(t1:TEAM)-[p:PLAYS]->(t2:TEAM)
WITH [t1,t2] AS teams, p AS matches
ORDER BY matches.time ASC
UNWIND teams AS team
WITH team.name AS teamName, collect(case when ((team = startnode(matches)) AND (matches.score1 > matches.score2)) OR ((team = endnode(matches)) AND (matches.score2 > matches.score1)) then +1 else 0 end) AS consecutive_wins
RETURN teamName, consecutive_wins
This returns a list for each team showing their win / lose record in the form explained above (i.e. [0,1,0,1,1,0])
Any guidance or help in regards to calculating consecutive wins would be much appreciated.
Thanks
I answered a similar question here.
The key is using apoc.coll.split() from APOC Procedures, splitting on 0, which will yield a row per winning streak (list of consecutive 1's) as value. The size of each of the lists is the number of consecutive wins for that streak, so just get the max size:
// your query above
CALL apoc.coll.split(consecutive_wins, 0) YIELD value
WITH teamName, max(size(value)) as consecutiveWins
ORDER BY consecutiveWins DESC
LIMIT 1
RETURN teamName, consecutiveWins
Your use case does not actually require the detection of the most consecutive 1s (and it also does not need to use UNWIND).
The following query uses REDUCE to directly calculate the maximum number of consecutive wins for each team (consW keeps track of the current number of consecutive wins, and maxW is the maximum number of consecutive wins found thus far):
MATCH (team:TEAM)-[p:PLAYS]-(:TEAM)
WITH team, p
ORDER BY p.time ASC
WITH team,
REDUCE(s = {consW: 0, maxW: 0}, m IN COLLECT(p) |
CASE WHEN (team = startnode(m) AND (m.score1 > m.score2)) OR (team = endnode(m) AND (m.score2 > m.score1))
THEN {consW: s.consW+1, maxW: CASE WHEN s.consW+1 > s.maxW THEN s.consW+1 ELSE s.maxW END}
ELSE s
END).maxW AS most_consecutive_wins
RETURN team.name AS teamName, most_consecutive_wins;

Rails: how to calculate the average of a small set of elements

I've looked at resources to know how to find the average with RoR built-in average ActiveRecord::Calculations. I've also looked online for ideas on how to calculate averages: Rails calculate and display average.
But cannot find any reference on how to calculate the average of a set of elements from the database column.
In the controller:
#jobpostings = Jobposting.all
#medical = #jobpostings.where("title like ? OR title like ?", "%MEDICAL SPECIALIST%", "%MEDICAL EXAMINER%").limit(4).order('max_salary DESC')
#medical_salary = "%.2f" % #medical.average(:max_salary).truncate(2)
#medical returns:
CITY MEDICAL SPECIALIST
220480.0
CITY MEDICAL EXAMINER (OCME)
180000.0
CITY MEDICAL SPECIALIST
158080.0
CITY MEDICAL SPECIALIST
130000.0
I want to find the average of :max_salary (each one is listed correctly under the job title). I use #medical_salary = "%.2f" % #medical.average(:max_salary).truncate(2) to convert the BigDecimal number and find the average of :max_salary from #medical which I thought would be limited to the top 4 displayed above.
But the result returned is: 72322.33, which is the average of the entire column (I checked), instead of the top 4.
Do I need to add another condition? Why does the average of #medical return the average of the entire column?
Any insight would help. Thanks.
#medical.average(:max_salary) is expanded as #jobpostings.where(...).average(:max_salary).limit(4) even though the limit(4) appears previously in the method chain.
You can confirm this by checking the query that is run which be as follows:
SELECT AVG(`jobpostings `.`max_salary `) FROM `tickets` ... LIMIT 4`
Effectively, LIMIT 4 doesn't do anything because there is only one average number in the result of the above query.
One way to accomplish what you are trying to do will be:
$ #top_salaries = #jobpostings.where("title like ? OR title like ?", "%MEDICAL SPECIALIST%", "%MEDICAL EXAMINER%").limit(4).map(&:max_salary)
=> [ 220480.0, 180000.0, 158080.0, 130000.0]
$ average = #top_salaries.reduce(:+) / #top_salaries.size
=> 172140.0
#medical.average(:max_salary)
This line corresponds to following MySQL query:
SELECT AVG(`jobpostings`.`max_salary`) AS avg_max_salary FROM `jobpostings` WHERE (`jobpostings`.`title` like "%MEDICAL SPECIALIST%" OR title like "%MEDICAL EXAMINER%") LIMIT 4
Since MySQL already calculated AVG on the column max_salary, it returns only 1 row. LIMIT 4 clause doesn't actually come into play.
You can try following:
#limit = 4
#medical = #jobpostings.where("title like ? OR title like ?", "%MEDICAL SPECIALIST%", "%MEDICAL EXAMINER%").limit(#limit).order('max_salary DESC')
#medical_salary = "%.2f" %(#medical.pluck(:max_salary).reduce(&:+)/#limit.to_f).truncate(2)
It will even work for cases when no. of results are 0.

Selecting greatest date range count in a rails array

I have a database with a bunch of deviceapi entries, that have a start_date and end_date (datetime in the schema) . Typically these entries no more than 20 seconds long (end_date - start_date). I have the following setup:
data = Deviceapi.all.where("start_date > ?", DateTime.now - 2.weeks)
I need to get the hour within data that had the highest number of Deviceapi entries. To make it a bit clearer, this was my latest try on it (code is approximated, don't mind typos):
runningtotal = 0
(2.weeks / 1.hour).to_i.times do |interval|
current = data.select{ |d| d.start_time > (start_date + (1.hour * (interval - 1))) }.select{ |d| d.end_time < (start_date + (1.hour * interval)) }.count
if current > runningtotal
runningtotal = current
end
The problem: this code works just fine. So did about a dozen other incarnations of it, using .where, .select, SQL queries, etc. But it is too slow. Waaaaay too slow. Because it has to loop through every hour within 2 weeks. Then this method might need to be called itself dozens of times.
There has to be a faster way to do this, maybe a sort? I'm stumped, and I've been searching for hours with no luck. Any ideas?
To get adequate performance, you'll want to do everything in a single query, which will mean avoiding ActiveRecord functionality and doing a raw query (e.g. via ActiveRecord::Base.connection.execute).
I have no way to test it, since I have neither your data nor schema, but I think something along these lines will do what you are looking for:
select y.starting_hour, max(y.num_entries) as max_entries
from
(
select x.starting_hour, count(*) as num_entries
from
(
select date_trunc('hour', start_time) starting_hour
from deviceapi as d
) as x
group by x.starting_hour
) as y
where y.num_entries = max(y.num_entries);
The logic of this is as follows, from the inner-most query out:
"Bucket" each starting time to the hour
From the resulting table of buckets, get the total number of entries in each bucket
Get the maximum number of entries from that table, and then use that number to match back to get the starting_hour itself.
If there happen to be more than one bucket with the same number of entries, you could determine a consistent way to pick one -- say the min(starting_hour) or similar (since that would stay the same even as data gets added, assuming you are not deleting items).
If you wanted to limit the initial time slice -- I see 2 weeks referenced in your post -- you could do that in the inner-most query with a where clause bracketing the date range.

ActiveRecord: Alternative to find_in_batches?

I have a query that loads thousands of objects and I want to tame it by using find_in_batches:
Car.includes(:member).where(:engine => "123").find_in_batches(batch_size: 500) ...
According to the docs, I can't have a custom sorting order: http://www.rubydoc.info/docs/rails/4.0.0/ActiveRecord/Batches:find_in_batches
However, I need a custom sort order of created_at DESC. Is there another method to run this query in chunks like it does in find_in_batches so that not so many objects live on the heap at once?
Hm I've been thinking about a solution for this (I'm the person who asked the question). It makes sense that find_in_batches doesn't allow you to have a custom order because lets say you sort by created_at DESC and specify a batch_size of 500. The first loop goes from 1-500, the second loop goes from 501-1000, etc. What if before the 2nd loop occurs, someone inserts a new record into the table? That would be put onto the top of the query results and your results would be shifted 1 to the left and your 2nd loop would have a repeat.
You could argue though that created_at ASC would be safe then, but it's not guaranteed if your app specifies a created_at value.
UPDATE:
I wrote a gem for this problem: https://github.com/EdmundMai/batched_query
Since using it, the average memory of my application has HALVED. I highly suggest anyone having similar issues to check it out! And contribute if you want!
The slower manual way to do this, is to do something like this:
count = Cars.includes(:member).where(:engine => "123").count
count = count/500
count += 1 if count%500 > 0
last_id = 0
while count > 0
ids = Car.includes(:member).where("engine = "123" and id > ?", last_id).order(created_at: :desc).limit(500).ids #which plucks just the ids`
cars = Cars.find(ids)
#cars.each or #cars.update_all
#do your updating
last_id = ids.last
count -= 1
end
Can you imagine how find_in_batches with sorting will works on 1M rows or more? It will sort all rows every batch.
So, I think will be better to decrease number of sort calls. For example for batch size equal to 500 you can load IDs only (include sorting) for N * 500 rows and after it just load batch of objects by these IDs. So, such way should decrease have queries with sorting to DB in N times.

Resources