Optimizing ActiveRecord Point-in-Polygon Search

Optimizing ActiveRecord Point-in-Polygon Search - ruby-on-rails

The following PiP search was built for a project that lets users find their NYC governmental districts by address or lat/lng (http://staging.placeanddisplaced.org). It works, but its kinda slow, especially when searching through districts that have complex polygons. Can anyone give me some pointers on optimizing this code?
One thought I had was to run the point_in_polygon? method on a simplified version of each polygon, i.e. fewer coordinates. This would mean less processing time, but also decreased accuracy.. thoughts?
class DistrictPolygonsController < ApplicationController
def index
...
if coordinates?
#district_polygons = DistrictPolygon.
coordinates_within_bounding_box(params[:lat], params[:lng]).
find(:all, :include => :district, :select => select).
select { |dp| dp.contains_coordinates?(params[:lat], params[:lng]) }
else
#district_polygons = DistrictPolygon.find(:all, :include => :district, :select => select)
end
...
end
end
class DistrictPolygon < ActiveRecord::Base
named_scope :coordinates_within_bounding_box, lambda { |lat,lng| { :conditions => ['min_lat<? AND max_lat>? AND min_lng<? AND max_lng>?', lat.to_f, lat.to_f, lng.to_f, lng.to_f] } }
named_scope :with_district_type, lambda { |t| { :conditions => ['district_type=?', t] } }
before_save :get_bounding_box_from_geometry
def get_bounding_box_from_geometry
# return true unless self.new_record? || self.geometry_changed? || self.bounds_unknown?
self.min_lat = all_lats.min
self.max_lat = all_lats.max
self.min_lng = all_lngs.min
self.max_lng = all_lngs.max
true # object won't save without this return
end
def bounds_unknown?
%w(min_lat max_lat min_lng max_lng).any? {|b| self[b.to_sym].blank? }
end
def bounds_known?; !bounds_unknown?; end
# Returns an array of XML objects containing each polygons coordinates
def polygons
Nokogiri::XML(self.geometry).search("Polygon/outerBoundaryIs/LinearRing/coordinates")
end
def multi_geometric?
Nokogiri::XML(self.geometry).search("MultiGeometry").size > 0
end
# Returns an array of [lng,lat] arrays
def all_coordinates
pairs = []
polygons.map do |polygon|
polygon.content.split("\n").map do |coord|
# Get rid of third 'altitude' param from coordinate..
pair = coord.strip.split(",")[0..1].map(&:to_f)
# Don't let any nils, blanks, or zeros through..
pairs << pair unless pair.any? {|point| point.blank? || point.zero? }
end
end
pairs
end
# All latitudes, regardless of MultiPolygonal geometry
def all_lats
all_coordinates.map(&:last).reject{|lat| lat.blank? || lat.zero?}
end
# All longitudes, regardless of MultiPolygonal geometry
def all_lngs
all_coordinates.map(&:first).reject{|lng| lng.blank? || lng.zero?}
end
# Check to see if coordinates are in the rectangular bounds of this district
def contains_coordinates?(lat, lng)
return false unless coordinates_within_bounding_box?(lat.to_f, lng.to_f)
polygons.any? { |polygon| DistrictPolygon.point_in_polygon?(all_lats, all_lngs, lat.to_f, lng.to_f) }
end
def coordinates_within_bounding_box?(lat, lng)
return false if (max_lat > lat.to_f == min_lat > lat.to_f) # Not between lats
return false if (max_lng > lng.to_f == min_lng > lng.to_f) # Not between lngs
true
end
# This algorithm came from http://www.ecse.rpi.edu/Homepages/wrf/Research/Short_Notes/pnpoly.html
def self.point_in_polygon?(x_points, y_points, x_target, y_target)
num_points = x_points.size
j = num_points-1
c = false
for i in 0...num_points do
c = !c if ( ((y_points[i]>y_target) != (y_points[j]>y_target)) && (x_target < (x_points[j]-x_points[i]) * (y_target-y_points[i]) / (y_points[j]-y_points[i]) + x_points[i]) )
j = i
end
return c
end
end

If your runtime is longer for more complex shapes, it suggests the performance is in the O(n) loop in the point_in_polygon?
Does profiling back that assumption up?
If performance is critical, consider implementing the exact same algorithm as native code.

I suspect you may be able to push the majority of the work into the DB. PostgreSQL has the PostGIS plugin which enables spatially aware queries to be performed.
PostGIS: http://postgis.refractions.net/
Docs: http://postgis.refractions.net/documentation/manual-1.4/
This breaks the database portability concept, but might be worth it if performance is critical.

Algorithm aside, keeping the polygon data in local memory and recoding this in a statically typed compiled language will likely lead to 100x-1000x increase in speed.

Related

Clean up messy code that query's based on multiple options

I'm using Rails, but the underlying question here applies more broadly. I have a report page on my web app that allows the user to specify what they're filtering on, and query the database based on those filters (MongoDB).
The data is based around hotels, the user must first select the regions of the hotels (state_one, state_two, state_three), then the statuses of the hotels (planning, under_construction, operational), then an optional criteria, price range (200, 300, 400). Users can select multiple of each of these options.
My way of doing this currently is to create an empty array, iterate through each region, and push the region into the array if the user selected that region. Then, I'm iterating through THAT array, and assessing the status of the hotels in those regions, if any hotel has the status the user has selected, then I'm adding that hotel to a new empty array. Then I do the same thing for price range.
This works, but the code is offensively messy, here's an example of the code:
def find_hotel
hotels = find_all_hotels
first_array = []
hotels.each do |hotel|
if params[:options][:region].include? 'state_one' and hotel.state == :one
first_array.push(hotel)
elsif params[:options][:region].include? 'state_two' and hotel.state == :two
first_array.push(hotel)
elsif params[:options][:region].include? 'state_three' and hotel.state == :three
first_array.push(hotel)
end
end
second_array = []
first_array.each do |hotel|
if params[:options][:region].include? 'planning' and hotel.status == :planning
first_array.push(hotel)
elsif params[:options][:region].include? 'under_construction' and hotel.status == :under_construction
first_array.push(hotel)
elsif params[:options][:region].include? 'operational' and hotel.status == :operational
first_array.push(hotel)
end
end
third_array = []
second_array.each do |hotel|
# More of the same here, this could go on forever
end
end
What are some better ways of achieving this?

How about this:
STATES = [:one, :two, :three]
STATUSES = [:planning, :under_construction, :operational]
PRICES = [200, 300, 400]
def find_hotel
region = params[:options][:region]
first_array = set_array(region, find_all_hotels, STATES, :state)
second_array = set_array(region, first_array, STATUSES, :status)
third_array = set_array(region, second_array, PRICES, :price_range)
end
def set_array(region, array, options, attribute)
array.each_with_object([]) do |element, result|
options.each do |option|
result << element if region.include?(option) && element[attribute] == option
end
end
end
UPDATE
Added attribute parameter to set_array in order to make the code work with your updated example.

Since second_array is empty, whatever you get by iterating over it (perhaps third_array) would also be empty.
def find_hotel
hotels = find_all_hotels
first_array = hotels
.select{|hotel| params[:options][:region].include?("state_#{hotel.state}")}
first_array += first_array
.select{|hotel| params[:options][:region].include?(hotel.status.to_s)}
second_array = third_array = []
...
end

Too many checks for empty params. How to optimize queries to ActiveRecord in Rails5?

I'm doing checks for empty parameters before do the query.
There is only 1 check for params[:car_model_id]. I can imagine if I will add more checks for other params, then there will be a mess of if-else statements. It doesn't look nice and I think it can be optimized. But how? Here is the code of controller:
class CarsController < ApplicationController
def search
if params[:car_model_id].empty?
#cars = Car.where(
used: ActiveRecord::Type::Boolean.new.cast(params[:used]),
year: params[:year_from]..params[:year_to],
price: params[:price_from]..params[:price_to],
condition: params[:condition]
)
else
#cars = Car.where(
used: ActiveRecord::Type::Boolean.new.cast(params[:used]),
car_model_id: params[:car_model_id],
year: params[:year_from]..params[:year_to],
price: params[:price_from]..params[:price_to],
condition: params[:condition]
)
end
if #cars
render json: #cars
else
render json: #cars.errors, status: :unprocessable_entity
end
end
end

The trick would be to remove the blank values, do a little bit of pre-processing (and possibly validation) of the data, and then pass the params to the where clause.
To help with the processing of the date ranges, you can create a method that checks both dates are provided and are converted to a range:
def convert_to_range(start_date, end_date)
if start_date && end_date
price_from = Date.parse(price_from)
price_to = Date.parse(price_to)
price_from..price_to
end
rescue ArgumentError => e
# If you're code reaches here then the user has invalid date and you
# need to work out how to handle this.
end
Then your controller action could look something like this:
# select only the params that are need
car_params = params.slice(:car_model_id, :used, :year_from, :year_to, :price_from, :price_to, :condition)
# do some processing of the data
year_from = car_params.delete(:year_from).presence
year_to = car_params.delete(:year_to).presence
car_params[:price] = convert_to_range(year_from, year_to)
price_from = car_params.delete(:price_from).presence
price_to = car_params.delete(:price_to).presence
car_params[:price] = convert_to_range(price_from, price_to)
# select only params that are present
car_params = car_params.select {|k, v| v.present? }
# search for the cars
#cars = Car.where(car_params)
Also, I'm pretty sure that the used value will automatically get cast to boolean for you when its provided to the where.
Also, #cars is an ActiveRecord::Relation which does not have an errors method. Perhaps you mean to give different results based on whether there are any cars returned?
E.g: #cars.any? (or #cars.load.any? if you don't want to execute two queries to fetch the cars and check if cars exist)
Edit:
As mentioned by mu is too short you can also clean up your code by chaining where conditions and scopes. Scopes help to move functionality out of the controller and into the model which increases re-usability of functionality.
E.g.
class Car > ActiveRecord::Base
scope :year_between, ->(from, to) { where(year: from..to) }
scope :price_between, ->(from, to) { where(price: from..to) }
scope :used, ->(value = true) { where(used: used) }
end
Then in your controller:
# initial condition is all cars
cars = Cars.all
# refine results with params provided by user
cars = cars.where(car_model_id: params[:car_model_id]) if params[:car_model_id].present?
cars = cars.year_between(params[:year_from], params[:year_to])
cars = cars.price_between(params[:price_from], params[:price_to])
cars = cars.used(params[:used])
cars = cars.where(condition: params[:condition]) if params[:condition].present?
#cars = cars

Instance Variables in a Rails Model

I have this variable opinions I want to store as an instance variable in my model... am I right in assuming I will need to add a column for it or else be re-calculating it constantly?
My other question is what is the syntax to store into a column variable instead of just a local one?
Thanks for the help, code below:
# == Schema Information
#
# Table name: simulations
#
# id :integer not null, primary key
# x_size :integer
# y_size :integer
# verdict :string
# arrangement :string
# user_id :integer
#
class Simulation < ActiveRecord::Base
belongs_to :user
serialize :arrangement, Array
validates :user_id, presence: true
validates :x_size, :y_size, presence: true, :numericality => {:only_integer => true}
validates_numericality_of :x_size, :y_size, :greater_than => 0
def self.keys
[:soft, :hard, :none]
end
def generate_arrangement
#opinions = Hash[ Simulation.keys.map { |key| [key, 0] } ]
#arrangement = Array.new(y_size) { Array.new(x_size) }
#arrangement.each_with_index do |row, y_index|
row.each_with_index do |current, x_index|
rand_opinion = Simulation.keys[rand(0..2)]
#arrangement[y_index][x_index] = rand_opinion
#opinions[rand_opinion] += 1
end
end
end
def verdict
if #opinions[:hard] > #opinions[:soft]
:hard
elsif #opinions[:soft] > #opinions[:hard]
:soft
else
:push
end
end
def state
#arrangement
end
def next
new_arrangement = Array.new(#arrangement.size) { |array| array = Array.new(#arrangement.first.size) }
#opinions = Hash[ Simulation.keys.map { |key| [key, 0] } ]
#seating_arrangement.each_with_index do |array, y_index|
array.each_with_index do |opinion, x_index|
new_arrangement[y_index][x_index] = update_opinion_for x_index, y_index
#opinions[new_arrangement[y_index][x_index]] += 1
end
end
#arrangement = new_arrangement
end
private
def in_array_range?(x, y)
((x >= 0) and (y >= 0) and (x < #arrangement[0].size) and (y < #arrangement.size))
end
def update_opinion_for(x, y)
local_opinions = Hash[ Simulation.keys.map { |key| [key, 0] } ]
for y_pos in (y-1)..(y+1)
for x_pos in (x-1)..(x+1)
if in_array_range? x_pos, y_pos and not(x == x_pos and y == y_pos)
local_opinions[#arrangement[y_pos][x_pos]] += 1
end
end
end
opinion = #arrangement[y][x]
opinionated_neighbours_count = local_opinions[:hard] + local_opinions[:soft]
if (opinion != :none) and (opinionated_neighbours_count < 2 or opinionated_neighbours_count > 3)
opinion = :none
elsif opinion == :none and opinionated_neighbours_count == 3
if local_opinions[:hard] > local_opinions[:soft]
opinion = :hard
elsif local_opinions[:soft] > local_opinions[:hard]
opinion = :soft
end
end
opinion
end
end

ActiveRecord analyzes the database tables and creates setter and getter methods with metaprogramming.
So you would create a database column with a migration:
rails g migration AddOpinionToSimulation opinion:hash
Note that not all databases support storing a hash or a similar key/value data type in a column. Postgres does. If you need to use another database such MySQL you should consider using a relation instead (storing the data in another table).
Then when you access simulation.opinion it will automatically get the database column value (if the record is persisted).
Since ActiveRecord creates a setter and getter you can access your property from within the Model as:
class Simulation < ActiveRecord::Base
# ...
def an_example_method
self.opinions # getter method
# since self is the implied receiver you can simply do
opinions
opinions = {foo: "bar"} # setter method.
end
end
The same applies when using the plain ruby attr_accessor, attr_reader and attr_writer macros.
When you assign to an attribute backed by a database column ActiveRecord marks the attribute as dirty and will include it when you save the record.
ActiveRecord has a few methods to directly update attributes: update, update_attributes and update_attribute. There are differences in the call signature and how they handle callbacks.

you can add a method like
def opinions
#opinions ||= Hash[ Simulation.keys.map { |key| [key, 0] }
end
this will cache the operation into the variable #opinions
i would also add a method like
def arrangement
#arrangement ||= Array.new(y_size) { Array.new(x_size) }
end
def rand_opinion
Simulation.keys[rand(0..2)]
end
and then replace the variables with your methods
def generate_arrangement
arrangement.each_with_index do |row, y_index|
row.each_with_index do |current, x_index|
arrangement[y_index][x_index] = rand_opinion
opinions[rand_opinion] += 1
end
end
end
now your opinions and your arrangement will be cached and the code looks better. you didn't have to add a new column in you table
you now hat to replace the #opinions variable with your opinions method

Wrapping 'next' and 'previous' functions

In my Rails 4 app, I defined functions in my model than get the (nth) next or previous row in de database, wrapping around the entire database, so that Item.last.next will refer to Item.first:
def next(n=0)
following = Item.where("id > ?", self.id).order("id asc") + Item.where("id < ?", self.id).order("id asc")
following[n % following.length]
end
def prev(n=0)
n = n % Item.count-1
previous = Item.where("id < ?", self.id).order("id desc") + Item.where("id > ?", self.id).order("id desc")
previous[n % previous.length]
end
This results in three database queries per method call, and I've learned to keep database queries to a minimum, so I wonder if there is a way do get this result with only one query.

What you are looking for seems a bit high level. So let's prepare the basic API at first.
def next(n=1)
self.class.where('id > ?', id).limit(n).order('id ASC')
end
def previous(n=1)
self.class.where('id > ?', id).limit(n).order('id DESC')
end
Then higher level methods
def next_recycle(n=1)
klass = self.class
return klass.first if (n = 1 && self == klass.last)
next(n)
end
def previous_recycle(n=1)
klass = self.class
return klass.last if (n == 1 && self == klass.first)
previous(n)
end
You can pick methods according to needs.

How to refactor complicated logic in create_unique method?

I would like to simplify this complicated logic for creating unique Track object.
def self.create_unique(p)
f = Track.find :first, :conditions => ['user_id = ? AND target_id = ? AND target_type = ?', p[:user_id], p[:target_id], p[:target_type]]
x = ((p[:target_type] == 'User') and (p[:user_id] == p[:target_id]))
Track.create(p) if (!f and !x)
end

Here's a rewrite of with a few simple extract methods:
def self.create_unique(attributes)
return if exists_for_user_and_target?(attributes)
return if user_is_target?(attributes)
create(attributes)
end
def self.exists_for_user_and_target?(attributes)
exists?(attributes.slice(:user_id, :target_id, :target_type))
end
def self.user_is_target?(attributes)
attributes[:target_type] == 'User' && attributes[:user_id] == attributes[:target_id]
end
This rewrite shows my preference for small, descriptive methods to help explain intent. I also like using guard clauses in cases like create_unique; the happy path is revealed in the last line (create(attributes)), but the guards clearly describe exceptional cases. I believe my use of exists? in exists_for_user_and_target? could be a good replacement for find :first, though it assumes Rails 3.
You could also consider using uniqueness active model validation instead.

##keys = [:user_id, :target_id, :target_type]
def self.create_unique(p)
return if Track.find :first, :conditions => [
##keys.map{|k| "#{k} = ?"}.join(" and "),
*##keys.map{|k| p[k]}
]
return if p[##keys[0]] == p[##keys[1]]
return if p[##keys[2]] == "User"
Track.create(p)
end

Categories

HOME

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Optimizing ActiveRecord Point-in-Polygon Search - ruby-on-rails

If your runtime is longer for more complex shapes, it suggests the performance is in the O(n) loop in the point_in_polygon? Does profiling back that assumption up? If performance is critical, consider implementing the exact same algorithm as native code.

Algorithm aside, keeping the polygon data in local memory and recoding this in a statically typed compiled language will likely lead to 100x-1000x increase in speed.

Related

Clean up messy code that query's based on multiple options

Too many checks for empty params. How to optimize queries to ActiveRecord in Rails5?

Instance Variables in a Rails Model

Wrapping 'next' and 'previous' functions

How to refactor complicated logic in create_unique method?

Categories

Resources