Duplicated business logic in Ruby and SQL - ruby-on-rails

I have a PORO (Plain Old Ruby Object) to deal with some business logic. It receives an ActiveRecord object and classify it. For the sake of simplicity, take the following as an example:
class Classificator
STATES = {
1 => "Positive",
2 => "Neutral",
3 => "Negative"
}
def initializer(item)
#item = item
end
def name
STATES.fetch(state_id)
end
private
def state_id
return 1 if #item.value > 0
return 2 if #item.value == 0
return 3 if #item.value < 0
end
end
However, I also want to do queries that groups objects based on these state_id "virtual attribute". I'm currently dealing with that by creating this attribute in the SQL queries and using it in GROUP BY statements. See the example:
class Classificator::Query
SQL_CONDITIONS = {
1 => "items.value > 0",
2 => "items.value = 0",
3 => "items.value < 0"
}
def initialize(relation = Item.all)
#relation = relation
end
def count
#relation.select(group_conditions).group('state_id').count
end
private
def group_conditions
'CASE ' + SQL_CONDITIONS.map do |k, v|
'WHEN ' + v.to_s + " THEN " + k.to_s
end.join(' ') + " END AS state_id"
end
end
This way, I can get this business logic into SQL and make this kind of query in a very efficient way.
The problem is: I have duplicated business logic. It exists in "ruby" code, to classify a single object and also in "SQL" to classify a collection of objects in database-level.
Is it a bad practice? Is there a way to avoid this? I actually was able to do this, doing the following:
item = Item.find(4)
items.select(group_conditions).where(id: item.id).select('state_id')
But by doing this, I loose the ability to classify objects that are not persisted in database. The other way out would be classifying each object in ruby, using an Iterator, but then I would lose database performance.
It's seem to be unavoidable to keep duplicated business logic if I need the best of the two cases. But I just want to be sure about this. :)
Thanks!

I'd rather keep database simple, and put logic in Ruby code as much as possible. Since classification is not stored in database, I won't expect the queries to return it.
My solution is to define a concern which will be included into ActiveRecord model classes.
module Classified
extend ActiveSupport::Concern
STATES = {
1 => "Positive",
2 => "Neutral",
3 => "Negative"
}
included do
def state_name
STATES.fetch(state_id)
end
private
def state_id
(0 <=> value.to_i) + 2
end
end
end
class Item < ActiveRecord::Base
include Classified
end
And I fetch items from database just as usual.
items = Item.where(...)
Since each item knows its own classification value, I don't have to ask database for it.
items.each do |item|
puts item.state_name
end

ActiveRecord itself implies a degree of coupling between your persistence and business logic. However, as much as the pattern allows, and if you don't have real performance constraints, the first option should be to keep your persistence code as dumb as possible, and move this "classification" (which is clearly a business rule) away from the database as much as possible.
The rationale is that database-related code is more expensive to change (especially as your system is already in production) and generally more difficult and slower to test than pure business logic.

Is there any chance to introduce trigger in the database? If so, I would go with “calculated” field state_id in the database, that changes it’s value on both INSERT and UPDATE (this will bring even more productivity benefit) and this code in ruby:
def state_if
return #item.state_id if #item.state_id # persistent object
case #item.value
when 0 then 2
when -Float::INFINITY...0 then 3
else 1
end
end

Related

Rails: Code optimization/restructuring requested

I have the following code snippet that works perfectly and as intended:
# Prepares the object design categories and connects them via bit mapping with the objects.design_category_flag
def prepare_bit_flag_positions
# Updates the bit_flag_position and the corresponding data in the object table with one transaction
ActiveRecord::Base.transaction do
# Sets the bit flag for object design category
ObjectDesignCategory.where('0 = (#rownum:=0)').update_all('bit_flag_position = 1 << (#rownum := 1 + #rownum)')
# Resets the object design category flag
Object.update_all(design_category_flag: 0)
# Sets the new object design category bit flag
object_group_relation = Object.joins(:object_design_categories).select('BIT_OR(bit_flag_position) AS flag, objects.id AS object_id').group(:id)
join_str = "JOIN (#{object_group_relation.to_sql}) sub ON sub.object_id = objects.id"
Object.joins(join_str).update_all('design_category_flag = sub.flag')
end
But in my opinion it is quite difficult to read. So I tried to rewrite this code without raw SQL. What I created was this:
def prepare_bit_flag_positions
# Updates the bit_flag_position and the corresponding data in the object table with via transaction
ActiveRecord::Base.transaction do
# Sets the bit flag for the object color group
ObjectColorGroup.find_each.with_index do |group, index|
group.update(bit_flag_position: 1 << index)
end
# Resets the object color group flag
Object.update_all(color_group_flag: 0)
# Sets the new object color group bit flag
Object.find_each do |object|
object.update(color_group_flag: object.object_color_groups.sum(:bit_flag_position))
end
end
end
This also works fine, but when I run a benchmark for about 2000+ records, the second option is about a factor of 65 slower than the first. So my question is:
Does anyone have an idea how to redesign this code so that it doesn't require raw SQL and is still fast?
I can see 2 sources of slowing:
N+1 problem
Instantiating objects
Calls to DB
This code has the N+1 Problem. I think this may be the major cause of the slowing.
Object.find_each do |object|
object.update(color_group_flag: object.object_color_groups.sum(:bit_flag_position))
end
Change to
Object.includes(:object_color_groups).find_each do |object|
...
end
You can also use Object#update class method on this code (see below).
I don't think you can get around #2 without using raw SQL. But, you will need many objects (10K or 100K or more) to see a big difference.
To limit the calls to the DB, you can use Object#update class method to update many at once.
ObjectColorGroup.find_each.with_index do |group, index|
group.update(bit_flag_position: 1 << index)
end
to
color_groups = ObjectColorGroup.with_index.map do |group, index|
[group.id, { bit_flag_position: group.bit_flag_position: 1 << index }]
end.to_h
ObjectColorGroup.update(color_groups.keys, color_groups.values)
The following is a single query, so no need to change.
Object.update_all(color_group_flag: 0)
Reference:
ActiveRecord#update class method API
ActiveRecord#update class method blog post
Rails Eager Loading

Is there any rails/ruby functions that I can use to reset a hash lookup table every 24 hours or at a specific date?

Ok so I have an app that allows users to pull App Store data, specifically top free top paid etc. The various attributes are quite limited, but users can filter by category and country. So obviously this leads to a lot of repeated queries, now normally this wouldn't be a problem, but I also use this data with google api which has a credits system. So What I want to do is save these results in my database if the results are unique. I have this all set up and fine but my only hang up is how I determine if a query has been made before, so my solution is to make a hashtable that stores all queries that have been made before and if not NULL(nil) then I call the api to fetch the data then create a new record.
Issue is the App Store refreshes every day or so(not exactly sure the schedule but will look it up later). I would like to have this Hashtable reference function refresh or reset itself to all NULL at this interval.
What would be the most efficient or simple way to start a refresh for this? Additionally I am kinda new to rails, so where should I place this function? In the helper modules? Controller?
Thanks!
Edit:
ok so here is my HashTable helper module
module MapsHelper
queryHistoryLookUp = {}
i = 0
31.times do |i|
queryTableLookup.merge!(i =>[] )
end
def queryTableLookup(asciiNum, queryString)
if queryTableLookup[asciiNum % 31].size == 0
queryTableLookup[asciiNum % 31].push(queryString)
else
a = queryTableLookup[asciiNum % 31].size
arrayOfQueries = queryTableLookup[asciiNum % 31]
a.times do |i|
if arrayOfQueries[i] == queryString
return true
else
return false
end
end
end
end
end
def queryHash(query)
asciSum = 0
query.each_char do |i|
asciSum += i.sum
end
queryTableLookup(asciSum, query)
end
end
additionally I am kinda new to rails, can I interact with these functions using Javascript, since on the client side I create the string query.
In my opinion, your best bet would be to use the Rails cache system. It provides a method of caching data, with an optional expires_in time.
From the docs:
http://guides.rubyonrails.org/caching_with_rails.html#low-level-caching
class MyModel < ActiveRecord::Base
def self.get_api_data(key)
Rails.cache.fetch("my_model/api_data:#{key}", expires_in: 12.hours) do
SomeService::API.get_data(key)
end
end
end
In your hash (which I think it could exist in a class variable) you can store both the query and the last access datetime:
Suppose you have a hash as class variable to the Foo class with name cache and that the query variable is your current query that you want to check.
if Foo.cache[query].nil? || (DateTime.now - Foo.cache[query].last_fetch).to_i > 0
results = your_method_to_fetch_data_for(query)
Foo.cache[query] = {:results => results, :last_fetch => Datetime.now}
else
results = Foo.cache[query][:results]
end

Execute method on mongoid scope chain

I need to take some random documents using Rails and MongoId. Since I plan to have very large collections I decided to put a 'random' field in each document and to select documents using that field. I wrote the following method in the model:
def random(qty)
if count <= qty
all
else
collection = [ ]
while collection.size < qty
collection << where(:random_field.gt => rand).first
end
collection
end
end
This function actually works and the collection is filled with qty random elements. But as I try to use it like a scope like this:
User.students.random(5)
I get:
undefined method `random' for #<Array:0x0000000bf78748>
If instead I try to make the method like a lambda scope I get:
undefined method `to_criteria' for #<Array:0x0000000df824f8>
Given that I'm not interested in applying any other scopes after the random one, how can I use my method in a chain?
Thanks in advance.
I ended up extending the Mongoid::Criteria class with the following. Don't know if it's the best option. Actually I believe it's quite slow since it executes at least qty queries.
I don't know if not_in is available for normal ActiveRecord modules. However you can remove the not_in part if needed. It's just an optimization to reduce the number of queries.
On collections that have a double (or larger) number of documents than qty, you should have exactly qty queries.
module Mongoid
class Criteria
def random(qty)
if count <= qty
all
else
res = [ ]
ids = [ ]
while res.size < qty
el = where(:random_field.gt => rand).not_in(id: ids).first
unless el.nil?
res << el
ids << el._id
end
end
res
end
end
end
end
Hope you find this useful :)

Rails3 - Multiple queries or single query in controller action

Hopefully a simple question around a rails best-practice.
Let's keep this super simple; say I have a task model that has an ID, description and status.
In my controller I have an index action to return all tasks
def index
#tasks = Task.all
end
My question is, in my view, suppose I want to display the tasks in separate HTML tables according to their status.
What is the best practice?
a) Query the database multiple times in the index action, ie
def index
#draft_tasks = Task.where(status: "Draft")
#approved_tasks = Task.where(status: "Approved")
#closed_tasks = Task.where(status: "Closed")
end
b) Query the database once, and filter in the contoller action
def index
tasks = Task.all
#draft_tasks = tasks.#somethinghere
#approved_tasks = tasks.#somethinghere
#closed_tasks = tasks.#somethinghere
end
c) Filter in the view
<% #tasks.each do |k, v| %>
<% some if statement searching for the status I want %>
# Some code to output the table
<%end%>
<%end%>
or
d) Something else?
The generally accepted best practices here are to keep controller methods thin and to keep logic out of the view. So with that in mind, one possible way to do this would be:
# model
class Task
scope :drafts, where(:status => "Draft")
scope :approved, where(:status => "Approved")
scope :closed, where(:status => "Closed")
end
# controller
def index
#draft_tasks = Task.drafts
#approved_tasks = Task.approved
#closed_tasks = Task.closed
end
This will make 3 queries to the database, which could become a performance concern down the road, but if that does happen, you can optimize it at the model level (e.g. by defining class methods drafts, approved, and closed where the first one called prefetches everything). It's less elegant though, so don't prematurely optimize.
This is a loaded question with no one best practice in my opinion. Given the case you have stated (display a table for each status) I would use the following thought process:
I would generally avoid case A when you're just dealing with one Model type. I try to limit the number of database queries when possible
Case B is what I would probably use if the view needs to display different markup depending on the status of a task.
I would usually tend towards case C if the markup is the same for each status. You can use the group_by function for this:
When the amount of information on your page starts to get larger and more complicated, you can start looking at extracting some logic out of the controller and into another object (common terms for this object would be a presenter or decorator). This can make testing some of your presentation logic easier by separating it from the controller and keeping your controllers 'thin'. But for the case you've given, I'd stick with option b or c.
In the simple case where the number of tasks is limited, I would do only a single query to retrieve them, and then separate them as follows:
tasks = Task.all
#draft_tasks = tasks.select { |x| x.status == 'Draft' }
#approved_tasks = tasks.select { |x| x.status == 'Approved' }
#closed_tasks = tasks.select { |x| x.status == 'Closed' }
Furthermore, depending on the bendability of your requirements, I would even render them in a single table with a clear visual marker what the state is (e.g. background-colour or icons). Then there would not even be a reason to separate the tasks beforehand (but I can imagine this would break your UI completely).
None of the above is valid once the number of tasks becomes larger, and you will need to apply pagination, and you need to display three different tables (one for each state).
In that case you will need to use the three separate queries as answered by #Ben.
Now UI-wise, I am not sure how you can paginate over three different sets of data at once. So instead I would use a single table showing all the states, and offer the option to filter on the status. In that case at least it is clear for the user what pagination will mean.
Just my two cents, hope this helps.
option a) seems better just because database can cache the query for you and stuff, so it should be faster.

How can I speed up this Rails code?

It's a vague question I know....but the performance on this block of code is horrible. It takes about 15secs from the original post to the action to rendering the page...
The purpose of this action is to retrieve all Occupations from a CV, all the skills from that CV and the occupations. They need to be organized in 2 arrays:
the first array contains all the Occupations (no duplicates) and has them ordered according their score. Fo each double entry found the score is increased by 1
the second array contains ALL the skills from both the occupation array and the cv. Again no doubles are allowed, but for every double encountered the score of the existing is increased by one.
Below is the code block that performs this operation. It's relatively big compared to my other code snippets, but i hope it's understandable. I know working with the arrays like i do is confusing, but here is what each array location means:
position 0 : the actuall skill/occupation object
position 1 : the score of the entry
position 2 : the location found in the db
position 3 : the location found in the cv
def categorize
#cv = Cv.find(params[:cv_id], :include => [:desired_occupations, :past_occupations, :educational_skills])
#menu = :second
#language = Language.resolve(:code => :en, :name => :en)
#occupation_hashes = []
#skill_hashes = []
(#cv.desired_occupations + #cv.past_occupations).each do |occupation|
section = []
section << 'Desired occupation' if #cv.desired_occupations.include? occupation
section << 'Work experience' if #cv.past_occupations.include? occupation
unless (array = #occupation_hashes.assoc(occupation)).blank?
array[1] += 1
array[2] = (array[2] & section).uniq
else
#occupation_hashes << [occupation, 1, section]
end
occupation.skills.each do |skill|
unless (array = #skill_hashes.assoc skill).blank?
label = occupation.concept.label(#language).value
array[1]+= 1
array[3] << label unless array[3].include? label
else
#skill_hashes << [skill, 1, [], [occupation.concept.label(#language).value]]
end
end
end
#cv.educational_skills.each do |skill|
unless (array = #skill_hashes.assoc skill).blank?
array[1]+= 1
array[3] << 'Education skills' unless array[3].include? 'Education skills'
else
#skill_hashes << [skill, 1, ['Education skills'], []]
end
end
# Sort the hashes
#occupation_hashes.sort! { |x,y| y[1] <=> x[1]}
#skill_hashes.sort! { |x,y| y[1] <=> x[1]}
#max = #skill_hashes.first[1]
#min = #skill_hashes.last[1] end
I can post the additional models and migrations to make it clear what each class does, but I think the first few lines of the above script should be clear on the associations. I'm looking for a way to optimize the each-loops...
That's quite the block of code there. Generally if you're writing methods that serious you're going to have trouble maintaining it in the future. A technique that would help is breaking up that monolithic chunk of code and turning it into a helper class that does the processing in more logical stages, making it easier to fine-tune aspects of it.
For instance, an interface might be:
#categorizer = CvCategorizer.new(params[:cv_id])
This would encapsulate all of the above and save it into instance variables made accessible by being declared with attr_reader.
Using a utility class means you can break up the initialization into steps that are made more clear:
def initialize(cv_id)
# Call a wrapper method that loads the CV
#cv = self.load_cv(cv_id)
# Perform discrete steps to re-order the imported data
self.organize_occupations
self.organize_skills
end
It's really hard to say why this is slow by just looking at it, though I would pay very close attention to log/development.log to see what's going on in there. It could be the initial load is painfully slow but the rest of the method is fine.
You should do a but of profiling in your code to see what is taking a large chunk of time. You can figure out how to work on of the profilers, or just sprinkle some simple puts or logger.info statements throughout your code with a timestamp. Probably easiest to do this by using Benchmark. Note: you may need to require 'benchmark'... not sure if it is auto required in Rails or not.
For a single line, you can do something like this:
logger.info Benchmark.measure { #cv = Cv.find(params[:cv_id], :include => [:desired_occupations, :past_occupations, :educational_skills]) }
And for timing larger blocks of code:
logger.info Benchmark.measure do
(#cv.desired_occupations + #cv.past_occupations).each do |occupation|
section = []
section << 'Desired occupation' if #cv.desired_occupations.include? occupation
section << 'Work experience' if #cv.past_occupations.include? occupation
unless (array = #occupation_hashes.assoc(occupation)).blank?
array[1] += 1
array[2] = (array[2] & section).uniq
else
#occupation_hashes << [occupation, 1, section]
end
end
end
I'd just start with large blocks and then narrow it down. Not knowing how large of a dataset you are dealing with, it is hard to say what the problem zone is.
I'll also concur with others that you will be way better off to break this thing into smaller methods. This will also make it easier to test for performance, as you can do things like:
Benchmark.measure { 10000.times { foo.do_that_thing_that_might_be_slow }}

Resources