Best way to implement a large 2D Lookup Table in Rails

Best way to implement a large 2D Lookup Table in Rails - ruby-on-rails

I am developing a Rails app and one of my actions compares two of the same kind of objects and returns a decimal value between 0 and 1. There are roughly 800 objects that need to be compared, thus there are roughly 800*800 possible decimal values that can be returned. Each action call requires about 300 or so comparisons, which are made via an API.
Because of the number of API calls that are needed, I have decided that the best approach is to make a lookup table with all 800*800 API comparison values stored locally, to avoid having to rely on the API, which has call limits and a significant overhead per call.
Basically I have decided that a lookup table best suits this task (although I am open for suggestions on this too).
My question is this: what is the best way to implement a 2 dimensional lookup table with ~800 "rows" and ~800 "columns" in rails? For example, if I wanted to compare objects 754 and 348, Would it be best to create models for the rows and columns and access the decimal comparison like:
object1.754.object2.348 # => 0.8738
Or should I store all of the values in a CSV or something like this? If this is the better approach, how should I even approach setting this up? I am relatively new to the rails world so apologies if an obvious answer is dangling in front of me!
Bear in mind that the entire point of this approach was to avoid the overheads from API calls and thus avoid large waiting times for the end user, so I am looking for the most time-efficient way to approach this task!

I would consider a hash of hashes, so you retrieve the values with:
my_hash[754][348]
=> 0.8738
If you might not have already loaded the value for a particular combination then you'd want to be careful to use:
my_hash[754].try(:[],348)
There could be some subtleties in the implimentation to do with loading the hash that make it beneficial to use the hashie gem.
https://rubygems.org/gems/hashie/versions/3.4.2
If you wanted to persist the values then it can be written into a database using serialize, and you can also extend the method to provide expiry dates on the values as well if you wished.

I had a similar problem comparing the contents of a large collection of ebooks, I stored the allready compared results in a matrix that I serialise with Marshal, the lookup key is a 2 dimensional array of the MD5 value of the filepath.
Here I add the Matric class I created for this task.
require 'digest/md5'
class Matrix
attr_accessor :path, :store
def initialize path
#path = path
#store = File.open(#path,'rb') { |f| Marshal.load(f.read) } rescue Hash.new(nil)
end
def save
File.open(#path,'wb') { |f| f.write(Marshal.dump(#store)) }
self
end
def add file1, file2, value
#store[[Digest::MD5.hexdigest(file1), Digest::MD5.hexdigest(file2)]] = value
end
def has? file1, file2
!#store[[Digest::MD5.hexdigest(file1), Digest::MD5.hexdigest(file2)]].nil?
end
def value file1, file2
#store[[Digest::MD5.hexdigest(file1), Digest::MD5.hexdigest(file2)]]
end
def each &blk
#store.each &blk
end
end

Related

Hash/Array to Active Record

I have been searching everywhere but I can't seem to find this problem anywhere. In Rails 5.0.0.beta3 I need to sort a my #record = user.records with an association and it's record.
The sort goes something like this.
#record = #record.sort_by { |rec|
If user.fav_record.find(rec.id)
User.fav_record(rec.id).created_at
Else
rec.created_at
End
This is just an example of what I do. But everything sorts fine.
The problem:
This returns an array and not an Active Record Class.
I've tried everything to get this to return an Active Record Class. I've pushed the sorted elements into an ID array and tried to extract it them in that order, I've tried mapping. Every result that I get turns my previous active record into an array or hash. Now I need it to go back into an active record. Does anyone know how to convert an array or hash of that active record back into an Active Record class?

There isn't a similarly easy way to convert ActiveRecord to array.
If you want to optimize the performance of your app, you should try to avoid converting arrays to ActiveRecord queries. Try and keep the object as a query as long as possible.
That being said, working with arrays is generally easier than queries, and it can feel like a hassle to convert a lot of array operations to ActiveRecord query (SQL) code.
It'd be better to write the sort code using ActiveRecord::Query methods or even writing it in plain SQL using find_by_sql.
I don't know what code you should specifically use here, but I do see that your code could be refactored to be clearer. First of all, If and Else should not be capitalized, but I'm assuming that this is just pseudocode and you already realize this. Second, your variable names should be pluralized if they are queries or arrays (i.e. #record.sort_by should be #records.sort_by instead).
It's worth mentioning that ActiveRecord queries are difficult to master and a lot of people just use array operations instead since they're easier to write. If "premature optimization is the root of all evil", it's really not the end of the world if you sacrifice a bit of performance and just keep your array implementation if you're just trying to make an initial prototype. Just make sure that you're not making "n+1" SQL calls, i.e. do not make a database call every iteration of your loop.
Here's an example of an array implementation which avoids the N+1 SQL issue:
# first load all the user's favorites into memory
user_fav_records = user.fav_records.select(:id, :created_at)
#records = #records.sort_by do |record|
matching_rec = user.fav_records.find { |x| x.id.eql?(rec.id) }
# This is using Array#find, not the ActiveRecord method
if matching_rec
matching_rec.created_at
else
rec.created_at
end
end
The main difference between this implementation and the code in your question is that I'm avoiding calling ActiveRecord's find each iteration of the loop. SQL read/writes are computationally expensive, and you want your code to make as little of them as possible.

Implement an efficient slack-like subdomain name suggestion

I have a Rails app with PostgreSQL.
I'm trying to implement a method to suggest alternative names for a certain resource, if the user input has been already chosen.
My reference is slack:
Is there any solution that could do this efficiently?
For efficiently I mean: using only one or also a small set of queries. A pure SQL solution would be great, though.
My initial implementation looked like this:
def generate_alternative_names(model, column_name, count)
words = model[column_name].split(/[,\s\-_]+/).reject(&:blank?)
candidates = 100.times.map! { |i| generate_candidates_using_a_certain_strategy(i, words) }
already_used = model.class.where(column_name => candidates).pluck(column_name)
(candidates - already_used).first(count)
end
# Usage example:
model = Domain.new
model.name = 'hello-world'
generate_alternative_names(model, :name, 5)
# => ["hello_world", "hello-world2", "world_hello", ...]
It generates 100 candidates, then checks the database for matches and removes them from the candidates list. Finally it returns the first count values extracted.
This method is a best effort implementation, as it works for small sets of suggestions, that have few conflicts (in my case, 100 conflicts).
Even if I increase this magic number (100), it does not scales indefinitely.
Do you know a method to improve this, so it can scale for large number of conflicts and without using magic numbers?

I would go with reversed approach: query the database for existing records using LIKE and then generate suggestions skipping already taken:
def alternatives(model, column, word, count)
taken = model.class.where("#{column} LIKE '%#{word}%'").pluck(column)
count.times.map! do |i|
generate_candidates_using_a_certain_strategy(i, taken)
end
end
Make a generate_candidates_using_a_certain_strategy to receive an array of already taken words to be skipped. There could be one possible glitch with race condition on two requests taking the same name, but I don’t think it might cause any problems, since you are always free to apologize when an actual creation will fail.

Breaking Association/ Relation collection objects into smaller Association/ Relation collections in Ruby on Rails

JRuby, Rails 3
I have a piece of code that queries a number of tables, related through association, returning a combined result set as an ActiveRecord::Relation. My problem is that when this function retrieves a very large result set and tries to do something with it (in my case, create a .xls file), the JVM errors, reporting a GC Memory Heap problem.
The problem is partly down to all these records being held in memory when trying to process the .xls export, as well as JRuby's questionable garbage collector- but, all these records should not be processed at once anyway! So my solution is to break these records into smaller chunks, write them to the file and repeat.
However, amongst all my other constraints, the next part of code that I need to use requires a relation object passed to it. Previously, this was the entire result set, but at this point, I've broken it down into smaller bits (for arguments sake, lets say 100 records).
At this point, you're probably thinking, yeah- what's the problem? Well, see my example code below:
#result_set = relation object
result_set.scoped.each_slice(100) do |chunk|
generic_filter = App::Filter.new(chunk, [:EXCEL_EXPORT_COLUMNS]) #<-- errors here
#do some stuff
generic_filter.relation.each_with_index do |work_type, index|
xls_doc.row(index + 1).concat(generic_filter.values_for_row(work_type))
DATE_COLUMN_INDEX.each do |column_index|
xls_doc.row(index + 1).set_format column_index,
::Spreadsheet::Format.new(number_format: 'DD-MM-YYYY')
end
end
[...] #some other stuff
end
As you can see, I am splitting the result_set into smaller chunks of 100 records and passing it to the App::Filter class that expects a relation object. However, slitting result_set into smaller chunks using each_slice or in_groups causes an error within the block because these two methods return an array of results, not a relation.
I'm fairly new to Ruby on Rails, so my questions are:
Is a relation in fact an object/ collection/ or something like a pre-
defined query, much like a prepared statement?
Is it possible to return smaller relation objects using methods similar to
each_slice or in_groups and process them as intended?
Any pointers/ suggestions will be well received- thanks!

A relation is a kind of helper to build SQL queries (INSERT, SELECT, DELETE, etc). In your exemple, you trigger SELECT queries with each_slice and you get arrays of results.
I havn't checked, I'm not sure each_slice is doing what you want... You should check find_each instead.
You should probably do something like this:
# do what you need with the relation but do NOT trigger the query
generic_filter = App::Filter.new(result_set.scoped, [:EXCEL_EXPORT_COLUMNS]) #<-- errors here
# trigger the query by slice
generic_filter.relation.find_each do |chunk|
chunk.each_with_index do |work_type, index|
xls_doc.row(index + 1).concat(generic_filter.values_for_row(work_type))
DATE_COLUMN_INDEX.each do |column_index|
xls_doc.row(index + 1).set_format column_index,
::Spreadsheet::Format.new(number_format: 'DD-MM-YYYY')
end
end
end

What is the 'Rails Way' to implement a dynamic reporting system on data

Intro
I'm doing a system where I have a very simple layout only consisting of transactions (with basic CRUD). Each transaction has a date, a type, a debit amount (minus) and a credit amount (plus). Think of an online banking statement and that's pretty much it.
The issue I'm having is keeping my controller skinny and worrying about possibly over-querying the database.
A Simple Report Example
The total debit over the chosen period e.g. SUM(debit) as total_debit
The total credit over the chosen period e.g. SUM(credit) as total_credit
The overall total e.g. total_credit - total_debit
The report must allow a dynamic date range e.g. where(date BETWEEN 'x' and 'y')
The date range would never be more than a year and will only be a max of say 1000 transactions/rows at a time
So in the controller I create:
def report
#d = Transaction.select("SUM(debit) as total_debit").where("date BETWEEN 'x' AND 'y'")
#c = Transaction.select("SUM(credit) as total_credit").where("date BETWEEN 'x' AND 'y'")
#t = #c.credit_total - #d.debit_total
end
Additional Question Info
My actual report has closer to 6 or 7 database queries (e.g. pulling out the total credit/debit as per type == 1 or type == 2 etc) and has many more calculations e.g totalling up certain credit/debit types and then adding and removing these totals off other totals.
I'm trying my best to adhere to 'skinny model, fat controller' but am having issues with the amount of variables my controller needs to pass to the view. Rails has seemed very straightforward up until the point where you create variables to pass to the view. I don't see how else you do it apart from putting the variable creating line into the controller and making it 'skinnier' by putting some query bits and pieces into the model.
Is there something I'm missing where you create variables in the model and then have the controller pass those to the view?

A more idiomatic way of writing your query in Activerecord would probably be something like:
class Transaction < ActiveRecord::Base
def self.within(start_date, end_date)
where(:date => start_date..end_date)
end
def self.total_credit
sum(:credit)
end
def self.total_debit
sum(:debit)
end
end
This would mean issuing 3 queries in your controller, which should not be a big deal if you create database indices, and limit the number of transactions as well as the time range to a sensible amount:
#transactions = Transaction.within(start_date, end_date)
#total = #transaction.total_credit - #transaction.total_debit
Finally, you could also use Ruby's Enumerable#reduce method to compute your total by directly traversing the list of transactions retrieved from the database.
#total = #transactions.reduce(0) { |memo, t| memo + (t.credit - t.debit) }
For very small datasets this might result in faster performance, as you would hit the database only once. However, I reckon the first approach is preferable, and it will certainly deliver better performance when the number of records in your db starts to increase

I'm putting in params[:year_start]/params[:year_end] for x and y, is that safe to do?
You should never embed params[:anything] directly in a query string. Instead use this form:
where("date BETWEEN ? AND ?", params[:year_start], params[:year_end])
My actual report probably has closer to 5 database calls and then 6 or 7 calculations on those variables, should I just be querying the date range once and then doing all the work on the array/hash etc?
This is a little subjective but I'll give you my opinion. Typically it's easier to scale the application layer than the database layer. Are you currently having performance issues with the database? If so, consider moving the logic to Ruby and adding more resources to your application server. If not, maybe it's too soon to worry about this.
I'm really not seeing how I would get the majority of the work/calculations into the model, I understand scopes but how would you put the date range into a scope and still utilise GET params?
Have you seen has_scope? This is a great gem that lets you define scopes in your models and have them automatically get applied to controller actions. I generally use this for filtering/searching, but it seems like you might have a good use case for it.
If you could give an example on creating an array via a broad database call and then doing various calculations on that array and then passing those variables to the template that would be awesome.
This is not a great fit for Stack Overflow and it's really not far from what you would be doing in a standard Rails application. I would read the Rails guide and a Ruby book and it won't be too hard to figure out.

Efficient way to display a nested tree with mongoid and rails

I've got a comment tree nested in the document, using mongoid embeds_many_recursively like this:
Document: {
...
comments: [{
...
updated_at,
child_comments: [{
...
updated_at
child_comments: [{...},{...}],
...},{...}]
...}]
...}]
...}
What's the most effective way of passing it to a view in a way that is ordered by first level 'comment updated_at' attribute?
At the moment I came up with this inside the main document model:
def flatten_comments
#flat_comments = []
self.comments.order_by([[:updated_at, :desc]]).each do |comment|
flatten_comments_iterator(comment)
end
return #flat_comments
end
def flatten_comments_iterator(comment)
#flat_comments << comment
comment.child_comments.each {|reply| flatten_comments_iterator(reply)}
end
and then just iterating in the view over the array.
The problems are:
1) in the recursive flattening, the order is lost somewhere, and I can't figure where, step-by-step on paper it seems to adding items in the needed order, probably some thing to do with class variable scope and access.
2) I'm not sure it's the most efficient way to do a simple retrieval.
Would be thankful for an advise and for experience with how to handle this kind of tasks efficiently.

There are basically 2 design approaches (one of them being the same as yours), that are documented on ruby driver's modeling examples. There is also a similar question on SO about it.
About the other issue: there's nothing bad about recursivity, in general, if the comments don't have a huge depth of nesting. However your implementation is not thread-safe, as it uses instance variables, and not local variables. To deal with it, you should transform #flat_comments to a local variable and pass it as a param to flatten_comments_iterator method.
Tip: since any method recursion can be transformed to a iteration, what you may want to implement is an iterative preorder traversal of a graph.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart