Cache Table from Remotely Hosted Database with ActiveRecord

Cache Table from Remotely Hosted Database with ActiveRecord - ruby-on-rails

I have a big Microsoft SQL Database in a remote host which I connect with ActiveRecord 5 using tiny_tds and activerecord-sqlserver-adapter. I need to make multiple queries to one table to find entries belonging to an object. The problem is that there are thousands of queries which takes a really long time to perform on a remote database.
Is it possible to cache the whole table so the queries would be performed in the local cached table to make them faster?
Edit: The purpose of this operation is to synchronize data from a legacy system to a newer one. The following loop is used for importing:
MsSqlDbEntity.where(deleted: nil).where.not(verified: nil).each do |entity|
entity_import(entity)
end
These are the methods used:
def entity_import(ms_sql_db_entity)
new_db_entity = NewDbEntity.new(
# Some params from ms_sql_db_entity
)
sub_entity_import(ms_sql_db_entity, new_db_entity) if new_db_entity.save
end
def sub_entity_import(ms_sql_db_entity, new_db_entity)
MsSqlDbSubEntity.where(ms_sql_db_entity_id: ms_sql_db_entity.id).each do |sub_entity|
new_db_sub_entity = NewDbSubEntity.create(
new_db_entity_id: new_db_entity.id,
# Some other params
)
end
end
The entity and sub_entity have a one-to-many relation.

Related

ActiveRecord: Check Number of Database Calls

For demo purposes, suppose that I have a class called DemoThing with a method called do_something.
Is there a way that (in code) I can check the number of times that do_something hits the database? Is there a way that I can "spy" on active record to count the number of times that the database was called?
For instance:
class DemoThing
def do_something
retVal = []
5.times do |i|
retVal << MyActiveRecordModel.where(:id => i)
end
retVal
end
end
dt = DemoThing.new
stuff = dt.do_something # want to assert that do_something hit the database 5 times

ActiveRecord should be logging each query in STDOUT.
But for the above code, it's pretty obvious that you're making 5 calls for each iteration of i.
Queries can be made more efficient by not mixing Ruby logic with querying.
In this example, gettings the ids before querying will mean that the query isn't called for each Ruby loop.
ids = 5.times.to_a
retVal = MyActiveRecordModel.where(id: ids) # .to_a if retVal needs to be an Array

Sure is. But first you must understand Rails' Query Cache and logger. By default, Rails will attempt to optimize performance by turning on a simple query cache. It is a hash stored on the current thread (one for every active database connection - Most rails processes will have just one ). Whenever a select statement is made (like find or where etc.), the corresponding result set is stored in a hash with the SQL that was used to query them as the key. You'll notice when you run the above method your log will show Model Load statement and then a CACHE statement. Your database was only queried one time, with the other 4 being loaded via cache. Watch your server logs as you run that query.

I found a gem for queries count https://github.com/comboy/sql_queries_count

How to avoid duplicates from saving in database parsed from external JSON file with sidekiq in Rails

I have a small To Do list in a .json file that I´m reading, parsing, and saving to a rails app with Sidekiq. Everytime I refresh the browser, the worker executes and duplicates the entries on the database. How do I maintain a unique database that is synchronized with the .json file and avoid duplicate entries to save on my database and show on my browser?
Here's the worker:
class TodoWorker
include Sidekiq::Worker
def perform
json_text = File.read('todo_json.json')
json = JSON.parse(json_text, :headers => true)
json.each do |todo|
t = TodoList.create(name: todo["name"], done: todo["done"])
t.save
end
end
end
And the controller:
class TodoListsController < ApplicationController
def index
#todo_lists = TodoList.all
TodoWorker.perform_async
end
end
Thanks

This is a terrible solution btw, you have a huge race condition in your read/store code, and you're not going to be able to use a large part of what Rails is good at. If you want a simple DB why not just use sqlite?
That being said, you need some way of recognizing duplicates, in most DBs this is done with a primary key that is sent to the browser along with the rest of the data, and then back again with any changes. That primary key is used to ensure that existing data is updated, rather than duplicated.
You will need the same thing in your JSON file, and then you can change your create method to be something more like ActiveRecord's find_or_create_by

Save/update multiple rows in rails

I'm currently working on saving a user social media posts in my app. The basic idea is to check if the post exists if it does update the data or if not create a new row. Right now I'm looping through all of the post that I receive from the social platform so potentially I'm looping through 3,000 and adding them to the database.
Is there a way that I could rewrite this to save all the items at once, which hopefully would speed up the save method?
post_data.each do |post_data_details|
post_instance = Post::Tumblr.
where(platform_id: platform_id).
where("data ->> 'id' = ?", post_data_details["id"].to_s).
first_or_initialize
exisiting_data = post_instance.data
new_data = exisiting_data.merge! post_data_details.to_hash
post_instance.data = new_data
post_instance.refreshed_at = date
post_instance.save!
end

It is good practice to run such long-running jobs via sidekiq or other background jobs solution.
You can also use single ActiveRecord transaction.
http://api.rubyonrails.org/classes/ActiveRecord/Transactions/ClassMethods.html
But keep in mind, that if one of records will be invalid - whole trasaction will be rollbacked.

How do I ensure correctness when using find_in_batches?

current my application have stat needs and I
make up a background job using rufus-scheduler and runs at 3:00
to batch process these records into CacheStat table. It's just like
any normal application's Weekly/Monthly Stat needs.
And I found out using find_each(say using User.find_each to iterate
all users), which invokes find_in_batches, I checkout the source code
of rails,
while records.any?
records_size = records.size
primary_key_offset = records.last.id
yield records
break if records_size < batch_size
if primary_key_offset
records = relation.where(table[primary_key].gt(primary_key_offset)).to_a
else
raise "Primary key not included in the custom select clause"
end
end
which the implentation is by comparing the primary-key,
my concern is the cocurrency,while I processing the batch,
whatif some records be inserted in-between?
does anybody have this kind of problem?
While I think, this code implementation may be be problemic,
because new records will always have larger PK and later in the
end will be find.
So this is what this kind of needs be implemented? If I want to
implement a batch stat processing by myself(without rails), then I
need to ensure have an integer primary key and using these fields to
compare(better not to use other kind of fields)?
(I was thinking of this because I'm kind of in the middle of switching
from mysql to mongo, so maybe later I need to implement this kind of
functionality by myself).

If I understand correctly, you can ensure correctness here by enforcing transactional isolation, e.g.
User.transaction do
User.find_each do |user|
user
end
end

ActiveRecord - working within one connection

For example, suppose there is the code in Rails 3.2.3
def test_action
a = User.find_by_id(params[:user_id])
# some calculations.....
b = Reporst.find_by_name(params[:report_name])
# some calculations.....
c = Places.find_by_name(params[:place_name])
end
This code does 3 requests to database and opens 3 different connections. Most likely it's going to be a quite long action.
Is there any way to open only one connection and do 3 requests within it? Or I want to control which connection to use by myself.

You would want to bracket the calls with transaction:
Transactions are protective blocks where SQL statements are only
permanent if they can all succeed as one atomic action. The classic
example is a transfer between two accounts where you can only have a
deposit if the withdrawal succeeded and vice versa. Transactions
enforce the integrity of the database and guard the data against
program errors or database break-downs. So basically you should use
transaction blocks whenever you have a number of statements that must
be executed together or not at all.
def test_action
User.transaction do
a = User.find_by_id(params[:user_id])
# some calculations.....
b = Reporst.find_by_name(params[:report_name])
# some calculations.....
c = Places.find_by_name(params[:place_name])
end
end
Even though they invoke different models the actions are encapsulated into one call to the DB. It is all or nothing though. If one fails in the middle then the entire capsule fails.
Though the transaction class method is called on some Active Record
class, the objects within the transaction block need not all be
instances of that class. This is because transactions are per-database
connection, not per-model.

You can take a look at ActiveRecord::ConnectionAdapters::ConnectionPool documentation
Also AR doesn't open a connection for each model/query it reuses the existent connection.
[7] pry(main)> [Advertiser.connection,Agent.connection,ActiveRecord::Base.connection].map(&:object_id)
=> [70224441876100, 70224441876100, 70224441876100]

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Cache Table from Remotely Hosted Database with ActiveRecord - ruby-on-rails

Related

ActiveRecord: Check Number of Database Calls

How to avoid duplicates from saving in database parsed from external JSON file with sidekiq in Rails

Save/update multiple rows in rails

How do I ensure correctness when using find_in_batches?

ActiveRecord - working within one connection

Categories

Resources