I have a model which has through relationships to another model through two foreign key relationships. I do a lot of lookups on those tables (which are 3-4K rows) while importing a lot of data, and I'm trying to eliminate spurious repeated database lookups.
(Ideally, my database would be doing async writes/INSERTs only)
I have played with doing my own caching by ID, and recently switched to using Rails.cache (with MemoryStore for now. I have no need to sync against other instances, and I'm ram rich on the import machine). However, I find that I am getting multiple copies of the same associated records, and I'd like to get rid of this.
For instance:
irb> p = Phone.includes([:site => :client, :btn => :client]).first.
irb> p.site.client.object_id => 67190640
irb> p.btn.client.object_id => 67170780
Ideally, I'd like these to point to the same object in memory.
Rails.cache would serialize things in/out, which really just makes this worse, but I was surprised by this. Could I override find_by_id() or some such in a way that the association proxies would make use of my cache?
Maybe there is another caching module that I'm missing?
(please note that there is no web front end involved in this process. It's all models and ORM)
Try using IdentityCache (see https://github.com/Shopify/identity_cache). We currently have a similar issue. We're using JRuby because it's fast but mallocs are expensive in our target environment... making cacheing these record instances all the more necessary.
There used to be an IdentityMap within ActiveRecord, but it was removed due to issues with unexpected behaviour around associations.
Just noticed you asked this in August, did you find a good solution?
Related
I have recently started consulting and helping with the development of a Rails application that was using MongoDB (with Mongoid as its DB client) to store all its model instances.
This was fine while the application was in an early startup stage, but as the application got more and more clients and while starting to need more and more complicated queries to show proper statistics and other information in the interface, we decided that the only viable solution forward was to normalize the data, and move to a structured database instead.
So, we are now in the process of migrating both the tables and the data from MongoDB (with Mongoid as object-mapper) to Postgres (with ActiveRecord as object-mapper). Because we have to make sure that there is no improper non-normalized data in the Mongo database, we have to run these data migrations inside Rails-land, to make sure that validations, callbacks and sanity checks are being run.
All went 'fine' on development, but now we are running the migration on a staging server, with the real production database. It turns out that for some migrations, the memory usage of the server increases linearly with the number of model instances, causing the migration to be killed once we've filled 16 GB of RAM (and another 16GB of swap...).
Since we migrate the model instances one by one, we hope to be able to find a way to make sure that the memory usage can remain (near) constant.
The things that currently come to mind that might cause this are (a) ActiveRecord or Mongoid keeping references to object instances we have already imported, and (b) the migration is run in a single DB transaction, so Postgres is taking more and more memory until it is completed maybe?
So my question:
What is the probable cause of this linear memory usage?
How can we reduce it?
Are there ways to make Mongoid and/or ActiveRecord relinquish old references?
Should we attempt to call the Ruby GC manually?
Are there ways to split a data migration into multiple DB transactions, and would that help?
These data migrations have about the following format:
class MigrateSomeThing < ActiveRecord::Migration[5.2]
def up
Mongodb::ModelName.all.each do |old_thing| # Mongoid's #.all.each works with batches, see https://stackoverflow.com/questions/7041224/finding-mongodb-records-in-batches-using-mongoid-ruby-adapter
create_thing(old_thing, Postgres::ModelName.new)
end
raise "Not all rows could be imported" if MongoDB::ModelName.count != Postgres::ModelName.count
end
def down
Postgres::ModelName.delete_all
end
def create_thing(old_thing, new_thing)
attrs = old_thing.attributes
# ... maybe alter the attributes slightly to fit Postgres depending on the thing.
new_thing.attributes = attrs
new_thing.save!
end
end
I suggest narrowing down the memory consumption to the reading or the writing side (or, put differently, Mongoid vs AR) by performing all of the reads but none of the model creation/writes and seeing if memory usage is still growing.
Mongoid performs finds in batches by default unlike AR where this has to be requested through find_in_batches.
Since ActiveRecord migrations are wrapped in transactions by default, and AR performs attribute value tracking to restore model instances' attributes to their previous values if transaction commit fails, it is likely that all of the AR models being created are remaining in memory and cannot be garbage collected until the migration finishes. Possible solutions to this are:
Disable implicit transaction for the migration in question (https://apidock.com/rails/ActiveRecord/Migration):
disable_ddl_transaction!
Create data via direct inserts, bypassing model instantiation entirely (this will also speed up the process). The most basic way is via SQL (Rails ActiveRecord: Getting the id of a raw insert), there are also libraries for this (Bulk Insert records into Active Record table).
If we have a small table which contains relatively static data, is it possible to have Active Record load this in on startup of the app and never have to hit the database for this data?
Note, that ideally I would like this data to be join-able from other Models which have relationships to it.
An example might be a list of countries with their telephone number prefix - this list is unlikely to change, and if it did it would be changed by an admin. Other tables might have relationships with this (eg. given a User who has a reference to the country, we might want to lookup the country telephone prefix).
I saw a similar question here, but it's 6 years old and refers to Rails 2, while I am using Rails 5 and maybe something has been introduced since then.
Preferred solutions would be:
Built-in Rails / ActiveRecord functionality to load a table once on startup and if other records are subsequently loaded in which have relationships with the cached table, then link to the cached objects automatically (ie. manually caching MyModel.all somewhere is not sufficient, as relationships would still be loaded by querying the database).
Maintained library which does the above.
If neither are available, I suppose an alternative method would be to define the static dataset as an in-memory enum/hash or similar, and persist the hash key on records which have a relationship to this data, and define methods on those Models to lookup using the object in the hash using the key persisted in the database. This seems quite manual though...
[EDIT]
One other thing to consider with potential solutions - the manual solution (3) would also require custom controllers and routes for such data to be accessible over an API. Ideally it would be nice to have a solution where such data could be offered up via a RESTful API (read only - just GET) if desired using standard rails mechanisms like Scaffolding without too much manual intervention.
I think you may be discounting the "easy" / "manual" approach too quickly.
Writing the data to a ruby hash / array isn't that bad an idea.
And if you want to use a CRUD scaffold, why not just use the standard Rails model / controller generator? Is it really so bad to store some static data in the database?
A third option would be to store your data to a file in some serialized format and then when your app loads read this and construct ActiveRecord objects. Let me show an example:
data.yml
---
- a: "1"
b: "1"
- a: "2"
b: "2"
This is a YAML file containing an array of hashes; you can construct such a file with:
require 'yaml'
File.open("path.yml", "w") do |f|
data = [
{ "a" => "1", "b" => 1 },
{ "a" => "2", "b" => 2 }
]
f.write(YAML.dump(data))
end
Then to load the data, you might create a file in config/initializers/ (everything here will be autoloaded by rails):
config/initializers/static_data.rb
require 'yaml'
# define a constant that can be used by the rest of the app
StaticData = YAML.load(File.read("data.yml")).map do |object|
MyObjectClass.new(object)
end
To avoid having to write database migrations for MyObjectClass (when it's not actually being stored in the db) you can use attr_accessor definitions for your attributes:
class MyObjectClass < ActiveRecord::Base
# say these are your two columns
attr_accessor :a, :b
end
just make sure not to run stuff like save, delete, or update on this model (unless you monkeypatch these methods).
If you want to have REST / CRUD endpoints, you'd need to write them from scratch because the way to change data is different now.
You'd basically need to do any update in a 3 step process:
load the data from YAML into a Ruby object list
change the Ruby object list
serialize everything to YAML and save it.
So you can see you're not really doing incremental updates here. You could use JSON instead of YAML and you'd have the same problem. With Ruby's built in storage system PStore you would be able to update objects on an individual basis, but using SQL for a production web app is a much better idea and will honestly make things more simple.
Moving beyond these "serialized data" options there are key-val storage servers store data in memory. Stuff like Memcached and Redis.
But to go back to my earlier point, unless you have a good reason not to use SQL you're only making things more difficult.
It sounds like FrozenRecord would be a good match for what you are looking for.
Active Record-like interface for read only access to static data files of reasonable size.
If we have a small table which contains relatively static data, is it possible to have Active Record load this in on startup of the app and never have to hit the database for this data?
Note, that ideally I would like this data to be join-able from other Models which have relationships to it.
An example might be a list of countries with their telephone number prefix - this list is unlikely to change, and if it did it would be changed by an admin. Other tables might have relationships with this (eg. given a User who has a reference to the country, we might want to lookup the country telephone prefix).
I saw a similar question here, but it's 6 years old and refers to Rails 2, while I am using Rails 5 and maybe something has been introduced since then.
Preferred solutions would be:
Built-in Rails / ActiveRecord functionality to load a table once on startup and if other records are subsequently loaded in which have relationships with the cached table, then link to the cached objects automatically (ie. manually caching MyModel.all somewhere is not sufficient, as relationships would still be loaded by querying the database).
Maintained library which does the above.
If neither are available, I suppose an alternative method would be to define the static dataset as an in-memory enum/hash or similar, and persist the hash key on records which have a relationship to this data, and define methods on those Models to lookup using the object in the hash using the key persisted in the database. This seems quite manual though...
[EDIT]
One other thing to consider with potential solutions - the manual solution (3) would also require custom controllers and routes for such data to be accessible over an API. Ideally it would be nice to have a solution where such data could be offered up via a RESTful API (read only - just GET) if desired using standard rails mechanisms like Scaffolding without too much manual intervention.
I think you may be discounting the "easy" / "manual" approach too quickly.
Writing the data to a ruby hash / array isn't that bad an idea.
And if you want to use a CRUD scaffold, why not just use the standard Rails model / controller generator? Is it really so bad to store some static data in the database?
A third option would be to store your data to a file in some serialized format and then when your app loads read this and construct ActiveRecord objects. Let me show an example:
data.yml
---
- a: "1"
b: "1"
- a: "2"
b: "2"
This is a YAML file containing an array of hashes; you can construct such a file with:
require 'yaml'
File.open("path.yml", "w") do |f|
data = [
{ "a" => "1", "b" => 1 },
{ "a" => "2", "b" => 2 }
]
f.write(YAML.dump(data))
end
Then to load the data, you might create a file in config/initializers/ (everything here will be autoloaded by rails):
config/initializers/static_data.rb
require 'yaml'
# define a constant that can be used by the rest of the app
StaticData = YAML.load(File.read("data.yml")).map do |object|
MyObjectClass.new(object)
end
To avoid having to write database migrations for MyObjectClass (when it's not actually being stored in the db) you can use attr_accessor definitions for your attributes:
class MyObjectClass < ActiveRecord::Base
# say these are your two columns
attr_accessor :a, :b
end
just make sure not to run stuff like save, delete, or update on this model (unless you monkeypatch these methods).
If you want to have REST / CRUD endpoints, you'd need to write them from scratch because the way to change data is different now.
You'd basically need to do any update in a 3 step process:
load the data from YAML into a Ruby object list
change the Ruby object list
serialize everything to YAML and save it.
So you can see you're not really doing incremental updates here. You could use JSON instead of YAML and you'd have the same problem. With Ruby's built in storage system PStore you would be able to update objects on an individual basis, but using SQL for a production web app is a much better idea and will honestly make things more simple.
Moving beyond these "serialized data" options there are key-val storage servers store data in memory. Stuff like Memcached and Redis.
But to go back to my earlier point, unless you have a good reason not to use SQL you're only making things more difficult.
It sounds like FrozenRecord would be a good match for what you are looking for.
Active Record-like interface for read only access to static data files of reasonable size.
For some models, the data is small but frequently used and also changing several times a week.
Currently, I'm using the cache to accelerate the response (by putting them into memory)
However, it will be a sort of buggy,
Once I forget to change the cache name like "model_#{self.to_s}_#{__callee__}_#{city}-2016-01-28"
How could I enable the access models operations in memory (like put them into Redis, or some memory based DB, only for some special models)
I'm using mongoDB currently.
Thanks
class AA
include Mongoid::Document
def self.get_country(city_or_airport_name)
any_of({airport: /.*#{city_or_airport_name}.*/i},
{city: /.*#{city_or_airport_name}.*/i}).to_a.first["country"]
end
def self.get_airports(city)
Rails.cache.fetch("model_#{self.to_s}_#{__callee__}_#{city}-2016-01-28") do
where(city: city).to_a.collect{|i| i.airport}
end
end
I use memcached in front of some SQL tables that don't change too often, but get hit a few times per second. Redis will do too. I also have tables which change rarely (once a week or so), when there is under 100-500k of data i dump it to a JSON file via crontab or upon change, then preload at application boot, but this slows down booting and may cause trouble later if the table/set grows.
I'm designing a ruby on rails app for a pharmacy, and one of the features is that there are stores who have pharmacists who work there. In addition, there are pharmacists, who can work at many stores. This sounds like a job for HABTM, right? Well, being the novice I am, I manually designed a workaround (because I never heard of HABTM - I basically taught myself rails and never got to some of the more advanced relationships). Right now, when a pharmacist is saved, there's a couple of lines in the create and update action of the pharmacists controller that turns the stores that they work at into a string, with each store_id separated by a comma. Then, when a store is displayed, it does a MYSQL request by
#pharmacists = Pharmacist.find :all, :conditions => "stores REGEXP '#{#store.id}'"
Would moving this system over to a rails based HABTM system be more efficient? Of course it would require less code in the end, but would it be worth it? In other words, what benefits, other than less code, would I get from moving this association to be managed by rails?
The benefit is that you will be using the right tool for the job! The whole point of using a framework such as Rails is that it helps you solve common problems without having to re-invent the wheel, which is what you've done here. By using associations you'll also be using a relational database properly and can take advantage of benefits like foreign key indexing, which will be faster than string manipulation.
You should use a has_and_belongs_to_many relationship unless you need to store extra attributes on the join model (for example the date a pharmacist started working at a store) in which case use has_many :through.
Using Rails associations will give you all the convenient methods that Rails provides, such as these:
# Find the stores the first pharmacist works at
#stores = Pharmacist.first.stores
# Find the pharmacists who work at a store
#pharmacists = Store.find_by_name('A Store').pharmacists
A Guide to ActiveRecord Associations