I am currently developing a membership website that include a view counter. Past experience says, having view counters in SQL is costly. I in fact kept away from view counters but today its not an option.
The project uses
Rails 4.0.2
Redis Objects gem
For demonstration I am hoping to use Heroku with Redis To Go plugin
Currently the counter is based on PG ( Active Record )
Redis Objects have been used with AR to count ( but how to save to AR profiles table ? )
Need / Thinking of achieving
Counts in Redis and periodically stores to PG table possibly using a Rake + schedule task
Efficiency
Problem
I can't figure out how to do a query the Redis DB to find all Profile objects in it.
If I could get this list of objects I can write a rake task to iterate through each item and save/update the modal value to the database.
After some search KEYS profile:* seems to be the only way to get all the profile values saved to the database. Then from it I have to manually fetch objects and update the values.
Question
Is using keys to find the keys and then the objects sensible ( in terms of efficiency ) to use in a scheduled task possibly once a day.
Is there a way to fetch all Profile objects directly from Redis db like we can do Profile.all in ActiveRecord ?
Any suggestions to implement a counter is highly appreciated ( even if not Redis based )
-
class Profile < ActiveRecord::Base
# To handle the profile counter
include Redis::Objects
counter :tviews
# Associations
belongs_to :user
# Other attributes
end
profiles_controller.rb
class ProfilesController < ApplicationController
def public
#profile.tviews.increment
# #profile.save # Save of AR based counting
end
end
In the SLIM
p.text-center#profile-counter
| Views -
= #profile.tviews.value + #profile.views
1) No it's not. See redis docs it says Time complexity: O(N) with N being the number of keys in the database, under the assumption that the key names in the database and the given pattern have limited length.
Not only redis would have to iterate over all the keys, it would have to store them in memory. Also due to single-threaded nature of redis it would be unresponsive for the time of executing keys command.
2) Not a ruby user here, but I would assume that not.
3) You have several options
Store ids for your keys in a redis set and clean it with your periodic task (and redis transactions if needed). This way you'll always know exactly which keys are in redis.
Use SCAN command from redis 2.8 It returns only limited amount of key and iterates over keys using an internal cursor.
(redis.io seems to be down right now, so links might not be working)
Related
I have recently started consulting and helping with the development of a Rails application that was using MongoDB (with Mongoid as its DB client) to store all its model instances.
This was fine while the application was in an early startup stage, but as the application got more and more clients and while starting to need more and more complicated queries to show proper statistics and other information in the interface, we decided that the only viable solution forward was to normalize the data, and move to a structured database instead.
So, we are now in the process of migrating both the tables and the data from MongoDB (with Mongoid as object-mapper) to Postgres (with ActiveRecord as object-mapper). Because we have to make sure that there is no improper non-normalized data in the Mongo database, we have to run these data migrations inside Rails-land, to make sure that validations, callbacks and sanity checks are being run.
All went 'fine' on development, but now we are running the migration on a staging server, with the real production database. It turns out that for some migrations, the memory usage of the server increases linearly with the number of model instances, causing the migration to be killed once we've filled 16 GB of RAM (and another 16GB of swap...).
Since we migrate the model instances one by one, we hope to be able to find a way to make sure that the memory usage can remain (near) constant.
The things that currently come to mind that might cause this are (a) ActiveRecord or Mongoid keeping references to object instances we have already imported, and (b) the migration is run in a single DB transaction, so Postgres is taking more and more memory until it is completed maybe?
So my question:
What is the probable cause of this linear memory usage?
How can we reduce it?
Are there ways to make Mongoid and/or ActiveRecord relinquish old references?
Should we attempt to call the Ruby GC manually?
Are there ways to split a data migration into multiple DB transactions, and would that help?
These data migrations have about the following format:
class MigrateSomeThing < ActiveRecord::Migration[5.2]
def up
Mongodb::ModelName.all.each do |old_thing| # Mongoid's #.all.each works with batches, see https://stackoverflow.com/questions/7041224/finding-mongodb-records-in-batches-using-mongoid-ruby-adapter
create_thing(old_thing, Postgres::ModelName.new)
end
raise "Not all rows could be imported" if MongoDB::ModelName.count != Postgres::ModelName.count
end
def down
Postgres::ModelName.delete_all
end
def create_thing(old_thing, new_thing)
attrs = old_thing.attributes
# ... maybe alter the attributes slightly to fit Postgres depending on the thing.
new_thing.attributes = attrs
new_thing.save!
end
end
I suggest narrowing down the memory consumption to the reading or the writing side (or, put differently, Mongoid vs AR) by performing all of the reads but none of the model creation/writes and seeing if memory usage is still growing.
Mongoid performs finds in batches by default unlike AR where this has to be requested through find_in_batches.
Since ActiveRecord migrations are wrapped in transactions by default, and AR performs attribute value tracking to restore model instances' attributes to their previous values if transaction commit fails, it is likely that all of the AR models being created are remaining in memory and cannot be garbage collected until the migration finishes. Possible solutions to this are:
Disable implicit transaction for the migration in question (https://apidock.com/rails/ActiveRecord/Migration):
disable_ddl_transaction!
Create data via direct inserts, bypassing model instantiation entirely (this will also speed up the process). The most basic way is via SQL (Rails ActiveRecord: Getting the id of a raw insert), there are also libraries for this (Bulk Insert records into Active Record table).
If we have a small table which contains relatively static data, is it possible to have Active Record load this in on startup of the app and never have to hit the database for this data?
Note, that ideally I would like this data to be join-able from other Models which have relationships to it.
An example might be a list of countries with their telephone number prefix - this list is unlikely to change, and if it did it would be changed by an admin. Other tables might have relationships with this (eg. given a User who has a reference to the country, we might want to lookup the country telephone prefix).
I saw a similar question here, but it's 6 years old and refers to Rails 2, while I am using Rails 5 and maybe something has been introduced since then.
Preferred solutions would be:
Built-in Rails / ActiveRecord functionality to load a table once on startup and if other records are subsequently loaded in which have relationships with the cached table, then link to the cached objects automatically (ie. manually caching MyModel.all somewhere is not sufficient, as relationships would still be loaded by querying the database).
Maintained library which does the above.
If neither are available, I suppose an alternative method would be to define the static dataset as an in-memory enum/hash or similar, and persist the hash key on records which have a relationship to this data, and define methods on those Models to lookup using the object in the hash using the key persisted in the database. This seems quite manual though...
[EDIT]
One other thing to consider with potential solutions - the manual solution (3) would also require custom controllers and routes for such data to be accessible over an API. Ideally it would be nice to have a solution where such data could be offered up via a RESTful API (read only - just GET) if desired using standard rails mechanisms like Scaffolding without too much manual intervention.
I think you may be discounting the "easy" / "manual" approach too quickly.
Writing the data to a ruby hash / array isn't that bad an idea.
And if you want to use a CRUD scaffold, why not just use the standard Rails model / controller generator? Is it really so bad to store some static data in the database?
A third option would be to store your data to a file in some serialized format and then when your app loads read this and construct ActiveRecord objects. Let me show an example:
data.yml
---
- a: "1"
b: "1"
- a: "2"
b: "2"
This is a YAML file containing an array of hashes; you can construct such a file with:
require 'yaml'
File.open("path.yml", "w") do |f|
data = [
{ "a" => "1", "b" => 1 },
{ "a" => "2", "b" => 2 }
]
f.write(YAML.dump(data))
end
Then to load the data, you might create a file in config/initializers/ (everything here will be autoloaded by rails):
config/initializers/static_data.rb
require 'yaml'
# define a constant that can be used by the rest of the app
StaticData = YAML.load(File.read("data.yml")).map do |object|
MyObjectClass.new(object)
end
To avoid having to write database migrations for MyObjectClass (when it's not actually being stored in the db) you can use attr_accessor definitions for your attributes:
class MyObjectClass < ActiveRecord::Base
# say these are your two columns
attr_accessor :a, :b
end
just make sure not to run stuff like save, delete, or update on this model (unless you monkeypatch these methods).
If you want to have REST / CRUD endpoints, you'd need to write them from scratch because the way to change data is different now.
You'd basically need to do any update in a 3 step process:
load the data from YAML into a Ruby object list
change the Ruby object list
serialize everything to YAML and save it.
So you can see you're not really doing incremental updates here. You could use JSON instead of YAML and you'd have the same problem. With Ruby's built in storage system PStore you would be able to update objects on an individual basis, but using SQL for a production web app is a much better idea and will honestly make things more simple.
Moving beyond these "serialized data" options there are key-val storage servers store data in memory. Stuff like Memcached and Redis.
But to go back to my earlier point, unless you have a good reason not to use SQL you're only making things more difficult.
It sounds like FrozenRecord would be a good match for what you are looking for.
Active Record-like interface for read only access to static data files of reasonable size.
Why ActiveRecord queries the database to know the database schema? Can't it simply read the db/schema.rb?
I have some Sidekiq workers that for performance cannot connect to the database. The job itself doesn't use the database at all (or at least I expected so):
n = Notification.new
n.body = cache["body"] # cache is from Redis
...
But the first line actually calls the database!
Is there anything I can do to make Rails read the schema.rb, or, in any case, instantiate a Notification without the database?
I don't want to create a separate model that doesn't inherits from ActiveRecord. I need the same model: sometimes it's loaded from the database and sometimes from Redis.
You can create a separate object which represents the Redis version of the object and use an included module to share methods between the AR and Redis versions. ActiveRecord::Base instances are not designed to be used without the database.
I have written a ruby on rails application. Here I am using one model like that:
class teacher < ActiveRecord::Base
self.table_name = "teachers"
end
The teachers table get populated by another independent application. In my application I am using redis to store some specific data about teachers. The problem is that initially I have populated the redis with data from teachers table with the help of a rake task.
But whenever any insert, update or delete happens to teachers table the redis data becomes inconsistent. So I want whenever any save or update occurs in teachers table, the data also gets stored in redis, I mean some event or interrupt based kind of thing. Can I do that in my application without changing the other application that I have already mentioned, where the teachers data is actually stored? If possible, please give some idea.
If you weren't using Redis, I'd have recommended using an after_save callback (triggers each time model is saved -- created / updated):
#app/models/teacher.rb
Class Teacher < ActiveRecord::Base
after_save :update
def update
Model.create({attr: self.value})
end
end
However, since you're using Redis, you need to synchronise the data cross-channel. I'd probably still use after_save, but with logic to determine which Redis values to update
my sql DB contains tables "jobs" and "job_categories."
"job_categories" associates job category strings (i.e. "Software Development") with an integer number (i.e. 7).
I need these associations saved into variables in my job controller for various query functions. How can I use rails to dynamically link changes to the job_categories table to variables in my jobs controller? I've worked with RoR for a few weeks now but am still a little fuzzy on how everything interacts. Thank you!
There's one big gotcha with what you're trying to do, but first I'll answer your question as asked.
Create class-level accessors in your JobsController, then write an Observer on the JobCategory class that makes the appropriate changes to the JobsController after save and destroy events.
class JobsController < ActionController::Base
##categories = JobCategory.find(:all)
cattr_accessor :categories
# ...
end
class JobCategoryObserver < ActiveRecord::Observer
def after_save(category)
JobsController.categories[category.name] = category.id
end
def after_destroy(category)
JobsController.categories.delete(category.name)
end
end
You'll need additional logic that removes the old name if you allow for name changes. The methods in ActiveRecord::Dirty will help with that.
So, the gotcha. The problem with an approach like this is that typically you have more than one process serving requests. You can make a change to the job_categories table, but that change only is updated in one process. The others are now stale.
Your job_categories table is likely to be small. If it's accessed with any frequency, it'll be cached in memory, either by the OS or the database server. If you query it enough, the results of that query may even be cached by the database. If you aren't querying it very often, then you shouldn't be bothering with trying to cache inside JobsController anyway.
If you absolutely must cache in memory, you're better off going with memcached. Then you get a single cache that all your Rails processes work against and no stale data.