In a rails project with delayed_job, and I am running into something strange.
I have an article Model, this model can be quite large, with many paragraphs of text in several fields.
If i do:
an_article.delay.do_something
the Delayed::Job that's created does not make it to the queue, it's never marked as failed or successful, and my logs don't acknowledge it's existance. However if i do
def self.proxy_method(article_id)
an_article = Article.find(article_id)
an_article.do_something
end
Article.proxy_method(an_article.id)
it works as intended.
Is there some unwritten rule about the size of job objects? Why would A not work, but B work?
One theory I have is that because I'm sort of close to my data cap for mongolab (430 / 496 mb) that the job never makes it to the db, but I have no log or error to really prove this.
NOTE: delayed_job using mongoid on heroku, rails 3.1
From what I have experienced, never push objects to the queue but rather use ids of objects. This is problematic and very hard to debug.
I have written a detailed post on this topic at:
https://stackoverflow.com/a/15672001/226255
Related
I have recently started consulting and helping with the development of a Rails application that was using MongoDB (with Mongoid as its DB client) to store all its model instances.
This was fine while the application was in an early startup stage, but as the application got more and more clients and while starting to need more and more complicated queries to show proper statistics and other information in the interface, we decided that the only viable solution forward was to normalize the data, and move to a structured database instead.
So, we are now in the process of migrating both the tables and the data from MongoDB (with Mongoid as object-mapper) to Postgres (with ActiveRecord as object-mapper). Because we have to make sure that there is no improper non-normalized data in the Mongo database, we have to run these data migrations inside Rails-land, to make sure that validations, callbacks and sanity checks are being run.
All went 'fine' on development, but now we are running the migration on a staging server, with the real production database. It turns out that for some migrations, the memory usage of the server increases linearly with the number of model instances, causing the migration to be killed once we've filled 16 GB of RAM (and another 16GB of swap...).
Since we migrate the model instances one by one, we hope to be able to find a way to make sure that the memory usage can remain (near) constant.
The things that currently come to mind that might cause this are (a) ActiveRecord or Mongoid keeping references to object instances we have already imported, and (b) the migration is run in a single DB transaction, so Postgres is taking more and more memory until it is completed maybe?
So my question:
What is the probable cause of this linear memory usage?
How can we reduce it?
Are there ways to make Mongoid and/or ActiveRecord relinquish old references?
Should we attempt to call the Ruby GC manually?
Are there ways to split a data migration into multiple DB transactions, and would that help?
These data migrations have about the following format:
class MigrateSomeThing < ActiveRecord::Migration[5.2]
def up
Mongodb::ModelName.all.each do |old_thing| # Mongoid's #.all.each works with batches, see https://stackoverflow.com/questions/7041224/finding-mongodb-records-in-batches-using-mongoid-ruby-adapter
create_thing(old_thing, Postgres::ModelName.new)
end
raise "Not all rows could be imported" if MongoDB::ModelName.count != Postgres::ModelName.count
end
def down
Postgres::ModelName.delete_all
end
def create_thing(old_thing, new_thing)
attrs = old_thing.attributes
# ... maybe alter the attributes slightly to fit Postgres depending on the thing.
new_thing.attributes = attrs
new_thing.save!
end
end
I suggest narrowing down the memory consumption to the reading or the writing side (or, put differently, Mongoid vs AR) by performing all of the reads but none of the model creation/writes and seeing if memory usage is still growing.
Mongoid performs finds in batches by default unlike AR where this has to be requested through find_in_batches.
Since ActiveRecord migrations are wrapped in transactions by default, and AR performs attribute value tracking to restore model instances' attributes to their previous values if transaction commit fails, it is likely that all of the AR models being created are remaining in memory and cannot be garbage collected until the migration finishes. Possible solutions to this are:
Disable implicit transaction for the migration in question (https://apidock.com/rails/ActiveRecord/Migration):
disable_ddl_transaction!
Create data via direct inserts, bypassing model instantiation entirely (this will also speed up the process). The most basic way is via SQL (Rails ActiveRecord: Getting the id of a raw insert), there are also libraries for this (Bulk Insert records into Active Record table).
I have a concept for a rails app that I want to make. I want a model that a user can create a record of with a boolean attribute. After 30 days/Month unless the record has true boolean attribute, the record will automatically delete itself.
In rails 5 you have access to "active_job"(http://guides.rubyonrails.org/active_job_basics.html)
There are two simple ways.
After creating the record, you could set this job to be executed after 30 days. This job checks if the record matches the specifications.
The other alternative is to create an alternative job, that runs everyday which queries the database for every record (of this specific model) that where created 30 days ago and destroy them if they do not match the specifications. (If thats on the database it should be easy as: MyModel.where(created_at: 30.days.ago, destroyable: true).destroy_all)
There are a couple of options for achieving this:
Whenever and run a script or rake task
Clockwork and run a script or rake task
Background jobs which in my opinion is the "rails way".
For the 1,2 you need to check everyday if a record is 30 days old and then delete it if there isn't a true boolean (which means, checking all the records or optimize the query everyday and check only the 30 days old records etc...). For the 3rd option, you can schedule on the record creation, a job to run after 30 days and do the check for each record independently. It depends on how you are processing the jobs, for example, if you use sidekiq you can use scheduled jobs or if you use resque check resque-scheduler.
Performing the deletion is straightforward: create a class method (e.g Record.prune) on the record class in question that performs the deletion based on a query e.g. Record.destroy_all(retain: false) where retain is the boolean attribute you mention. I'd recommend then defining a rake task in lib/tasks that invokes this method (e.g.)
namespace :record
task :prune => :environment
Record.prune
end
end
The scheduling is more difficult; a crontab entry is sufficient to provide the correct timing, but ensuring that an appropriate environment (e.g. one that's loaded rbenv/rvm and any appropriate environment variables) is more difficult. Ensuring that your deployment process produces binstubs is probably helpful here. From there the bin/rake record:prune ought to be enough. It's hard to provide a more in-depth answer without more knowledge of all the environments in which you hope to accomplish this task.
I want to mention a non Rails approach. It depends on your database. When you use mongodb you can utilize mongodb "Expire Data from Collections" feature. When you use mysql you can utilize mysql event scheduler. You can find a good example here What is the best way to delete old rows from MySQL on a rolling basis?
I have a production Rails app running on Heroku, and some API endpoints are taking a long period of time to resolve (~1-2s).
It's a normal Rails RESTful GET action cases#index. The method looks like so:
#cases = cases_query
meta = {
total: #cases.total_count,
count: params[:count],
page: params[:page],
sort: params[:order],
filter: pagination[:filter],
params: params
}
render json: #cases,
root: 'cases',
each_serializer: CaseSerializer,
meta: meta
The method runs an ActiveRecord query to select data, serializes each record and renders JSON. Skylight, a Rails profiler/monitoring/performance tool, is telling me that this endpoint amongst others is spending 70% in the controller method (in Ruby), and 30% in the database.
What in this code or in my app's setup is causing this to spend so much time in the app code? Could it be a gem?
Picture of Skylight analytics on this endpoint (you can see the bulk of the time is spent in Ruby in the controller action):
ActiveRecord can generate a ton of Ruby objects from queries. So you track the time it takes for the database to return results, and that may be ~20% of your request, but the rest could still be in ActiveRecord converting those results into Ruby objects.
Does your query for this request return a lot of rows? Are those rows very wide (like when you do a join of table1., table2., table3.*)?
I've had some experience in the past with serializers really really crushing performance. That usually ends up being a bit of a line by line hunt for what's responsible.
To troubleshoot this issue I recommend finding a way to get realtime or near realtime feedback on your performance. The newrelic_rpm gem has a /newrelic page you can view in development mode, which should provide feedback similar to Skylight. Skylight may have a similar development mode page you should look into.
There's also a gem called Peek that adds a little performance meter to each page view, that you can add gems to in order to show specific slices of the request, like DB, views, and even Garbage collection. https://github.com/peek/peek Check it out, especially the GC plugin.
Once you have that realtime feedback setup, and you can see something that maps roughly to your skylight output, you can start using a binary search in your code to isolate the performance problem.
In your controller, eliminate the rendering step by something like:
render json: {}
and look at the results of that request. If the performance improves dramatically then your issue is probably in the serialization phase.
If not, then maybe it is ActiveRecord blowing up the Ruby objectspace. Do a google search for Ruby Object Space profiling and you'll find ways to troubleshoot this.
If that's your problem, then try to narrow down the results returned by your query. select only the columns you need to render in this response. Try to eliminate joins if possible (by returning a foreign key instead of an object, if that is possible).
If serialization is your problem... Good luck. This one is particularly hard to troubleshoot in my experience. You may try using a more efficient JSON gem like OJ, or hardcoding your serializers rather than using ActiveRecord::Serializer (last resort!).
Good luck!
Normally database queries can cause this kind of issue revisit you database queries and try to optimize them apply joins where you can.
Also try to use Puma gem with heroku to improve your server performance.
Is it possible to force ActiveRecord to push/flush a transaction (or just a save/create)?
I have a clock worker that creates tasks in the background for several task workers. The problem is, the clock worker will sometimes create a task and push it to a task worker before the clock worker information has been fully flushed to the db which causes an ugly race condition.
Using after_commit isn't really viable due to the architecture of the product and how the tasks are generated.
So in short, I need to be able to have one worker create a task and flush that task to the db.
ActiveRecord uses #transaction to create a block that begins and either rolls back or commits a transaction. I believe that would help your issue. Essentially (presuming Task is an ActiveRecord class):
Task.transaction do
new_task = Task.create(...)
end
BackgroundQueue.enqueue(new_task)
You could also go directly to the #connection underneath with:
Task.connection.commit_db_transaction
That's a bit low-level, though, and you have to be pretty confident about the way the code is being used. #after_commit is the best answer, even if it takes a little rejiggering of the code to make it work. If it won't work for certain, then these two approaches should help.
execute uses async_exec under the hood which may or may not be what you want. You could try using the lower level methods execute_and_clear (or even exec_no_cache) instead.
I have a controller that generates HTML, XML, and CSV reports. The queries used for these reports take over a minute to return their result.
What is the best approach to run these tasks in the background and then return the result to the user? I have looked into Backgroundrb. Is there anything more basic for my needs?
You could look at using DelayedJob to perform those queries for you, and have an additional table called "NotificationQueue". When a job is finished (with its resultset), store the resultset and the User ID of the person who made that query in the NotificationQueue table. then on every page load (and, if you like, every 15-20 seconds), poll that database and see if there are any completed queries.
DelayedJob is really great because you write your code as if it wasn't going to be a delayed job, and just change the code to do the following:
#Your method
Query.do_something(params)
#Change to
Query.send_later(:do_something, params)
We use it all the time at work, and it works great.