In my Rails app, the user can upload a file and then what I need to do is: when the file is uploaded I want to start a rake task, which parses the file and feeds all the tables in the database, and depending on how large is the file, it can take some time. I assume sth like >> system "rake task_name" in my controller will do the work.
However is it the best practice? Is it safe? Because in that way any user would be starting a rake process. In this Railcast they recommend running rake in background. What would be an alternative or the common practice?
I am using Rails 3.2, so I couldn't use Active Job. I used Sidekiq and worked fine.
Check a very helpful tutorial here
Related
I have built a rails app for storing events. I want to generate a report to find the number of events happened in a day. I want to do it asynchronously. I am new to sidekiq and Redis. Can anyone suggest a good resource to study?
My suggestion for this would be to do this in a rake task that would be run on the server once a day.
You can find good resources on how to create rake tasks online and then use this simple gem to make sure the rake task runs once a day on the server.
https://github.com/javan/whenever
I am assuming you have a Profile model. You could use the timestamps in this model created_at to get all the profiles created on a given day. You could then create a CSV or whatever you like with that data and email it to whoever needs the report (how you handle the data is up to you)
You can do all the above in Sidekiq if you wish, I would recommend reading through the gem docs and this getting started guide from the official wiki https://github.com/mperham/sidekiq/wiki/Getting-Started
It's fairly straightforward and once you get your first process working it will start to make more sense.
I would also highly reccomend this video before you start working with sidekiq and redis, to give you an overall background of how sidekiq works and in what use cases it may be helpful to you.
https://www.youtube.com/watch?v=GBEDvF1_8B8
I have a Rails application that pulls information from various internal APIs and stores in a local SQLite DB. My rails application is essentially a glorified UI on top of this data pulled from multiple APIs.
For reasons outside the scope of this question, it is not straightforward to simply update the information in the DB on a periodic basis by re-querying the API. I'm basically forced to recreate the DB from scratch.
In essence I want to do something like this every X hours -
Automatically shut down the rails application
Put up a maintenance page ("Sorry, we'll be back in a few mins")
Drop the db, recreate it, and re-migrate it (rake db:drop; rake db:create; rake db:migrate)
Run my custom rake task that populates the tables again (rake myApp:update)
Re-start the application
This brings up a few questions
How do I have the app restart automatically every X hours? Should I schedule this externally using a cron job? Is there a rails way I can accomplish this?
How do I display a maintenance page if the app is down? Again, is this also an external re-direct I need to manage?
Most importantly, is there a good way to drop the tables and recreate them or should I be using rake tasks? Is there a way to call rake tasks at startups? I guess I could create a .rb process under config/initalizers that would run at startup (but only when Rails.env == 'production')?
Thanks for the help!
Just create a Cron task that runs periodically. That Cron task starts a shell stript that just does all the step you would run manually.
There is a gem (https://github.com/biola/turnout) at helps with the maintainance page. It provides rake tasks like rake maintenance:start and rake maintenance:end.
I think it is not necessary to drop the tables. Usually it should be enough to just delete all records and then create new records. If you really have to drop the database, it might be faster to just restore the database schema from a structure dump.
There is not a 'rails way' to reload a rails app every so often, since starting and stopping a rails app is outside the context of the app itself. A cron job is a fine way to go.
Typically a web server is running "in front" of the rails app, apache or nginix are common. The maintenance page would need to be implemented on that level (since the rails app is down for a moment remember), something like a temporary change to the configs and a reload should suffice. Then when bringing that app back online, restore the config to point to the rails app and reload again.
Using the rake tasks you have is fine, set the environment variable RAILS_ENV=production so they hit the right sqlitedb file. Don't bother attempting to call the rake tasks at rails start up, call them from the script called by your cron job, and then start the app after that.
I use whenever to call rake tasks throughout the day, but each task launches a new Rails environment. How can I run tasks throughout the day without relaunching Rails for each job?
Here's what I came up with, would love to get some feedback on this...?
Refactor each rake task to instead be a method within the appropriate Model.
Use the delayed_job gem to assign low priority and ensure these methods run asynchronously.
Instruct whenever to call each Model.method instead of calling the rake task
Does this solution make sense? Will it help avoid launching a new Rails environment for each job? .. or is there a better way to do this?
--
Running Rails 3
You could certainly look into enqueuing delayed_jobs via cron, then having one long running delayed_job worker.
Then you could use whenever to help you create the delayed_job enqueuing methods. It's probably easiest to have whenever's cron output call a small wrapper script which loads active_record and delayed_job directly rather than your whole rails stack. http://snippets.aktagon.com/snippets/257-How-to-use-ActiveRecord-without-Rails
You might also like to look into clockwork.rb, which is a long-running process that would do the same thing you're using cron for (enqueuing delayed_jobs): http://rubydoc.info/gems/clockwork/0.2.3/frames
You could also just try using a requeuing strategy in your delayed_jobs: https://gist.github.com/704047
Lots of good solutions to this problem, the one I eventually ended up integrating is as such:
Moved my rake code to the appropriate models
Added controller/routing code for calling models methods from the browser
Configured cronjobs using the whenever gem to run command 'curl mywebsite.com/model#method'
I tried giving delayed_job a go but didn't like the idea of running another Rails instance. My methods are not too server intensive, and the above solution allows me to utilize the already-running Rails environment.
Comment this line from schedule.rb
require File.expand_path(File.dirname(__FILE__) + "/environment")
Instead load only required ruby files like models in your case.
I'm new to Ruby on Rails and wanted to create a crawler that scrapes data and inserts it into the database. I'm currently using Heroku so I can't access the database directly and was wondering what the best way to integrate a crawler script into the RoR framework would be. I would be using an hourly or daily cron to run the script.
If you are using Rails on Heroku you can just use an ORM adapter like Datamapper or ActiveRecord. This then gives you access to your database but through a layer basically. If you need to send raw sql to the database you can but it's usually not recommended since the ORM's provide pretty much everything you need.
You would basically just create models within your rails application like normal and the associated fields in a table.
rails g model page meta_title:string page_title:string
rake db:migrate # This has to be run on heroku too "heroku rake db:migrate" after you have pushed your code up
Then in your crawler script you can create records by just using your model...
Page.create(:title => crawler[:title], :meta_title => crawler[:meta_title])
Normally you can use Whenever(https://github.com/javan/whenever) to manage your cronjobs but on Heroku I'm not sure how it works since I haven't set any up on Heroku before.
I'd suggest 1 of 2 options:
Use a ruby script that uses require rubygems along with other helper libraries (like Rails, ActiveRecord, whatever) you want to accomplish the task, and then cron that script.
If you're using Rails to also serve web apps, use the machine's hosts file so that a wget (or similar) on that machine will properly map requests to that instance of rails; from there, just set it up as a web app, and use the wget command in your CRON. Not terribly efficient, but if you're just looking for something quick and dirty based on an existing setup, that would work nicely. Just make sure to send STDOUT and STDERR to /dev/null so you don't end up amassing CRON files.
I am using the sitemap_generator gem to build an xml sitemap. From the readme:
... run rake sitemap:refresh as needed to create/rebuild your Sitemap files
I'd prefer to do this any time the create action is ran in my Content controller. Is there a best practice for doing something like this?
It is possible, yes. But I would not recommend it. Rake tasks tend to take at least a few seconds to run which will occupy a server request and prolong the response to the client.
If you want to refresh the sitemap after every create, then I would recommend one of two solutions. Either analyse what the rake task sitemap:refresh does and use the code directly from your controller. But I would only do that as long as it doesn't take too much time to run and since I do not know much about sitemap_generator, I can't tell.
The other option would be to run the rake task from a delayed_job which I found to be the preferred alternative. That way, you can trigger the job from your create action but you don't have to wait for it to finish.