Running filewatcher as separate process - ruby-on-rails

I am still very new to Ruby so I hope you can help. I have a Ruby on Rails app that needs to watch a specific directory "Dir A" to which I keep adding txt files. Once a new file appears it needs to be processed into csv file, which then appears in a tmp directory before being attached to a user, and disappears from tmp after the file goes into ActiveStorage, while keeping the original txt file in "Dir A" for a limited amount of time.
Now, I am using filewatcher gem to watch "Dir A" , however I need it to run on server startup and continue to run in the background. I understand I need to daemonize the process, but how can I do it from *.rb files rather than terminal?
Atm I am using Threads, but I am not sure if that's the best solution...
I also have the following issues:
- how to process files which have already appeared in the folder before the server start up?
- filewatcher does not seem to pick up another new file while it's still processing the previous one, and threads don't seem to help with that
- what would you recommend to be the best way to keep track of processed files - a database, or renaming/copying files into a different folder, or some global variables or maybe there's something else? I have to know which files are processed so I don't repeat the process espacially in cases when I need to schedule filewatcher restarts due to its declining performance (filewatcher documentation states it is best to restart the process if it's been long-running)
I'm sorry to bombard with questions, but I need some guidance, maybe there's a better gem I've missed, I looked at guard gem but I am not entirely sure how it works and filewatcher seemed simpler.

This question should probably be split into two, one about running filewatcher as a background process, and another about managing processed files, but as far as filewatcher goes, a simple solution would be to use the foreman gem with a Procfile.
You can start your Rails app in one process and filewatcher in another with a Procfile in the root of your app, like this:
# Procfile
web: bundle exec puma -t 5:5 -p ${PORT:-3000} -e ${RACK_ENV:-development}
filewatcher: filewatcher "**/*.txt" "bundle exec rake process_txt_files"
and move whatever processing needs to be done into a rake task. With this you could just run foreman start locally to start both processes, and if your production server supports Procfiles (like Heroku, for example), this makes it easy to do the same in production.

Related

Rails with Heroku: How to download a CSV created by a rake task

I deployed "Harrys Prelauncher" on Heroku and try to do the teardown (currently just testing). See here: https://github.com/harrystech/prelaunchr#teardown
After running the rake task ...
heroku run rake prelaunchr:create_winner_csvs
... a csv file is created in "/lib/assets", but I dont know how to access the file (it works locally in development).
How can I download or access the file?
Heroku uses "ephemeral" filesystem that is not guaranteed to preserve changes made at runtime. Simply put, if it's not pushed to git (I assume you're using git with heroku), it's not guaranteed to exist in all the instances of your app. It may exist in one of them, but you may have no simple way of accessing that specific filesystem. And you shouldn't, really.
It's done like that so that multiple instances of the same app can be fired up seamlessly. Of course, that requires some discipline: storage of any meaningful state outside: in the database, on external disk, anywhere. The benefit of this is horizontal scalability: should you be short on resources, you can fire up another web dyno that would (normally) behave exactly the same way. New dynos are started from bundles that are packed on git push and thus do not contain any changes you may have made in another instance.
A workaround may be running heroku run bash, so that you end up in an interactive shell linked to another instance of your bundle.
Then you can make that file (by running your rake task) and access its contents in any way you deem reasonable. Text files can be echoed into the console with cat and copy-pasted anywhere else. That's a dirty way.
A much cleaner way would be rigging the app to send the file in question via email. and it's one of the few reasonable ways if that rake task is invoked by the Rails app itself.
I ran into this problem recently while developing the Prelaunchr campaign for a client. Assuming you have a local version of your app, you can "pull" your Heroku database down to your local machine, set that as your development database in database.yml, and run the rake task from your local app, which should now have the same database as your heroku version. Here is the command to pull the db (subbing out name_for_database & heroku_app_name with your own):
heroku pg:pull HEROKU_POSTGRESQL_COPPER_URL name_for_database --app heroku_app_name
Make sure to restart your local server to see the new database info populated.

Running Rails Task From Cron

I have a Rails runner task that I want to run from cron, but of course cron runs as root and so the environment is set up improperly to get RVM to work properly. I've tried a number of things and none have worked thus far. The crontab entry is:
* 0 * * * root cd /home/deploy/rails_apps/supercharger/current/ && /usr/local/rvm/wrappers/ruby-1.9.3-p484/ruby bundle exec rails runner -e production "Charger.start"
Apologies for the super long command line. Anyhow, the error I'm getting from this is:
ruby: No such file or directory -- bundle (LoadError)
So ruby is being found in the RVM directory, but again, the environment is wrong.
I tried rvm alias delete [alias_name] and it seemed to do something, but darn if I know where the wrapper it generated went. I looked in /usr/local/rvm/wrappers and didn't see one with the name I had specified.
This seems like a common problem -- common enough that the whenever gem exists. The runner command I'm using is so simple, it seemed like a slam dunk to just put this entry in the crontab and go, but not so much...
Any help with this is appreciated.
It sounds like you could use a third-party tool to tether your Rails app to cron: Whenever. You already know about it, but it seems you never tried it. This gem includes a simple DSL that could be applied in your case like:
every :day # Or specify another period, or something else, see README
runner "Charger.start"
end
Once you've defined your schedule, you'll need to write it into crontab with whenever command line utility. See README file and whenever --help for details.
It should not cause any performance impact at runtime since all it does is conversion into crontab format upon deployment or explicit command. It's not needed, once the server is running, everything is done by cron after that.
If you don't want an extra gem anyway, you might as well check what command does it issue for executing your task. Still, an automated way of adding a cron task is easier to maintain and to deploy. Sure, just tossing a line into the crontab is easier — just for you and just this once. Then it starts to get repetitive and tiring, not to mention confusion for other potential developers who will have to set up something similar on their own machines.
You can run cron as different user than root. Even in your example the task begins with
* 0 * * * root cd
root is the user that runs the command. You can edit it with crontab -e -u username.
If you insist on running cron task as root or running as other user does not work for some reason, you can switch user with su. For example:
su - username -c "bundle exec rails runner -e production "Charger.start"

Rails Active Job usage , or running watcher thread automatically with Rails

It's nice to see Rails 4.2 come with Active Job as a common interface for background jobs. But I can't find how to start a worker in the document. It seems that the document is still immature (e.g. the right version of Sneakers is only referred to in Rails' Gemfile), so I'm not sure if the "running workers" part is not in Active Job or just not mentioned in docs.
So with Active Job, do I still need to manually start the job watcher threads like sidekiq or in my case, rake sneakers:run? If so, where should I put these commands to let rails server run these parallel tasks automatically in a develop environment?
ActiveJob is just a common interface. You still need the backend gem, and you still need to launch it separately from your server (it is a separated process, which is the objective).
Sample using resque:
In the Gemfile:
gem 'resque'
In the terminal, launching a worker:
bin/resque work
The case is similary when using sidekick, delayed job or something else.
If you want to launch the server & worker in a single command, you can create a short bash script for it, but I would advise not doing so: having two separated console helps to watch what is happening on each side (web app & worker).
A better solution would be to use the foreman gem to manage starting & stopping your process.
You can create a simple Procfile with the processes to start:
web: bundle exec rails s
job: bundle exec resque work
And then just start both using foreman:
foreman start
By default, foreman will interleave the logs of the process in the console, but this can be configured.
You still have to run the job thread watcher.

Rails.root points to the wrong directory in production during a Resque job

I have two jobs that are queued simulataneously and one worker runs them in succession. Both jobs copy some files from the builds/ directory in the root of my Rails project and place them into a temporary folder.
The first job always succeeds, never have a problem - it doesn't matter which job runs first either. The first one will work.
The second one receives this error when trying to copy the files:
No such file or directory - /Users/apps/Sites/my-site/releases/20130829065128/builds/foo
That releases folder is two weeks old and should not still be on the server. It is empty, housing only a public/uploads directory and nothing else. I have killed all of my workers and restarted them multiple times, and have redeployed the Rails app multiple times. When I delete that releases directory, it makes it again.
I don't know what to do at this point. Why would this worker always create/look in this old releases directory? Why would only the second worker do this? I am getting the path by using:
Rails.root.join('builds') - Rails.root is apparently a 2 week old capistrano release? I should also mention this only happens in the production environment. What can I do
?
Rescue is not being restarted (stopped and started) on deployments which is causing old versions of the code to be run. Each worker continues to service the queue resulting in strange errors or behaviors.
Based on the path name it looks like you are using Capistrano for deploying.
Are you using the capistrano-resque gem? If not, you should give that a look.
I had exactly the same problem and here is how I solved it:
In my case the problem was how capistrano is handling the PID-files, which specify which workers currently exist. These files are normally stored in tmp/pids/. You need to tell capistrano NOT to store them in each release folder, but in shared/tmp/pids/. Otherwise resque does not know which workers are currently running, after you make a new deployment. It looks into the new release's pids-folder and finds no file. Therefore it assumes that no workers exist, which need to be shut down. Resque just creates new workers. And all the other workers still exist, but you cannot see them in the Resque-Dashboard. You can only see them, if you check the processes on the server.
Here is what you need to do:
Add the following lines in your deploy.rb (btw, I am using Capistrano 3.5)
append :linked_dirs, ".bundle", "tmp/pids"
set :resque_pid_path, -> { File.join(shared_path, 'tmp', 'pids') }
On the server, run htop in the terminal to start htop and then press T, to see all the processes which are currently running. It is easy to spot all those resque-worker-processes. You can also see the release-folder's name attached to them.
You need to kill all worker-processes by hand. Get out of htop and type the following command to kill all resque-processes (I like to have it completely clean):
sudo kill -9 `ps aux | grep [r]esque | grep -v grep | cut -c 10-16`
Now you can make a new deploy. You also need to start the resque-scheduler again.
I hope that helps.

Daemon Start at the application bootup

I have a daemon that should run behind my rails app doing db modifications.I implemented that daemon using ruby daemons gem. I want to start that daemon at the start of my app. Whenever my app starts, I need to start that daemon.
How can I do this..?
If you must start it during Rails initialization:
Create a ruby file that will start the daemon. Say invoke_daemon.rb
Put this file in config/initializers/invoke_daemon.rb
However if it isn't mandatory, I would suggest creating a binary executable or a rake task and manually starting it through command line. This way it runs as a separate process. You can simply add it to your deployment scripts for production boxes and on development box run it manually. A few examples would be searchd, the search daemon for sphinx and thinking_sphinx:delayed_delta rake task from thinking_sphinx.
For your knowledge you have to take look of
Rails Life cycle
I have just implemented this thing. I have implemented on Windows7.
I have created one batch file let's say my_batch.bat, which contains ruby command i.e. ruby my_daemon.rb file.
In addition, to execute this file when my app starts , I have just added one statement in environment.rb file which executes that batch file. i.e. system ("my_batch.bat").
But I am not sure that this is a good way to implement this task.

Resources