Due to a limitation of a 3rd party library, I need to use a file with a static name. What happens in Rails if multiple users are trying to write to that file at the same time? EACCESS error?
Is there a way I could circumvent this?
At the Ruby level, what will happen if multiple processes try to write to the file depends on how the library uses the file: whether and how it locks the file before opening it and what mode it opens the file in. It might just work, it might raise an error, or (most likely, if the library does nothing to handle this situation) multiple writers might silently interleave writes with one another in a way that could corrupt the file, or the last writer might win.
At the Rails level, it depends on how you run Rails. If you run a single, normally configured Rails instance on a given server, you won't have any problems, since Rails itself is single-threaded by default. If you run multiple Rails instances (presumably controlled by an application server like Passenger or unicorn) you might have problems.
Assuming the library doesn't handle multiple writers for you, you can work around it in a couple of ways:
Run only one instance of your Rails app on each server (or docker container or chrooted environment).
Fork the library and change it to include the process ID in the file name. That's what I'd do.
Using Rails 3.2, Thinking Sphinx 3.0.6, Sphinx 2.2.4-id64-beta.
I am setting up a dedicated database server for my app. May I know the following:
Where should I install Sphinx? App server or database server?
Is the following /path/to/app/server/current/config/thinking_sphinx.yml correct:
address: database_server_ip_address
pid_file: /path/to/app/server/shared/pids/searchd.pid
indices_location: /path/to/app/server/shared/db/sphinx
configuration_file: /path/to/app/server/shared/config/production.sphinx.conf
It probably depends on where you want to avoid latency. If it's on the database server, then indexing will be fast, but search queries may be a bit slow (granted, they're just facing the same hit that database queries are, so I'd hope that was negligible either way). If indexing's not too hard and you want that extra touch of speed, then having Sphinx on your web server might be better.
That said, you're more likely to scale up the number of web servers, so having a single Sphinx daemon on the database server is probably the better way to go.
As for the options - they look correct (the paths are outside of the current directory or any specific release). You'll just want to make sure the actual directories exist.
We need to modify (such as add/remove table, add/remove/change column in current table) the mysql production database with the progress of development of the rails 3.1 app. What's the best way to do that? There is valuable data in production database and they should be retained after the modification.
Thanks.
Not sure what you have tried but Rails Database Migrations is what you are looking for.
The only way I believe if i understand your question is - to keep your data and schema copy handy and maintain them using some standard naming convention like dd-mm-yy-timestamp. this will allow you to take hot backup/cold backup. while in rails you can play with migrations to and from. but not keep data as well. if the data is a seed data then it can be returned and reversed.
The solution is other hand is more handy.you can write cron jobs and call shell scripts to make sure your operation is not manual and keeps the backup copy at safe location.
I am writing a Rails app that "scrapes/navigates" some other websites and webservices for content. I am using Mechanize and Savon to do the heavylifting.
But given the dynamic nature of the web, I'd like to make my calls to these editable by the admin users of the site - rather than requiring me to release a new version of the site.
The actual scraping thread happens async to the website, using the daemons gem.
My requirements are:
Thinking that the scraping/webservice calling code is quite simple, the easiest route is to make the whole class editable by the admins.
Keep a history of the scraping code - so that we can fairly easily revert if we introduce a problem.
Initially use the code from the file system, but as soon as thats been edited and stored somewhere, to use that code instead.
I am thinking my options are:
Store the code in the db (with a history table for the old versions)
Store the code in a private git repo somewhere and access that for the history/latest versions.
I am thinking the git route might be easiest, given its raison d'etre is to track file history...
But perhaps there is a gem/plugin that does all this for me, out of the box?
Thanks in advance for any tips/advice.
~chris
I really hope you aren't doing something like what's talked about here...
Assuming you are doing a proper mixin, there used to be a gem called "acts_as_versioned" which would do something like you want. It's been a while so I don't know if it's been turned into a plugin or if it's been abandoned. Essentially the process it uses was to provide a combination key for your versioned table.
Your database would have a structure like this:
Key column (id for the record)
Version column (id for the record's version)
All the record attributes
Let's say you had a table for your scripts, and the script you wanted has three versions. Your table would have the following records:
123, 3, '#Be good now'
123, 2, 'puts "Hi"'
123, 1, '#Do not be bad'
Getting the most recent version would be as simple as
Scripts.find :first, :conditions=>{:id=>123}, :order=>"version desc"
Rolling back would be as simple as removing the most recent version, or having another table with a pointer to the active version. It's up to you.
You are correct in that git, subversion, mercurial and company are going to be much better at this. To provide support, you just follow these steps:
Check out the script on the server (using a tag so you can manage what goes there at any time)
Set up a cron job to check out the new script periodically (like every six hours or whatever you feel comfortable with)
The daemon you have for running the script should run the new version automatically.
IF your site is already under source control, and IF you're running under mod_rails/passenger, you could follow this procedure:
edit scraping code
commit change locally
touch yourapp/tmp/restart.txt
that should give you history of the change and you shouldn't have to re-deploy.
A bit safer, but not sure if it's possible for you is on a test/developement server: make change, commit locally, test it, then on production server, git pull then touch tmp/restart.txt
I've written some big spiders and page analyzers in the past, and one of the things to keep in mind is what code is providing what service to the entire application.
Rails is providing the presentation of the data being gathered by your spidering engine. The presentation is one side of the coin, and spidering is the other, and they should be two separate code bases, tied together by some data-sharing mechanism, which, in your case, is the database. The database gives you some huge advantages as does having Rails available, when your spidering code is separate. It sounds like you have some separation already, but I'd recommend creating a wider gap. With that in mind, here's how I've done it before, and what I'd do now.
Previously, I had a separate app for my spidering that was spawning multiple spider tasks. Each task would look at a bunch of different URLs, throw their results in the database, then quit. Each time one quit the main app would spawn another spider to process more URLs. Each loop, the main app checked a YAML configuration file for run-time parameters, like how many sub-tasks it should have running, how many URLs they'd get, how long they'd wait for connections, etc. It stored the last modification date of the config file each time it loaded it so, if I made a change to the file, the app would sense it in a reasonably short time, reread the file, and adjust its behavior.
All state information about the URLs/pages/sites being scraped/spidered, was kept in the database so I could check on its progress. I could see how many had been processed or remained in the queue, the various result codes, and the content being returned. If I didn't like something I could even tweak the filters to skip junk pages, knowing the spidering tasks would be updated in a few minutes.
That system worked extremely well, spidered a major customer's series of websites without a glitch, running for several weeks as I added new sites to the list. (We were helping one of the Fortune 50 companies improve their sites, and every site had been designed and implemented by a different team, making every site completely different. My code had to be flexible and robust; I was really happy with how it worked out.)
To change it, these days I'd use a database table to hold all the configuration info. That way I could easily build an admin form, and let someone else inherit the task of adjusting the app's runtime configuration. The spider tasks would also be written so they'd pull their configuration from the database, rather than inherit it from the main app. I originally had the main app do all the administration and pass the config info to the spidering apps because I wanted to keep the number of connections to the database as low as possible. I was using Postgres and now know it could have easily handled the load, so by letting the individual tasks handle their configuration I could have made it more responsive.
By making the spidering engine separate from the presentation engine it was possible to temporarily stop one or the other without affecting the progress of the spidering job. Once I had the auto-reload of the prefs in place I don't think I had to stop the spidering engine, I just adjusted its prefs. It literally ran for weeks without stopping and we eventually pulled the plug because we had enough data for our needs.
So, I'd recommend tweaking your code so your spidering engine doesn't rely on Rails, instead it will be fired off by cron or a separate scheduling app. If you have to temporarily stop Rails your engine will run anyway. If you have to temporarily stop the engine then Rails can continue serving pages. The database sits between the two acting as the glue.
Of course, if the database goes down you're hosed all the way around, but what else is new? :-)
EDIT: Chris said:
"I see your point about the splitting the code out, though my Ruby-fu is low - not sure how far I can separate things without having to have copies of the ActiveModel/migrations stuff, plus some shared model classes."
If we look at your application as spider engine <--> | <-- database --> | <--> Rails/MVC/presentation, where the engine and Rails separately read and write to the database, and look at what each does well, that helps figure out how to break them into separate code bases.
Rails is designed to handle migrations, so let it. There's no reason to reinvent that wheel. But, how often do you do migrations, and what is effected when you do? You do them seldom once the application is stable, and, at that point you'd do them in a maintenance cycle to tweak the database. You can shut down the spidering engine and the web interface for a few minutes, migrate the database, then bring things up and you're off and running. Migrations are a necessary evil, but are hardly show-stoppers once in production. Most enterprises have "Software Sunday", or some pre-announced window of maintenance, so do the same.
ActiveRecord, modeling and associations are pretty easy to deal with too. The models are in a file that is required internally by Rails already, so the spidering engine can inherit the database know-how that way too; Multiple apps/scripts can use the same model file. You don't see the Rails books talk about it much, but ActiveRecord is actually pretty easy to use outside of Rails. Search the googles for activerecord without rails for more info.
You can pull in ActiveSupport also if you want some of its extensions to classes by doing a regular require, but the Rails "view" and "controller" logic, which normally applies to presenting the web interface, shouldn't be needed at all in the engine.
Business logic, which goes in the controllers in Rails could even be refactored into separate methods that get required by the Rails side of things and by the spidering engine. It's a different way of looking at Rails but falls in line with the "DRY" mantra - don't repeat yourself, so make things modular and require (or require_relative) bits and pieces that are the building blocks of the entire system.
If you don't want a totally separate codebase, you can take advantage of Rail's script runner, which gives a script access to the ActiveRecord::Base and ActiveRecord::Associations and ActiveSupport. Do a rails runner -h from your app's main directory, or search for "rails runner" for more info. runner is not good for a job that starts and runs many times an hour, because Rail's startup cost is high. But, if you have a long-running task, say one that runs in parallel with your rails app, then it's a great choice. I'd give it serious consideration for the spidering side of your application. Eventually you might want to break the spidering-engine out to a separate host so the presentation side has a dedicated host, so runner will help you buy time and do it in small steps.
I'm fairly new to both Rails and Heroku but I'm seriously thinking of using it as a platform to deploy my Ruby/Rails applications.
I want to use all the power of Heroku, so I prefer the "embedded" PostgreSQL managed by Heroku instead of the addon for Amazon RDS for MySQL, but I'm not so confident without the possibility to access my data in a SQL client...
I know that in a well made app you have no need to access DB, but there are some situations (add rows to a config table, see data not mapped in a view, update some columns for debugging issues, performance monitoring, running queries for reporting, etc.) when this can be good...
How do you solve this problem? What's you experience in a real life app powered by Heroku?
Thanks!
I have been using it for a about a year. I love the workflow that it provides but I find not having access to the data is a real bother. Your options for working with database are:
Taps: In theory you create your database however you want locally and use taps to copy both schema and data to Heroku. In practice, most of the time its amazingly great. However I am currently dealing with the cleanup after taps translated some of my columns poorly and corrupted my data.
Heroku console: Totally fine for all the usual ActiveRecord stuff, but closest you can get to the database is ActiveRecord::Base.connection.execute "some sql". When you find yourself wondering about doing alter table commands like that you will know you're in trouble.
They also provide a "bundle" as a method for backing up your app. This lets you download all your code plus a sql dump of the database. The difficulty is that since there is no direct database access there is no way of loading that same sql dump back into the database so you can recover from dataloss, which, to me, is the point of having those dump files to begin with. All you can use the bundle for is to create a new application (heroku bundles:animate), not restore a current app.
I would love to be wrong about any/all of these. This seems like a curious rough spot in the best thought out service that I know of. Database access normally doesn't feel like much to give up when most of what you do is made so easy.
To me database access is like a fire extinguisher. Usually not a big deal, but when it matters, it matters a lot.
I use admin_data to give me some insight as to what is going on. I've successfully used it on Heroku, as well as other hosting providers.
Firstly let me start off by saying that heroku is awesome. I've had a great experience deploying my application and integrating with their other services such as websolr.
With that said, your questions:
Getting at your data
If you want to be able to get to your data you can use taps to pull your remote database down locally. This can be useful for debugging.
Performance monitoring
Use new relic RPM. This comes as part of heroku, you can enable it from the add-ons menu.
Add-hoc database queries
You could write a controller which allows you to execute arbitrary sql and view the results, but this isn't something I'd recommend. As suggest admin_data is a good solution for managing your data, but if you want to do anything more complicated you'll have to resort to writing the code yourself.