Copying a massive database for local ruby on rails development? - ruby-on-rails

There is a massive database (GB) that I am working with now and all of the previous development has been done on a slicehost slice. I am trying to get ready for more developers to come in and work so I need each person to be able to setup his own machine for development, which means potentially copying this database. Selecting only the first X rows in each table to cut size could be problematic for data consistency. Is there any way around this, or is a 1 hour download for each developer going to be necessary? And beyond that, what if I need to copy the production DB down for dev purposes in the future?

databases required for development and testing rarely need to be full size, it is often easier to work on a small copy. A database subsetting tool like Jailer ( ) might help you here.

Why not have a dev server that each dev connects to?
Yes all devs develop against the same database. No developement is ever done excpt through scripts that are checked into Subversion. If a couple of people making changes run into each other, all the better that they find out as soon as possible that they are doing things which might conflict.
We also periodically load a prod backup to dev and rerun any scripts for things which have not yet been loaded to prod to keep out data up-to-date. Developing against the full data set is critical once you have a medium sized database because the coding techniques which appear to be fine to a dev on a box by himself with a smaller dataset, will often fail misreably against prod sized data and when there are multiple users.

To make downloading the production database more efficient, be sure you're compressing it as much as possible before transmission, and further, that you're stripping out any records that aren't relevant for development work.
You can also create a patch against an older version of your database dump to ship over only the differences and not an entirely new copy of it. This works best when each INSERT statement is recorded one per line, something that may need to be engaged on your tool specifically. With MySQL this is the --skip-extended-insert option.
A better approach is to have a fake data generator that can roll out a suitably robust version of the database for testing and development. This is not too hard to do with things like Factory Girl which can automate routine record creation.

In case anyone's interested in an answer to the question of "how do I copy data between databases", I found this:
It answered the question I asked when I found this S.O. question.


Changing existing Core Data Image Store from Transformable to Binary Data/Allows External Storage

I am about to undertake the daunting project of converting my live (i.e. already on the app store for a number of years) app from Transformable to Binary Data store for images in Core Data.
I have many users with very large databases that store lots of images. This has really slowed down the Backup/Restore process, and probably caused some other behind-the-scenes issues as well. I didn't know any better when I set it up this way years ago.
How can I undergo this process so as not to lose even one of my customer's images? If it were just me and my own data, I'm sure I could get things working. But I want to be sure to do this properly, step by step, and I knew that this community could be a big help in that area. I really don't know where to start for the existing images.
Basically, I am looking for 1) steps to take, so as not to miss a beat. and 2) general advice, warnings, etc. in this process. I really need a clean migration when this version goes live.
Thanks in advance to anyone who can help.
One piece of advice: don't use "Allows External Storage", especially if you plan to use iCloud syncing with Core Data in the future. Reference:
Instead, you might want to consider moving the images into their own files, and saving the URL to those files inside your database instead. You will have to work out how best to do the migration: lightweight migration is probably not an option if you go down this route.
Transformable data type is really just binary under the covers with some additional metadata. Have you tested a simple lightweight migration on an existing store? I suspect the migration would work and would leave the existing data in the store.
If you are looking to get the existing binary data actually moved out of the SQLite file then you are looking at something a bit more involved.
A heavy migration will accomplish what you are looking for but if the stores are large it may take took long and potentially not provide enough feedback for a good user experience. I personally do not use heavy migrations, ever, on IOS but it will accomplish your goal.
An export/import will also work. I generally recommend export/import when a lightweight migration won't work. It involves a medium amount of code but in the end you own the code, understand the entire process and can tweak it to your exact needs.

SQLite in development, PostgreSQL in production—why not?

Heroku advises against this because of possible issues. I'm an SQL noob, can you explain the type of issues that could be encountered by using different databases?
I used sqlite3 in development and postgres in production for a while, but recently switched to postgres everywhere.
Things to note if you use both:
There are differences between sqlite3 and postgres that will bite you. A common thing I ran into is that postgres is stricter about types in queries (where :string_column => <integer> will work fine in sqlite and break in postgres). You definitely want a staging area that uses postgres if your dev is sqlite and it matters if your production app goes down because of a sql error.
Sqlite is much easier to set up on your local machine, and it's great being able to just delete/move .sqlite files around in your db/ directory.
taps allows you to mirror your heroku postgres data into your local sqlite db. It gets much slower as the database gets larger, and at a few 10s of tables and 100K+ rows it starts to take 20+ minutes to restore.
You won't get postgres features like ilike, the new key/value stores, fulltext search
Because you have to use only widely supported SQL features, it may be easier to migrate your app to mysql
So why did I switch? I wanted some postgres-only features, kept hitting bugs that weren't caught by testing, and needed to be able to mirror my production db faster (pg_restore takes ~1 minute vs 20+ for taps). My advice is to stay with sqlite in dev because of the simplicity, and then switch when/if you need to down the road. Switching from sqlite to postgres for development is as simple as setting up postgres - there's no added complexity from waiting.
Different databases interpret and adhere to the SQL standard differently. If you were to, say, copy paste some code from SQLite to PostgreSQL there's a very large chance that it won't immediately work. If it's only basic queries, then maybe, but when dealing with anything particular there's a very low chance of complete compatability.
Some databases are also more up to date with the standard. It's a similar battlefield to that of internet browsers. If you've ever made some websites you'd know compatability is a pain in the ass, having to get it to work for older versions and Internet Explorer. Because some databases are older than others, and some even older than the standards, they would've had their own way of doing things which they can't just scrap and jump to the standard because they would lose support for their existing larger customers (this is especially the case with a database engine called Oracle). PostgreSQL is sort of like Google Chrome, quite high up there on standards compliance but still with some of its own little quirks. SQLite is, as the name suggests, a light-weight database system. You could assume it lacks some of the more advanced functionality from the standards.
The database engines also perform the same actions differently. It is worth getting to know and understand one database and how it works (deeper than just the query level) so you can make the most of that.
I was in a (kind of) similar situation. Generally it is a very bad idea to use different database engines for production and test. There are multiple reasons
SQL syntax differences including DML, DDL statements, stored procedures, triggers etc
Performance optimizations done on one DB wont be valid on the other
SQLite is an embedded database, PostgreSQL is not
They don't support the same data types
Different syntax/commands to configure/setup db. SQLite uses PRAGMAs
One should stick to one db engine, unless you have a really, really good reason. I can't think of any.

Is there a gem to make Git and Rails work together for more robust migrations?

I've heard before that Rails migrations are flawed, but I never really experienced any example of this firsthand until just recently. What I now realize is that if a migration relies on a particular state of the code, then you're in trouble if you try to clone the repo two years later and run all the migrations (as a lot of them will depend on older versions of the code).
I thought this guy had a good idea:
Is there anything like this: a gem (say) to automatically check out the commit where each migration was added, all the way up to HEAD?
Obviously it wouldn't be a fool-proof system, as it does rely on every migration being legitimately possible from a clean slate at the point it was committed to the repo (I can imagine cases where teams have written migrations that only incidentally work based on things they've done with the database completely outside of version control). But it would certainly be better than nothing.
I think there are 2 cases that could happen, and in each case, the solution is different:
You develop the code and deliver it in small release increments. The solution is running somewhere, has data in it, and you have to migrate it every time you deliver a new version.
==> Rails is then a perfect answer how to develop and deliver the new releases. I do that all the time (with 2 applications where I am the only user), and never had a problem.
You develop the code, and deliver a lot of small releases (with the first approach). You then want to instantiate it on a new server, without any data stored there.
==> Then dumping your scheme and loading it on the new server is the best way, just to ensure that everything is in place.
I do not know Capistrano, perhaps there are options to do it differently. So if you have something like the scenario 1, use the Rails approach, in case of scenario 2, use the dump-and-load approach.
Great reason to version your database schema.rb.
The git changelog will show when the schema.rb changed, and give you the commit hash.

Rolling out new version of a Rails app

I wonder how people deal with gradually rolling out features and versions in a production evironment. the scenario is where you have two versions of tested code one already in production and one to be rolled out, these are the common issues..
different versions of code within same rails app.
different versions of rails app during rollout to users.
different database structures between version
moving data across new databases and servers.
here are some ideas for the above for discussion
if statements with constant, version numbers in M,V,C names
load balance to different app servers (how to make sticky?) , RVM
have old and new fields in tables as temporary, or migrate records to new tables or
no easy way to move data between
It sounds like you need a good branching and merging strategy. If you're using something like Git or SVN, then anything on master or trunk, respectively, should be production-ready quality. If you're running into situations where the AbcController is good and ready to go, but XyzController is flaky, then the XyzController probably needs more testing and shouldn't be in master yet.
Migrations in rails also follow this policy, which lead to your data structure. If you think that you're ready for production, then there should't be significant changes to your database. Maybe you need to add a column or feature, but you should be well past wholesale database refactorings.
Finally, uploading/updating data is a pain in any migration situation. In my experience, it involves writing SQL scripts to perform the moves, or update the database for some new feature. Those SQL scripts should also be under your source control. Rails can make this easier, by writing your migration scripts in the migration file itself. Depending on your exact situation, this can work.

Has anyone tried a multi-domain/multi-database/single-deployment Rails setup?

I'm developing an app (basically an intranet) which has a few small sets of users, each a company using the app internally.
Up to now, each set of users has its own deployment with a separate domain name and database, but all living on the same server. This means that each time I have to push an upgrade I need to deploy once per client. Also, each new client means adding a new deploy target, for which I'm currently using Capistrano's multistage plugin, but it's getting a bit ridiculous.
This is a less than ideal setup, so after some thought I came up with the idea of modifying the app so that it handles multiple domains, each mapped to a different database, but on a single deployment. I created a small proof-of-concept app which basically has a before_filter in ApplicationController acting as a multiplexer for domains/databases, connecting ActiveRecord to each domain's database on each request. This worked really well, but I haven't applied this to the big app yet and I can think of at least one problem down the road: running migrations across all databases. I'm pretty sure I can work around that one though, maybe I'll tweak the rake task a little, but I'm worried that might not be the last of problems with it.
Has anyone ever tried this, or can think of any major reasons why this would be a bad idea? I would like to listen to some opinions.
This is usually called multi-tenancy. Here is a presentation or video about doing it in rails. Couldn't tell if it was any good, it was blocked here at work.
And no, there is nothing wrong with it as an idea. I'm not sure about your particular implementation, but I have worked on apps that were multi-tenant in the past and can't say we ever had much difficulty except when trouble clients wanted to stay on a legacy version of the product and we wanted to move forward.
I have a similar app and still the same problem as you, and after many tries, i ended up (before a desirable core solution came) with one env file per domain and kind a filter like yours.
I've been running in production for almost 1 year, and the only problem i detected is that rails expected the main db (even you won't use it) to have the same migration level as the others. (this problem arise under certain conditions)
If you need futher details, just let me know.
I hope this helps.
