SQLite in development, PostgreSQL in production—why not? - ruby-on-rails

Heroku advises against this because of possible issues. I'm an SQL noob, can you explain the type of issues that could be encountered by using different databases?

I used sqlite3 in development and postgres in production for a while, but recently switched to postgres everywhere.
Things to note if you use both:
There are differences between sqlite3 and postgres that will bite you. A common thing I ran into is that postgres is stricter about types in queries (where :string_column => <integer> will work fine in sqlite and break in postgres). You definitely want a staging area that uses postgres if your dev is sqlite and it matters if your production app goes down because of a sql error.
Sqlite is much easier to set up on your local machine, and it's great being able to just delete/move .sqlite files around in your db/ directory.
taps allows you to mirror your heroku postgres data into your local sqlite db. It gets much slower as the database gets larger, and at a few 10s of tables and 100K+ rows it starts to take 20+ minutes to restore.
You won't get postgres features like ilike, the new key/value stores, fulltext search
Because you have to use only widely supported SQL features, it may be easier to migrate your app to mysql
So why did I switch? I wanted some postgres-only features, kept hitting bugs that weren't caught by testing, and needed to be able to mirror my production db faster (pg_restore takes ~1 minute vs 20+ for taps). My advice is to stay with sqlite in dev because of the simplicity, and then switch when/if you need to down the road. Switching from sqlite to postgres for development is as simple as setting up postgres - there's no added complexity from waiting.

Different databases interpret and adhere to the SQL standard differently. If you were to, say, copy paste some code from SQLite to PostgreSQL there's a very large chance that it won't immediately work. If it's only basic queries, then maybe, but when dealing with anything particular there's a very low chance of complete compatability.
Some databases are also more up to date with the standard. It's a similar battlefield to that of internet browsers. If you've ever made some websites you'd know compatability is a pain in the ass, having to get it to work for older versions and Internet Explorer. Because some databases are older than others, and some even older than the standards, they would've had their own way of doing things which they can't just scrap and jump to the standard because they would lose support for their existing larger customers (this is especially the case with a database engine called Oracle). PostgreSQL is sort of like Google Chrome, quite high up there on standards compliance but still with some of its own little quirks. SQLite is, as the name suggests, a light-weight database system. You could assume it lacks some of the more advanced functionality from the standards.
The database engines also perform the same actions differently. It is worth getting to know and understand one database and how it works (deeper than just the query level) so you can make the most of that.

I was in a (kind of) similar situation. Generally it is a very bad idea to use different database engines for production and test. There are multiple reasons
SQL syntax differences including DML, DDL statements, stored procedures, triggers etc
Performance optimizations done on one DB wont be valid on the other
SQLite is an embedded database, PostgreSQL is not
They don't support the same data types
Different syntax/commands to configure/setup db. SQLite uses PRAGMAs
One should stick to one db engine, unless you have a really, really good reason. I can't think of any.

Related

Possible to have one app on Heroku that dynamically uses different databases?

I have an idea for a multi-tenant app, and I'm trying to decide if I should use one large database or use separate databases for each tenant.
I don't even know if the latter is possible in Rails, or with rails on Heroku.
I also don't know if this is a good idea, or cost prohibitive.
But I guess to start I just want to know if it's possible.
There are many approaches to multi-tenancy, each with its own pros and cons. Postgres has this nice feature called schemas, which means you can have one database but multiple namespaces inside. This can be a convenient solution for Rails, as Rails was designed for connecting with only one database. It is easy to integrate with apartment gem. It takes care of migrations and tenant switching based on specified rules, usually subdomain. But this solution has downsides. While Postgres does not have any limitation on number of schemas, when you have a lot, migrations will take forever. And there are problems with backups. Heroku recommends using less than 50 schemas.
If you want to have multiple physical databases then it is a little bit tricky with Rails. There are some gems that allow connecting to multiple databases. Recently I heard about octoshark gem, but I haven't use it.
In summary, Postgres schemas are nice if you want to have good isolation without too much work. It will be also cost efficient on Heroku, as you will use only one database. But it won't scale for a lot of tenants. Multiple databases provide the best isolation, but support for this solution in Rails is not that great I think. And it will be costly as you you will need to provision separate database for each tenant. And the last resort is to just use one database and scope all your tenant data with tenant_id. In this solution you need to guarantee isolation which requires additional work and it is easy to miss some parts of the application.

Rolling out new version of a Rails app

I wonder how people deal with gradually rolling out features and versions in a production evironment. the scenario is where you have two versions of tested code one already in production and one to be rolled out, these are the common issues..
different versions of code within same rails app.
different versions of rails app during rollout to users.
different database structures between version
moving data across new databases and servers.
here are some ideas for the above for discussion
if statements with constant, version numbers in M,V,C names
load balance to different app servers (how to make sticky?) , RVM
have old and new fields in tables as temporary, or migrate records to new tables or
databases.
no easy way to move data between
servers.
It sounds like you need a good branching and merging strategy. If you're using something like Git or SVN, then anything on master or trunk, respectively, should be production-ready quality. If you're running into situations where the AbcController is good and ready to go, but XyzController is flaky, then the XyzController probably needs more testing and shouldn't be in master yet.
Migrations in rails also follow this policy, which lead to your data structure. If you think that you're ready for production, then there should't be significant changes to your database. Maybe you need to add a column or feature, but you should be well past wholesale database refactorings.
Finally, uploading/updating data is a pain in any migration situation. In my experience, it involves writing SQL scripts to perform the moves, or update the database for some new feature. Those SQL scripts should also be under your source control. Rails can make this easier, by writing your migration scripts in the migration file itself. Depending on your exact situation, this can work.

Using SQLite as production database, bad idea but

We are currently using postgresql for our production database in rails, great database, but I am building the new version of our application around SQLite. Indeed, we don't use advanced functions of postgres like full text search or PL/SQL. Considering SQLite, I love the idea to move the database playing with just one file, its simple integration in a server and in Rails, and the performance seems really good -> Benchmark
Our application's traffic is relatively high, we got something like 1 200 000 views/day. So, we make a lot of read from the database, but we make a few writes.
What do you think of that ? Feedback from anyone using or trying (like us) to use SQLite like a production database ?
If you do lots of reads and few writes then combine SQLite it with some sort of in-memory cache mechanism (memcache or redis are really good for this). This would help to minimize the number of accesses (reads) to the database. This approach helps on any many-reads-few-writes environment and it helps to not hit SQLite deficiencies - in your specific case.
SQLite is designed for embedded systems. It will work fine with a single user, but doesn't handle concurrent requests very well. 1.2M views per days probably means you'll get plenty of the latter.
For doing only reads I think in theory it can be faster than an out-of-process database server because you do not have to serialize data to memory or network streams, its all accessed in-process. In practice its possible an RDBMS could be faster; for example MySQL has pretty good query caching features and for certain queries that could be an improvement because all your rails process would use this same cache. With sqllite they would not share a cache.

Copying a massive database for local ruby on rails development?

There is a massive database (GB) that I am working with now and all of the previous development has been done on a slicehost slice. I am trying to get ready for more developers to come in and work so I need each person to be able to setup his own machine for development, which means potentially copying this database. Selecting only the first X rows in each table to cut size could be problematic for data consistency. Is there any way around this, or is a 1 hour download for each developer going to be necessary? And beyond that, what if I need to copy the production DB down for dev purposes in the future?
Sincerely,
Tyler
databases required for development and testing rarely need to be full size, it is often easier to work on a small copy. A database subsetting tool like Jailer ( http://jailer.sourceforge.net/ ) might help you here.
Why not have a dev server that each dev connects to?
Yes all devs develop against the same database. No developement is ever done excpt through scripts that are checked into Subversion. If a couple of people making changes run into each other, all the better that they find out as soon as possible that they are doing things which might conflict.
We also periodically load a prod backup to dev and rerun any scripts for things which have not yet been loaded to prod to keep out data up-to-date. Developing against the full data set is critical once you have a medium sized database because the coding techniques which appear to be fine to a dev on a box by himself with a smaller dataset, will often fail misreably against prod sized data and when there are multiple users.
To make downloading the production database more efficient, be sure you're compressing it as much as possible before transmission, and further, that you're stripping out any records that aren't relevant for development work.
You can also create a patch against an older version of your database dump to ship over only the differences and not an entirely new copy of it. This works best when each INSERT statement is recorded one per line, something that may need to be engaged on your tool specifically. With MySQL this is the --skip-extended-insert option.
A better approach is to have a fake data generator that can roll out a suitably robust version of the database for testing and development. This is not too hard to do with things like Factory Girl which can automate routine record creation.
In case anyone's interested in an answer to the question of "how do I copy data between databases", I found this:
http://justbarebones.blogspot.com/2007/10/copy-model-data-between-databases.html
It answered the question I asked when I found this S.O. question.

How can I calculate data for a boxplot (quartiles, median) in a Rails app on Heroku? (Heroku uses Postgresql)

I'm trying to calculate the data needed to generate a box plot which means I need to figure out the 1st and 3rd Quartiles along with the median. I have found some solutions for doing it in Postgresql however they seem to depend on either PL/Python or PL/R which it seems like Heroku does not have either enabled for their postgresql databases. In fact I ran "select lanname from pg_language;" and only got back "internal", "c", and "sql".
I also found some code to do it in pure ruby but that seems somewhat inefficient to me.
I'm rather new to Box Plots, Postgresql, and Ruby on Rails so I'm open to suggestions on how I should handle this. There is a possibility to have a lot of data which is why I'm concerned with performance however if the solution ends up being too complex I may just do it in ruby and if my application gets big enough to warrant it get my own Postgresql I can host somewhere else.
*note: since I was only able to post one link, cause I'm new, I decided to share a pastie with some relevant information
Heroku does not give you superuser access on the PostgreSQL cluster, which is required to install any additional languages.
If possible, it's best to perform aggregation server side (in the database) for performance reasons. There are median aggregate implementations which don't need additional languages. By looking at PL/Python boxplot implementations, one should be able to write a PL/pgSQL or PL/SQL equivilant.

Resources