The best solution for migration-ecosystem - ruby-on-rails

We have infrastructure which contains about 30 host for 10 microservices and 4 hosts for databases ( SQL (PostgreSQL) and NoSQL (Cassandra, Riak, Neo4j), replicated and sharded ). Periodically we need to modify DB-structure (add some tables, fields and trigers).
For some time we was using python scripts for making migrations/patches to DBs (and propogate these changes to isolated develop/analytic clusters). Next, we moved to Rails migrations. Rails is the better way (a little higher-level) for our migrations (in opposite to python scripts), but in same cases (DB specific patches, like creating postgres trigers, scharding) it requires sql commands (no-DSL).
Our purpose is to go to high-level approach for migrations (generating), like IDE (PgAdmin, DataGrip or some similar), which could:
generate migrations/patches for DBs (which could be applied to DB cluster).
the generated migrations/patches must be sequenced (like in Rails).
it would be good to have the approach for SQL and NoSQL migrations/patches.
And after generating these migrations/paches we need to apply it (from commad line, CLI) on the cluster side.
There is also very specific actions for making virtual connections between some table in NoSQL and Relational DB.
So, is there any solutions for deploying such migration-ecosystem?
How do you solve such problem in you projects?

Related

Aqueduct & in memory database

just wanted to know, if the Aqueduct ORM supports a simple in memory database, for testing purposes. Looking for something easy and lightweight to write the backend, before actually connecting it to postgres.
I've used similar approach with H2 and Postgres with Java, but it is rather error-prone: While the SQL interface may be similar, you could be using a feature that is available in one, but not the other. Eventually, either your development is blocked, or it is OK, but then the real deployment will be hitting issues.
I've found that starting a Postgresql instance in a docker is much easier than I've first thought, and now I use the same principle for most external dependencies: run them inside docker. If there is an interest, I can open source a Dart package that starts the docker container and waits until a certain string pattern is present on the output (e.g. a report on successful start).
Aqueduct was built to be tested with a locally running instance of PostgreSQL. This avoids the class of errors that occur when using a different database engine in tests vs. deployment. It is a very important feature of Aqueduct.
The tl;dr is that you can use a local instance of PostgreSQL with the same efficiency as an in-memory database and there is documentation on the one-time setup process.
The Details
Aqueduct creates an intermediate representation of your data model at startup by reflecting on your application code. This representation drives database migrations, serialization, runtime reflection, and can even be exported as JSON to create data modeling tools on top of Aqueduct.
At the beginning of each test suite, your test harness uses this representation to generate temporary tables in a local database named dart_test. Temporary tables are destroyed as soon as the database connection is lost; which you can configure to happen between tests, groups of tests, or entire test suites depending on your needs. It turns out that this is very fast - on the order of milliseconds.
CI platforms like TravisCI and Appveyor both support local PostgreSQL processes. See this script and this travis config for an example.

Database Changes Outside Ruby/Rails Migration

we have several technologies accessing the same database. At the moment, Ruby/Rails is used to create migrations when making changes to the database. The question is a simple one:
Is it possible for our DBAs to make changes to the database (not using Ruby migrations) without stepping on the Ruby devs toes and breaking the Ruby web application?
If so, some generic details about how to get started or pointed in the right direction would be great! Thanks.
I can tell you from experience that this is not the best idea, one that you will eventually regret and later, inevitably, reverse. But I know that it does come up. I've had to do them (against my will or in case of extreme emergencies).
Given the option, I'd push back on it if you can in favor of any solution that bring the SQL closer to the repository and further away from a "quick fix" to the database directly. Why?
1) Your local/testing/staging/production databases will diverge, eventually rendering your code untestable in a reliable way
2) You won't be able to regenerate your database from "scratch" to match production
3) If the database is ever clobbered, you won't be able to re-create it in any sensible way.
DBA's generally don't care about these things until something in the code breaks, and they want you to figure it out. But, for obvious reasons, that now becomes quite difficult.
One approach I have taken that seems to make everyone happy is to do the following:
1) Commit to having ALL database changes, big or small, put into a repository with the code. This means that everything that has happened to the database is all together in one place.
2) Each change, or set of changes, should be a migration. A migration can be simply running an SQL file. But, it should be run from within a migration for all the testability benefits.
So, for example, let's say you have a folder structure like:
- database_updates
-- v1
--- change_1.sql
--- change_2.sql
-- v2
--- change_3.sql
--- change_2_fix.sql
Now, let say you want to make a change or set of change via SQL. First, create a new version folder, let's call it "v1". Next, put your SQL scripts in this folder. Finally, create a migration:
def change
# Read all files in v1 folder, and run the SQL
end
(I have code that does this, happy to share the gist if you find yourself using this approach)
Since each migration is transactional, any of the scripts that fail will cause all of them to fail.
Now, let's say you have the next set, v2. Same exact thing. And, we have a history of these "versioned" changes, so we can look at the migration history and see what's been run, etc.
As a power user note, this set up also allows for recourse if things fail; in these cases, we can opt to go back to v1:
def up
# run v2 scripts
end
def down
# run v1 scripts
end
For this to work, v1 and v2 would need to be autonomous -- that is, they can destroy and rebuild entities without any dependencies. If that's not what you want, just stick with the change method.
This would also allow you to test for breaking changes. Let's say it is reported that something doesn't work anymore with v6. You can rollback your database migrations to v5, v4, etc (because you are doing a migration per folder) and test to see when the test broke, and correct it with v7.
Anyway, the end game of it all is that you can safely check out this project from a repository, create your database, run rake db:migrate and know that your database structure resembles exactly what is deployed elsewhere. And, worst case, if your database gets clobbered, you can just run all your scripts from v1 - vN and end up with your database back again.
For DBA's everything remains SQL for them, they can just send you a file or set of files for you to run.
If you want to get fancy, you could even write a migration generator that knows how to handle a line like rails g migration UpdateDBVersion version:v7 to take care of the repetitive boilerplate.
As long as everyone relies on the same updated schema.rb or structure.sql, everyone will share the same database 'version'.
See this SO answer for more insight.
Changes to the database, tables, or indexes should be made using ActiveRecord migrations whenever possible. This specifically ensures that development and test environments remain logically in sync. Remember that developers must be capable of accurate development and testing against the same database structure as occurs in the production environment, and QA teams must be able to adequately test such changes.
However, some database features are not actually supported by ActiveRecord migrations, and may only be applied directly to the database. These features are often database-specific, such as any of the following:
Views
Triggers
Stored procedures
Indexes with function-based columns
Virtual columns
Essentially any database-specific features that don't have an ActiveRecord abstraction will be made directly to the database.
Sometimes, however, other applications require the addition of tables, columns, or indexes in order to operate properly or efficiently. These other applications may simply be used to view/report against the database, or they may be substantial business applications that have their own independent database requirements and separate development teams. Occasionally, a DBA may have to step in and create an index or provide some optimization needed to solve a real-world production performance issue.
There are simply far too many situations for shared database management to give a definitive answer. Depending on the size of the organization and the complexity of the needs for the shared management, there may be many ways to solve the problem of a shared database schema that are specific to the application or organization.
For instance, I have worked on applications that shared a database with as many as 10 other applications, each of which "owned" portions of the schema and shared other portions with the other teams, all mediated through the DBA group. In situations such as this, the organizational structure and change control process may be the only means of solving this problem.
Whichever the situation, some real-world suggestions may help avoid problems and mitigate maintenance woes:
Offer to translate SQL DDL commands into ActiveRecord migrations, where possible, so that DBAs can accomplish their needs, and the application team can still appropriately maintain the schema
Any changes made outside the ActiveRecord migration should be thoroughly tested for impact to the project in a non-production environment by the same QA resources that test the actual Rails application
Encapsulate any external changes in a .sql file and include the file as part of the project in version control
If the development team is using the same database product in development (some cannot, due to licensing or complexity), those changes should be applied to the developer database instances, as well
It's best if you can apply the changes during a migration, even just by calling the relevant CLI tools as a migration step - the exact mechanism will be database-dependent, as well
Try to avoid doing this more than is absolutely necessary, as this can significantly reduce the database independence of the application, even between versions of the same database product (limiting upgrade opportunities)

Migrating from mongodb to postgresql in rails

I'm using MongoDB with mongo_mapper gem in Rails and project is big enough. Is there any way I can migrate data from Mongoid to Postgresql?
You should look into some of the automated migration/replication tools like MoSQL: https://stripe.com/blog/announcing-mosql
One risky strategy for this migration would be to convert your codebase to using postgres and all of your models, put your site into maintenance mode, migrate your databases, deploy the new code, and bring it back up. However, this requires significant downtime and development risk of bugs or data loss.
A safer, but much more involved strategy would be to setup automatic migration of data to a new database to sync up your databases. Then, every action in the application writes to both databases. After each transaction, you verify the data is in sync between both databases and read from Mongo. This allows you to fix errors as you find them and highlight any inconsistencies. Once you are no longer finding discrepancies, you can turn off writing to mongo and retire that database, remove the mongo models/code, and move on.

Making Changes Between Two Databases ASP.NET MVC

I'm working on a project in which we have two versions of an MVC App, the live, and the dev versions, I've made changes to the dev version and added tables and data, etc.
Is there any way to migrate these changes onto the live version without losing all data(i.e. just regenerating the database).
I've already tried just rebuilding the database but we lose all data that was previously stored( as obviously we are essentially deleting the old database and rebuilding it).
tl;dr
How do I migrate my dev version of an mvc app along with any new tables to the live version of an mvc app with missing models and tables.
Yes, it is possible to migrate your changes from your dev instance to your production instance; to do so you must create SQL scripts that update your production database with the changes. This may be accomplished by manually writing the scripts or by using tools to generate the scripts for you. However you go about it, you will need scripts to update your database (well, you could perform manual updates via the tooling of your database, but this is not desirable, as you want the updates to occur in a short time window, and you want this to happen reliably and repeatably).
The easiest way that I know of to do this is to use tools like SQL Compare (for schema updates) or SQL Data Compare (for data updates). These are from Redgate, but they cost a fair bit of money. They are well worth they price, and most companies I've worked with are happy to pay for licenses. However, you may not want to shell out for them personally. To use these tools, you connect them to source and destination databases, and they analyze the differences between the databases (schematically or data) and produce SQL scripts. These scripts may then be run manually or by the tools themselves.
Ideally, when you work on your application, you should be producing these scripts as you go along. That way when it comes time to update your environments, you may simply run the scripts you have. It is worth taking the time to include this in your build process, so database scripts get included in your builds. You should write your scripts so they are idempotent, meaning that you can run them multiple times and the end result will be the same (the database updated to the desired schema and data).
One may of managing this is creating a DBVersions table in your database. This table includes all your script updates. For example you could have a table like the following (this is SQL Server 2008 dialect):
CREATE TABLE [dbo].[DBVersions] (
[CaseID] [int] NOT NULL,
[DateExecutedOn] [datetime] NOT NULL,
CONSTRAINT [PK_DBVersions] PRIMARY KEY CLUSTERED (
[CaseID] ASC
)
) ON [PRIMARY]
CaseID refers to the case (or issue) number of the feature or bug that requires the SQL update. Your build process can check this table to see if the script has already been run. If not, it runs it. This is useful if you cannot write your scripts in a way that allows them to be run more than once. If all your scripts can be run an unbounded number of times, then this table is not strictly necessary, though it could still be useful to reduce the need to run a large number of script every time a deployment is done.
Here are links to the Redgate tools. There may be many other tools out there, but I've had a very good experience with these.
http://www.red-gate.com/products/sql-development/sql-compare/
http://www.red-gate.com/products/sql-development/sql-data-compare/
It depends on your deployment strategy and this is more of a workflow that your team need to embrace. If regenerating the live database from scratch it can take awhile depending how big the database size is. I don't see a need to do this in most scenarios.
You would only need to separate out database schema object and data row scripts. The live database version should have its database schema objects scripted out and stored in a repository. When a developer is working on a new functionality, he/she will need to make those changes against the database scripts in the repository. If there is a need to make changes to the database rows then the developer will also need to check in the data row scripts in the repository. On a daily deployment the live database version can be compared against what is checked in the repository and pushed to make it in sync.
On our side we use tools such as RedGate Schema Compare and Data Compare to do the database migration from the dev version to our intended target version.

ease of scaling mongodb vs mysql

I am creating a Grails application that is the backend for a mobile application. It is currently deployed on Amazon EC2. It persists data to a mysql database. One instance currently pointing to the database. I plan to deploy multiple instances of the app behind a load balancer and eventually have read requests go to slave instances of the db. We plan to release in the coming months and have a beta group of a couple of thousand users. It is more read intensive than write.
We have looked into using mongodb instead of sql and see it as a good solution.
Not having a lot of experience scaling mysql ( or mongodb ) would it be easier to scale mongodb since it has features such as auto sharding. ( Looking for thoughts from people who have done both ) I am of thinking it will be easier to switch to mongodb now rather than be in 'production' and having to migrate.
Thoughts?
MongoDB has two versions of "scaling":
Read scaling via replica sets.
Write scaling via sharding.
They're not silver bullets, but they're both very easy to set up. Replica sets have auto-failover which is practically essential when using EC2 (they have a good history of just randomly failing nodes). When you need write scaling, MongoDB has documented processes for upgrading your replica set to a series of sharded replica sets.
The unfortunate limitation is that (last I checked), things like scalr don't really support automatic scaling. So you'll have to roll your own solution for adding and removing nodes from the set.
Some important considerations:
Disk IO performance is sketchy in the cloud. Good performance is all about the amount of RAM you can throw at the problem.
If you're using replica sets for reads, ensure that your driver / data wrapper is capable of handling the distribution of reads. Just like MySQL it's not currently "free", you'll need to decide "write vs. read".
64-bit machines. MongoDB really wants to operate on 64-bit hardware. This is a cost consdieration as you'll probably have to ramp up with 4GB machines instead of 2GB machines (I don't think this is a big limitation, but I also know what it's like to be a startup).
MongoDB is still new tech. The lists are very active, and people are using it in production for very large data sets. But this is still a new product, you have to be prepared to work from the command-line and parse through docs and ask questions.
would it be easier to scale mongodb
At some level scaling is going to be a "hard" problem. What MongoDB does well is provide a way to really scale out lots of boxes horizontally with replication. In my experience, MySQL really tops out at around two boxes for writes. You can easily configure co-masters, but after that you have to start mucking around with all kinds of partitioning and you basically lose the ability to do joins.
I am of thinking it will be easier to switch to mongodb now rather than be in 'production'
It probably will.
Thoughts?
Start small. Get one piece working and see if you like how it works. If you have access to an EC2 account, then it's easy to spin up a couple of machines and play. MongoDB is not a panacea, but it works really well for a lot of modern web problems. Just measure how badly you need joins :)

Resources