I have trained my dase model with huge dataset on my dev machine and its working fine. Now don't want to put data and train again on production machine. Since I have model ready I just want to put the generated model and start serving layer on production machine which is running PIO.
I am not sure whether PIO already has something for this. Or how people are generally do it. It is an obvious case where we always want to train model on dev machine and deploy it back to production.
By default the models are serialized and stored to the Event Server DB. Your Algorithm can overwrite that behavior if you need. Check out the docs in: https://github.com/actionml/PredictionIO/blob/master/core/src/main/scala/io/prediction/controller/PersistentModel.scala
So you can probably accomplish what you want. However, I think that in general it is an anti-pattern to generate models on dev machines for prod.
Related
With Orchard CMS 1.6, I have four environments set up: Development, Testing, Staging, and Production.
I have made a lot of changes from the Orchard UI Dashboard in Development and I want to migrate those changes into other environments. A related question shows this can be done manually through the Orchard Dashboard using Import/Export modules, but I'm looking for a solution for data migrations that I can automate. Really, I'm looking to nail step three for SQL Server 2005/2008 database in the accepted answer for this related question: that states, "Migrate your db to production environment."
There is not much documentation out there when it comes to setting up and maintaining Orchard CMS in multiple environments outside of Azure and I have 88 tables in my current Orchard database. Obviously, anything I can do to automate data deployments would be a huge help.
I ran a Schema and Data compare between Development and Testing(which currently reflects Production). After backing up databases and replicating schema, I noticed there are a lot of data differences in almost every table. Before migrating data, I want to be sure I've isolated the tables I do not want changed. Environmental variables such as connection strings are going to have to stay unchanged.
What tables should persist in their environments?
Right now I am thinking they are:
Orchard_Users_UserPartRecord - I have users in Production I do not want to have anywhere else.
Environmental Data - I have connection strings I put in a table for a custom module that are different for each environment.
Am I missing anything? What other data should be retained in tables on destination environments?
I really wouldn't recommend a database migration: you are going to have huge problems with ids, relationships, etc. Content items are distributed onto many tables, sometimes in ways that are hard to predict, and there's no guarantee that the same id hasn't been attributed on different environments to different items or records. This is why a higher-level procedure like import/export is the recommended approach.
Automating it is absolutely something we want, and there is a feature branch doing exactly that for a future version of Orchard (feature/deployment IIRC).
In the meantime, I'm pretty sure import and export also exist as commands for the CLI, which should enable you to automate it to a degree.
I am reasonably new to Ruby on Rails so I am not sure how to implement this. My understanding is that rails is not designed with multiple databases in mind, although I could use establish_connection etc to make it work.
My main problem is:
I have an SaaS/application that will serve several businesses. Each
business will have several database tables such as: users, comments,
messages, transfers, navigation history, logs etc. It seems I have 3
options:
1: Store everybody's data in one database with every object belonging_to a business or just tagging something like a businessID/name. Use this tag to fetch the appropriate data and worry about scaling/performance later as my app grows. (Would I have to worry about this pretty early on?)
2: One database per Business. No need to store associations, and db queries perform consistently throughout the application's life (possibly bad assumption here).
3: Have separate instances of my app each running some number of businesses (not sure this is any good).
What I have seen used in other frameworks/businesses is just (2) multiple dbs.
I am also really interested is what is the best practice in rails as well. I know several applications have this same problem and hearing how this has been solved will help.
Any help is much appreciated. Thank you so much.
Env.
Ruby 1.9.2
Rails 3.1
Production:Heroku or EY (still deciding, now running on heroku)
According to this page, You'd need to apply some metaprogramming for multiple databases.
Why not make your deployment script to deploy to different directories with different database settings? One branch per business? Might require some more maintenance, but allows for per-business code if you need it.
Let's say for example I'm managing a Rails application that has static content that's relevant in all of my environments but I still want to be able to modify if needed. Examples: states, questions for a quiz, wine varietals, etc. There's relations between your user content and these static data and I want to be able to modify it live if need be, so it has to be stored in the database.
I've always managed that with migrations, in order to keep my team and all of my environments in sync.
I've had people tell me dogmatically that migrations should only be for structural changes to the database. I see the point.
My counterargument is that this mostly "static" data is essential for the app to function and if I don't keep it up to date automatically (everyone's already trained to run migrations), someone's going to have failures and search around for what the problem is, before they figure out that a new mandatory field has been added to a table and that they need to import something. So I just do it in the migration. This also makes deployments much simpler and safer.
The way I've concretely been doing it is to keep my test fixture files up to date with the good data (which has the side effect of letting me write more realistic tests) and re-importing it whenever necessary. I do it with connection.execute "some SQL" rather than with the models, because I've found that Model.reset_column_information + a bunch of Model.create sometimes worked if everyone immediately updated, but would eventually explode in my face when I pushed to prod let's say a few weeks later, because I'd have newer validations on the model that would conflict with the 2 week old migration.
Anyway, I think this YAML + SQL process works explodes a little less, but I also find it pretty kludgey. I was wondering how people manage that kind of data. Is there other tricks available right in Rails? Are there gems to help manage static data?
In an app I work with, we use a concept we call "DictionaryTerms" that work as look up values. Every term has a category that it belongs to. In our case, it's demographic terms (hence the data in the screenshot), and include terms having to do with gender, race, and location (e.g. State), among others.
You can then use the typical CRUD actions to add/remove/edit dictionary terms. If you need to migrate terms between environments, you could write a rake task to export/import the data from one database to another via a CSV file.
If you don't want to have to import/export, then you might want to host that data separate from the app itself, accessible via something like a JSON request, and have your app pull the terms from that request. That seems like a lot of extra work if your case is a simple one.
There is a massive database (GB) that I am working with now and all of the previous development has been done on a slicehost slice. I am trying to get ready for more developers to come in and work so I need each person to be able to setup his own machine for development, which means potentially copying this database. Selecting only the first X rows in each table to cut size could be problematic for data consistency. Is there any way around this, or is a 1 hour download for each developer going to be necessary? And beyond that, what if I need to copy the production DB down for dev purposes in the future?
Sincerely,
Tyler
databases required for development and testing rarely need to be full size, it is often easier to work on a small copy. A database subsetting tool like Jailer ( http://jailer.sourceforge.net/ ) might help you here.
Why not have a dev server that each dev connects to?
Yes all devs develop against the same database. No developement is ever done excpt through scripts that are checked into Subversion. If a couple of people making changes run into each other, all the better that they find out as soon as possible that they are doing things which might conflict.
We also periodically load a prod backup to dev and rerun any scripts for things which have not yet been loaded to prod to keep out data up-to-date. Developing against the full data set is critical once you have a medium sized database because the coding techniques which appear to be fine to a dev on a box by himself with a smaller dataset, will often fail misreably against prod sized data and when there are multiple users.
To make downloading the production database more efficient, be sure you're compressing it as much as possible before transmission, and further, that you're stripping out any records that aren't relevant for development work.
You can also create a patch against an older version of your database dump to ship over only the differences and not an entirely new copy of it. This works best when each INSERT statement is recorded one per line, something that may need to be engaged on your tool specifically. With MySQL this is the --skip-extended-insert option.
A better approach is to have a fake data generator that can roll out a suitably robust version of the database for testing and development. This is not too hard to do with things like Factory Girl which can automate routine record creation.
In case anyone's interested in an answer to the question of "how do I copy data between databases", I found this:
http://justbarebones.blogspot.com/2007/10/copy-model-data-between-databases.html
It answered the question I asked when I found this S.O. question.
I have always scripted all DB changes. But I'm wondering if I really need to do that with Grails applications, since GORM will make the changes automatically.
So what's the best practice for managing database changes in a Grails application?
This is the pattern that I've used on several large Grails projects (and some smaller ones):
In GORM we trustâ„¢ for the first
stages of development
(pre-production / without data)
Just before releasing to the
production environment start using a
tool like Autobase, Liquibase, Database Migration Tasks (similar to RoR's rake), or another schema versioning utility.
Maintain all database changes
through the tool in an automated
fashion.
Test your migrations, by writing tests that exercise corner cases and data integrity to the level that you are comfortable running them on production data.
I wouldn't use straight GORM in production unless it is a smaller project that can handle a few possible speed bumps and manual interventions.
Once you start managing several environments (local development, QA/UAT, Staging, Production) you be glad that you spent the time to manage your DB changes.
Liquibase and Autobase both give you some good tools for writing many of the common refactorings, but you can always drop down into raw SQL if you want/need to.
GORM won't handle all schema changes automatically; for example, if you delete a field from one of your domain objects, GORM won't delete the corresponding database column.
There are a couple of Grails plugins to manage database migration - see Autobase and Liquibase. I'd say that using whichever of these meets your needs more closely is the best practice; here's a good blog post that explains a few gotcha's with both plugins.