database/datamart best practice - grails

I have developed a grails application that stores a great deal of information. Currently, when I wish to run analysis on the large sets of data, it can be pretty time consuming. To help speed things up, I have decided to move all the calculated and aggregated data off into a "data mart". This way, a process can be run, maybe by a cron job, to work with all the records, pull out all the requested information, and store the calculated and requested data in separate tables.
My questions are: First, does this seem like the best way to tackle the problem? If so, I'm trying to figure out the best way to manage the new domain classes. Should I keep them in the same domain project folder or is it possible to create a new folder? My domain classes just seem to be getting very cluttered and I would rather a way to separate the relational tables from the data-mart tables. Any suggestions for best structuring would be great.
I am using groovy on grails and MySQL database
thanks
jason

It sounds like what you are doing is a pretty good idea. You will notice that the stack set of apps also have data aggregation processes the run every day (rankings), every few days (badge calculations), etc.
You could create a new package for all the 'datamart' classes required, to keep it separate.
If you don't need anything in your current app, you can even create a new project. Keep in mind that if you need to pull all of the data out of your tables, hibernate might not be the best solution. If possible, leverage your db to do calculations for you.

Related

How to handle database scalability with Ruby on Rails

I am creating a management system and I want to know how "Ruby on Rails" can support me in the mission of ensuring that each customer has their information, records and tables independent from other customers.
Is it better to put everything in a database and put a customer identifier to pull information through this parameter in queries or create a database for each customer automatically?
I admit that the second option attracts me more ... And I know that putting everything in one database will be detrimental to performance, because I assume that customers and their data will increase exponentially!
I want to know which option is more viable in the long run. And if the best option is to create separate databases, how can I do this with Ruby on Rails ??
There are pro and cons for both solutions which really depend on your use case.
Separating each customer in its own database has definitely advantages for scaling, running in different data centres or even onsite. However, this comes with higher complexity. For instance you can't query across customers anymore, you would need to run queries for each customers and aggregate the results. This approach is called multi tenancy (or shardening). There is a good gem called Apartment available (https://github.com/influitive/apartment).
Keeping everything in one database might be simpler to start of as it's less complex but it really depends on your use case.
Edit
Adding some more information based on the questions.
There are several reasons to use a one db per client architecture.
You have clearly separated tenants. In case it might make sense to go with the one db approach.
Scale. Having separated databases for each tenant makes scaling of course easier.
If 2) is the main reason you want to go for a one db per client approach I would strongly advise you against it. You add so much more complexity to your app which you might not need for years to come (if ever).
If scaling is your main concern I recommend reading Designing Data Intensive Applications by Martin Kleppmann. But basically, don't worry about scale for the first few years and focus on your product.

Dynamic database connection in a Rails App

I'm quite new to Rails but in my current assignment I have no other choice but use RoR. My problem is that in my app I would like to create, connect and destroy databases automatically on user demand but as far as I understand it is quite hard to accomplish this with ActiveRecord. It would be nice to hear some advice from more experienced RoR developers on this issue.
The problem in details:
I have a main database (which I access with activerecord). In this database I store a list of my active programs (and some template data for creating new programs). I would like to create a separate database for each of this programs (when a user creates a new program in my app).
In the programs' databases I would like to store the state and basic info of the particular program and a huge amount of program related data (which is used to calculate the state and is necessary to have for audit reasons).
My problem is that for example I want a dashboard listing all the active programs and their state data. So first I have to get the list from my main db and after that I have to connect to all the required program databases and get the state data.
My question is what is the best practice to accomplish this? What should I use (ActiveRecord, a particular gem, etc.)?
Hi, thanks for your answers so far, I would like to add a couple of details to make my problem more clear for you:
First of all, I'm not confusing database and table. In my case there is a tool which is processing log files. Its a legacy tool (written in ruby 1.8.6) and before running it, I have to run an SQL script which creates a database with prefilled- and also with empty tables for this tool. The tool then processes the logs and inserts the calculated data into different tables in this database. The catch is that the new system should support running programs parallel which means I have to create different databases for different programs.(this was not an issue so far while the tool was configured by hand before each run, but now the configuration must be automatic by my tool) There is no way of changing the legacy tool while it would be too complicated in the given time frame, also it's a validated tool. So this is the reason I cannot use different tables for different programs, because my solution should be based on an other tool.
Summing my task up:
I have to crate a complex tool using RoR and Ruby 2.0.0 which:
- creates a specific database for a legacy tool every time a user want to start a new program
- configures this old tool on a daily basis to process the required logs and insert the calculated data into the appropriate database
- access these databases and show dashboards based on their data
The database I'm using is MySQL.
I cannot use other framework, because the future owner of my tool won't be able to manage/change/update it. So I have to go with RoR, which is quite painful for me right now and I really hope some of you guys can give me a little guidance.
Ok, this is certainly outside of the typical use case scenario, BUT it is very doable within Rails and ActiveRecord.
First of all, you're going to want to execute some SQL directly, which is fine, but you'll also have to take extra care if you're using user input to determine the name of the new database for instance, and do your own escaping. (Or use one of ActiveRecord's lower-level escaping methods that we normally don't worry about.) The basic idea though is something like:
create_sql = <<SQL
CREATE TABLE foo ...
SQL
ActiveRecord::Base.connection.execute(create_sql)
Although now that I look at ActiveRecord::ConnectionAdapters::Mysql2Adapter, there's a #create method that might help you.
The next step is actually doing different things in the context of different databases. The key there is ActiveRecord::Base.establish_connection. Using that, and passing in the params for the database you just created, you should be able to do what you need to for that particular db. If the db's weren't being created dynamically, I'd put that line at the top of a standard ActiveRecord model so that that model would always connect to that db instead of the main one. If you want to use the same class, and connect it to different db's (one at a time of course), you would probably remove_connection before calling establish_connection to the next one.
I hope this points you in the right direction. Good luck!

What's the best practice for handling mostly static data I want to use in all of my environments with Rails?

Let's say for example I'm managing a Rails application that has static content that's relevant in all of my environments but I still want to be able to modify if needed. Examples: states, questions for a quiz, wine varietals, etc. There's relations between your user content and these static data and I want to be able to modify it live if need be, so it has to be stored in the database.
I've always managed that with migrations, in order to keep my team and all of my environments in sync.
I've had people tell me dogmatically that migrations should only be for structural changes to the database. I see the point.
My counterargument is that this mostly "static" data is essential for the app to function and if I don't keep it up to date automatically (everyone's already trained to run migrations), someone's going to have failures and search around for what the problem is, before they figure out that a new mandatory field has been added to a table and that they need to import something. So I just do it in the migration. This also makes deployments much simpler and safer.
The way I've concretely been doing it is to keep my test fixture files up to date with the good data (which has the side effect of letting me write more realistic tests) and re-importing it whenever necessary. I do it with connection.execute "some SQL" rather than with the models, because I've found that Model.reset_column_information + a bunch of Model.create sometimes worked if everyone immediately updated, but would eventually explode in my face when I pushed to prod let's say a few weeks later, because I'd have newer validations on the model that would conflict with the 2 week old migration.
Anyway, I think this YAML + SQL process works explodes a little less, but I also find it pretty kludgey. I was wondering how people manage that kind of data. Is there other tricks available right in Rails? Are there gems to help manage static data?
In an app I work with, we use a concept we call "DictionaryTerms" that work as look up values. Every term has a category that it belongs to. In our case, it's demographic terms (hence the data in the screenshot), and include terms having to do with gender, race, and location (e.g. State), among others.
You can then use the typical CRUD actions to add/remove/edit dictionary terms. If you need to migrate terms between environments, you could write a rake task to export/import the data from one database to another via a CSV file.
If you don't want to have to import/export, then you might want to host that data separate from the app itself, accessible via something like a JSON request, and have your app pull the terms from that request. That seems like a lot of extra work if your case is a simple one.

Asp.net mvc small amount data storage

I am writing some learning tests (i.e. what's the answer for...; choose correct options...). Now my question is, how should I store them. SQL db seems quite an overkill, but I really don't know what would be the best choice if I wanted to select random subset of questions etc. Perhaps some simple xml files?
Thanks for advice.
A RDBMS could be a good option for you since it sounds like you're wanting to collect some sort of result set based on the questions your asking. This way you'll be able to tie the questions, answers and users together is some logical way.
You could easily store the questions in an XML file and that would work, however it makes it a little more tricky to tie your overall data together.
One thing you could do is draw or write out a real plan of how you'd like your data to interact with itself and other data. This will probably give you a better idea of what needs to be done and how to go about doing it.
The easiest way is to go for a relation database solution. If you don't need support by the heavier db's out there like ms sql server i should have a look at ms sql express or sql lite. Thees database don't need any database server running to work, they are just file based databased and are easily moved around....

Am I the only one that queries more than one database?

After much reading on ruby on rails and multiple database connections, it seems that I have found something that not that many folks do, at least not with ror. I am used to querying many different databases and schemas and pulling back the information either for a report or for one seamless page. So, a user doesn't have to log on to several different systems. I can create a page that has all the systems on one or two web pages.
Is that not a normal occurrence in the web and database driven design?
EDIT: Is this because most all my original code is in classic asp?
I really honestly think that most ORM designers don't seem to take the thought that users may want to access more than one database into account. This seems to be a pretty common limitation in the ORM universe.
Our client website runs across 3 databases, so I do this to. Actually, I'm condensing everything into views off of one central database which then connects to the others.
I never considered this to be "normal" behavior though. I would guess that most of the time you would be designing for one system and working against that.
EDIT: Just to elaborate, we use Linq to SQL for our data layer and we define the objects against the database views. This way we keep reports and application code working off the same data model. There is some extra work setting up the Linq entities, because you have to manually define primary keys and set up associations... however so far it has definitely proven worthwhile. We tried to do so with Entity Framework, but had a lot of trouble getting the relationships set up appropriately and had to give up. The funny thing is I had thought Entity Framework was supposed to be designed for more advanced scenarios like ours...
It is not uncommon to hit multiple databases during a single part of an application's workflow. However, in every instance that I have done it, this has been performed through several web service calls, which among other things wrap the databases in question.
I have not, to my knowledge, ever had a need to hit multiple databases directly at once and merge results into a single report.
I've seen this kind of architecture in corporate Portals- where lots of data is pulled in via different data sources. The whole point of a portal is to bring silo'd systems together- users might not want to be using lots of systems in isolation (especially if they have to sign into each one). In that sort of scenario it is normal, particularly if it is a large company that has expanded rapidly and has a large number of heterogenous systems.
In your case whether this is the right thing to do depends on why you have these seperate DBs.
With ORM's it may be a little difficult. However, it can be done. Pull the objects as needed from the various databases, then use them as a composite to create a new object that is the actual one that is desired. If you can skip the ORM part of the process, then you can directly query the databases and build your object directly.
Pulling data from two databases and compiling a report is not uncommon, but because cross-database queries cannot be optimized by the query engine of either database, OLTP systems typically use a single database, to keep the application performant.
If you build the system from the ground up, it is not advisable to do it this way. If you are working with a system you didn't design, there is no much choice and it is not uncommon (that is the difference between "organic" and "planned" grow).
Not counting master and various test instances, I hit nine databases on a regular basis. Yes, I inherited it, and yes, "Classic" ASP figures prominently. Of course, all the "brillant" designers of this mess are long gone. We're replacing it with things more sane as quickly as we safely can.
I would think that if you're building a new system, and keep adding databases and get to the point of two or three databases, it's probably time to re-think your design. OTOH, if you're aggregating data from multiple, disparate systems, then, no, it's not that strange. Depending on the timliness you need, and your budget for throwing hardware at the problem, and if your data is mostly static, this would be a good scenario for a "reporting server" that pulls the data down from the Live server periodically.

Resources