Am I the only one that queries more than one database? - ruby-on-rails

After much reading on ruby on rails and multiple database connections, it seems that I have found something that not that many folks do, at least not with ror. I am used to querying many different databases and schemas and pulling back the information either for a report or for one seamless page. So, a user doesn't have to log on to several different systems. I can create a page that has all the systems on one or two web pages.
Is that not a normal occurrence in the web and database driven design?
EDIT: Is this because most all my original code is in classic asp?

I really honestly think that most ORM designers don't seem to take the thought that users may want to access more than one database into account. This seems to be a pretty common limitation in the ORM universe.

Our client website runs across 3 databases, so I do this to. Actually, I'm condensing everything into views off of one central database which then connects to the others.
I never considered this to be "normal" behavior though. I would guess that most of the time you would be designing for one system and working against that.
EDIT: Just to elaborate, we use Linq to SQL for our data layer and we define the objects against the database views. This way we keep reports and application code working off the same data model. There is some extra work setting up the Linq entities, because you have to manually define primary keys and set up associations... however so far it has definitely proven worthwhile. We tried to do so with Entity Framework, but had a lot of trouble getting the relationships set up appropriately and had to give up. The funny thing is I had thought Entity Framework was supposed to be designed for more advanced scenarios like ours...

It is not uncommon to hit multiple databases during a single part of an application's workflow. However, in every instance that I have done it, this has been performed through several web service calls, which among other things wrap the databases in question.
I have not, to my knowledge, ever had a need to hit multiple databases directly at once and merge results into a single report.

I've seen this kind of architecture in corporate Portals- where lots of data is pulled in via different data sources. The whole point of a portal is to bring silo'd systems together- users might not want to be using lots of systems in isolation (especially if they have to sign into each one). In that sort of scenario it is normal, particularly if it is a large company that has expanded rapidly and has a large number of heterogenous systems.
In your case whether this is the right thing to do depends on why you have these seperate DBs.

With ORM's it may be a little difficult. However, it can be done. Pull the objects as needed from the various databases, then use them as a composite to create a new object that is the actual one that is desired. If you can skip the ORM part of the process, then you can directly query the databases and build your object directly.

Pulling data from two databases and compiling a report is not uncommon, but because cross-database queries cannot be optimized by the query engine of either database, OLTP systems typically use a single database, to keep the application performant.

If you build the system from the ground up, it is not advisable to do it this way. If you are working with a system you didn't design, there is no much choice and it is not uncommon (that is the difference between "organic" and "planned" grow).

Not counting master and various test instances, I hit nine databases on a regular basis. Yes, I inherited it, and yes, "Classic" ASP figures prominently. Of course, all the "brillant" designers of this mess are long gone. We're replacing it with things more sane as quickly as we safely can.
I would think that if you're building a new system, and keep adding databases and get to the point of two or three databases, it's probably time to re-think your design. OTOH, if you're aggregating data from multiple, disparate systems, then, no, it's not that strange. Depending on the timliness you need, and your budget for throwing hardware at the problem, and if your data is mostly static, this would be a good scenario for a "reporting server" that pulls the data down from the Live server periodically.

Related

How to handle database scalability with Ruby on Rails

I am creating a management system and I want to know how "Ruby on Rails" can support me in the mission of ensuring that each customer has their information, records and tables independent from other customers.
Is it better to put everything in a database and put a customer identifier to pull information through this parameter in queries or create a database for each customer automatically?
I admit that the second option attracts me more ... And I know that putting everything in one database will be detrimental to performance, because I assume that customers and their data will increase exponentially!
I want to know which option is more viable in the long run. And if the best option is to create separate databases, how can I do this with Ruby on Rails ??
There are pro and cons for both solutions which really depend on your use case.
Separating each customer in its own database has definitely advantages for scaling, running in different data centres or even onsite. However, this comes with higher complexity. For instance you can't query across customers anymore, you would need to run queries for each customers and aggregate the results. This approach is called multi tenancy (or shardening). There is a good gem called Apartment available (https://github.com/influitive/apartment).
Keeping everything in one database might be simpler to start of as it's less complex but it really depends on your use case.
Edit
Adding some more information based on the questions.
There are several reasons to use a one db per client architecture.
You have clearly separated tenants. In case it might make sense to go with the one db approach.
Scale. Having separated databases for each tenant makes scaling of course easier.
If 2) is the main reason you want to go for a one db per client approach I would strongly advise you against it. You add so much more complexity to your app which you might not need for years to come (if ever).
If scaling is your main concern I recommend reading Designing Data Intensive Applications by Martin Kleppmann. But basically, don't worry about scale for the first few years and focus on your product.

Data distribution for a system with SOA

I have a rails application which manages different types of items and users who own them. Items of different types might have different features. There is a number of sinatra services which have to access items (read-only, every service one specific item type).
Is it a good idea to create separate tables / databases for every service and to keep them in sync with the rails DB? In this case the main DB will hold all items. It's postgres, so hstore could be used for different features. On all updates a sync message will be sent using Redis pub/sub or RabbitMQ messaging. Services will subscribe and update service specific tables.
The system should be really reliable, scalable, and prepared for high-load and new not yet known item categories. What do you think? Does it make sense or are there better approaches for these requirements? Thank you in advance, I really appreciate your help!
There is no one-size-fits-all answer here. The answer depends on your requirements and these will decide which of two approaches you might take.
The first approach is conceptually the simplest, which is to have every service hit the same database. The advantage here is that you can scale up relatively easily, the system is simple and flexible, and you can do a lot with the database to keep things working well. The disadvantage is that db downtime will take down all services at once.
The second approach is to keep every service (or group of closely related services) as separate self-contained service, kept in sync with some sort of message passing. This has the advantage of being more robust in terms of delivering basic services, but far less robust in terms of everything staying in sync (because the CAP theorem's consistency requirement is sacrificed for availability, and your data is effectively partitioned).
I don't know which one you will want to use. To the extent possible I would usually choose the single db approach but I am a Postgres guy, not a Rails guy. The second approach also works quite well in some cases but it does have a complexity cost.

Is there some way in Delphi to cache master-detail rows and post both master and detail child rows at the same time

I want to post in memory some child rows, and then conditionally post them, or don't post them to an underlying SQL database, depending on whether or not a parent row is posted, or not posted. I don't need a full ORM, but maybe just this:
User clicks Add doctor. Add doctor dialog box opens.
Before clicking Ok on Add doctor, within the Add doctor dialog, the user adds one or more patients which persist in memory only.
User clicks Ok in Add doctor window. Now all the patients are stored, plus the new doctor.
If user clicked Cancel on the doctor window, all the doctor and patient info is discarded.
Try if you like, mentally, to imagine how you might do the above using delphi data aware controls, and TADOQuery or other ADO objects. If there is a non-ADO-specific way to do this, I'm interested in that too, I'm just throwing ADO out there because I happen to be using MS-SQL Server and ADO in my current applications.
So at a previous employers where I worked for a short time, they had a class called TMasterDetail that was specifically written to add the above to ADO recordsets. It worked sometimes, and other times it failed in some really interesting and difficult to fix ways.
Is there anything built into the VCL, or any third party component that has a robust way of doing this technique? If not, is what I'm talking about above requiring an ORM? I thought ORMs were considered "bad" by lots of people, but the above is a pretty natural UI pattern that might occur in a million applications. If I was using a non-ADO non-Delphi-db-dataset style of working, the above wouldn't be a problem in almost any persistence layer I might write, and yet when databases with primary keys that use identity values to link the master and detail rows get into the picture, things get complicated.
Update: Transactions are hardly ideal in this case. (Commit/Rollback is too coarse a mechanism for my purposes.)
Your asking two separate questions:
How do I cache updates?
How can I commit updates to related tables at the same time.
Cached updates can be accomplished a number of different ways. Which one is best depends on your specific situation:
ADO Batch Updates
Since you've already stated that you're using ADO to access the data this is a reasonable option. You simply need to set the LockType to ltBatchOptimistic and CursorType to either ctKeySet or ctStatic before opening the dataset. Then call TADOCustomDataset.UpdateBatch when you're ready to commit.
Note: The underlying OLEDB provider must support batch updates to take advantage of this. The provider for SQL Server fully supports this.
I know of no other way to enforce the master/detail relationship when persisting the data than to call UpdateBatch sequentially on both datasets.
Parent.UpdateBatch;
Child.UpdateBatch;
Client Datasets
Data caching is one of the primary reasons for TClientDataset's existence and synchronizing a master/detail relationship isn't difficult at all.
To accomplish this you define the master/detail relationship on two dataset components as usual (in your case ADOQuery or ADOTable). Then create a single provider and connect it to the master dataset. Connect a single TClientDataset to the provider and you're done. TClientDatset interprets the detail dataset as a nested dataset field, which can be accessed and bound to data aware controls just like any other dataset.
Once this is in place you simply call TClientDataset.ApplyUpdates and the client dataset will take care of ordering the updates for the master/detail data correctly.
ORMs
There is a lot that can be said about ORMs. Too much to fit into an answer on StackOverflow so I'll try to be brief.
ORMs have gotten a bad rap lately. Some pundits have gone so far as to label them an anti-pattern. Personally I think this is a bit unfair. Object-relational mapping is an incredibly difficult problem to solve correctly. ORMs attempt to help by abstracting away a lot of the complexity involved in transferring data between a relational table and an instance of an object. But like with everything else in software development there are no silver bullets and ORMs are no exception.
For a simple data entry application without a lot of business rules an ORM is probably overkill. But as an application becomes more and more complex an ORM starts to look more appealing.
In most cases you'll want to use a third party ORM rather than rolling your own. Writing a custom ORM that perfectly fits your requirements sounds like a good idea and its easy to get started with simple mappings but you'll soon start running into issues like parent/child relationships, inheritance, caching and cache invalidation (trust me I know this from experience). Third party ORMs have already encountered these issues and spent an enormous amount of resources to solve them.
With many ORMs you trade code complexity for configuration complexity. Most of them are actively working to reduce the boilerplate configuration by turning to conventions and policies. If you name all your primary keys Id rather than having to map each table's Id column to a corresponding Id property for each class you simply tell the ORM about this convention and it assumes all tables and classes its aware of follow the convention. You only have to override the convention for specific cases where it doesn't apply. I'm not familiar with all of the ORMs for Delphi so I can't say which support this and which don't.
In any case you'll want to design your application architecture so you can push off the decision of which ORM framework (or for that matter any framework) to use as long as possible.

The Ruby community values simplicity...what's your argument for simplifying a db schema in a new project?

I'm working on a project with developers who have not worked with Ruby OR Rails before.
They have created a schema that is too complicated, in my opinion. The schema has 117 tables, and obtaining the simplest piece of information would require traversing/joining 7 tabels...and of course, there's no "main" table that serves as a sort of key between them. The schema renders many of the rails tools like 'find' method, and many of the has_many/belongs to relationships almost useless. And coding for all of these relationships will likely be more time-consuming than we have the money to code for.
THE QUESTION:
Assuming you are VERY convinced (IMHO...hehe) that the schema is not ideal, and there are multiple ways to represent the domain, how would you argue FOR simplifying the schema (aside from what I've already said)?
I'll stand up in 2 roles here
DBA: Database admin/designer.
Dev: Application developer.
I assume the DBA is a person who really know all the Database tricks. Reaallyy Knows.
DBA:
Database is the key of the application and should have predefined structure in order to serve its purpose well and with best performance.
If you cannot use random schema (which is reasonably normalised and good) then the tools are wrong.
Dev:
The database is just a data store, so we need to keep it simple and concentrate on the application.
DBA:
Database is not a store it is the core of the application. There is no application without database.
Dev:
No. The application is the core. There is no application without the front-end and the business logic applied to it.
And the war begins...
Both points are valid and it is always trade off.
If the database will ONLY be used by RoR, then you can use it more like a simple store.
If the DB can be used by other application OR it will be used with large amount of data and high traffic it must enforce some best practices.
Generally there is no way you can disagree with DBA.
But they can understand your situation and might allow you to loose the standards a bit so you could be more productive.
So you need to work closely, together.
And you need to talk to each other to explain and prove the point why database should be like this or that.
Otherwise, the team is broken and project can be failure with hight probability.
ActiveRecord is a very handy tool. But it cannot do everything for you. It does not provide Database structure by default that you expect exactly. So it should be tuned.
On the other side. If DBA can accept that all PKs are Auto incremented integers that would make Developer's life easier (ActiveRecord does it by default).
On the other side, if developers would accept some of DBA constraints it would make DBA's life easier.
Now to answer your question:
how would you argue FOR simplifying the schema
Do not argue. Meet the team and deliver the message and point on WHY it should be done.
Maybe it really shouldn't and you don't know all the things, maybe they are not aware of something.
You could agree on the general structure of the database AND try to describe it using RoR migrations as a meta language.
This way they would see the general picture, and you would use your great ActiveRecords.
And also everybody would be on the same page.
Your DB schema should reflect the domain and its relationships.
De-normalisation should only be done when you have measured that there is a performance problem.
7 joins is not excessive or bad, provided you have good indexes in place.
The general way to make this argument up the chain is based on cost. If you do things simply, there will be less code and fewer bugs. The system will be able to be built more quickly, or with more features, and thus will create more ROI. If you can get the money manager on board with that approach, he or she may let you dictate terms to the team. There is the counterargument that extreme over-normalization prevents bad data, but I have found that this is not the case, as the complexity it engenders tends to lead to more errors and more database code in general.
The architectural and technical argument here is simple. You have decided to use Ruby on Rails. Therefore you have decided to use the ActiveRecord pattern. The ActiveRecord pattern is driven by having the database tables match the object model. That's the pattern in use here, and in many other places, so the best practices they are trying to apply for extreme data normalization simply do not apply. Buy a copy of Patterns of Enterprise Application Architecture and put the little red bookmark at page 160 so they can understand how the pattern works from the architecture perspective.
What the DBA types tend to be unaware of is how much work ActiveRecord does for you, from query generation, cascading deletes, optimistic locking, auto populated columns, versioning (with acts_as_versioned), soft deletes (with acts_as_paranoid), etc. There is a strong argument to use well tested, community supported library functions to perform these operations versus custom code that must be maintained by a DBA.
The real issue with DBAs is then that they need some work to do. Let them focus on monitoring performance, finding slow queries in the code, creating indexes and doing backups.
If you end up losing the political battle for a sane schema, you may want to consider switching to DataMapper. It's the next pattern in PoEAA. The other thing you may be able to get them to do is to create views in the database that correspond to the object model. This way, you could use many of the finding capabilities in the ActiveRecord model based on the views, but have custom insert, update, and delete methods.

server side db programming: why?

Given that database is generally the least scalable component (of a web application), are there any situations where one would put logic in procedures/triggers over keeping it in his favorite programming language (ruby...) or her favorite web framework (...rails!).
Server-side logic is often much faster, even with procedural approach.
You can fine-tune your grant options and hide the data you don't want to show
All queries in one places are more convenient than if they were scattered all around the code.
And here's a (very subjective) article in my blog on the reason I prefer stored procedures:
Schema Junk
BTW, triggers (as opposed to functions / stored procedures / packages) I generally dislike.
They are completely other story.
You're keeping the processing in the database, along with the data.
If you process on the server side, then you have to transfer the data out to a server process across the network, process it, and (optionally) send it back. You have the network bandwidth/latency issues, plus memory overheads.
To clarify - if I have 10m rows of data, my two extreme scenarios are to a) pull those 10m rows across the network and process on the server side, or b) process in place in the database using the server and language (SQL) optimised for this purpose. Note that this is a generalisation and not a hard-and-fast rule, but it's the one I follow for most scenarios.
When many heterogeneous applications and various other systems need to access your single database and be sure through their operations data stays consistent without integrity conflicts. So you put your logic into triggers and stored procedures that will offer an interface to external clients.
Maybe not for most web-based systems, but certainly for enterprise databases. Stored procedures and the like allow you much greater control over security and performance, as well as offering a bit of encapsulation for the database itself. You can change the schema all you want as long as the stored procedure interface remains the same.
In (almost) every situation you would keep the processing that is part of the database in the database. Application code cannot substitute for triggers, you won't get very far before you have updated the database and failed to fire the application's equivalent of the triggers (the first time you use the DBMS's management console, for instance).
Let the database do the database work and let the application to the application's work. If you have a specific performance problem with the database, and that performance problem can be addressed by moving processing from the database, in that case you might want to consider doing so.
But worrying about database performance without a database performance problem existing (which is what you seem to be doing here) is both silly and, sadly, apparently a pre-occupation of many Stackoverlow posters.
Least scalable? SQL???
Look up, "federating."
If the database is shared, having logic in the database is better in order to control everything that happens. If it's not it might just make the system overly complicated.
If you have multiple applications that talk to your database, stored procedures and triggers can enforce correctness more pervasively. Accordingly, if correctness is more important than convenience, putting logic in the database is sensible.
Scalability may be a red herring, though. Sometimes it's easier to express the behavior you want in the domain layer of an OO language, but it can be actually more expensive than doing the idiomatic SQL way.
The security mechanism at a previous company was first built in the service layer, then pushed to the db side. The motivation was actually due to some limitations in a data access framework we were using. The solution turned out to be a bit buggy because our security model was complicated, but the upside was that bugs only had to be fixed in the database; we didn't have to worry about different clients following different rules.
Triggers mean 3rd-party apps can modify the database without creating logical inconsistencies.
If you do that, you are tying your business logic to your model. If you code all your business logic in T-SQL, you aren't going to have a lot of fun if later you need to use Oracle or what have you as your database server. Actually, I'm not sure I understand this question exactly. How do you think this would improve scalability? It really shouldn't.
Personally, I'm really not a fan of triggers, particularly in a database dedicated to a single application. I hate trying to track down why some data is inconsistent, to find it's down to a poorly written trigger (and they can be tricky to get exactly correct).
Security is another advantage of using stored procs. You do not have to set the security at the table level if you don't use dynamic code (Including ithe stored proc). This means your users cannot do anything unless they have a proc to to it. This is one way of reducing the possibility of fraud.
Further procs are easier to performance tune than most application code and even better, when one needs to change, that is all you have to put on production, not recomplie the whole application.
Data integrity must be maintained at the database level. That means constraints, defaults values, foreign keys, possibly triggers (if you have very complex rules or ones involving multiple tables). If you do not do this at the database level, you will eventually have integrity issues. Peolpe will write a quick fix for a problem and run the code in the query window and the required rules are missed creating a larger problem. A millino new records will have to be imported through an ETL program that doesn't access the application because going through the application code would take too long running one record at a time.
If you think you are building an application where scalibility will be an issue, you need to hire a database professional and follow his or her suggestions for design based on performance. Databases can scale to terrabytes of data but only if they are originally designed by someone is a specialist in this kind of thing. When you wait until the while application is runnning slower than dirt and you havea new large client coming on board, it is too late. Database design must consider performance from the beginning as it is very hard to redesign when you already have millions of records.
A good way to reduce scalability of your data tier is to interact with it on a procedural basis. (Fetch row..process... update a row, repeat)
This can be done within a stored procedure by use of cursors or within an application (fetch a row, process, update a row) .. The result (poor performance) is the same.
When people say they want to do processing in their application it sometimes implies a procedural interaction.
Sometimes its necessary to treat data procedurally however from my experience developers with limited database experience will tend to design systems in a way that do not leverage the strenght of the platform because they are not comfortable thinking in terms of set based solutions. This can lead to severe performance issues.
For example to add 1 to a count field of all rows in a table the following is all thats needed:
UPDATE table SET cnt = cnt + 1
The procedural treatment of the same is likely to be orders of magnitude slower in execution and developers can easily overlook concurrency issues that make their process inconsistant. For example this kind of code is inconsistant given the avaliable read isolation levels of many RDMBS platforms.
SELECT id,cnt FROM table
...
foreach row
...
UPDATE table SET cnt = row.cnt+1 WHERE id=row.id
...
I think just in terms of abstraction and ease of servicing a running environment utilizing stored procedures can be a useful tool.
Procedure plan cache and reduced number of network round trips in high latency environments can also have significant performance advantages.
It is also true that trying to be too clever or work very complex problems in the RDBMS's half-baked procedural language can easily become a recipe for disaster.
"Given that database is generally the least scalable component (of a web application), are there any situations where one would put logic in procedures/triggers over keeping it in his favorite programming language (ruby...) or her favorite web framework (...rails!)."
What makes you think that "scalability" is the only relevant concern in a system design ? I agree with rexem where he commented that it is very obvious that you are "not" biased ...
Databases are sets of assertions of fact. Those sets become more valuable if they can also be guaranteed to conform to certain integrity rules. Those guarantees are not worth a dime if it is the applications that are expected to enforce such integrity. Triggers and sprocs are the only way SQL systems have to allow such guarantees to be offered by the DBMS itself.
That aspect outweighs "scalability" anytime, anywhere, anyhow.

Resources