What is the optimum usage of postgreSQL indexes and Rails? - ruby-on-rails

I am new to database optimisations and I've been creating Rails apps for some time now, by using scaffolding tools, generators and web tutorials. So far, pretty much everything works fine, no big loads of users/traffic/data and the UX is flawless.
I have just started to create a huge application that will hold customer support tickets and updates, so I am starting to worry about performance in the near future. I only remember indexes from my university education as database references for fast search.
What is unclear to me is when should I use them and how? I've seen the scaffolders add an add_index for the id and created_at datetime of a new model but I don't know If they are right to use and what should I add by myself in order to keep the object retrieval speed to an optimum/fast level and also learn common practises when creating models.
I've also seen indexes including 2 or more fields. Why?
My models are User, Customer, SupportTicket, so any recommendations will be appreciated. I am looking into possible examples in Rails.

Indices will make your searches fast. You need to analyse your models and see, by which all parameters you are going to search more.
Example
User.where(email: "test#example.com")
In general, we always search for users using an email address, so it makes sense to put an index on it. So as a thumb rule, you need to check your fields and the ones, which would be used with .find or .where condition more.
However, the index also needs its own space, which can also take up a significant amount of space, so you need to be careful about that.

Related

Is ~44 columns too much for a model? Does it make sense to break a one-to-one relation?

I am interested in what the best practice is for a model that has a lot of data attached to it. Most of my app revolve around one model (SKU), and it seems to have more and more things associated with it.
For example, my SKU model has multiple prices, dimensions, weight, recommended prices for multiple price levels, title, description, shelf life, etc. Would it make sense to break all the pricing info to another table? Or break up the SKU into different uses of the SKU and associate them? For example, WebSKU, StockSKU, etc.
As mentioned in the answer linked by Tom, if all your attributes really belong to that model there is no reason to break it up. However, if you have columns like price1, price2, price3 or dimension_x_1, dimension_y_1, dimension_x_2, dimension_y_2, etc, then it usually means you should be creating another table to contain those.
For example, you could set it up so that you have the following models
Sku
has_many :prices
has_many :dimensions
Price
belongs_to :sku
Dimension
belongs_to :sku
As everyone else said, the design of a database should respond to the logic behind it. Why? Mainly, because it will be easier to maintain and understand.
I was also going to drive attention to normalization rules, as #sawa did.
Generally, is a good approach to normalize your database, as it provides several advantages. You should read this wikipedia link (at least as a starting point).
Following normal rules will help you to design your database taking into account the logic behind your data.
But denormalization also has it's advantages. The first (always considered) being optimizing read performance. This basically means having data on one table that you would have had in different tables when following normal rules, and generally makes sense when that data has some logic relation.
You have to aim to achieve a balance depending on the problem you are facing.
On the other side, for the tags on your post I can see you are using ruby on rails, that uses the active record pattern. One consequence of the database model you are presenting, is that you will probably have a domain model just as complex. I mean, very large. I don't know every detail about your project, but I guess that it will quickly grow to be a god object, making your code hard to maintain, extend and understand.
Database should be designed not according to how many columns it has, but according to logic, particularly following Codd's normal forms. If there is systematic redundancy in your database, then that is a sign for splitting it into multiple tables. If not, keep it as is.
I think it is good to design data model, taking into account how DB engine works with files and memory. The first bottleneck of PostgreSQL is file IO. Memory consumption is also an important part. When PostgreSQL reads some table data (FYI: table data is not read at Index-Only-Scans) it reads 8 KB (compile time parameter) pages. More tuples in such a page, - less file IO, less memory consumption, better cache using (more often hits, fast prewarming, etc.), better performance.
So, if one have a really high-loaded project, it can be useful to think about separation of often used data to isolated tables (as a next step - place this tables into a separate tablespace on SDD or powerful RAID).
I.e. there should be some balance between a logic simplicity and performance tweaks.

How to Organize an out of control table?

Hello and good morning.
I am working on a side project where I am adding an analytic board to an already existing app. The problem is that now the users table has over 400 columns. My question is that what's a better way of organizing this table such as splintering the table off into separate tables. How do you do that and how do you communicate the tables between the new tables?
Another concern is that If I separate the table will I still be able to save into it through the user model? I have code right now that says:
user.wallet += 100
user.save
If I separate wallet from user and link the two tables will I have to change this code. The reason I'm asking this is that there is a ton of code like this in the app.
Thank you so much if you can help me understanding how to organize a database. As a bonus if there is a book that talks about database organization can you recommend it to me (preferably one that is in rails).
Edit: Is there also a way to do all of this without loosing any data. For example transfer the data to a new column on the new table then destroying the old column.
Please read about:
Database Normalization
You'll get loads of hits when searching for that string and there are many books about database design covering that subject.
It is most likely, that this table of yours lacks normalization, but you have to see yourself!
Just to give an orientation - I would get a little anxious when dealing with a tenth of that number of columns. That saying, I clearly have to stress that there might be well normalized tables with 400 columns as well as sloppily created examples with just 10 columns.
Generally speaking, the probability of dealing with bad designed tables and hence facing trouble simply rises with the number of columns.
So take your time and if you find out, that users table needs normalization next step would indeed be to spread data over several tables. Because that clearly (and most likely even heavily) affects the coding of your application here is where you thoroughly have to balance pros and cons - simply impossible to judge that from far away.
Say, you have substantial problems (e.g. fierce performance problems - you wouldn't post it) that could be eased by normalization there are different approaches of how to split data. Here please read about:
Cardinalities
Usually the new tables are linked by
Foreign Keys
, identical data (like a user id) that appear in multiple tables and that are used to join them, that is.
And finally, yes, you can do that without losing data as the overall amount of information never changes when normalizing.
In case your last question was meant to be technical: There is no problem in reading data from one column and inserting them into a new one (of a new table). That has to happen in a certain order as foreign keys have to be filled before you can use them. See
Referential Integrity
However, quite obvious: Deleting data and dropping columns interferes with the operability of your application. Good planning is due.

Does a serialized hash column make more sense than an associated model/table for flexibility?

I have been researching quite a bit and the general consensus is to avoid serialized hashes in a DB whenever possible, however the design I have lends itself to this structure, so I'm hoping to get some opinions and/or advice. Here is the scenario:
I have a model/table :products which houses financial products. Each product has_many investment strategies, which I had originally stored in a separate :strategies model/table. Since each product has completely different strategies, and each strategy has different attributes, its become extremely difficult (and hacky) to manipulate each strategy's attributes into normalized, consistent columns (to the point where I have products that I simply cannot add to the application). Additionally, a strategy's attributes can sometimes change based on the amount of money allocated to that strategy.
In order to solve this issue, I am looking into removing the :strategies model/table altogether and simply adding a strategies column to my :products model/table. The new column would house a multi-dimensional hash of each product's strategies. This option allows for tremendous flexibility from a data storage perspective.
My primary question is, do I lose any functionality by restructuring my database this way? There will be times when I need to search a product by it's strategy's attributes and I have read that searching within a multi-dimensional hash is difficult at best. Is this considered bad practice? Is there a third solution that I haven't considered?
The advantages of rolling with multiple tables for this design is you can leverage the database to protect your data with constraints, functions and triggers. The database is the only place you can protect your customers data with 100% confidence. These tried and true techniques have lost their luster in recent years and are viewed as cumbersome and/or unnecessary to those who do not understand them.
Hash based stores within relational databases are currently changing quickly due to popularity of nosql databases, however, traditionally it has been difficult to fully protect your customers data from the database with this implementation. Therefore, the application layer is where much of this protection lives. With that said, this is being innovated on and maybe someday they will solve it.
The big advantage of using the hash as a column in a table is you can get up and going more quickly while your figuring out your problem. In addition, you can pivot more easily because most modifications are made in the application layer on the fly.
Full text seaching and complex queries can also be a bit more difficult if your using an hash based store within a relational database.
General rule of thumb is if you need the data to safe and or have some complex reporting to do, go relational. Think a big financial services type app ;) Otherwise if your building a more social, data display style app, or just mocking things up there is nothing wrong with a serialized hash column. Most importantly remember to write tests so you can refactor more confidently if you choose wrong!
My $0.02
I would be curious to know which decision you choose and how it has worked out.

What database should I use in an app where my models don't represent different ideas, but instead different types with overlapping fields?

I'm building an application where I will be gathering statistics from a game. Essentially, I will be parsing logs where each line is a game event. There are around 50 different kinds of events, but a lot of them are related. Each event has a specific set of values associated with it, and related events share a lot of these attributes. Overall there are around 50 attributes, but any given event only has around 5-10 attributes.
I would like to use Rails for the backend. Most of the queries will be event type related, meaning that I don't especially care about how two event types relate with each other in any given round, as much as I care about data from a single event type across many rounds. What kind of schema should I be building and what kind of database should I be using?
Given a relational database, I have thought of the following:
Have a flat structure, where there are only a couple of tables, but the events table has as many columns as there are overall event attributes. This would result in a lot of nulls in every row, but it would let me easily access what I need.
Have a table for each event type, among other things. This would let me save space and improve performance, but it seems excessive to have that many tables given that events aren't really seperate 'ideas'.
Group related events together, minimizing both the numbers of tables and number of attributes per table. The problem then becomes the grouping. It is far from clear cut, and it could take a long time to properly establish event supertypes. Also, it doesn't completely solve the problem of there being a fair amount of nils.
It was also suggested that I look into using a NoSQL database, such as MongoDB. It seems very applicable in this case, but I've never used a non-relational database before. It seems like I would still need a lot of different models, even though I wouldn't have tables for each one.
Any ideas?
This feels like a great use case for MongoDB and a very awkward fit for a relational database.
The types of queries you would be making against this data is very key to best schema design but imagine that your documents (in a single collection similar to 1. above) look something like this:
{ "round" : 1,
"eventType": "et1",
"attributeName": "attributeValue",
...
}
You can easily query by round, by eventType, getting back all attributes or just a specified subset, etc.
You don't have to know up front how many attributes you might have, which ones belong with which event types, or even how many event types you have. As you build your prototype/application you will be able to evolve your model as needed.
There is a very large active community of Rails/MongoDB folks and there's a good chance that you can find a lot of developers you can ask questions and a lot of code you can look at as examples.
I would encourage you to try it out, and see if it feels like a good fit. I was going to add some links to help you get started but there are too many of them to choose from!
Since you might have a question about whether to use an object mapper or not so here's a good answer to that.
A good write-up of dealing with dynamic attributes with Ruby and MongoDB is here.

Rails 3 when is it too late to switch to NoSQL? (case: creating a followers-system)

I have a table for user and another table for follower. The followers table is a list of user_ids and follower_ids. Seems pretty straight forward.
I've been planning on using mysql for production and I feel like down the road, this is really going to bite me in the a$$
Should I switch to MongoDB? Is it too late?
I've never dealt with NoSQL-anything and I'm wondering how to get around the issue of joins. I wouldn't care about putting a little effort forth to fix this problem except I separated my users from their profiles. I am under the assumption that activerecord uses joins in a statement such as #name = User.profile.full_name
What you need to consider is separating, conceptually, the data storage technology from your data structure. MySQL can be scaled, scaled, and scaled some more if you know how to do it as it's an old and proven platform. While it probably doesn't have the same shiny new appeal of something like NoSQL, it does have a very good track-record and that's often what counts.
There's a number of ways to tune MySQL to perform more quickly. The built-in clustering and replication features mean you can often scale up to multiple instances very easily, and using a simpler, faster database engine like MyISAM can give you order-of-magnitude performance gains in some circumstances.
MongoDB is a very interesting experiment, but so far it hasn't really earned its stripes. If it's anything like other noble NoSQL projects like Cassandra it will still need years of work to be truly "web scale".
In your particular case, let's say you want to find a list of a user's followers. You're probably doing something like this:
SELECT followers.id, followers.name FROM user_followers
LEFT JOIN users AS followers ON followers.id=user_followers.follower_id
WHERE user_followers.user_id=?
You're right in presuming that the JOIN here will cause trouble down the road. What you're overlooking is that you can easily remove the join using the same humble trick that is essential to making your application scale: denormalizing important information.
What if you copied the follower's name into the user_followers table each time you add an entry to it:
SELECT follower_id, follower_name FROM user_followers
WHERE user_id=?
Now there's no joins. The only catch is that when you start to denormalize things you should implement a method to bring the copies back into sync from the master should something get messed up, and you must be careful to ensure that a change in the master value should propagate to the copies as expediently as is required.
A simple mass update could be as easy as:
UPDATE users,user_followers
SET user_followers.follower_name=users.name
WHERE user_followers.follower_id=users.id
If the users are unable to change their names, though, you wouldn't even need to worry. Sometimes constraints can help you in this regard.

Resources