Key/Value Postgres Sql Table Performance - ruby-on-rails

I am currently building a Rails app where there is a "documents" data table that stores references to pdfs living on an S3 server. These documents could have 100 different types. Each type can have up to 20 attributes or meta info.
My dilemma is do I make 100 relational tables for every doc type or just create one key/value data table with a reference to the doc_id.
My gut tells me to go key/value for flexibility for searching and supporting more and more document types over time without having to create new migrations. However, I know there are pitfalls with this technique. My first concern of course is the size of the table. The key/value table could end up with millions of rows.
On the other hand, having 100 attribute tables would be nightmare to query against in a full text search situation.
So bottom line is, by going with key/value, is performance on a 3 column Postgres table with potentially millions of rows a scaling problem? Also what about joins on the value field?
This data would almost never change by the way. So it would be 90% reads.

Consider a single table with an hstore column. It is a PostgreSQL data type designed for storing key/value pairs.
http://www.postgresql.org/docs/9.1/static/hstore.html
There are also multiple Ruby gems that add hstore support to ActiveRecord. Here is one that I wrote: https://github.com/JackC/surus You can search ruby gems for about a dozen more alternatives as well.

Related

How to deal with nested hash with dynamic keys(PorstgreSQL JSONB) with the help of Cube.js?

I am quite a newbie to Cube.js. I have been trying to integrate Cube.js analytics functionality with my Ruby on Rails app. The database is PostgreSQL. In a database, there is a certain column called answers_json with jsonb data type which contains a nested hash. An example of data of that column is:
**answers_json:**
"question_weights_calc"=>
{"314"=>{"329"=>1.5, "331"=>4.5, "332"=>1.5, "333"=>3.0},
"315"=>{"334"=>1.5, "335"=>4.5, "336"=>1.5, "337"=>3.0},
"316"=>{"338"=>1.5, "339"=>3.0}}
There are many more keys in the same column with the same hash structure as shown above. I posted the specific part because I would be dealing with this part only. I need assistance with accessing the values in the hash. The column has a nested hash. In the example above, the keys "314", "315" and "316" are Category IDs. The keys associated with Category ID "314" are "329","331","332", "333"; which are Question IDs. Each category will have multiple questions. For different records, the category and question IDs will be dynamic. For example, for another record, Category ID and Question IDs associated with that category id will be different. I need to access the values associated with the key question id. For example, to access the value "1.5" I need to do this in my schema file:
**sql: `(answers_json -> 'question_weights_calc' -> '314' ->> '329')`**
But the issue here is, those ids will be dynamic for different records in the database. Instead of "314" and "329", they can be some other numbers. Adding different record's json here for clarification:
**answers_json:**
"question_weights_calc"=>{"129"=>{"273"=>6.0, "275"=>15.0, "277"=>8.0}, "252"=>{"279"=>3.0, "281"=>8.0, "283"=>3.0}}}
How can I know and access those dynamic IDs and their values since I also need to perform mathematical operations on values. Thanks!
As a general rule, it's difficult to run SQL-based reporting on highly dynamic JSON data. Postgres does have some useful functions for dealing with JSON, and you might be able to use json_each or json_object_keys plus a few joins to get there, but its quite likely that the performance and maintainability of such a query would be difficult to say the least 😅 Cube.js ultimately executes SQL queries, so if you do go the above route, the query should be easily transferrable to a Cube.js schema.
Another approach would be to create a separate data processing pipeline that collects all the JSON data and flattens it into a single table. The pipeline should then store this data back in your database of choice, from where you could then use Cube.js to query it.

Rails database setup Polymorphism

We have to create a request system which will have roughly 10 different types of requests. All of these requests will belong to the 'accounting' aspect of our application. Therefore we've called them "Accounting requests".
All requests share maybe only a few columns and each has up to 20 columns individually.
We started to wonder if having separate tables for each request type would be practical in terms of speed when we start to have to do very complicated joins or queries, for example, fetching ALL requests types into a single table and then sorting it.
Maybe it would be easier to just use Single Table Inheritance since it will have a type column and we'd be using one table to store all 10 accounting request types.
What do you think regarding using STI for this many polymorphic associations and requirements?
Essentially, it would have models like so:
AccountingRequest
BillingRequest < AccountingRequest
CheckRequest < AccountingRequest
CancellationRequest < AccountingRequest
Each subclass has roughly 10+ fields.
Currently reading about Multiple Table Inheritance here. This seems like the solution that fits my requirements in this case. Not sure yet though.
STI is a good fit if your models all share the same attributes.
However if your sub classes start having attributes specific to them and not applicable to others, then STI can result in a lot of null columns. In that case, I usually prefer to go with polymorphic association.
This railscast episode is a great example of the difference between the 2
You can use STI in that situation. But making STI will require all the columns into one single table and that's not the good think. The table will go very large in the number of fields.
I think you should divide into two tables like as below...
Request: A request table will be the polymorphic table which saved the information for the type of requests.
RequestItem: The request item table will save all the 20 fields records into the table and will have a foreign key of request table. The request item table will have two fields into the database that's called key and value.
It sounds do-able.
When I've looked into this, I found that making extensive use of value objects helped to control the non-applicability of some attributes to some of the types.
In my case I had types of products, some of which would not have particular measurements for example. In those cases I used a Null Object to indicate "Not applicable" where appropriate.
Edit: I also found the composed_of syntax very convenient: https://apidock.com/rails/ActiveRecord/Aggregations/ClassMethods/composed_of
For now I'm using a bit of NoSQL for such cases. Postgresql's JSONB type allows to store multilevel ruby hash. It also provides rich functionality: DB level constraints, indexes and query operators.
So common attributes are stored in standard way and child specific - in jsonb. Then you can use whatever you need on top of this: STI, Value Objects pattern, serialization or just create scopes for each child. I prefer the last one - my models are thin, most of constraints are DB level and all business logic is in service classes.
Pros:
Avoiding alter table on big tables when need to add one more child type
Keeping my queries efficient
Preventing storing and selecting unnecessary columns
Serialization out of the box for JSON APIs
Cons:
A bit of schemaless
Vendor lock

Creating a YAML based list vs. a model in Rails

I have an app that consists mainly of restaurant model instances. One of the essential attributes for these restaurants is labeling the cuisine it falls under. I'm currently at odds with myself in regards to designing this. On one hand I thought of creating a Cuisine model and creating either a HMT or HABTM association between Restaurants and Cuisines.
More recently I came across this post which shows how to create a pre-defined set of attributes. To take the answer one step further I'm assuming (in my case) I'd add a string-based cuisine column to my restaurant model and setup a select box in my restaurant form that would save the selected value.
What I was wondering was what would be the most efficient way of doing this? The goal is to eventually be able to query restaurants based what cuisine(s) they fall under. I wasn't sure if a model would be the best choice due to it only serving as a join table in a sense with a name attribute. Wasn't sure if having this extra table for something so minute would be optimal.
On the other hand I didn't know if using YAML for this would be conducive since the values are essentially dummy strings with no tangible records on file like I'd have with a model instance. Can someone help me sort out this confusion?
There are many benefits of normalizing many-to-many relationships in the db. Here are some:
Searching, sorting, and creating indexes is faster, since tables are narrower, and more rows fit on a data page.
You can have more clustered indexes (one per table), so you get more flexibility in tuning queries.
Index searching is often faster, since indexes tend to be narrower and shorter.
More tables allow better use of segments to control physical placement of data.
You usually have fewer indexes per table, so data modification commands are faster.
Fewer null values and less redundant data, making your database more compact.
Triggers execute more quickly if you are not maintaining redundant data.
Data modification anomalies are reduced.
Normalization is conceptually cleaner and easier to maintain and change as your needs change.
Also, by normalizing you get the cleaner syntax and other infrastructure benefits from ActiveRecord, e.g.
cuisine.restaurants.where(city: 'Toledo')

rails create table in db dynamically

Normally to create/alter a table in database I use migrations (manually run rake db:migrate) and then in my code I use ActiveRecord. This is very cool as I don't have to worry about representation of the data in db and about a specific kind of db (sqlserver, pg or other).
But now a customer wants to be able to create "things" on-fly himself like, say, he starts selling computers, so he wants to an interface where he can dynamically create an object "computer" with properties like "Name, RAM, HD, ...". It seems to be quite natural to create a separate table in db with all these fields. But how can I do that in RoR and keep all these nice things about ActiveRecord?
Please suggest.
The usual way is to do exactly the opposite:
Have a table for object types
Have a table for field names for each object type
Have a very big table with all the custom attributes for each object of any type
This is called EAV (Entity-attribute-value model, see http://en.wikipedia.org/wiki/Entity-attribute-value_model). And it scales pretty bad.
Alternatively, you can use a store text column instead of the big EAV table (see http://api.rubyonrails.org/classes/ActiveRecord/Store.html) so you don't have to make those difficult attribute retrievals, typical of EAV. You still need to store somewhere the "object types" definitions, so the expected fields etc are available when building forms and tables.
The problem with this approach is that you are not able to query (where/join/select) on those attributes because they are not columns. There are a number of solutions to that:
Don't do filtering on those attributes (meh...)
Have an external search server that's able to do faceted search
(as #Amar correctly says) Use a document database
Use postgreSQL and use hstore instead of a simple serialized column.
NoSQL database(Document Database Mongodb,CouchDB) can be best fit for this or use redis. As per my thoughts you can use Vertical Table concept Try to run Rails 2.x Demo of application for MySQL.
You can try with Mongodb, check if this is needed.

Ruby dynamically tied to table

I've got a huge monster of a database (Okay that's not quite true, but there are over 8 million records in one product table)..
This table is fed by 13 suppliers.
Even with the best indexing I could come up with, searching for the top 10,000 records that are ready for supplier 8, is crazy slow.
What I'd like to do is create a product table for each supplier and parse the table into smaller tables.
Now in c++ or what have you, I'd just switch the table that I'm working with inside the class.
In ruby, it seems I'll have to create a new class for each table, and do a migration.
Also as I plan to have some in session tables #, I'd be interested in getting ruby to work with them..
Oh.. 8 million and set to grow to 20 million in the next 6 months.
A question posed, was what's my db engine.. Right now it's sql, but I'm open to pushing my db to another engine, if it will mean I can use temp tables, and "partitioned" tables.
One additional point to indexing.. Indexing on fields that change frequently isn't practical. Like price and quantity.. I'd have to re-index the changed items, each time I made a change.
By Ruby, I am assuming you mean that inheriting from the ActiveRecord::Base class in a Ruby on Rails application. By convention, you are correct in that each class is meant to represent a separate table.
You can easily execute arbitrary SQL using the "ActiveRecord::Base.connection.execute" method, and passing a string that is your SQL query. This would bypass having to create separate Ruby classes that would represent transient tables. This is not the "Rails approach", however it does address your question of allowing switching of the tables inside a class file.
More information on ActiveRecord database statements can be found here: http://api.rubyonrails.org/classes/ActiveRecord/ConnectionAdapters/DatabaseStatements.html
However, as other people have pointed out, you should be able to optimize your query such that splitting across multiple tables is not necessary. You may want to analyze your SQL query's execution plan using various tools to optimize the execution. If you are using MySQL view check out their query execution planning functionality: http://dev.mysql.com/doc/refman/5.5/en/execution-plan-information.html
By introducing indexes, or changing join methods between tables, etc you should be able to return reduce your query execution time.

Resources