I have a Postgresql 11.8 table named posts where I would like to define a column slugs of type JSONB, which would contain arrays of strings such as ["my-first-post", "another-slug-for-my-first-post"].
I can find a post having a specific slug using the ? existence operator: SELECT * FROM posts WHERE slugs ? 'some-slug'.
Each post is expected to only have a handful of slugs but the amount of posts is expected to grow.
Considering the above query where some-slug could be any string:
How can I define an index to have a reasonably performant query (no full table scan)?
How can I ensure the same slug cannot appear multiple times (across and within the different arrays)?
I am primarily looking for a solution for Postgresql 11 but also would be interested to know solutions in future versions, if any.
The database is used in a Rails 6.0 app so I am also interested by the Rails migration syntax.
You can support the ? operator with a normal GIN index on the jsonb column.
However, you cannot enforce uniqueness of array values.
Particularly if you want to have database constraints, you should not model your data using JSON. Use a regular normalized data model with several tables and foreign keys, then it will be easy to implement such an uniqueness constraint.
Related
I am quite a newbie to Cube.js. I have been trying to integrate Cube.js analytics functionality with my Ruby on Rails app. The database is PostgreSQL. In a database, there is a certain column called answers_json with jsonb data type which contains a nested hash. An example of data of that column is:
**answers_json:**
"question_weights_calc"=>
{"314"=>{"329"=>1.5, "331"=>4.5, "332"=>1.5, "333"=>3.0},
"315"=>{"334"=>1.5, "335"=>4.5, "336"=>1.5, "337"=>3.0},
"316"=>{"338"=>1.5, "339"=>3.0}}
There are many more keys in the same column with the same hash structure as shown above. I posted the specific part because I would be dealing with this part only. I need assistance with accessing the values in the hash. The column has a nested hash. In the example above, the keys "314", "315" and "316" are Category IDs. The keys associated with Category ID "314" are "329","331","332", "333"; which are Question IDs. Each category will have multiple questions. For different records, the category and question IDs will be dynamic. For example, for another record, Category ID and Question IDs associated with that category id will be different. I need to access the values associated with the key question id. For example, to access the value "1.5" I need to do this in my schema file:
**sql: `(answers_json -> 'question_weights_calc' -> '314' ->> '329')`**
But the issue here is, those ids will be dynamic for different records in the database. Instead of "314" and "329", they can be some other numbers. Adding different record's json here for clarification:
**answers_json:**
"question_weights_calc"=>{"129"=>{"273"=>6.0, "275"=>15.0, "277"=>8.0}, "252"=>{"279"=>3.0, "281"=>8.0, "283"=>3.0}}}
How can I know and access those dynamic IDs and their values since I also need to perform mathematical operations on values. Thanks!
As a general rule, it's difficult to run SQL-based reporting on highly dynamic JSON data. Postgres does have some useful functions for dealing with JSON, and you might be able to use json_each or json_object_keys plus a few joins to get there, but its quite likely that the performance and maintainability of such a query would be difficult to say the least 😅 Cube.js ultimately executes SQL queries, so if you do go the above route, the query should be easily transferrable to a Cube.js schema.
Another approach would be to create a separate data processing pipeline that collects all the JSON data and flattens it into a single table. The pipeline should then store this data back in your database of choice, from where you could then use Cube.js to query it.
We have to create a request system which will have roughly 10 different types of requests. All of these requests will belong to the 'accounting' aspect of our application. Therefore we've called them "Accounting requests".
All requests share maybe only a few columns and each has up to 20 columns individually.
We started to wonder if having separate tables for each request type would be practical in terms of speed when we start to have to do very complicated joins or queries, for example, fetching ALL requests types into a single table and then sorting it.
Maybe it would be easier to just use Single Table Inheritance since it will have a type column and we'd be using one table to store all 10 accounting request types.
What do you think regarding using STI for this many polymorphic associations and requirements?
Essentially, it would have models like so:
AccountingRequest
BillingRequest < AccountingRequest
CheckRequest < AccountingRequest
CancellationRequest < AccountingRequest
Each subclass has roughly 10+ fields.
Currently reading about Multiple Table Inheritance here. This seems like the solution that fits my requirements in this case. Not sure yet though.
STI is a good fit if your models all share the same attributes.
However if your sub classes start having attributes specific to them and not applicable to others, then STI can result in a lot of null columns. In that case, I usually prefer to go with polymorphic association.
This railscast episode is a great example of the difference between the 2
You can use STI in that situation. But making STI will require all the columns into one single table and that's not the good think. The table will go very large in the number of fields.
I think you should divide into two tables like as below...
Request: A request table will be the polymorphic table which saved the information for the type of requests.
RequestItem: The request item table will save all the 20 fields records into the table and will have a foreign key of request table. The request item table will have two fields into the database that's called key and value.
It sounds do-able.
When I've looked into this, I found that making extensive use of value objects helped to control the non-applicability of some attributes to some of the types.
In my case I had types of products, some of which would not have particular measurements for example. In those cases I used a Null Object to indicate "Not applicable" where appropriate.
Edit: I also found the composed_of syntax very convenient: https://apidock.com/rails/ActiveRecord/Aggregations/ClassMethods/composed_of
For now I'm using a bit of NoSQL for such cases. Postgresql's JSONB type allows to store multilevel ruby hash. It also provides rich functionality: DB level constraints, indexes and query operators.
So common attributes are stored in standard way and child specific - in jsonb. Then you can use whatever you need on top of this: STI, Value Objects pattern, serialization or just create scopes for each child. I prefer the last one - my models are thin, most of constraints are DB level and all business logic is in service classes.
Pros:
Avoiding alter table on big tables when need to add one more child type
Keeping my queries efficient
Preventing storing and selecting unnecessary columns
Serialization out of the box for JSON APIs
Cons:
A bit of schemaless
Vendor lock
How do I save multiple values in a single cell record in Ruby on Rails applications?
If I have a table named Exp with columns named: Education, Experience, and Skill, what is the best practice if I want users to store multiple values such as: education institutions or skills in a single row?
I'd like to have users use multiple text fields, but should go into same cell record.
For instance if user has multiple skills, those skills should be in one cell? Would this be best or would it be better if I created a new table for just skills?
Please advise,
Thanks
I would not recommend storing multiple values in the same database column. It would make querying very difficult. For example, if you wanted to look for all the users with a particular skill set, the query would clumsy both on readability and performance.
However, there are still certain cases where it makes sense.
When you want to allow for variable list of data points
You are not going to query the data based on one of the values in the list
ActiveRecord has built-in support for this. You can store Hash or Array in a database column.
Just mark the columns as Text
rails g model Exp experience:text education:text skill:text
Next, serialize the columns in your Model code
class Exp < ActiveRecord::Base
serialize :experience, :education, :skill
# other model code
end
Now, you can just save the Hash or Array in the database field!
Exp.new(:skill => ['Cooking', 'Singing', 'Dancing'])
You can do it using a serialized list in a single column (comma-separated), but a really bad idea, read these answers for reasoning:
Is storing a delimited list in a database column really that bad?
How to store a list in a column of a database table
I suggest changing your schema to have a one to many relationship between users and skills.
Rails 4 and PostgreSQL comes with hstore support out of the box, more info here In rails 3 you can use gem to enable it.
It depends on what kind of functionality you want. If you want to bind the Exp model attributes with a form (for new and update operations) and put some validations on them, it is always better to keep it in a separate table. On the other hand, if these are just attributes, which you just need in database keep them in a single column. There is way by which you can keep the serialized object like arrays and hashes in database columns. Make them a array/hash as per your need and save it like this.
http://api.rubyonrails.org/classes/ActiveRecord/AttributeMethods/Serialization/ClassMethods.html#method-i-serialize
Serialized attributes, automatically deserializes when they are pulled out of tables and serialized automatically when saved.
I am currently building a Rails app where there is a "documents" data table that stores references to pdfs living on an S3 server. These documents could have 100 different types. Each type can have up to 20 attributes or meta info.
My dilemma is do I make 100 relational tables for every doc type or just create one key/value data table with a reference to the doc_id.
My gut tells me to go key/value for flexibility for searching and supporting more and more document types over time without having to create new migrations. However, I know there are pitfalls with this technique. My first concern of course is the size of the table. The key/value table could end up with millions of rows.
On the other hand, having 100 attribute tables would be nightmare to query against in a full text search situation.
So bottom line is, by going with key/value, is performance on a 3 column Postgres table with potentially millions of rows a scaling problem? Also what about joins on the value field?
This data would almost never change by the way. So it would be 90% reads.
Consider a single table with an hstore column. It is a PostgreSQL data type designed for storing key/value pairs.
http://www.postgresql.org/docs/9.1/static/hstore.html
There are also multiple Ruby gems that add hstore support to ActiveRecord. Here is one that I wrote: https://github.com/JackC/surus You can search ruby gems for about a dozen more alternatives as well.
I've discovered a new thing about Postgres: Composite types. I really like this approach and it will be very useful for me.
The problem is that rails' ActiveRecord don't have native support for this.
Do you ever used Postgres' composite types with Rails? Was a good experience or do you prefer the common approach of creating new models for this nested data?
http://www.postgresql.org/docs/8.4/static/rowtypes.html
Tks! :-)
This is an interesting feature of PostgreSQL, however I have not had a chance to work with it.
A few things come to mind on the Rails side:
ActiveRecord would have a tough time formatting the SQL necessary to query these objects. You would probably have to write custom SQL to take the special syntax into account.
ActiveRecord will not be able to do implicit casting of the custom types. This could be problematic when trying to access the attributes through ActiveRecord. You can extend the PostgreSQL adapter for ActiveRecord to cast these special data types into custom classes, however this is a non-traditional approach.
A few things come to mind on the database side:
Because of the way these types collapse multiple attributes into a single entity, this approach could be difficult to query. This includes specifying conditions where you need to check an individual attribute for a particular value. Additionally, if any of these composite types contain a key references, it could be difficult to perform CASCADE options.
This schema approach could be difficult to index if performance becomes an issue
This schema approach doesn't seem to be normalized in a way a database should be. In the examples provided, this composite data should exist as a separate table definition, with a foreign key reference in the parent table.
Unless the specific application you have in mind has compelling benefits, I would suggest a more normalized approach. Instead of:
CREATE TYPE inventory_item AS (
name text,
supplier_id integer,
price numeric
);
CREATE TABLE on_hand (
item inventory_item,
count integer
);
INSERT INTO on_hand VALUES (ROW('fuzzy dice', 42, 1.99), 1000);
You could achieve a similar result by doing the following, while keeping full support for ActiveRecord without having to extend the Postgres adapter, or create custom classes:
CREATE TABLE inventory_item (
id integer,
name text,
supplier_id integer,
price numeric
);
CREATE TABLE on_hand (
inventory_item_id integer,
count integer
);
INSERT INTO inventory_item VALUES ('fuzzy dice', 42, 1.99) RETURNS INTEGER;
INSERT INTO on_hand VALUES (<inventory_item_id>, 1000);