Rails, Active Record - number of rows in tables - ruby-on-rails

I got an email from Heroku saying I have too many rows in my Postgres DB.
How can I see how many rows I have in each table (so I can prioritize deletion)?

heroku pg:psql (specify database name here if you have more than one)
Then check out this post to get a row count in postgres
How do you find the row count for all your tables in Postgres

A better way to look at this might be:
How many records do I have for each of my AR models?
While it may not exactly match the numbers Heroku gives (may be a bit lower), and may have some extra stuff (that are not models), but for practical purposes, this is best option and it avoids the need to use SQL.
ActiveRecord::Base.descendants.map { |d| [d.all.size, d.name] }.sort
Explanation:
ActiveRecord::Base.descendants is a quick way to get all of the AR models. This is used to create an array of arrays that contains the [number of records, model name] to allow for simple lexicographic sorting. This should be enough to quickly determine where all your rows are going.

Related

Indexing Postgresql JSONB arrays for element existence and unicity

I have a Postgresql 11.8 table named posts where I would like to define a column slugs of type JSONB, which would contain arrays of strings such as ["my-first-post", "another-slug-for-my-first-post"].
I can find a post having a specific slug using the ? existence operator: SELECT * FROM posts WHERE slugs ? 'some-slug'.
Each post is expected to only have a handful of slugs but the amount of posts is expected to grow.
Considering the above query where some-slug could be any string:
How can I define an index to have a reasonably performant query (no full table scan)?
How can I ensure the same slug cannot appear multiple times (across and within the different arrays)?
I am primarily looking for a solution for Postgresql 11 but also would be interested to know solutions in future versions, if any.
The database is used in a Rails 6.0 app so I am also interested by the Rails migration syntax.
You can support the ? operator with a normal GIN index on the jsonb column.
However, you cannot enforce uniqueness of array values.
Particularly if you want to have database constraints, you should not model your data using JSON. Use a regular normalized data model with several tables and foreign keys, then it will be easy to implement such an uniqueness constraint.

Join an ActiveRecord model to a table in another schema with no model

I need to join an ActiveRecord model in my Ruby on Rails app to another table in a different schema that has no model. I've searched for the answer, and found parts of it, but not a whole solution in one place, hence this question.
I have Vehicle model, with many millions of rows.
I have a table (reload_cars) in another schema (temp_cars) in the same database, with a few million records. This is an ad hoc table, to be used for one ad hoc data update, and will never be used again. There is no model associated with that table.
I initially was lazy and selected all the reload_cars records into an array (reload_vins) in one query, and then in a second query did something like:
`Vehicle.where(vin_status: :invalid).where('vin in (?)', reload_vins)`.
That's simplified a bit from the actual query, but demonstrates the join I need. In other queries, I need full sets of inner and outer joins between these tables. I also need to put various selection criteria on the model table and/or the non-model table in various steps.
That blunt approach worked fine in development, but did not scale up to the production database. I thought it would take a few minutes, which is plenty fast enough for a one-time operation. But, it timed out, particularly when looping through sets of records. Small tweaks did not help.
So, I need to do a legit join.
In retrospect, the answer seems pretty obvious. This query ran pretty much instantly, and gave the exact expected result, with various criteria on each table.
Here is one such query:
Vehicle.where(vin_status: :invalid)
.joins("
join temp_cars.reload_cars tcar
on tcar.vin = vehicles.vin
where tcar.registration_id is not null
")

How to use offset and limit for inserting data in database

I am learning Ruby on Rails and SQlite. I saw some interesting code for the seed file while I was looking at ways around to create a seed file:
Classroom.all.each_with_index do |classroom, i|
classroom.students << [Student.limit(8).offset(i*2)]
end
I understand that it is inserting students in the classroom but don't understand what limit and offset are doing.
I tried to search online for this and found https://apidock.com/rails/ActiveRecord/QueryMethods/offset but it didn't make anything clear to me.
Any suggestion for the resources where I can find more info on this or any example that can help me to understand this?
Sure (see http://guides.rubyonrails.org/active_record_querying.html for details)
Student means a SQL query to get all the rows from the Student table.
What each of these methods like limit and offset are doing is modifying the underlying SQL query that Rails is building.
The limit(8) means just get 8 items.
The offset(i*2) means start taking them i*2 elements down the list.
If it was just an array (and not a database table) it be like saying
student[i*2..i*2+8-1]
Note I wonder if this code has a bug in it? It would make more sense to be offset(i*8) . Then this code would take groups of 8 students and put them in different classes.
As it is this will take the first 8 students, put them in the first classroom, then take students 3-10 and putting them in the next class, etc, so some students will end up in up to 4 different classes!

Key/Value Postgres Sql Table Performance

I am currently building a Rails app where there is a "documents" data table that stores references to pdfs living on an S3 server. These documents could have 100 different types. Each type can have up to 20 attributes or meta info.
My dilemma is do I make 100 relational tables for every doc type or just create one key/value data table with a reference to the doc_id.
My gut tells me to go key/value for flexibility for searching and supporting more and more document types over time without having to create new migrations. However, I know there are pitfalls with this technique. My first concern of course is the size of the table. The key/value table could end up with millions of rows.
On the other hand, having 100 attribute tables would be nightmare to query against in a full text search situation.
So bottom line is, by going with key/value, is performance on a 3 column Postgres table with potentially millions of rows a scaling problem? Also what about joins on the value field?
This data would almost never change by the way. So it would be 90% reads.
Consider a single table with an hstore column. It is a PostgreSQL data type designed for storing key/value pairs.
http://www.postgresql.org/docs/9.1/static/hstore.html
There are also multiple Ruby gems that add hstore support to ActiveRecord. Here is one that I wrote: https://github.com/JackC/surus You can search ruby gems for about a dozen more alternatives as well.

Ruby dynamically tied to table

I've got a huge monster of a database (Okay that's not quite true, but there are over 8 million records in one product table)..
This table is fed by 13 suppliers.
Even with the best indexing I could come up with, searching for the top 10,000 records that are ready for supplier 8, is crazy slow.
What I'd like to do is create a product table for each supplier and parse the table into smaller tables.
Now in c++ or what have you, I'd just switch the table that I'm working with inside the class.
In ruby, it seems I'll have to create a new class for each table, and do a migration.
Also as I plan to have some in session tables #, I'd be interested in getting ruby to work with them..
Oh.. 8 million and set to grow to 20 million in the next 6 months.
A question posed, was what's my db engine.. Right now it's sql, but I'm open to pushing my db to another engine, if it will mean I can use temp tables, and "partitioned" tables.
One additional point to indexing.. Indexing on fields that change frequently isn't practical. Like price and quantity.. I'd have to re-index the changed items, each time I made a change.
By Ruby, I am assuming you mean that inheriting from the ActiveRecord::Base class in a Ruby on Rails application. By convention, you are correct in that each class is meant to represent a separate table.
You can easily execute arbitrary SQL using the "ActiveRecord::Base.connection.execute" method, and passing a string that is your SQL query. This would bypass having to create separate Ruby classes that would represent transient tables. This is not the "Rails approach", however it does address your question of allowing switching of the tables inside a class file.
More information on ActiveRecord database statements can be found here: http://api.rubyonrails.org/classes/ActiveRecord/ConnectionAdapters/DatabaseStatements.html
However, as other people have pointed out, you should be able to optimize your query such that splitting across multiple tables is not necessary. You may want to analyze your SQL query's execution plan using various tools to optimize the execution. If you are using MySQL view check out their query execution planning functionality: http://dev.mysql.com/doc/refman/5.5/en/execution-plan-information.html
By introducing indexes, or changing join methods between tables, etc you should be able to return reduce your query execution time.

Resources