Optimising a large number of rows in a Rails app database - ruby-on-rails

We have a number of Meters which read a number of Parameters at a given interval and uploads the data (in CSV format) to a MYSQL database.
I have modelled this in Rails as follows:
Meter
has_many :parameters
Parameter
belongs_to :meter
has_many :readings
Reading
belongs_to :parameter
(I've used normal foreign keys - meter_id and parameter_id - to link the tables)
This is working great with my seed data and I'm using self.readings.pluck(:value).latest in my Parameter model in order to grab the latest value and pass it to the view.
The only problem is that the meters upload the data every 30 seconds. This means that - as there are currently 20 parameters - just over a months worth of data has left me with over 20,000,000 rows in my Readings table and this means that the queries to grab the latest are taking around 500ms each.
I'm after suggestions of ways to optimise this. I've added an index to the parameter_id field but, other than that, I'm not really sure of the best way to proceed...
It may be that I need to rethink the way that my database is structured, but this seemed to make most sense as I want to be able to dynamically add new parameters down the line (hence why I couldn't just make my columns the parameter names) and this seems to be the way that Rails stores data by default.
Thanks in advance.

If you are using Rails 3 and want to keep using a relational database your best option is to use table partitioning.
If you use PostgreSQL you can use the partitioned gem and check this slides to get an overview.
If you want to use Rails 4, since the partitioned gem is not compatible with ActiveRecord 4, I would advise you to use manual partitioning, you can use the year as your partition point, for example.
Check this blog post on Sharding and Partitioning and evaluate what should work best.

Related

Rails and understanding "belong_to"

So im pretty new with rails, and am working on an API that takes POST requests (from a raspberry-pi) and sets up data in the database.
I have 2 models/schema:
a "Measurement" model. Which simply just contains 2 floats (humidity and temp for now)
and a "Unit" model. Which im not 100% sure how I want to do this, but it will probably just contain an "id" identifying the unit-id in some sort of way.
Anyways, I want measurements to belong to a unit (so I can reference the units for historical value) IE: This raspberry-pi had these temps the past 5 hours..or whatever.
How would I want to arrange this.
I imagine i'd need at the very least "Measurement" model to "belong_to" "Units" model. Am I forgetting something else? Besides the "has_many" of course for Units. How would I go about creating seed data for this?
I want to eventually be able to have an index page for the "Unit" id that contains it's humidity/temps it's been sent.
A measurements database record will have a unit_id integer field, matching the id primary key field of the units table.
Rails's ActiveRecord expresses this many-to-one relationship by saying Unit.has_many :measurements, and Measurement.belongs_to :unit.
From here, take time to just read your tutorials, to soak in all this before trying to code.

Rails 4 - Ordering by something not stored in the database

I am using Rails 4. I have a Room model with hour_price day_price and week_price attributes.
On the index, users are able to enter different times and dates they would like to stay in a room. Based on these values, I have a helper method that then calculates the total price it would cost them using the price attributes mentioned above.
My question is what is the best way to sort through the rooms and order them least to greatest (in terms of price). Having a hard time figuring out the best way to do this, especially when considering the price value is calculated by a helper and isn't stored in the database.
You could load all of them and do an array sort as is suggested here, and here. Though that would not scale well, but if you've already filtered by the rooms that are available this might be sufficient.
You might be able to push it back to the database by building a custom sql order by.
Rooms.order("(#{days} * day_price) asc")

ROR / Database: How to create a custom order list for database records?

I have a database table with some articles (for a website) like so:
Articles:
id title order_id
1 - 1
2 - 4
3 - 3
4 - 2
Now on the webpage I want to use the order_id to order the articles, this works perfectly fine, using ROR active record.
However when I want to update the order_id I would have to update all of the records using this technique, each time a change to the order_id is made. What is a better way of doing this ?
Thanks
You want acts_as_list:
class Article < ActiveRecord::Base
acts_as_list :column => 'order_id'
end
There's no way around updating lots of records when you perform a reordering, but acts_as_list can do all that for you with methods like Article#move_to_top and Article#move_lower.
There are some gems to solve your problem. The Ruby Toolbox has them in the category Active Record Sortables. As the time of writing (March 2017) the top gems in this list are:
act_as_list (Website, GitHub)
This is the most popular choice for managing an ordered list in the database and it is still maintained 10 years after its creation. It will do just what you wanted and manage the numbers of the items. The gem will keep your position field numbers form 1 to n in the correct order. This however means that inserting items in the middle of the list means increasing all of the position values for the list items below it, which can be quite some work for your database.
ranked-model (GitHub)
This gem also manages custom ordered lists for you. However it uses another approach behind the scenes, where your list items get position numbers big and spaced apart across the full range of integer values. This should get you performance benefits if you have large lists and need to reorder the items often. It seems to me that this gem might no longer be maintained though, since the author is now doing Ember.js development, it should work though. Edit: It is still maintained.
sortable (GitHub)
This seems to be the same like act_as_list but with the ability to put your items into multiple list. I'm not really sure if this is a valid use-case since you could just create multiple items. It looks like it was not maintained for a long time and not used by many.
resort (GitHub)
This gem uses a linked list approach, i.e. every database entry gets a pointer to the next entry. This might be a good idea if you need a lot of inserts in the middle of your lists, but seems like a terrible idea for just getting the list of entires or if something goes wrong in the database and the chain breaks. It is quite new, so let's see how it develops.
acts_as_restful_list (GitHub)
This gem is "Just like acts_as_list, but restful". It seems to aim for a nicer API. The company behind it does no longer exist, so I'd rather use act_as_list and deal with its API, which is not too bad anyway.

Ruby dynamically tied to table

I've got a huge monster of a database (Okay that's not quite true, but there are over 8 million records in one product table)..
This table is fed by 13 suppliers.
Even with the best indexing I could come up with, searching for the top 10,000 records that are ready for supplier 8, is crazy slow.
What I'd like to do is create a product table for each supplier and parse the table into smaller tables.
Now in c++ or what have you, I'd just switch the table that I'm working with inside the class.
In ruby, it seems I'll have to create a new class for each table, and do a migration.
Also as I plan to have some in session tables #, I'd be interested in getting ruby to work with them..
Oh.. 8 million and set to grow to 20 million in the next 6 months.
A question posed, was what's my db engine.. Right now it's sql, but I'm open to pushing my db to another engine, if it will mean I can use temp tables, and "partitioned" tables.
One additional point to indexing.. Indexing on fields that change frequently isn't practical. Like price and quantity.. I'd have to re-index the changed items, each time I made a change.
By Ruby, I am assuming you mean that inheriting from the ActiveRecord::Base class in a Ruby on Rails application. By convention, you are correct in that each class is meant to represent a separate table.
You can easily execute arbitrary SQL using the "ActiveRecord::Base.connection.execute" method, and passing a string that is your SQL query. This would bypass having to create separate Ruby classes that would represent transient tables. This is not the "Rails approach", however it does address your question of allowing switching of the tables inside a class file.
More information on ActiveRecord database statements can be found here: http://api.rubyonrails.org/classes/ActiveRecord/ConnectionAdapters/DatabaseStatements.html
However, as other people have pointed out, you should be able to optimize your query such that splitting across multiple tables is not necessary. You may want to analyze your SQL query's execution plan using various tools to optimize the execution. If you are using MySQL view check out their query execution planning functionality: http://dev.mysql.com/doc/refman/5.5/en/execution-plan-information.html
By introducing indexes, or changing join methods between tables, etc you should be able to return reduce your query execution time.

Rails Caching DB Queries and Best Practices

The DB load on my site is getting really high so it is time for me to cache common queries that are being called 1000s of times an hour where the results are not changing.
So for instance on my city model I do the following:
def self.fetch(id)
Rails.cache.fetch("city_#{id}") { City.find(id) }
end
def after_save
Rails.cache.delete("city_#{self.id}")
end
def after_destroy
Rails.cache.delete("city_#{self.id}")
end
So now when I can City.find(1) the first time I hit the DB but the next 1000 times I get the result from memory. Great. But most of the calls to city are not City.find(1) but #user.city.name where Rails does not use the fetch but queries the DB again... which makes sense but not exactly what I want it to do.
I can do City.find(#user.city_id) but that is ugly.
So my question to you guys. What are the smart people doing? What is
the right way to do this?
With respect to the caching, a couple of minor points:
It's worth using slash for separation of object type and id, which is rails convention. Even better, ActiveRecord models provide the cacke_key instance method which will provide a unique identifier of table name and id, "cities/13" etc.
One minor correction to your after_save filter. Since you have the data on hand, you might as well write it back to the cache as opposed to delete it. That's saving you a single trip to the database ;)
def after_save
Rails.cache.write(cache_key,self)
end
As to the root of the question, if you're continuously pulling #user.city.name, there are two real choices:
Denormalize the user's city name to the user row. #user.city_name (keep the city_id foreign key). This value should be written to at save time.
-or-
Implement your User.fetch method to eager load the city. Only do this if the contents of the city row never change (i.e. name etc.), otherwise you can potentially open up a can of worms with respect to cache invalidation.
Personal opinion:
Implement basic id based fetch methods (or use a plugin) to integrate with memcached, and denormalize the city name to the user's row.
I'm personally not a huge fan of cached model style plugins, I've never seen one that's saved a significant amount of development time that I haven't grown out of in a hurry.
If you're getting way too many database queries it's definitely worth checking out eager loading (through :include) if you haven't already. That should be the first step for reducing the quantity of database queries.
If you need to speed up sql queries on data that doesnt change much over time then you can use materialized views.
A matview stores the results of a query into a table-like structure of
its own, from which the data can be queried. It is not possible to add
or delete rows, but the rest of the time it behaves just like an
actual table. Queries are faster, and the matview itself can be
indexed.
At the time of this writing, matviews are natively available in Oracle
DB, PostgreSQL, Sybase, IBM DB2, and Microsoft SQL Server. MySQL
doesn’t provide native support for matviews, unfortunately, but there
are open source alternatives to it.
Here is some good articles on how to use matviews in Rails
sitepoint.com/speed-up-with-materialized-views-on-postgresql-and-rails
hashrocket.com/materialized-view-strategies-using-postgresql
I would go ahead and take a look at Memoization, which is now in Rails 2.2.
"Memoization is a pattern of
initializing a method once and then
stashing its value away for repeat
use."
There was a great Railscast episode on it recently that should get you up and running nicely.
Quick code sample from the Railscast:
class Product < ActiveRecord::Base
extend ActiveSupport::Memoizable
belongs_to :category
def filesize(num = 1)
# some expensive operation
sleep 2
12345789 * num
end
memoize :filesize
end
More on Memoization
Check out cached_model

Resources