I have developed a Rails application, where I have used one model, student. The number of students is fairly large, about 10,000. So while I am using a json call like this:
students.json?subject_id=4
or doing some query like this:
#student = Student.all(:subject_id => 4)
It is taking fair amount of time, from 2 to 4 seconds. So I want to use Redis here to store students in Redis because I think it will reduce this searching time to the order of miliseconds.
Actually I have never used Redis before. I understand that I have to rewrite the student model and controller to use Redis. Being an absolute beginner in Redis, I am asking how will I approach the problem. Also if my understanding is wrong, please clarify. Thanks in advance.
10,000 record isn't that much, i think you should figure out whether the problem is in your db design. Look for the db query make by rails in the development log or the console, and use "explain" of sql and see whether an index is used.
Related
I'm building my first Rails app and have it working great with Thinking Sphinx. I'm understanding most of it but would love it if someone could help me clarify a few conceptual questions
When displaying search results after a sphinx query, should I be using the sphinx_attributes that are returned from the sphinx query? Or should my view use normal rails objects, such as #property.title, #property.amenities.title etc? If I use normal rails objects, doesn't that mean its doing extra queries?
In a forum, I'd like to display 'unread posts'. Obviously this is true/false for each user/topic combination, so I'm thinking I should be caching the 'reader' ids within the topic's sphinx index. This way I can quickly do a query for all unread posts for a given user_id. I've got this working, but then realised its pointless, as there is a time delay between sphinx indexes. So if a user clicks on an unread post, it will still appear unread until the sphinx DB is re-indexed
I'm still on development so I'm manually indexing/rebuilding, but on production, what is a standard time between re-indexing?
I have a model with several text fields - should I concat these all into one column in the sphinx index for a keyword search? Surely this is quicker than indexing all the separate fields.
Slightly off-topic, but just wondering - when you access nested models, for example #property.agents.name, does this affect performance? Or does rails automatically fetch all associated entries when a property is pulled from the database?
To answer each of your points:
For both of your examples, sphinx_attributes would not be helpful. Firstly, you've already loaded the property, so the title is available directly without an extra database hit. And for property.amenities.title you're dealing with an array of strings, which Sphinx has no concept of. Generally, I would only use sphinx_attributes for complicated calculated attributes, not standard column references.
Yes, you're right, there will be a delay with this value.
It depends on how often your data changes. I have some apps where I can index every day because changes are so rare, but others where we'll run it every 10 minutes. If the data is particularly volatile, I'll look at using deltas (usually via Sidekiq) to have changes reflected in Sphinx in a few seconds.
I don't think it's much difference either way - unless you want to search on any of those columns separately? If so, it'll need to be a separate field.
By default, as you use each property's agents, the agents for that property will be loaded from the database (one SQL call per property). You could look at the eager loading docs for how to manage this better when you're dealing with multiple records. Thinking Sphinx has the ability to pass through :include options to the underlying ActiveRecord call.
I have been working in Rails (I mean serious working) for last 1.5 years now. Coming from .Net background and database/OLAP development, there are many things I like about Rails but there are few things about it that just don't make sense to me. I just need some clarification for one such issue.
I have been working on an educational institute's admission process, which is just a small part of much bigger application. Now, for administrator, we needed to display list of all applied/enrolled students (which may range from 1000 to 10,000), and also give a way to export them as excel file. For now, we are just focusing on exporting in CSV format.
My questions are:
Is Rails meant to display so many records at the same time?
Is will_paginate only way to paginate records in Rails? From what I understand, it still fetches all the records from DB, and then selectively displays relevant records. Back in .Net/PHP/JSP, we used to create stored procedure and from there we selectively returns relevant records. Since, using stored procedure being a known issue in Rails, what other options do we have?
Same issue with exporting this data. I benchmarked the process i.e. receiving request at the server, execution of the query and response return. The ActiveRecord creation was taking a helluva time. Why was that? There were only like 1000 records, and the page showed connection timeout at the user. I mean, if connection times-out while working on for 1000 records, then why use Rails or it means Rails are not meant for such applications. I have previously worked with TB's of data, and never had this issue.
I never understood ORM techniques at the core. Say, we have a table users, and are associated with multiple other tables, but for displaying records, we need data from only tables users and its associated table admissions, then does it actually create objects for all its associated tables. I know, the data will be fetched only if we use the association, but does it create all the objects before-hand?
I hope, these questions are not independent and do qualify as per the guidelines of SF.
Thank you.
EDIT: Any help? I re-checked and benchmarked again, for 1000 records, where in we are joining 4-5 different tables (1000 users, 2-3 one-to-one association, and 2-3 one-to-many associations), it is creating more than 15000 objects. This is for eager loading. As for lazy loading, it will be 1000 user query plus some 20+ queries). What are other possible options for such problems and applications? I know, I am kinda bumping the question to come to top again!
Rails can handle databases with TBs of data.
Is will_paginate only way to paginate records in Rails?
There are many other gems like "kaminari".
it fetches all records from the db..
NO. It doesnt work that way. For example take the following query,Users.all.page(1).per(10)
User.all wont fire a db query, it will return a proxy object. And you call page(1) and per(10) on the proxy(ActiveRecord::Relation). When you try to access the data from the proxy object, it will execute a db query. Active record will accumulate all conditions and paramaters you pass and will execute a sql query when required.
Go to rails console and type u= User.all; "f"; ( the second statement: "f", is to prevent rails console from calling to_s on the proxy to display the result.)
It wont fire any query. Now try u[0], it will fire a query.
ActiveRecord creation was taking a helluva time
1000 records shouldn't take much time.
Check the number of sql queries fired from the db. Look for signs of
n+1 problem and fix them by eager loading.
Check the serialization of the records to csv format for any cpu or memory intensive operation.
Use a profiler and track down the function that is consuming most of the time.
I searched for this and was surprised not to find an answer, so I might be overcomplicating this. But, basically I have a couple of RESTful models in my Rails 3 application that I would like to keep track of, in a pretty simple way, just for popularity tracking over time. In my case, I'm only interested in tracking hits on the GET/show method–Users log in and view these two resources, their number of visits go up with each viewing/page load.
So, I have placed a "visits" column on the Books model:
== AddVisitsToBooks: migrating ===============================================
-- add_column(:books, :visits, :integer)
-> 0.0008s
== AddVisitsToBooks: migrated (0.0009s) ======================================
The column initializes to zero, then, basically, inside the books_controller,
def show
unless #book.owner == current_user #hypothetically, we won't let an owner
"cheat" their way to being popular
#book.visits = #book.visits + 1
#book.save
end
And this works fine, except now every time a show method is being called, you've got not only a read action for the object record, but a write, as well. And perhaps that gets to the heart of my question; is the total overhead required just to insert the single integer change a big deal in a small-to-midsize production app? Or is it a small deal, or basically nothing at all?
Is there a much smarter way to do it? Everything else I came up with still involved writing to a record every time the given page is viewed. Would indexing the field help, even if I'm rarely searching by it?
The database is PostgreSQL 9, by the way (running on Heroku).
Thanks!
What you described above has one significant cons: once the process updates database (increase visit counter) the row is blocked and if there is any other process it has to wait.. I would suggest using DB Sequence for this reason: http://www.postgresql.org/docs/8.1/static/sql-createsequence.html However you need to maintain the sequence custom in your code: Ruby on Rails+PostgreSQL: usage of custom sequences
After some more searching, I decided to take the visits counter off of the models themselves, because as MiGro said, it would be blocking the row every time the page is shown, even if just for a moment. I think the DB sequence approach is probably the fastest, and I am going to research it more, but for the moment it is a bit beyond me, and seems a bit cumbersome to implement in ActiveRecord. Thus,
https://github.com/charlotte-ruby/impressionist
seems like a decent alternative; keeping the view counts in an alternate table and utilizing a gem with a blacklist of over 1200 robots, etc, etc.
Ignoring the inefficiency of the query is there an existing Rails wrapper for abstracting the differences between rand() across databases?
MySQL:
User.first(order: 'random()')
Postgres:
User.first(order: 'rand()')
I can create my own constant RAND_STR but was wondering if one already existed?
Please note - this is not an efficiency question
I know the query is inefficient I'm just wondering about the abstraction
You could so something like User.offset(rand(User.count)).first, but that would wind up executing two queries. Other than that, Rails doesn't have any built-in way to do this that I know of.
See Rails 3: Get Random Record
I'm sure I used to have this problem however now it seems to have disappeared. Both SQLlite and Postgres are accepting RANDOM(). Not sure whether this is Rails' handiwork but whatever the reason it is now fixed.
Is there anything (warnings, advices) that I should know if I want to develop an inventory management system using Ruby on Rails. The biggest problem that I could think of is on how to do long calculations on the stocks. The other one would be on how to do cachings on stock counts. BTW, I'll be using MySQL as the database. Thanks in advance.
I think, there is no reason for not writing it with Rails.
To the caching of stock count's, there is a method in Rails, which is named cache_column. This caches the number of relations in a column.
And to do big calculations on stocks. I don't know why this should be a problem.
And if this would da heavy work, you can put it into a worker.
there is no argument that speaks against using Ruby on Rails for that.
if you want to make big calculations on a database level (like SUM) be sure to use BIGINT explicitely in your migrations for this column, as the MySQL Integer (signed) supports a Maximum of 2147483647, and the result of your calculation will be computed in the same data type by MySQL.
To keep track of cached stock counts, use counter_cache