Does adding an attribute increase load time? - ruby-on-rails

I'm trying to understand factors that slow a website down. Say I define Example.all somewhere on a page. Would adding more attributes to my Example model significantly increase the time needed for the website to load, even if I don't use the said attributes on that page, since the server might have to iterate through more columns?

Depends.
Adding more attributes to your Example model will definitely increase the load time. But compared with other factors, it will just be too minor.
A couple things to speed your page loading
Cache your example model.
Use pagination, only load needed amount of data in one single page.
Only select columns you need. Example:
Example.select(“Name”, “Date”).where(Score: 0)
Consider add profiling tools to better measure your load time composition. Check Miniprofiler out and the example how to use it http://railscasts.com/episodes/368-miniprofiler

Related

Is it better to do direct table loads in a high performance application?

I'm using PostgreSQL in a Rails 3.2 application that receives updates from a third party all day long. Sometimes this third party will throw over 2,000 requests a minute at my application, each update consisting of a large XML file.
Right now I am storing basic information from each XML file into a table. Then, a background process picks up big chunks of data in that table and copies the data into a table using PostgreSQL's COPY feature.
Am I doing the right thing or the wrong thing here? This table that is the load target is also the major CRUD target of the UI. Does the COPY feature lock the entire table when the load happens, and should I be doing a bunch of inserts instead? I originally thought the inserts would be too expensive, but if the direct load locks the whole table then that's going to be a problem.
COPY is the lowest level way to mass-insert records into PostgreSQL. I like your solution to post-process the records in a background job.
Alternatively, if you need to have performance and maintain some Rails/Ruby functionality, consider the
activerecord-import gem. The gem will perform mass-insertions and allow ActiveRecord callbacks and validations to be used as needed. Even if you use this for post-processing of the bulk COPYed records, it may gain you a significant performance increase.
Here is a good article for using activerecord-import:
http://ruby-journal.com/how-to-import-millions-records-via-activerecord-within-minutes-not-hours/
This is what the Postgres team recommends for optimal import performance: http://www.postgresql.org/docs/current/interactive/populate.html

Which is better - Output Caching 6 Child Actions or 6 DB queries using RenderPartial?

I am attempting to develop a cms system that will use tags to display similar content. For example, underneath the news section there will be articles, blogs, news and forum questions that share the same tags. There will be no more than 5 items in each list.
I am considering 2 options for displaying this related content and am wondering if developers with more experience than myself would recommend one over the other?
Speed is the primary goal, because we have equally maintainable ways of executing both options.
Option 1 - Output Caching RenderAction Results
For each 'similar content' section, render a child action on the corresponding controller and cache the output. This feels more in the spirit of MVC, and would be light on DB calls. But with 5 'similar content' lists, that would equal 6 full MVC cycles for each page request.
I've read that RenderAction can still be expensive, even though it has improved in the last couple of years.
Option 2 - RenderPartials with DB Queries for Each
Alternatively, for each 'similar content' section, we could query the db and use RenderPartial to display the output. Although this would require a small DB query for each section (5 items or less), I'm wondering how that would compare with the performance saved by NOT calling RenderAction.
I've frequently read how much faster RenderPartial is compared to RenderAction.
Essentially your choice boils down to which is faster: RenderAction with an already cached result or 5 DB queries?
When you look at it that way, you really are talking about one solution with no network latency and one solution with network latency (sending the query to the database and receiving a response). Any solution that removes network latency is de facto faster than an alternative with network latency.
Also, bear in mind that purists like to talk about how one thing or another is "slow" compared to some other way. Yes, child actions will always be slower than partials because child actions go through the whole routing infrastructure and then finally to the Razor template engine whereas partials just go directly to the Razor template engine. But, we're talking about highly optimized, compiled code running in memory. "Slower" is measured in milliseconds or even nanoseconds. Sure, those can add up over time, and if you did something crazy like render 50 child actions in a single view, you might see some noticeable performance loss, but this will typically not be an issue worth worrying about in 99.9999% of cases.
Just design your application in the way that makes the most sense for your application and stop worrying about a millisecond here or there.

Identifying where to put controller versus model logic

I'm trying to figure out the best (or rather the most practical/efficient) way of doing something in my rails application. Basically what I have is an area where a user must enter some information in several form fields which I currently have DB columns for (income for example).
With that information obtained from the user, some calculations need to be performed (say, for example: income and rrsp contribution need to be run through a simple formula to determine the approximate taxable income of the user).
My question is, would it be best practice to perform said calculation in a method at the model level, and save that processed information in a DB column of its own, or would I perform said calculations of the raw input data at a controller level, requiring processing each time?
I'm guessing it's probably generally best to store the calculated data in the database so it doesn't need to be processed each time, but I'm basically looking for best practices to follow in this case and in general. It probably also depends widely based on the applications specific requirements I'd guess.
My preference is to store raw (or lightly sanitized) data only. Then turn the formulae you need into methods on your model, or perhaps library/helper functions, depending on the structure of your project as a whole.
When you start storing processed data, you need to start worrying about the task of syncing when the source data changes. This can be messy and hard.
Since computers are fast at arithmetic, and for a web application relying on a database the arithmetic is not likely to be your performance bottleneck, i wouldn't worry about the performance overhead. If it became a bottleneck, i might start to think about a cache layer.

Entity, dealing with large number of records (> 35 mlns)

We have a rather large set of related tables with over 35 million related records each. I need to create a couple of WCF methods that would query the database with some parameters (data ranges, type codes, etc.) and return related results sets (from 10 to 10,000 records).
The company is standardized on EF 4.0 but is open to 4.X. I might be able to make argument to migrate to 5.0 but it's less likely.
What’s the best approach to deal with such a large number of records using Entity? Should I create a set of stored procs and call them from Entity or there is something I can do within Entity?
I do not have any control over the databases so I cannot split the tables or create some materialized views or partitioned tables.
Any input/idea/suggestion is greatly appreciated.
At my work I faced a similar situation. We had a database with many tables and most of them contained around 7- 10 million records each. We used Entity framework to display the data but the page seemed to display very slow (like 90 to 100 seconds). Even the sorting on the grid took time. I was given the task to see if it could be optimized or not. and well after profiling it (ANTS profiler) I was able to optimize it (under 7 secs).
so the answer is Yes, Entity framework can handle loads of records (in millions) but some care must be taken
Understand that call to database made only when the actual records are required. all the operations are just used to make the query (SQL) so try to fetch only a piece of data rather then requesting a large number of records. Trim the fetch size as much as possible
Yes, not you should, you must use stored procedures and import them into your model and have function imports for them. You can also call them directly ExecuteStoreCommand(), ExecuteStoreQuery<>(). Sames goes for functions and views but EF has a really odd way of calling functions "SELECT dbo.blah(#id)".
EF performs slower when it has to populate an Entity with deep hierarchy. be extremely careful with entities with deep hierarchy .
Sometimes when you are requesting records and you are not required to modify them you should tell EF not to watch the property changes (AutoDetectChanges). that way record retrieval is much faster
Indexing of database is good but in case of EF it becomes very important. The columns you use for retrieval and sorting should be properly indexed.
When you model is large, VS2010/VS2012 Model designer gets real crazy. so break your model into medium sized models. There is a limitation that the Entities from different models cannot be shared even though they may be pointing to the same table in the database.
When you have to make changes in the same entity at different places, try to use the same entity by passing it and send the changes only once rather than each one fetching a fresh piece, makes changes and stores it (Real performance gain tip).
When you need the info in only one or two columns try not to fetch the full entity. you can either execute your sql directly or have a mini entity something. You may need to cache some frequently used data in your application also.
Transactions are slow. be careful with them.
if you keep these things in mind EF should give almost similar performance as plain ADO.NET if not the same.
My experience with EF4.1, code first: if you only need to read the records (i.e. you won't write them back) you will gain a performance boost by turning of change tracking for your context:
yourDbContext.Configuration.AutoDetectChangesEnabled = false;
Do this before loading any entities. If you need to update the loaded records you can allways call
yourDbContext.ChangeTracker.DetectChanges();
before calling SaveChanges().
The moment I hear statements like: "The company is standardized on EF4 or EF5, or whatever" This sends cold shivers down my spine.
It is the equivalent of a car rental saying "We have standardized on a single car model for our entire fleet".
Or a carpenter saying "I have standardized on chisels as my entire toolkit. I will not have saws, drills etc..."
There is something called the right tool for the right job
This statement only highlights that the person in charge of making key software architecture decisions has no clue about software architecture.
If you are dealing with over 100K records and the datamodels are complex (i.e. non trivial), Maybe EF6 is not the best option.
EF6 is based on the concepts of dynamic reflection and has similar design patterns to Castle Project Active Record
Do you need to load all of the 100K records into memory and perform operations on these ? If yes ask yourself do you really need to do that and why wouldn't executing a stored procedure across the 100K records achieve the same thing. Do some analysis and see what is the actual data usage pattern. Maybe the user performs a search which returns 100K records but they only navigate through the first 200. Example google search, Hardly anyone goes past page 3 of the millions of search results.
If the answer is still yes you need to load all of the 100K records into memory and perform operations. Then maybe you need to consider something else like a custom built write through cache with light weight objects. Maybe lazy load dynamic object pointers for nested objects. etc... One instance where I use something like this is large product catalogs for eCommerce sites where very large numbers of searches get executed against the catalog. Why is in order to provide custom behavior such as early exit search, and regex wildcard search using pre-compiled regex, or custom Hashtable indexes into the product catalog.
There is no one size fits all answer to this question. It all depends the data usage scenarios and how the application works with the data. Consider Gorilla Vs Shark who would win? It all depends on the environment and the context.
Maybe EF6 is perfect for one piece that would benefit from dynamic reflection, While NetTiers is better for another that needs static reflection and an extensible ORM. While low level ADO is perhaps best for extreme high performance pieces.

Is calling a partial an expensive operation?

Can I use partials as much as I want or do I have to restrain myself to avoid bringing my views to a crawl under much traffic?
There's an obvious overhead using partial but this isn't something you should probably worry about.
Partials are files. When you render an action without partials, your action "costs" 1 file (that's not totally true, but this is to simplify the explanation).
If your action renders 4 partials, you end up with a cost of 5. This means, you have 4 additional IO calls and the real cost for each call depends on your server load, server performances and so on.
But does this cost matter? In my experience, for the 99% of times no. Also noteworthy, the benefits of using partials in terms of code readability and maintainability usually are worth the choice.
If performances must be a key feature, you should probably look for speed and improvements elsewhere.
Remember: Ruby is not a super-fast programming language and code expressiveness always took the first place in favor of performances. Rails implicitly agree to this convention, albeit the Rails team has always focused on performances (and Rails 3 it's the practical demonstration there's always place for improvements)
That said, you can safely use partials and reduce the application overhead with some clever caching mechanism. For example, you can place collection rendering within a cache block so that a render collection statement will be executed only once then your app will simply load 1 cache file instead of 10 un-cached partials.
One of the most sneak errors I made many times at the beginning was to worry about performances in the wrong way, that said, without actually running benchmarks. I remember once when I was trying to drop one single database query in favor of an hard coded hash because "the query costs", without realizing there was an other stupid query that was loading an entire table collection without the include statement, that resulted in running the second query 3 times slower.
So, if you really care about performance, you probably shouldn't avoid using partials but instead make sure you are taking advantage of all the other features Rails provides you to scale your application.

Resources