Why System.Web.Cache lost cache once in a while? - asp.net-mvc

I've been using System.Web.Cache for a while for testing purpose. It's quite a nice cache store and speed up my webpage quite a lot. But i don't know there are some case, in which i run for a few more more page, and when I turn back to that page few more time, the page query again ( I checked using a profiler ).
Does System.web.cache cache in Ram or some type of flash memory that make the cache go off once in a while when it's low on memory? or is it my mistake somewhere? Is it good to use System.web.cache for production?
best wishes to you :)

The cache will automatically start removing items when your system runs low on memory, the items it picks can be controlled to some degree by the priority you give them when you insert them into the cache. Use the CacheItemPriority enum to specify the priority in the Cache.Add() method. Yes the cache is fine for production, whether it is good or not for your specific implementation only you can tell.

The other issue to watch for is when the IIS application pool gets recycled.

Yes, ASP.NET cache is perfectly fine for production use (however, consider Velocity if you have a web farm). And yes, it does automatically remove items based on memory, item priority & other "metrics".

Related

Memory constantly increasing in Rails app

I recently launched a new Ruby on Rails application that worked well in development mode. After the launch I have been experiencing the memory being used is constantly increasing:
UPDATED: When this screen dump (the one below) from New Relic was taken. I have scheduled a web dyno restart every hour (one out of two web dynos). Thus, it does not reach the 500Mb-crash level and it actually gets a bit of a sig saw pattern. The problem is not at all resolved by this though, only some of the symptoms. As you can see the morning is not so busy but the afternoon is more busy. I made an upload at 11.30 for a small detail, it could not have affected the problem even though it appears as such in the stats.
It could be noted as well that it is the MIN memory that keeps on increasing even though the graph shows AVG memory. Even when the graph seems to go down temporarily in the graph, the min memory stays the same or increases. The MIN memory never decreases!
The app would (without dyno restarts) increase in memory until it reached the maximum level at Heroku and the app crashes with execution expired-types of errors.
I am not a great programmer but I have made a few apps before without having this type of problem.
Troubleshooting performed
A. I thought the problem would lie within the before_filter in the application_controller (Will variables in application controller cause a memory leak in Rails?) but that wasn't the problem.
B. I installed oink but it does not give any results (at all). It creates an oink.log but does not give any results when I run "heroku run oink -m log/oink.log", no matter what threshold.
C. I tried bleak_house but it was deprecated and could not be installed
D. I have googled and read most articles in the topic but I am none the wiser.
E. I would love to test memprof but I can't install it (I have Ruby 1.9x and don't really know how to downgrade it to 1.8x)
My questions:
Q1. What I really would love to know is the name(s) of the variable(s) that are increasing for each request, or at least which controller is using the most memory.
Q2. Will a controller as the below code increase in memory?
related_feed_categories = []
#gift.tags.each do |tag|
tag.category_connections.each do |cc|
related_feed_categories << cc.category_from_feed
end
end
(sorry, SO won't re-format the code to be easily readable for some reason).
Do I need to "kill off" related_feed_categories with "related_feed_categories = nil" afterwards or does the Garbage Collector handle that?
Q3. What would be my major things to look for? Right now I can't narrow it down AT ALL. I don't know which part of the code to look deeper into, and I don't really know what to look for.
Q4. In case I really cannot solve the problem. Are there any online consulting service where I can send my code and get them to find the problem?
Thanks!
UPDATED. After receiving comments it could have to do with sessions. This is a part of the code that I guess could be bad:
# Create sessions for last generation
friend_data_arr = [#generator.age, #generator.price_low, #generator.price_high]
friend_positive_tags_arr = []
friend_negative_tags_arr = []
friend_positive_tags_arr << #positive_tags
friend_negative_tags_arr << #negative_tags
session["last_generator"] = [friend_data_arr, friend_positive_tags_arr, friend_negative_tags_arr]
# Clean variables
friend_data_arr = nil
friend_positive_tags_arr = nil
friend_negative_tags_arr = nil
it is used in the generator#show controller. When some gifts have been generated through my gift-generating-engine I save the input in a session (in case they want to use that info in a later stage). I never kill or expire these sessions so in case this could cause memory increase.
Updated again: I removed this piece of code but the memory still increases, so I guess this part is not it but similar code might causing the error?
That's unlikely our related_feed_categories provoke this.
Are you using a lot of files ?
How long do you keep sessions datas ? Looks like you have an e-commerce site, are you keeping objects in sessions ?
Basically, i think it is files, or session, or an increase in temporary datas flushed when the server crash(memcache ?).
In the middle of the night, i guess that ou have fewer customer. Can you post the same memory chart, in peak hours ?
It may be related to this problem : Memory grows indefinitely in an empty Rails app
UPDATE :
Rails don't store all the datas on client side. I don't remember the default store, bu unless you choose the cookie::store, rails send only datas like session_id.
They are few guidelines about sessions, the ActiveRecord::SessionStore seem to be the best choice for performance purpose. And you shouldn't keep large objects nor secrets datas in sessions. More on session here : http://guides.rubyonrails.org/security.html#what-are-sessions
In the 2.9 part, you have an explanation to destroy sessions, unused for a certain time.
Instead of storing objects in sessions, i suggest you store the url giving the search results. You may even store it in database, offering the possibility to save few research to your customer, and/or by default load the last used.
But at these stage we are still, not totally sure that sessions are the culprits. In order to be sure, you may try on a test server, to stress test your application, with expiring sessions. So basically, you create a large number of sessions, and maybe 20 min later rails has to suppress them. If you find any difference in memory consumption, it will narrow things.
First case : memory drop significantly when sessions expires, you know that's is session related.
Second case : The memory increase at a faster rate, but don't drop when sessions expires, you know that it is user related, but not session related.
Third case : nothing change(memory increase at usual), so you know it do not depend on the number of user. But i don't know what could cause this.
When i said stress tests, i mean a significant number of sessions, not really a stress test. The number of sessions you need, depends on your average numbers of users. If you had 50 users, before your app crashed, 20 -30 sessions may be sginificant. So if you had them by hand, configure a higher expire time limit. We are just looking for differences in memory comsuption.
Update 2 :
So this is most likely a memory leak. So use object space, it has a count_objects method, which will display all the objets currently used. It should narrow things. Use it when memory have already increased a lot.
Otherwise, you have bleak_house, a gem able to find memory leaks, still ruby tools for memory leaks are not as efficient as java ones, but it's worth a try.
Github : https://github.com/evan/bleak_house
Update 3 :
This may be an explanation, this is not really memory leak, but it grows memory :
http://www.tricksonrails.com/2010/06/avoid-memory-leaks-in-ruby-rails-code-and-protect-against-denial-of-service/
In short, symbols are keep in memory until your restart ruby. So if symbols are created with random name, memory will grow, until your app crash. This don't happen with Strings, the are GCed.
Bit old, but valid for ruby 1.9.x Try this : Symbol.all_symbols.size
Update 4:
So, your symbols are probably the memory leak. Now we still have to find where it occurs. Use Symbol.all_symbols. It gives you the list. I guess you may store this somewhere, and make a diff with the new array, in order to see what was added.
It may be i18n, or it may be something else generating in an implicit way like i18n. But anyway, this is probably generating symbols with random data in the name. And then these symbols are never used again.
Assuming category_from_feed returns a string (or a symbol perhaps), a magnitude of 300MB increase is quite unlikely. You can roughly arrive at this by profiling this:
4_000_000.times {related_feed_categories << "Loooooooooooooong string" }
This snippet would shoot the memory usage up by about 110MB.
I'd look at DB connections or methods that read a file and then don't close it. I can see that it's related to feeds which probably means you might be using XML. That can be a starting point too.
Posting this as answer because this looks bad in comments :/

safe and performing way to save site wide variable

I am building a rails app, have a site wide counter variable (counter) that is used (read and write) by different pages, basically many pages can cause the counter increment, what is a multi-thread-safe way to store this variable so that
1) it is thread-safe, I may have many concurrent user access might read and write this variable
2) high performing, I originally thought about persist this variable in DB, but wondering is there a better way given there can be high volume of request and I don't want to make this DB query the bottleneck of my app...
suggestions?
It depends whether it has to be perfectly accurate. Assuming not, you can store it in memcached and sync to the database occasionally. If memcached crashes, it expires (shouldn't happen if configured properly), you have to shutdown, etc., reload it from the database on startup.
You could also look at membase. I haven't tried it, but to simplify, it's a distributed memcached server that automatically persists to disk.
For better performance and accuracy, you could look at a sharded approach.
Well you need persistence, so you have to store it in the Database/Session/File, AFAIK.

When/what to cache in Rails 3

Caching is something that I kind of ignored for a long time, as projects that I worked on were on local intranets with very little activity. I'm working on a much larger Rails 3 personal project now, and I'm trying to work out what and when I should cache things.
How do people generally determine this?
If I know a site is going to be relatively low-activity, should I just cache every single page?
If I have a page that calls several partials, is it better to do fragment caching in those partials, or page caching on those partials?
The Ruby on Rails guides did a fine job of explaining how caching in Rails 3 works, but I'm having trouble understanding the decision-making process associated with it.
Don't ever cache for the sake of it, cache because there's a need (with the exception of something like the homepage, which you know is going to be super popular.) Launch the site, and either parse your logs or use something like NewRelic to see what's slow. From there, you can work out what's worth caching.
Generally though, if something takes 500ms to complete, you should cache, and if it's over 1 second, you're probably doing too much in the request, and you should farm whatever you're doing to a background process…for example, fetching a Twitter feed, or manipulating images.
EDIT: See apneadiving's answer too, he links to some great screencasts (albeit based on Rails 2, but the theory is the same.)
You'll want to think about caching several kinds of things:
Requests that are hit a lot, and seldom change
Requests that are "expensive" to draw, lots of database calls, etc. Also hopefully these seldom change.
The other side of caching that shouldn't go without mention, is expiration. Its also often the harder part. You have to know when a cache is no longer good, and clear it out so fresh content will be generated. Sweepers, or Observers, depending on how you implement your cache can help you with this. You could also do it just based on a time value, allow caches to have a max-age and clear them after that no matter what.
As for fragment vs full page caching, think of it in terms of how often those parts are updated. If 3 partials of a page are never updated, and one is, maybe you want to cache those 3, and allow that 1 to be fetched live for so you can have up to the second accuracy. Or if the different partials of a page should have different caching rules: maybe a "timeline" section is cached, but has a cache-age of 1 minute. While the "friends" partial is cached for 12 hours.
Hope this helps!
If the site is relatively low activity you shouldn't cache any page. You cache because of performance problems, and performance problems come about because you have too much data to query, too many users, or worse, both of those situations at the same time.
Before you even think about caching, the first thing you do is look through your application for the requests that are taking up the most time. Not the slowest requests, but the requests your application spends the most aggregate time performing. That is if you have a request A that runs 10 times at 1500ms and request B that runs 5000 times at 250ms you work on optimizing B first.
It's actually pretty easy to grep through your production.log and extract rendering times and URLs to combine them into a simple report. You can even do that in real-time if you want.
Once you've identified a problematic request, you go about picking apart what it's doing to service the request. The first thing is to look for any queries that can be combined by using eager loading or by looking ahead a bit more to anticipate what you'll need. The next thing is to ensure you're not loading data that isn't used.
So many times you'll see code to list users and it's loading 50KB per person of biographical data, their Facebook and Twitter handles, literally everything about them, and all you use is their name.
Fetch as little as you need, and fetch it in the most efficient way you can. Use connection.select_rows when you don't need models.
The next step is to look at what kind of queries you're running, and how they're under-performing. Ensure your indexes are all set properly and are being used. Check that you're not doing complicated JOIN operations that could be resolved by a bit of tactical de-normalization.
Have a look at what data you are storing in your application, and try and find things that can be removed from your production database and warehoused somewhere else. Cycle your data out regularly when it's no longer relevant, preserve it in a separate database if you need to.
Then go over and have a look at how your database server is tuned. Does it have sufficiently large buffers? Is it on hardware that could be upgraded with more memory at a nominal cost? Too many people are running a completely un-tuned database server and with a few simple settings they can get ten-fold performance increases.
If, and only if, you still have a performance problem at this point then you might want to consider caching.
You know why you don't cache first? It's because once you cache something, that cached data is immediately stale. If parts of your application use this data under the assumption it's always up to date, you will have problems. If you don't expire this cache when the data does change, you will have problems. If you cache the data and never use it again, you're just clogging up your cache and you will have problems. Basically you'll have lots of problems when you use caching, so it's often a last resort.

ASP.NET MVC 3: what and when to cache and how to decide?

I have been neglecting learning about caching for quite some time now, and although I've used caching here and there in the past it's not something I'm familiar with.
I found a great tutorial about what caching is and what kinds of cache there are (I know already what caching is), but...
How does one decide what and when to cache? Are there things that should always be cached? On what situations should you never use caching?
First rule is: Don't cache until you need it, that would be premature optimization (first link I found, google for more info)
The biggest problem with caching is invalidation of cache. What happens when the data you have cached is being updated. You need to make sure your cache is updated as well and if not done correctly often becomes a mess.
I would:
Build the application without
caching and make sure the
functionality works as intended
Do some performance testing, and
apply caching when needed
After applying caching do
performance testing again to check
that you are getting the expected speed increase
I think the easiest way is to ask yourself a bunch of questions,
Is this result ever going to change?
No? then cache it permanently
Yes, When is it going to change? When a user updates something.
Is it going to impact only the particular user who changed the value or all of the users. This should give you an indication of when to clear the particular cache.
You can keep on going, but after awhile you will end up with different profiles
UserCache, GlobalCache just being 2 examples.
These profiles should be able to tell you what to cache and have a certain update criteria (When to refresh the cache)

possible to have two working sets? 1) data 2) code

In regards to Operating System concepts... Can a process to have two working sets, one that represents data and another that represents code?
A "Working Set" is a term associated with Virtual Memory Manangement in Operating systems, however it is an abstract idea.
A working set is just the concept that there is a set of virtual memory pages that the application is currently working with and that there are other pages it isn't working with. Any page that is being currently used by the application is by definition part of the 'Working Set', so its impossible to have two.
Operating systems often do distinguish between code and data in a process using various page permissions and memory protection but this is a different concept than a "Working set".
This depends on the OS.
But on common OSes like Windows there is no real difference between data and code so no, it can't split up it's working set in data and code.
As you know, the working set is the set of pages that a process needs to have in primary store to avoid thrashing. If some of these are code, and others data, it doesn't matter - the point is that the process needs regular access to these pages.
If you want to subdivide the working set into code and data and possibly other categorizations, to try to model what pages make up the working set, that's fine, but the working set as a whole is still all the pages needed, regardless of how these pages are classified.
EDIT: Blocking on I/O - does tis affect the working set?
Remember that the working set is a model of the pages used over a given time period. When the length of time the process is blocked is short compared to the time period being modelled, then it changes little - the wait is insignificant and the working set over the time period being considered is unaffected.
But when the I/O wait is long compared to the modelled preriod, then it changes much. During the period the process is blocked, it's working set is emmpty. An OS could theoretically swap out all the processes' pages on the basis of this.
The working set model attempts to predict what pages the process will need based on it's past behaviour. In this case, if the process is still blocked at time t+1, then the model of an empty working set is correct, but as soon as the process is unblocked, it's working set will be non-empty - the prediction by the model still says no pages needed, so the predictive power of the model breaks down. But this is to be expected - you can't really predict the future. Normally. And the working set is expected to change over time.
This question is from the book "operating system concepts". The answer they are looking for (found elsewhere on the web) is:
Yes, in fact many processors provide two TLBs for this very reason. As
an example, the code being accessed by a process may retain the same
working set for a long period of time. However, the data the code
accesses may change, thus reflecting a change in the working set for
data accesses.
Which seems reasonable but is completely at odds with some of the other answers...

Resources