In my current project (Rails 2.3) we have a collection of 1.2 million keywords, and each of them is associated with a landing page, which is effectively a search results page for a given keywords. Each of those pages is pretty complicated, so it can take a long time to generate (up to 2 seconds with a moderate load, even longer during traffic spikes, with current hardware). The problem is that 99.9% of visits to those pages are new visits (via search engines), so it doesn't help a lot to cache it on the first visit: it will still be slow for that visit, and the next visit could be in several weeks.
I'd really like to make those pages faster, but I don't have too many ideas on how to do it. A couple of things that come to mind:
build a cache for all keywords beforehand (with a very long TTL, a month or so). However, building and maintaing this cache can be a real pain, and the search results on the page might be outdated, or even no longer accessible;
given the volatile nature of this data, don't try to cache anything at all, and just try to scale out to keep up with traffic.
I'd really appreciate any feedback on this problem.

Something isn't quite adding up from your description. When you say 99.9% being new visits, that is actually pretty unimportant. When you cache a page you're not just caching it for one visitor. But perhaps you're saying that for 99.9% of those pages, there is only 1 hit every few weeks. Or maybe you mean that 99.9% of visits are to a page that only gets hit rarely?
In any case, the first thing I would be interested in knowing is whether there is a sizable percentage of pages that could benefit from full page caching? What defines a page as benefitting from caching? Well, the ratio of hits to updates is the most important metric there. For instance, even a page that only gets hit once a day could benefit significantly from caching if it only needs to be updated once a year.
In many cases page caching can't do much, so then you need to dig into more specifics. First, profile the pages... what are the slowest parts to generate? What parts have the most frequent updates? Are there any parts that are dependent on logged-in state of the user (doesn't sound like you have users though?)?
The lowest-hanging fruit (and what will propagate throughout the system) is good old fashioned optimization. Why does it take 2-seconds to generate a page? Optimize the hell out of your code and data store. But don't go doing things willy-nilly like removing all Rails helpers. Always profile first (NewRelic Silver and Gold are tremendously useful for getting traces from the actual production environment. Definitely worth the cost) Optimize your data store. This could be through denormalization or in extreme cases by switching to different DB technology.
Once you've done all reasonable direct optimization strategy, look at fragment caching. Can the most expensive part of the most commonly accessed pages be cached with a good hit-update ratio? Be wary of solutions that are complicated or require expensive maintenance.
If there is any cardinal rule to optimizing scalability cost it is that you want enough RAM to fit everything you need to serve on a regular basis, because this will always get you more throughput than disk access no matter how clever you try to be about it. How much needs to be in RAM? Well, I don't have a lot of experience at extreme scales, but if you have any disk IO contention then you definitely need more RAM. The last thing you want is IO contention for something that should be fast (ie. logging) because you are waiting for a bunch of stuff that could be in RAM (page data).
One final note. All scalability is really about caching (CPU registers > L1 cache > L2 cache > RAM > SSD Drives > Disc Drives > Network Storage). It's just a question of grain. Page caching is extremely coarse-grained, dead simple, and trivially scalable if you can do it. However for huge data sets (Google) or highly personalized content (Facebook), caching must happen at a much finer-grained level. In Facebook's case, they have to optimize down to the invidual asset. In essence they need to make it so that any piece of data can be accessed in just a few milliseconds from anywhere in their data center. Every page is constructed individually for a single user with a customized list of assets. This all has to be put together in < 500ms.


How do I figure out why a large chunk of memory is not being garbage collected in Rails?

I'm pretty new to Ruby, Rails, and everything else in that ecosystem. I've joined a team that has a Ruby 3.1.2 / Rails 6.1.7 app backed by a Postgres database.
We have a scenario where, sometimes, memory usage on one of our running instances jumps up significantly and is never relinquished (we've waited days). Until today, we didn't know what was causing it or how to reproduce it.
It turns out that it's caused by an internal tool which was running an unbounded ActiveRecord query -- no limit and no paging. When pointing this tool at a more active customer, it takes many seconds, returns thousands of records, and memory usage increases by tens of MB. No amount of waiting will lead to the memory usage going back down again.
Having discovered that, we recently added paging to that particular tool, and in the ~week since, we have not seen usage increasing in giant chunks anymore. However, there are other scenarios which have similar behavior but with smaller payloads; these cause memory usage to increase gradually over time. We deploy this application often enough that it hasn't been a big deal, but I am looking to gain a better understanding of what's happening and to determine if there's a problem here, because that's not what we should see from a stable application that's free of memory leaks.
My first suspicion was a memoized instance variable on the controller, but a quick check indicates that Rails controllers are discarded as soon as the request finishes processing, so I don't think that's it.
My next suspicion was that ActiveRecord was caching my resultset, but I've done a bunch of research on how this works and my understanding is that any cached queries/relations should be released when the request completes. Even if I have that wrong, a subsequent identical request takes just as long and causes another jump in memory usage, so either that's not it, or caching is broken on our system.
My Google searches turn up lots of results about various caching capabilities in Rails 2, 3, and 5 -- but not much about 6.x, so maybe something significant has changed and I just haven't found it.
I did find ruby-prof, memory-profiler, and get_process_mem -- these all seem like they are only suitable for high-level analysis and wouldn't help me here.
Can I explore the contents of the object graph currently in memory on an existing, live instance of my app? I'm imagining that this would happen in the Rails console, but that's not a constraint on the question. If not, is there some other way that I could find out what is currently in memory, and whether it's just a bunch of fragmented pages or if there's actually something that isn't getting garbage collected?
#engineersmnky pointed out in the comments that maybe everything is fine and that perhaps Ruby is just still holding on to the OS page due to some other still-valid object therein. However, if this is the case, it strikes me as unlikely that memory usage would not go back down to the previous baseline after several days of production usage.
Loading tens of MB worth of resultset into memory should result in the allocation of >1000 16kb memory pages in just a handful of seconds. It seems reasonable to assume that the vast majority of those would contain exclusively this resultset, and could therefore be released as soon as the resultset is garbage collected.
Furthermore, I can reproduce the increased memory usage by running the same unbounded ActiveRecord query in the Rails console, and when I close that console, the memory goes down almost immediately -- exactly what I was expecting to see when the web request completes. I don't fully understand how the Rails console works when connecting to a running application, though, so this may not be relevant.

Can two processes share same page, in case that the other page isn't "full"?

If I have a process which needs 6KB of RAM and the page size is 4KB, I need to allocate two pages. Can another PROCESS access the remaining 2KB for himself, so that two processes share same page table?
Can two processes share same page, in case that the other page isn't "full"?
In theory it is possible, but there are significant issues:
As #Peter says, allowing both processes to share the same page would bypass traditional process memory protections, as for most processors access protection granularity would be no smaller than a whole page.
The two processes would have to coordinate in some way about who gets what part of that shared page.  This could range from
simple coordination that says, process 1 gets the first half and process 2 gets the 2nd half — but this becomes silly when process 1 or process 2 needs more memory, since at that point probably would have been better off simply having their own pages.
the processes communicate with each other to formalize a split between who gets what of that page.  Such communication would typically be some kind of synchronization, which is a bottleneck for many situations.
Consider multiple threads in the same process — some modern runtime systems, e.g. for Java and C#, provide a separate per-thread heap so that simple memory allocations do not require synchronization with other threads.
Having one page that is not full represents less than a page of waste per process, which is not very high overhead, so not really a problem that needs solving, given the issues of security and coordination.
Effectively, the operating systems already share the whole of physical memory between processes albeit at a page granularity, so there is sharing (just not intra-page sharing) and the amount of waste is bounded.

Is it safe to leave MiniProfiler's ProfiledDbConnection in production code?

Given that MiniProfiler isnt actually running for non-local requests due to the following:
protected void Application_BeginRequest()
if (Request.IsLocal)
Is it still okay then (performance-wise) to leave the use of ProfiledDbConnection in production code?
var db = new MyDataContext(new StackExchange.Profiling.Data.ProfiledDbConnection(new SqlConnection(System.Configuration.ConfigurationManager.ConnectionStrings["MyConnectionString"].ConnectionString), MiniProfiler.Current))
In short: yes.
Every page on Stack Overflow is running it, so that connection sees something like a hundred billion uses a month just here (and we're at ~2-4% CPU load at peak traffic). It's meant to be extremely low overhead and safe for production code. Compared to the trip off-box you're about to take to query SQL, Redis, etc. - the profiling bits are very minimal.
I strive to keep it small because our primary use case is Stack Overflow scale and load. The only likely way it is problematic would be to unboundedly store profiles for a large number of requests. They have to eat storage somewhere and that might be an issue at scale.
For Stack Overflow: we're profiling all requests and storing the slow ones, for further analysis and to keep an eye on problematic routes. So not only is profiled db connection present, it's profiling, on all requests here. I'm not trying to brag on numbers, only provide a hopefully compelling use case for: it's fine, we're slamming it hard and have a vested interest in keeping it low-overhead.

Generate thumbnail images at run-time when requested, or pre-generate thumbnail in harddisk?

I was wondering, which way of managing thumbnail images make less impact to web server performance.
This is the scenario:
1) each order can have maximum of 10 images.
2) images does not need to store after order has completed (max period is 2 weeks).
3) potentially, there may have a few thousands of active orders at anytime.
4) orders with images will frequently visit by customers.
IMO, pre-generate thumbnail in hard disk is a better solution as hard disk are cheaper even with RAID.
But what about disk I/O speed, and resource it need to load images? will it take more resource than generate thumbnails at real-time?
It would be most appreciate if you could share your opinion.
I suggest a combination of both - dynamic generation with disk caching. This prevents wasted space from unused images, yet adds absolutely no overhead for repeatedly requested images. SQL and mem caching are not good choices, both require too much RAM. IIS can serve large images from disk while only using 100k of RAM.
While creating http://imageresizing.net, I discovered 29 image resizing pitfalls, and few of them are obvious. I strongly suggest reading the list, even if it's a bit boring. You'll need an HttpModule to be able to pass cached requests off to IIS.
Although - why re-invent the wheel? The ImageResizer library is widely used and well tested.
If the orders are visited frequently by customers, it is better to create the thumbnails ones and store on disk. this way the webserver doesn't need to process the page that long. It will speed up the loading time of your webpages.
It depends on your load. If the resource is being requested multiple times then it makes sense to cache it.
Will there always have to be an image? If not, you can create it on the first request and then cache it either in memory, or more likely a database, for subsequent requests.
However, if you always need the n images to exists per order, and/or you have multiple orders being created regularly, you will be better off passing the thumbnail creation off to a worker thread or some kind of asynchronous page. That way, multiple request's can be stacked up, reducing load on the server.

What will happen if a application is large enough to be loaded into the available RAM memory?

There is chance were a heavy weight application that needs to be launched in a low configuration system.. (Especially when the system has too less memory)
Also when we have already opened lot of application in the system & we keep on trying opening new new application what would happen?
I have only seen applications taking time to process or hangs up for sometime when I try operating with it in low config. system with low memory and old processors..
How it is able to accomodate many applications when the memory is low..? (like 128 MB or lesser..)
Does it involves any paging or something else..?
Can someone please let me know the theory behind this..!
"Heavyweight" is a very vague term. When the OS loads your program, the EXE is mapped in your address space, but only the code pages that run (or data pages that are referenced) are paged in as necessary.
You will likely get horrible performance if pages need to constantly be swapped as the program runs (aka many hard page faults), but it should work.
Since your commit charge is near the commit limit, and the commit limit will likely have no room to grow, you will also likely recieve many malloc()/VirtualAlloc(..., MEM_COMMIT)/HeapAlloc()/{Local|Global}Alloc() failures so you need to watch the return codes in your program.
Some keywords for search engines are: paging, swapping, virtual memory.
Wikipedia has an article called Paging (Redirected from Swap space).
There is often the use of virtual memory. Virtual memory pages are mapped to physical memory if they are used. If a physical page is needed and no page is available, another is written to disk. This is called swapping and that explains why crowded systems get slow and memory upgrades have positive effects on performance.
