Web application Load Tests: What metrics to look at? - load-testing

During a stress/load test of a ASP.NET app hosted in IIS, what should I be monitoring on the app server?
For example, the built in utility performance monitor in windows has a huge list of counters that I can monitor. But, I don't even know what half of these counters actually mean? I know I want to look at things like memory, processor, network....but it is pretty general.
How can I successfully find a problem area?
What counters some of you guys have used in the past?

These metrics we watch to determine if requests are being serviced promptly and the volume is scaling linearly with the applied load:
Queued Requests
Current Requests
Requests Executing
Requests Succeeded
Requests/sec
We will also watch these to look for application problems
Errors/sec
Unhandled Execution Errors/sec
To monitor the VM memory, we look at:
CLR Heap Size
CLR Generation 0, 1 & 2 Garbage collections
CLR Percent Time in GC
For locking conditions, we watch:
CLR Lock Contentions
CLR Lock Contention/sec
CLR Lock Contention Queue Length
Depending on the application we might look at others, like thread counts, but the above are the ones we look at most frequently.

Related

Perfino System Requirements

We're planning to evaluate and eventually potentially purchase perfino. I went quickly through the docs and cannot find the system requirements for the installation. Also I cannot find it's compatibility with JBoss 7.1. Can you provide details please?
There are no hard system requirements for disk space, it depends on the amount of business transactions that you're recording. All data will be consolidated, so the database reaches a maximum size after a while, but it's not possible to say what that size will be. Consolidation times can be configured in the general settings.
There are also no hard system requirements for CPU and physical memory. A low-end machine will have no problems monitoring 100 JVMs, but the exact details again depend on the amount of monitored business transactions.
JBoss 7.1 is supported. "Supported" means that web service and EJB calls can be tracked between JVMs, otherwise all application servers work with perfino.
I haven't found any official system requirements, but this is what we figured out experimentally.
We collect about 10,000 transactions a minute from 8 JVMs. We have a lot of distinct and long SQL queries. We use AWS machine with 2 VCPUs and 8GB RAM.
When the Perfino GUI is not being used, the CPU load is low. However, for the GUI to work properly, we had to modify perfino_service.vmoptions:
-Xmx6000m. Before that we had experienced multiple OutOfMemoryError in Perfino when filtering in the transactions view. After changing the memory settings, the GUI is running fine.
This means that you need a machine with about 8GB RAM. I guess this depends on the number of distinct transactions you collect. Our limit is high, at 30,000.
After 6 weeks of usage, there's 7GB of files in the perfino directory. Perfino can clear old recordings after a configurable time.

Best setting for HTTPJVMMaxHeapSize in Domino 8.5.3 64 Bit

I am trying to find a definitive answer as to what the best setting is for the JVM Heap Size in Domino 8.5.3 FP4 - 64 Bit for Windows.
I know that by default it's set to 1024M. Some web sites have suggested that it's recommended to be 1G / 1024M - but that's the default setting so it that as good as it gets?
Other sites have said 25% of available RAM.
My Domino server has 12GB RAM available. It's currently got HTTPJVMMaxHeapSize = 1024M and Task Manager tells me that around 7GB of RAM is in use and nhttp.exe is using around 1.1GB. I want to increase this Heap to 2GB or 3GB if possible - will there be any issues doing to?
I'm running Windows Server 2008 R2 Standard Edition.
Speaking from an XPages perspective:
1 - Understand the dynamics of the working set and hardware
That is, understand what the application code, server runtime, and hardware profile is doing when processing a given working set (ie: the XPages application(s) within the server). Is the application coded in a non-optimal manner in terms of lifecycle execution and memory usage? Is the application making use of memory or disk persistence for component tree serialization? Is the server assigned an adequate amount of JVM memory? Is the hardware providing enough CPU and memory?
2 - Profile and monitor the working set with upper limit loads
To fully answer some of the questions in #1, detailed performance and scalability profiling must be carried out using tools like the XPages Toolbox and Eclipse Memory Analyzer. Furthermore, test the working set using Rational Performance Tester (or some other performance testing tool) to mimic real life concurrent workloads in a test environment. This allows you to set up a test environment where you can hit your application with (n) number of concurrent users using automation and collect that all invaluable data on health etc.
3 - Analyse the profile information to identify bottlenecks within the working set
Remember your working set can be one or more applications. Each doing something different, and having different load requirements. Be specific about the task at hand - do you want to tune the system more generally for all applications (for an average scale) or do you want the server to be fully optimized for a specific application (for a targeted scale)?
4 - Optimize the working set where applicable
Get in and make changes to the XSP / Java / ServerSide JavaScript code where applicable - use your knowledge of the XPages Request Processing Lifecycle and also look out for those hungry JVM memory consumers under high end load scenarios. Always favor disk persistence (disk storage is cheaper than RAM!) and code your custom Java objects and Managed Beans accordingly to cope with serialization and restoration. And make the trade-offs between function and speed in these scenario's where high scalability ends up burning CPU... a smarter UX with targetted functions / actions etc.
5 - Scale the hardware where applicable
Be prepared to increase cores, clock speeds, RAM, disk storage based on the needs of the working set - a cyclic approach to profiling, monitoring, and optimizing the working set will shed more and more light on the suitability of the hardware as this process evolves.
5 - Repeat from #2 until the working set and hardware performs and scales to the expected requirements / load expectancy of the system
I would recommend to set server's notes.ini param JavaVerboseGC=1 with console output into console.log in IBM_TECHNICAL_SUPPORT directory. After some time, take that log file and use IBM Support Assistant with tool IBM Monitoring and Diagnostic Tools for Java - Garbage Collection and Memory Visualizer (GCMV).
This could help how to interpret collected data: http://www-01.ibm.com/support/docview.wss?uid=swg27013824&aid=1.
You definitely should do that if you get OutOfMemoryException.
Heap size set too high can lead to fewer GC runs but taking longer time - server will periodically "hang" for very short time. So too high setting is not recommended.

Is DataSnap Optimized for responding to more than 1k users at the same time?

We want to start a big multi-tier application. The server side application must respond to more than 1000 users at the same time. We want to create server application by 64 bit compiler and client side with 32 bit. In this case we don't know DataSnap can respond to all client without any problem or not?
In this case The Server computer is very powerful (multi-processor and more than 16GB of RAM) and DataBase Management system is FireBird 2.5.
You need a way to perform realistic load tests.
For the Firebird database, you can simulate concurrent users with the free Apache JMeter tool. It can run SQL statements and record their execution time statistics (average, min/max etc.). So you could for example create a thread group with twenty different SQL queries, and then run twenty threads which each will perform these queries sequentially.
JMeter allows to define time limits on the SQL query, and JMeter treats it as an error if the query exceeds this limit. Then you can try to find the maximum client count where the overall error rate is still less than (for example) five percent.
But you also need to know how high the expected database load will be, and you will also need to have a test database with a realistic size, not only a couple of records. Also, some database queries like reports might cause higher load - these should be included in the simulation too, as they can affect overall performance. In JMeter, you can create a second thread group, running in parallel with the first one, for these long-running statements with different settings (less simulated clients).
Testing the database will show if there is a bottleneck already in this area. For example, the test result could be that the database can serve twenty clients with a total average transaction rate of 20 TPS (transactions per second), which means one client executes one transaction per second. But this TPS value will decrease with higher user count.
Related question: Firebird usage in big projects which also has a link to http://www.firebirdsql.org/en/case-studies-catalog/
Regarding DataSnap client load simulation: this can be done with a scripted client, which runs a predefined set of statements / commands over the connection.
To run a high number of load test clients simultaneaously you could use a service like Amazon Elastic Compute Cloud (EC2), to launch clones of your test machine image, saving you hardware costs. But of course I would start with a small client machine which simply runs ten or twenty scripted clients.
As far as I know DataSnap is based on Indy. And Indy's connection handling model is not very scaleable - one thread per connection, which is very resource consuming. Even using Indy's thread pools is not an option I think... Also in Windows (32 bit) for example there is a limit of the maximum threads you can create (2000 IIRC). Anyway - using many threads is not good and hits performance of the server (for reference - Windows Internals book, Windows Performance Team blog etc.)
A scalable, robust and professional application server would use IO Completion ports (IOCP) for data processing. But I don't know if DataSnap can take advantage of it.
UPDATE:
On the CodeRage7 I asked similar scaleability questions. Here are the answers:
Q: Recently there was a question on StackOverflow about DataSnap's scaleability/performance. So can DS handle, for example, 2000 or more concurent user request at the network and application level?
A: the scalability is based on scalability of TCP/HTTP/HTTPs and # of connections allowed in your server operating system. Also based on memory and hardware you employ. There is no specific limit in DataSnap.
My comment: While this is true, Indy's connection handling model, i.e. one thread per connection, introduces bottleneck especially in 32 bit Windows (2000 threads max). In Win64 it should not be so much problem, but again - this kind of handling data flow leads to performance degradation.
Q: Does DataSnap support some kind of load balancing?
A: Not directly. You can do this in code in your DataSnap server(s).
My comment: I've found very good paper on implementing Failover/Load Balancing in DataSnap in Andreano Lanusse's blog
Q: Does DataSnap support IO Completion ports for better scaleability?
This my question was left unanswered.
Hope this helps!
UPDATE2:
I found very interesting post on DS Performance: DataSnap analysis based on Speed & Stability tests
UPDATE3:
DataSnap, Deployment, Performance, and More (Marco Cantu)
Monitoring and control of connections in DataSnap XE2 - translated in English
Monitoring and control of connections in DataSnap XE2 - original
When the specifications for a system are made, you need to be very precise when it's about multiple users.
For example: you create a website, and the client expects 15.000 unique users.
Then the client usually comes up with a requirement that the system needs to support 15.000 simultaneous users, which is very naive.
You'll need a more detailed specification than that.
Usually it's more sensible to say something like: in 99% of the requests, 99% of the users can get a response to their request within 5 seconds average.
In normal usage, you'll never see all users send a request within the same second. If at some point they all arrive within the same minute (also very unlikely), you'll have a lot fewer concurrent users.
Even for websites with tens of thousands of users, where most of them connect on a daily basis, the webserver is idle most the time, and once and a while it jumps to 5% or in extreme cases to 20%. If we really have to serve all of these users at once we'd be screwed, but that never happens, and it's not realistic to prepare a server for such loads.

Windows Mobile memory corruption

Is WM operating system protects process memory against one another?
Can one badly written application crash some other application just mistakenly writing over the first one memory?
Windows Mobile, at least in all current incarnations, is build on Windows CE 5.0 and therefore uses CE 5.0's memory model (which is the same as it was in CE 3.0). The OS doesn't actually do a lot to protect process memory, but it does enough to generally keep processes from interfering with one another. It's not hard and fast though.
CE processes run in "slots" of which there are 32. The currently running process gets swapped to slot zero, and it's addresses are re-based to zero (so all memory in the running process effectively has 2 addresses, the slot 0 address and it's non-zero slot address). These addresses are proctected (though there's a simple API call to cross the boundary). This means that pointer corruptions, etc will not step on other apps but if you want to, you still can.
Also CE has the concept of shared memory. All processes have access to this area and it is 100% unprotected. If your app is using shared memory (and the memory manager can give you a shared address without you specifically asking, depending on your allocation and its size). If you have shared memory then yes, any process can access that data, including corrupting it, and you will get no error or warning in either process.
Is WM operating system protects process memory against one another?
Yes.
Can one badly written application crash some other application just mistakenly writing over the first one memory?
No (but it might do other things like use up all the 'disk' space).
Even if you're a device driver, to get permission to write to memory that's owned by a different process there's an API which you must invoke explicitly.
While ChrisW's answer is technically correct, my experience of Windows mobile is that it is much easier to crash the entire device from an application than it is on the desktop. I could guess at a few reasons why this is the case;
The operating sytem is often much more heavily OEMed than Windows desktop, that is the amount of manufacturer specific low level code can be very high, which leads to manufacturer specific bugs at a level that can cause bad crashes. On many devices it is common to see a new firmware revision every month or so, where the revisions are fixes to such bugs.
Resources are scarcer, and an application that exhausts all available resources is liable to cause a crash.
The protection mechanisms and architecture vary quite a bit. The device I'm currently working with is SH4 based, while you mostly see ARM, X86 and the odd MIPs CPU..

Monitoring CPU Core Usage on Terminal Servers

I have windows 2003 terminal servers, multi-core. I'm looking for a way to monitor individual CPU core usage on these servers. It is possible for an end-user to have a run-away process (e.g. Internet Explorer or Outlook). The core for that process may spike to near 100% leaving the other cores 'normal'. Thus, the overall CPU usage on the server is just the total of all the cores or if 7 of the cores on a 8 core server are idle and the 8th is running at 100% then 1/8 = 12.5% usage.
What utility can I use to monitor multiple servers ? If the CPU usage for a core is "high" what would I use to determine the offending process and then how could I automatically kill that process if it was on the 'approved kill process' list?
A product from http://www.packettrap.com/ called PT360 would be perfect except they use SMNP to get data and SMNP appears to only give total CPU usage, it's not broken out by an individual core. Take a look at their Dashboard option with the CPU gauge 'gadget'. That's exactly what I need if only it worked at the core level.
Any ideas?
Individual CPU usage is available through the standard windows performance counters. You can monitor this in perfmon.
However, it won't give you the result you are looking for. Unless a thread/process has been explicitly bound to a single CPU then a run-away process will not spike one core to 100% while all the others idle. The run-away process will bounce around between all the processors. I don't know why windows schedules threads this way, presumably because there is no gain from forcing affinity and some loss due to having to handle interrupts on particular cores.
You can see this easily enough just in task manager. Watch the individual CPU graphs when you have a single compute bound process running.
You can give Spotlight on Windows a try. You can graphically drill into all sorts of performance and load indicators. Its freeware.
perfmon from Microsoft can monitor each individual CPU. perfmon also works remote and you can monitor farious aspects of Windows.
I'm not sure if it helps to find run-away processes because the Windows scheduler dos not execute a process always on the same CPU -> on your 8 CPU machine you will see 12.5 % usage on all CPU's if one process runs away.

Resources