Monitoring CPU Core Usage on Terminal Servers - monitoring

I have windows 2003 terminal servers, multi-core. I'm looking for a way to monitor individual CPU core usage on these servers. It is possible for an end-user to have a run-away process (e.g. Internet Explorer or Outlook). The core for that process may spike to near 100% leaving the other cores 'normal'. Thus, the overall CPU usage on the server is just the total of all the cores or if 7 of the cores on a 8 core server are idle and the 8th is running at 100% then 1/8 = 12.5% usage.
What utility can I use to monitor multiple servers ? If the CPU usage for a core is "high" what would I use to determine the offending process and then how could I automatically kill that process if it was on the 'approved kill process' list?
A product from http://www.packettrap.com/ called PT360 would be perfect except they use SMNP to get data and SMNP appears to only give total CPU usage, it's not broken out by an individual core. Take a look at their Dashboard option with the CPU gauge 'gadget'. That's exactly what I need if only it worked at the core level.
Any ideas?

Individual CPU usage is available through the standard windows performance counters. You can monitor this in perfmon.
However, it won't give you the result you are looking for. Unless a thread/process has been explicitly bound to a single CPU then a run-away process will not spike one core to 100% while all the others idle. The run-away process will bounce around between all the processors. I don't know why windows schedules threads this way, presumably because there is no gain from forcing affinity and some loss due to having to handle interrupts on particular cores.
You can see this easily enough just in task manager. Watch the individual CPU graphs when you have a single compute bound process running.

You can give Spotlight on Windows a try. You can graphically drill into all sorts of performance and load indicators. Its freeware.

perfmon from Microsoft can monitor each individual CPU. perfmon also works remote and you can monitor farious aspects of Windows.
I'm not sure if it helps to find run-away processes because the Windows scheduler dos not execute a process always on the same CPU -> on your 8 CPU machine you will see 12.5 % usage on all CPU's if one process runs away.

Related

Perfino System Requirements

We're planning to evaluate and eventually potentially purchase perfino. I went quickly through the docs and cannot find the system requirements for the installation. Also I cannot find it's compatibility with JBoss 7.1. Can you provide details please?
There are no hard system requirements for disk space, it depends on the amount of business transactions that you're recording. All data will be consolidated, so the database reaches a maximum size after a while, but it's not possible to say what that size will be. Consolidation times can be configured in the general settings.
There are also no hard system requirements for CPU and physical memory. A low-end machine will have no problems monitoring 100 JVMs, but the exact details again depend on the amount of monitored business transactions.
JBoss 7.1 is supported. "Supported" means that web service and EJB calls can be tracked between JVMs, otherwise all application servers work with perfino.
I haven't found any official system requirements, but this is what we figured out experimentally.
We collect about 10,000 transactions a minute from 8 JVMs. We have a lot of distinct and long SQL queries. We use AWS machine with 2 VCPUs and 8GB RAM.
When the Perfino GUI is not being used, the CPU load is low. However, for the GUI to work properly, we had to modify perfino_service.vmoptions:
-Xmx6000m. Before that we had experienced multiple OutOfMemoryError in Perfino when filtering in the transactions view. After changing the memory settings, the GUI is running fine.
This means that you need a machine with about 8GB RAM. I guess this depends on the number of distinct transactions you collect. Our limit is high, at 30,000.
After 6 weeks of usage, there's 7GB of files in the perfino directory. Perfino can clear old recordings after a configurable time.

Best setting for HTTPJVMMaxHeapSize in Domino 8.5.3 64 Bit

I am trying to find a definitive answer as to what the best setting is for the JVM Heap Size in Domino 8.5.3 FP4 - 64 Bit for Windows.
I know that by default it's set to 1024M. Some web sites have suggested that it's recommended to be 1G / 1024M - but that's the default setting so it that as good as it gets?
Other sites have said 25% of available RAM.
My Domino server has 12GB RAM available. It's currently got HTTPJVMMaxHeapSize = 1024M and Task Manager tells me that around 7GB of RAM is in use and nhttp.exe is using around 1.1GB. I want to increase this Heap to 2GB or 3GB if possible - will there be any issues doing to?
I'm running Windows Server 2008 R2 Standard Edition.
Speaking from an XPages perspective:
1 - Understand the dynamics of the working set and hardware
That is, understand what the application code, server runtime, and hardware profile is doing when processing a given working set (ie: the XPages application(s) within the server). Is the application coded in a non-optimal manner in terms of lifecycle execution and memory usage? Is the application making use of memory or disk persistence for component tree serialization? Is the server assigned an adequate amount of JVM memory? Is the hardware providing enough CPU and memory?
2 - Profile and monitor the working set with upper limit loads
To fully answer some of the questions in #1, detailed performance and scalability profiling must be carried out using tools like the XPages Toolbox and Eclipse Memory Analyzer. Furthermore, test the working set using Rational Performance Tester (or some other performance testing tool) to mimic real life concurrent workloads in a test environment. This allows you to set up a test environment where you can hit your application with (n) number of concurrent users using automation and collect that all invaluable data on health etc.
3 - Analyse the profile information to identify bottlenecks within the working set
Remember your working set can be one or more applications. Each doing something different, and having different load requirements. Be specific about the task at hand - do you want to tune the system more generally for all applications (for an average scale) or do you want the server to be fully optimized for a specific application (for a targeted scale)?
4 - Optimize the working set where applicable
Get in and make changes to the XSP / Java / ServerSide JavaScript code where applicable - use your knowledge of the XPages Request Processing Lifecycle and also look out for those hungry JVM memory consumers under high end load scenarios. Always favor disk persistence (disk storage is cheaper than RAM!) and code your custom Java objects and Managed Beans accordingly to cope with serialization and restoration. And make the trade-offs between function and speed in these scenario's where high scalability ends up burning CPU... a smarter UX with targetted functions / actions etc.
5 - Scale the hardware where applicable
Be prepared to increase cores, clock speeds, RAM, disk storage based on the needs of the working set - a cyclic approach to profiling, monitoring, and optimizing the working set will shed more and more light on the suitability of the hardware as this process evolves.
5 - Repeat from #2 until the working set and hardware performs and scales to the expected requirements / load expectancy of the system
I would recommend to set server's notes.ini param JavaVerboseGC=1 with console output into console.log in IBM_TECHNICAL_SUPPORT directory. After some time, take that log file and use IBM Support Assistant with tool IBM Monitoring and Diagnostic Tools for Java - Garbage Collection and Memory Visualizer (GCMV).
This could help how to interpret collected data: http://www-01.ibm.com/support/docview.wss?uid=swg27013824&aid=1.
You definitely should do that if you get OutOfMemoryException.
Heap size set too high can lead to fewer GC runs but taking longer time - server will periodically "hang" for very short time. So too high setting is not recommended.

Mirrored queue performance factors

We operate two dual-node brokers, each broker having quite different queues and workloads. Each box has 24 cores (H/T) worth of Xeon E5645 # 2.4GHz with 48GB RAM, connected by Gigabit LAN with ~150μs latency, running RHEL 5.6, RabbitMQ 3.1, Erlang R16B with HiPE off. We've tried with HiPE on but it made no noticeable performance impact, and was very crashy.
We appear to have hit a ceiling for our message rates of between 1,000/s and 1,400/s both in and out. This is broker-wide, not per-queue. Adding more consumers doesn't improve throughput overall, just gives that particular queue a bigger slice of this apparent "pool" of resource.
Every queue is mirrored across the two nodes that make up the broker. Our publishers and consumers connect equally to both nodes in a persistant way. We notice an ADSL-like asymmetry in the rates too; if we manage to publish a high rate of messages the deliver rate drops to high double digits. Testing with an un-mirrored queue has much higher throughput, as expected. Queues and Exchanges are durable, messages are not persistent.
We'd like to know what we can do to improve the situation. The CPU on the box is fine, beam takes a core and a half for 1 process, then another 80% each of two cores for another couple of processes. The rest of the box is essentially idle. We are using ~20GB of RAM in userland with system cache filling the rest. IO rates are fine. Network is fine.
Is there any Erlang/OTP tuning we can do? delegate_count is the default 16, could someone explain what this does in a bit more detail please?
This is difficult to answer without knowing more about how your producers and consumers are configured, which client library you're using and so on. As discussed on irc (http://dev.rabbitmq.com/irclog/index.php?date=2013-05-22) a minute ago, I'd suggest you attempt to reproduce the topology using the MulticastMain java load test tool that ships with the RabbitMQ java client. You can configure multiple producers/consumers, message sizes and so on. I can certainly get 5Khz out of a two-node cluster with HA on my desktop, so this may be a client (or application code) related issue.

Passenger server upgrade: Processor (CPU) Cores VS Ram?

I went through documentation of Passenger to find out how many application instances it can run with respect to hardware configuration. Documentation only talks about RAM
The optimal value depends on your system’s hardware and the server’s average load. You should experiment with different values. But generally speaking, the value should be at least equal to the number of CPUs (or CPU cores) that you have. If your system has 2 GB of RAM, then we recommend a value of 30. If your system is a Virtual Private Server (VPS) and has about 256 MB RAM, and is also running other services such as MySQL, then we recommend a value of 2.
It says minimum value can be number of CPU/CPU Cores we have. I have a VPS with one VCPU & 1GB RAM & my service provider has an option to just upgrade the RAM. I'm wondering how far I can just keep upgrading only RAM? How important it is to upgrade number of CPUs?
Quick Answer
Depends on what resources are the bottleneck for your app.
Long answer
You'll need to factor in a few things:
How much CPU time does your app need?
How much RAM does any given instance of your app use at peak load?
Does your app spend a lot of time doing IO intensive tasks? (ie: db and file reads/writes, network communication)
There can be other things to factor in, but your bottlenecks will probably be one of the above. If RAM is your main bottleneck, by all means use your newly available RAM. However, if it turns out that your app is being slowed down by CPU availability or flooded IO, no amount of RAM is going to speed things up.
On the topic of CPU cores; my understanding is that the main Apache process that runs Passenger is a single threaded process. Apache spawns new threads to handle concurrency on an as-needed basis. Each additional CPU core theoretically allows you to spawn x*n threads, where x is the number of threads you can optimally run under a single CPU core and n is the number of CPU cores available to Apache.
Disclaimer: I'm not very well read on Passenger internals; though this logic usually holds true for other kinds of Apache configurations.

Web application Load Tests: What metrics to look at?

During a stress/load test of a ASP.NET app hosted in IIS, what should I be monitoring on the app server?
For example, the built in utility performance monitor in windows has a huge list of counters that I can monitor. But, I don't even know what half of these counters actually mean? I know I want to look at things like memory, processor, network....but it is pretty general.
How can I successfully find a problem area?
What counters some of you guys have used in the past?
These metrics we watch to determine if requests are being serviced promptly and the volume is scaling linearly with the applied load:
Queued Requests
Current Requests
Requests Executing
Requests Succeeded
Requests/sec
We will also watch these to look for application problems
Errors/sec
Unhandled Execution Errors/sec
To monitor the VM memory, we look at:
CLR Heap Size
CLR Generation 0, 1 & 2 Garbage collections
CLR Percent Time in GC
For locking conditions, we watch:
CLR Lock Contentions
CLR Lock Contention/sec
CLR Lock Contention Queue Length
Depending on the application we might look at others, like thread counts, but the above are the ones we look at most frequently.

Resources