Perfino server running out of memory daily - perfino

We've been running Perfino on non-prod hosts successfully for the past year.
Recently we added 9 new DEV hosts and now the Perfino service is running out of memory and crashing every day.
We've thrown absurd amounts of memory at the Perfino service but it doesn't seem to help.
It's currently monitoring 955 VMs across 11 hosts.
Ideas on how to proceed?
Many thanks,
David

Related

Rails 5 app on EC2 keeps shutting down every few days.

We upgraded form a t2.micro to a t2.small EC2 instance when I noticed Rails app was shutting down, and I'd have to restart Unicorn.
Since EC2 doesn't include memory utilization out of the box, I installed perl scripts per AWS docs, and see that we hit 87% memory utilization in the last hour, even though we have a tiny amount of traffic.
What are the main issues that could be causing this?

hardware requirements for PlasticSCM server

I'm evaluating PlasticSCM on a VMWare Machine with 4GB RAM and 4Core CPU. Since I've ported our trunk into the server (about 6GB of Data), the service ran out of memory (started swapping). I've increased the the VM RAM to 6GB This is actually more than I'd like to load the host system with, since I've also got VMs for PlasticSCM Client, TeamCity Server, TeamCity Agent.
I was trying to find a spec with details on hardware requirement for running PlasticSCM server which incorporates scaling. So far, I've only found the minimum requirement (512MB RAM etc.) and the system information of your heavy load and scale test. As far as I can see, it's all about RAM. :)
Anyway is there a detailed spec with recommendations for the hardware being used?
P.S.: Of course, in case of switching to Plastic we'd run the service on a real machine instead of VM.

Tomcat6 constant crashes

We had 5 applications over a linode(Ubuntu 10.04 32 bit) of 1G RAM. Recently we moved one of the applications out of that linode to another of 512M. The application is built on Java EE and was working pretty stable on the old server. On the new server however tomcat(Version 6 on both servers) crashes every now and then without any logs. The only difference on the new server is that we are using nginx as the web server against apache2 on the old and the new server uses Ubuntu 12, 64 bit. There is no reason to doubt a memory leak because the application was behaving well on the old server. Are there any tomcat optmizations to be done to prevent such kind of crashes. I doubt if the reason is load due to traffic(since the new server has lower RAM) as well, because even in the middle of the night when there are just about 10 concurrent users, tomcat still crashes. Any insight towards the problem would be appreciated.
I checked the RAM usage and tomcat constantly occupies about 60% of the memory and all of a sudden crashes and goes to 0. I have used a bash script and run it as a cron job every 5 minutes on the new server to check if tomcat is down and restart it automatically. Could that be causing the issue? The script is mentioned below
if [ "$(/etc/init.d/tomcat6 status)" == " * Tomcat servlet engine is not running." ]; then /etc/init.d/tomcat6 start; fi
Please note, I am not an expert at server configuration. I can just about configure a server to install and get required things running.
You moved your app from a 32-bit Hotspot JVM to a 64-bit Openjdk JVM. And on the new server you have less RAM.
First I would try to install the same 32bit Hotspot JVM on the new server,and see if the crashes still occur. If they do, I would start adding more memory, and adjust xmx etc' accordingly.
I upgraded the RAM to 1GB, downgraded to Ubuntu 12, 32 Bit, reinstalled JVM 32 bit and now the server works like a charm. I was unable to zero down on the root cause, but the most possible cause should be either the 64bit OS or the 64 bit JVM eating too much memory. Thanks for your help.

aws memory high usage

Recently I configured my instance into a micro environment in EC2 with glassfish and mysql in windows..
I deployed my war and i was able to access my site through http.
I changed my application and redeployed the war and it also worked.
When I was about to redeploy the war for 4th or 5th time, the application got deployed, I saw the message in the log file. But I was unable to access the site through http.
Then I tried the command "asadmin list-applications" and I got the following message.
Error occurred during initialization of VM
Could not reserve enough space for object heap
After that I was not able to connect to my instance through RDP and I had to reboot, I was able to access it again after that. I started the servers again (glassfish mysql), but no luck.
I noticed that the memory usage is around 90% or more. CPU isage is low.
now I can not access my site through http. what shall i do ?
Thanks in advance !
Honestly, there are a couple issues working against you here:
1) Windows requires FAR more RAM than Ubuntu to run at a minimum decent level.
2) GlassFish has a much larger footprint than Tomcat or Jetty.
Is there any particular reason you need Windows? Like is there a specific need that your server run some executables for file processing or something like that outside the JVM? Most would agree that Linux (Ubuntu or other) will give you much better results in performance and stability to run an App Server like GlassFish in any environment.

Resque: Slow worker startup and Forking

I'm currently moving my application from a Linode setup to EC2. Redis is currently installed on a remote instance with various worker instances interacting with the queue. Thats all going fantastic.
My problem is with the amount of time it takes for a worker to be 'instantiated' and slow forking. Starting a worker will usually take between 30 seconds and a minute(from god.rb starting the worker rake task and the worker actively starting work on the queue). I could live with that, but I've not experienced such a wait time on my current Linode production box so I believe its one of my symptoms to a bigger problem. Next issue is that jobs that took a second or less in my previous environment now seem to take about 5 to 10 times longer..
I'm assuming this must be some sort of issue with my Ubuntu install on EC2? One notable difference is that I'm running REE 1.8.7-2010.01 in my new setup, and REE 1.8.6 on the old Linode boxes.
Anyone else experienced these issues?
It turns out I had overestimated the CPU power of an EC2 small instance. Moved my workers to a large instance and all is well.

Resources