How to diagnose slow startup time in Cloud Run containers? - ruby-on-rails

I am running some services with Google Cloud Run. While performance has been satisfactory, there's a recurrent issue with extremely slow startup time, which leads to occasionally dropped requests when new containers can't spin up in time.
Currently, with first gen execution environments and startup CPU boost enabled, Google's dashboard reports around 18 to 50 seconds of startup time. Image is based on ruby:3.0.2, and it runs a Ruby on Rails 6 application. In a development environment, startup (timed from run to container accepting requests) never seems to take more than 5 seconds.
I want to know what tools are available to diagnose this issue, and if there are any obvious pitfalls with my specific case that I might be missing.
I've tried playing around with the service's configuration options, to no avail. The biggest suspect is a startup bash script that handles migrations on the first boot, and asset compiling on development. However, I've tried building with an empty script, and the problem persists. I also think the container images might be too large (around 700Mb), but I haven't gotten around to slimming then down nor found evidence that this is the problem.

Related

Passenger processes stuck maxing CPU after hitting 100%

The Setup:
* Ubuntu 18.04 LTS
* Apache 2.4.29
* Passenger 6.0.16
* Ruby 2.3.8
* Rails 4.2.x
I have both staging and prod servers with the same setup on AWS EC2; they are both running the same kernel/build. I upgraded the Ruby/Rails version of my app from Ruby 2.1.x -> 2.3.8, and Rails 4.0 -> 4.2, first on staging then on production.
On staging, everything was working fine; pages were loading quickly and without issue. On prod, pages would start by loading quickly but pretty soon would degrade. The user CPU would max out at 99%+ eventually causing the app to go down and be unresponsive. The only solution was to restart Apache, roughly every 30min.
After a LOT of digging and testing, top -c showed that Passenger RubyApp would hit 100% CPU and soon after would stay "locked" at max CPU for each process, even if no one was using the site. I've been trying to change different settings both in Apache and Passenger but nothing seems to work. Effectively, as soon as we get a few people hitting the site in a particular way, ANY of the spun Passenger processes that hit 100% end up staying fairly high and either don't shut off or don't exit and burn CPU, as if there were some IO issue.
Right now Passenger and Apache configs are exactly the same on staging/prod and are the defaults.
Screenshots of the example top in prod with a few users using it.
And roughly same amount of people using on staging.
Staging looks far more accurate in terms of a Rails app -- I'd expect to see higher memory use than CPU. AWS Support was also baffled, as prod is on an XL and staging is on a Micro instance, and the AWS kernel versions were the same. Here's AWS monitoring around CPU usage... prod was updated on the 20th, but not a lot of people used it over the weekend, and really became a problem on Monday during working hours.
Any ideas of why this is happening on one server vs the other?? It's no particular request that causes it; it's literally any (or 2-3 requests coming in tandem) that will cause the CPU to spike to 100 and get stuck.
TIA.

node webpack hangs. How to debug?

I am trying to build ORO Platform js assets, using a non-docker environment, it works like a charm, but in Docker (either during Docker Build, or container execution) the building process stop and hangs with 100% CPU.
67% [0] building 1416/1470 modules 54 active ... ndles/orotask/sidebar_widgets/assigned_tasks/css/styles.scss
The building process does not necessarily hang on the exact same file. And also, the build seems to succeed on some occasion.
I've try to reduce to a minimum the process by removing Happy, tested with --max-old-space-size=4096, but no luck.
Sources : https://github.com/oroinc/platform/tree/master/build
How would you recommend debugging this ?
Thanks
There is a known issue when a NodeJs process hangs while you run it from the root user. As I know, there is no workaround for now. Consider using another user to build the assets.
If it's not the case, please review the Troubleshooting section in OroAssetBundle, that might help.

Docker could not start because I do not have enough memory. How to solve it?

I got into a HTML/CSS/JavaScript course and I need Docker Desktop installed an functionally on my laptop. The problem is that I can not start it because I do not have enough memory, the error is appearing every time when I try to start it. I have tried to solve it by lowering the settings of the Docker Engine, free up some memory with RAMMap and turn Windows to performance mode, but unfortunately the error is still here.
The laptop that I work on has only 2 GB of RAM. Is there a solution to start Docker?

aws memory high usage

Recently I configured my instance into a micro environment in EC2 with glassfish and mysql in windows..
I deployed my war and i was able to access my site through http.
I changed my application and redeployed the war and it also worked.
When I was about to redeploy the war for 4th or 5th time, the application got deployed, I saw the message in the log file. But I was unable to access the site through http.
Then I tried the command "asadmin list-applications" and I got the following message.
Error occurred during initialization of VM
Could not reserve enough space for object heap
After that I was not able to connect to my instance through RDP and I had to reboot, I was able to access it again after that. I started the servers again (glassfish mysql), but no luck.
I noticed that the memory usage is around 90% or more. CPU isage is low.
now I can not access my site through http. what shall i do ?
Thanks in advance !
Honestly, there are a couple issues working against you here:
1) Windows requires FAR more RAM than Ubuntu to run at a minimum decent level.
2) GlassFish has a much larger footprint than Tomcat or Jetty.
Is there any particular reason you need Windows? Like is there a specific need that your server run some executables for file processing or something like that outside the JVM? Most would agree that Linux (Ubuntu or other) will give you much better results in performance and stability to run an App Server like GlassFish in any environment.

Resque: Slow worker startup and Forking

I'm currently moving my application from a Linode setup to EC2. Redis is currently installed on a remote instance with various worker instances interacting with the queue. Thats all going fantastic.
My problem is with the amount of time it takes for a worker to be 'instantiated' and slow forking. Starting a worker will usually take between 30 seconds and a minute(from god.rb starting the worker rake task and the worker actively starting work on the queue). I could live with that, but I've not experienced such a wait time on my current Linode production box so I believe its one of my symptoms to a bigger problem. Next issue is that jobs that took a second or less in my previous environment now seem to take about 5 to 10 times longer..
I'm assuming this must be some sort of issue with my Ubuntu install on EC2? One notable difference is that I'm running REE 1.8.7-2010.01 in my new setup, and REE 1.8.6 on the old Linode boxes.
Anyone else experienced these issues?
It turns out I had overestimated the CPU power of an EC2 small instance. Moved my workers to a large instance and all is well.

Resources