My Rails app production is running on AWS EC2 instance, on Apache through Phusion Passenger. And I am facing different problems with it.
Sometimes this instance becomes unreachable via ssh and my rails application cannot be accessed via browser. Probably some memory issues. I have created swap memory, but it is not helping.
Sometimes my Rails app gets shut down, and application/current folder gets removed.
First one can be fixed only by stopping AWS instance and starting it again, and second one by deploying application.
Any suggestions on what can be causing it? Or even more important how I can fix that once and for all?
Related
The issue
I am using the same container (similar resources) on 2 projects -- production and staging. Both have custom domains setup with cloud flare DNS and are on the same region. Container build is done in a completely different project and IAM is used to handle the access to these containers. Both project services have 80 concurrency and 300 seconds time out for all 5 services.
All was working good 3 days back but from yesterday almost all cloud run services on staging (thankfully) started throwing 503 randomly and for most requests. Some services were not even deployed for a week. The same containers are running fine on production project, no issues.
Ruled out causes
anything to do with Cloudflare (I tried the URL cloud run gives it has the issue of 503)
anything with build or containers (I tried the demo hello world container with go - it has the issue too)
Resources: I tried giving it 1 GB ram and 2 cpus but the problem persisted
issues on deployment (deploy multiple branches - didn't work)
issue in code (just routed traffic to old 2-3 days old revision but still issue was there)
Issue on service level ( I used the same container to create a completely new service, it also had the issue)
Possible causes
something on cloud run or cloud run load balancer
may some env vars but that also doesn't seem to be the issue
Response Codes
I just ran a quick check with vegeta (30 secs with 10 rps) same container on staging and production for a static file path and below are the responses:
Staging
Production
If anyone has any insights on this it would help greatly.
Based on your explanation, I cannot understand what's going on. You explained what doesn't work but didn't point out what works (does your app run locally? are you able to run a hello world sample application?)
So I'll recommend some debugging tips.
If you're getting a HTTP 5xx status code, first, check your application's logs. Is it printing ANY logs? Is there logs of a request? Does your application have and deployed with "verbose" logging setting?
Try hitting your *.run.app domain directly. If it's not working, then it's not a domain or dns or cloudflare issue. Try debugging and/or redeploying your app. Deploy something that works first. If *.run.app domain works, then the issue is not in Cloud Run.
Make sure you aren't using Cloudflare in proxy mode (e.g. your DNS points to Cloud Run; not Cloudflare) as there's a known issue about certificate issuance/renewals when domains are behind Cloudflare, right now.
Beyond these, if a redeploy seems to solve your problem, maybe try redeploying. It could be very likely some configuration recently became different two different projects.
See Cloud Run Troubleshooting
https://cloud.google.com/run/docs/troubleshooting
Do you see 503 errors under high load?
The Cloud Run (fully managed) load balancer strives to distribute incoming requests over the necessary amount of container instances. However, if your container instances are using a lot of CPU to process requests, the container instances will not be able to process all of the requests, and some requests will be returned with a 503 error code.
To mitigate this, try lowering the concurrency. Start from concurrency = 1 and gradually increase it to find an acceptable value. Refer to Setting concurrency for more details.
I'm trying to host a spigot minecraft server on heroku using docker. I know that heroku doesn't really support tcp so I use ngrok(localhost tunneling) to get around this. The image is based on the official openjdk 8-jre image and starts spigot and ngrok and then gets the ngrok address and uploads it to a pastebin service called ix.io. Everything works fine when I run the docker image locally but when i try to run it on heroku it sais:
Error R10 (Boot timeout) -> Web process failed to bind to $PORT within 60 seconds of launch
Stopping process with SIGKILL
State changed from starting to crashed
the complete log can be found here: https://gist.githubusercontent.com/paperbenni/6c1f4567dbf02cda299230eeb3391fc0/raw/7832444ed358131c9c6c57e330baa62b74cd113e/heroko%2520docker%2520spigot%2520logs
What is going on here? Does there have to be some sort of web service using things like nginx which can be accessed from a web browser in order for the app to be considered valid?(long and probably wrong spelled sentence, I know) I don't really get what's going on here.
Sidenote: there are some memory errors in the logs. The container runs fine locally when limited to 512mb RAM, so maybe someone could help me out with that.
Check that your port is right and make sure that you still have heroku run time. That could be the error. Hope I helped! :)
EDIT: When you run your server, it runs through "dynos". These dynos can only have 512mb RAM. If you world uses more than that, consider making a smaller world or lower the RAM.
Lower RAM: https://www.spigotmc.org/threads/server-optimization-lowering-ram.10999/
It should be in properties xml file etc.
Heroku spins down containers for free accounts when the app isn't accessed for a day. For our system, deployed on Dokku, we have production, staging, as well as developer containers running the same app. Today I noticed a Dokku app hang indefinitely mid-deploy on our dev VM. After investigating, I discovered that the issue was due to insufficient VM memory. After I killed a few containers, the container started successfully. For reference, there are almost 60 containers deployed on our dev box now, but only about 5 of them are being actively used. Often, our devs deploy multiple versions of the same app when testing. Sometimes these apps are no longer needed (in which case we can simply remove them), but more often than not, they'll need to be accessed again a week or two later.
To save resources on our VMs, we would like to spin down dev containers, especially since there are likely to be multiple instances of the same app.
Is this possible with Dokku? If I simply stop containers that haven't been accessed for a while (using docker stop command), then the user accessing the app later will be greeted with a 404 page. What I would like to do instead is show the loading icon to the user until the container is spun up again.
simply with dokku commands this is not posible for the moment. maybe you can use ps:stop and try something like if you find a 502 error on nginx, you then try to run a shell script that start the application, but that will of course give the 502 error to the user the first time.
I have Amazon EB. with (Puma, Nginx) 64bit Amazon Linux 2014.09 v1.0.9 running Ruby 2.1 (Puma).
Suddenly when I deployed my project send the next error in my terminal:
ERROR: Timed out while waiting for command to Complete
Note: Before didn't happen.
I see the event in the console and this is the log:
Update environment operation is complete, but with command timeouts. Try increasing the timeout period. For more information, see troubleshooting documentation.
I'v already incrementing the time without success.
option_settings:
- namespace: aws:elasticbeanstalk:command
option_name: Timeout
value: 1800
The Health takes a long time to put it in green (aprox, 20 min), and then it takes other long time for updating the instance with the new changes (aprox, other 20 min), (I have only one instance).
How can I see other logs?
This seems like rather common problem with elasticbeanstalk. In short your EC2 instance is going haywired. What you can do is to terminate the EC2 instance on the EC2 dashboard and the loader balancer will start new instance and that may save your problem. To minimise any down time you may start the new instance first and then terminate your older instance. Just be wary that you will lose any ephemeral data and you may have to reinstall certain dependencies (if they are not in your ebextensions 0
Let me know if you need any more help. Do check out the aws ebs forum
Cheers,
biobirdman
The problem was the RAM in the instance, so I had to change that instance by other bigger.
I've got a few rails apps running under different vhosts on a single small EC2 instance. My automated deployment process for each involves running some rake tasks (migration, asset compilation, etc.), staging everything into a versioned directory and symlinking web root to it. I'm serving the apps with Apache + Passenger. Through this process (and the rebooting of passenger), I have ruby processes eating up 100% of CPU. I understand why this is happening, but I need a way to throttle these processes down so that all of the other apps on the instance aren't as significantly impacted as they currently are.
Don't know if you've already come across this. But it's there to make EC2 deployment more convenient. https://github.com/wr0ngway/rubber
There is also a Railscast on it at: http://railscasts.com/episodes/347-rubber-and-amazon-ec2
Hopefully, these two resources will help you somewhere.