Guidance needed - how to track nginx-docker-rails performance issue

Guidance needed - how to track nginx-docker-rails performance issue - ruby-on-rails

My Rails application is deployed on Amazon elastic beanstalk using Docker. Web requests flow into an nginx web server that forwards them to the thin rails server residing in docker. Somewhere along the way there's a bottleneck. Once every 50 requests (or so) I see nginx reports serving time which is x40 higher than the time the rails thin server reports.
here's an example:
NGINX (7490ms):
146.185.56.206 - silo [18/Mar/2015:13:35:55 +0000] "GET /needs/117.json?iPhone HTTP/1.1" 200 2114 "-" "Mozilla/5.0 (Macintosh;
Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko)
Chrome/40.0.2214.115 Safari/537.36" 7.490 7.490 .
THIN (rails server): 171ms
2015-03-18 13:35:48.349 [INFO] method=GET path=/needs/117.json
format=json controller=needs action=show status=200
duration=171.96 view=109.73 db=29.20 time=2015-03-18 13:35:47
Can anyone supply some guidance how to troubleshoot such a situation? How do I find the source of the response time difference? I guess it could be either nginx or docker or thin or linux.

It sounds like one of the thin processes is under load doing a heavy task and nginx is still sending requests to the busy one. If there were a problem of queuing in Thin, the request would take a short time to be processed itself, but longer to get to the top of the queue. So first, check others requests before or around that request.
Second, if you are using upstream to serve (http://nginx.org/en/docs/http/ngx_http_upstream_module.html), apparently you could get among others $upstream_response_time and try to log it.
Third, you could also try to reproduce a similar setup in dev/qa and try a stress test. If you manage to reproduce it consistently, you could see number of request on each queue (i.e. http://comments.gmane.org/gmane.comp.lang.ruby.unicorn.general/848).

Related

AWS ALB returns 502 and request doesnt reach targets

We have an ALB load balancing 2 targets. But we are experiencing 502 responses from ALB quite often, that interrupts UI to throw an error. Our application is UI based running on node js and uses socket.io.
Sample log from ALB access log.
https 2019-06-10T09:29:46.987095Z app/DES-G1-USE4-ext-elb/7c8fddfc050d66f6 184.51.199.55:55418 10.72.72.155:8888 0.000 8.697 -1 502 - 876 485 "GET https://designer-use4.genesyscloud.com:443/socket.io/?EIO=3&transport=polling&t=Mj0kC0w HTTP/1.1" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:65.0) Gecko/20100101 Firefox/65.0" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2 arn:aws:elasticloadbalancing:us-east-1:361190373704:targetgroup/DES-G1-USE4-tg-desg/4cc8feabb9ee8f89 "Root=1-5cfe2302-b19c6d5059c2fc6096e048e0" "-" "session-reused" 0 2019-06-10T09:29:38.289000Z "forward" "-" "-"
Here -1 502 - means that request is forwarded to the backend by ELB, but the target is not responded. And the connection is b/w ELB and target is closed somehow. As per https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-access-logs.html.
In our application log, there is no trace of this request reached our application.
Please help us to debug this issue from ALB.

Make sure that your application is responding with a 200-399 response code on the / path. The behavior that you are mentioning is often because the Load Balancer Health Check is not able to verify that your application is running correctly. Also, make sure that your server is able to respond on HTTP that path ( HTTP/2 is not yet supported ).

Potential causes of this:
firewall: make sure security groups of your application server and ELB are able to connect to each other on the designated port.
health check: the load balancer will kick an application server that crashes a few times. Make sure that the application server is attached and showing as healthy on the dashboard.

SVN Server Not Responding to Write Requests

I am in the process of trying to set up an SVN repo using an apache web server. I was able to get the repo created and configured without too many problems. I can reach the repo via the browser, so I think the apache configuration is correct. The problem comes when I try to do the initial commit. When I run the commit command in the terminal, it hangs for several minutes before returning svn: E175012: Connection timed out. The initial commit is a single file, less than 100kb. Even more strange, after the command times out, it seems to create an HTTPd process on my system that uses 90% of the CPU.
I did some research to see if I could solve the problem myself, but so far nothing has worked. I was able to use Charles Proxy to monitor the HTTP requests and it looks like the svn client is sending the POST but it is never receiving a response from the server. After the default timeout (10 minutes) the client gives up and displays the timeout error.
I also tried setting up the repo using SvnServe instead of apache. I was able to read and write to the repo using svn://. However, the code I am working on expects to communicate with the repo over HTTP, so I still need to figure out what the problem is with apache.
Does anyone know what could be causing this issue? Are there any other steps I can take to troubleshoot the problem for myself?
[Update]
I checked the logs for my apache server. Here is what I'm seeing when I run the commit:
_myip_ - - [28/Feb/2017:10:04:04 -0500] "OPTIONS /my/repo HTTP/1.1" 200 190 "-" "SVN/1.9.5 (x86_64-apple-darwin16.1.0) serf/1.3.9"
_myip_ - - [28/Feb/2017:10:04:04 -0500] "OPTIONS /my/repo HTTP/1.1" 200 97 "-" "SVN/1.9.5 (x86_64-apple-darwin16.1.0) serf/1.3.9"
[Update 2]
In an attempt to further narrow down the cause of this issue, I tried setting up a different apache server in a Linux virtual machine. That server worked perfectly, and I was even able to read/write to it from osx. So it would seem that the issue is something specific to the apache server on OSX.

Please try this.
$ sudo chmod -R 775 /var/lib/svn
Reference URL-: https://gotechnies.com/setup-svn-server-ubuntu/

Would Rails' production.log log an HTTP 408 request?

I'm running some Rails servers behind unicorn, which is behind nginx.
I can see a handful of POST requests with a status of 408 in nginx's access.log.
123.45.67.890 - - [17/Mar/2016:01:23:45 +1100] "POST /collections/ABC/items/DEFGH HTTP/1.1" 408 0 "http://catalog.paradisec.org.au/collections/ABC/items/DEFGH/edit" "MY FAVOURITE BROWSER"
But I can't see anything from the same time in unicorn.log (which doesn't log an awful lot of stuff) or production.log (which logs a fair amount of stuff).
Should Rails be logging anything in a HTTP 408 scenario?

production.log only logs requests which actually reach rails. It seems to me, that a 408 is caused by nginx. This can happen if your unicorn workers are busy. So in this case the request neither reached unicorn nor rails.

3x thin servers in apache, but 2 in config file - what would thin do?

I just noticed an issue on our production server whereby the apache balancer was configured thusly:
<Proxy balancer://thin_cluster>
BalancerMember http://127.0.0.1:6000
BalancerMember http://127.0.0.1:6001
BalancerMember http://127.0.0.1:6002
ProxySet lbmethod=bybusyness maxattempts=1 timeout=30
</Proxy>
But the thin config file only specified 2 servers;
thin.yml (condensed for brevity)
address: 127.0.0.1
port: 6000
servers: 2 # <-- wrong!!
The number of thin servers was increased from 2 to 3 about 6 months ago, but whoever increased it forgot to increase the servers count in the thin.yml file (they only did it in the apache config file). The reason I started looking into this is that it had been noticed that every third request to the application was slow. I'm assuming this is why.
The question I have is: What would thin actually do under these conditions? Why did the application still work? Surely every third request would have full on died rather than "coped with this situation".
Thanks in advance.

Thin doesn't know or care about the Apache configuration. It only adheres to its own config and will only spawn 2 servers as a result.
The reason every third request was kinda slow is probably due to Apache rerouting the request. Since the two thin servers are using ports 6000 and 6001 the reference from Apache to port 6002 cannot reach a server - the port is simply not used by any.
Apache still tries sending the request there, because it also doesn't know if there's a server behind that address/port. It then waited for a timeout (a few seconds?) since no response was given and then rerouted the request to one of the other ports (6000 or 6001).
Apache doesn't "save" the unreachable server because it might just be a temporary outage. You can probably change this behavior with some settings (at least that is possible in Nginx)
You should either remove the third port definition in Apache or add another thin server.

Passenger error when lots of concurrent calls

While doing a load test I found passenger throwing below error at first when lots of concurrent requests hit server. And, client side it gives 502 error code. However, after some requests say 1000- 2000 requests its works fine.
2013/07/23 11:22:46 [error] 14131#0: *50226 connect() to /tmp/passenger.1.0.14107/generation-
0/request failed (11: Resource temporarily unavailable) while connecting to upstream, client: 10.251.18.167, server: 10.*, request: "GET /home HTTP/1.0", upstream: "passenger:/tmp/passenger.1.0.14107/generation-0/request:", host: hostname
Server Details.
Passenger 4.0.10
ruby 1.9.3/2.0
Server Ec2 m1.xlarge
64-bit 4core 15gb
Ubuntu 12:24 LTS
Its a web server which servers dynamic webpages for rails framework
Can somebody suggest what the issue might be?

A "temporarily unavailable" error in that context means the socket backlog is full. That can happen if your app cannot handle your requests fast enough. What happens is that the queue grows and grows, until it's full, and then you start getting those errors. In the mean time your users' response times grow and grow until they get an error. This is probably an application-level problem so it's best to try starting there. Try figuring out why your app is slow, at which request it is slow, and fix that. Or maybe you need to scale to more servers.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart