What is the difference between a concurrent connection and a concurrent request? - load-testing

I am trying to do some load testing and I was told that as parameters for testing, I should include both the number of concurrent requests and the number of concurrent connections. I really don't understand how there can be multiple requests on a given connection. When a client requests a webpage from a server, it first opens a connection, sends a request and gets a reponse and then closes a connection. What am I missing here?
UPDATE:
I meant to ask how it was possible for a single connection to have multiple requests concurrently (meaning simultaneously.) Otherwise, what would be the point of measuring both concurrent requests and concurrent connections? Would counting both of them be helpful in knowing how many connections are idle at a time? I realize that a single connection can handle more than one request consecutively, sorry for the confusion.

HTTP supports a feature called pipelining, which allows the browser to send multiple requests to the server over a single connection without waiting for the responses. The server must support this. IIRC, the server has to send a specific response to the request that indicates "yeah, I'll answer this request, and you can go ahead and send other requests while you're waiting". Last time I looked (many years ago), Firefox was the only browser that supported pipelining and it was turned off by default.
It is also worth noting that even without pipelining, concurrent connections is not equal to concurrent requests, because you can have open connections that are currently idle (no requests pending).

A server may keep a single connection open to serve multiple requests. See http://en.wikipedia.org/wiki/HTTP_persistent_connection. It describes HTTP persistent (also called keep-alive) connections. The idea is that if you make multiple requests, it removes some of the overhead of setting up and tearing down a new connection.

Related

Is it possible for a TIdServerContext to be used more than once at the same time?

Now I know that on an Indy HTTP Server (TIdHTTPServer), the TIdServerContext is re-used for multiple requests incoming from a particular client. However, while designing how things work, I need to know whether it is possible that multiple requests could overlap each other using the same context class?
For example, imagine typing a URL in a browser and pressing refresh over and over. What I see happen is multiple context classes get created. However, I'm afraid that somewhere, the same context instance might be used to handle two requests at the same time.
Is it possible for that to happen? Or is it safe to say that one instance will never process multiple requests at the same time? I'm almost sure it's the latter, considering the context is its own thread, but I need to be sure.
Now I know that on an Indy HTTP Server (TIdHTTPServer), the TIdServerContext is re-used for multiple requests incoming from a particular client.
Only if the client and server are using HTTP keep-alives so multiple requests can be sent over a single TCP connection. Otherwise, the connection is closed after each response.
However, while designing how things work, I need to know whether it is possible that multiple requests could overlap each other using the same context class?
No. Indy context objects are created on a per-connection basis, they are run on a single thread at a time, and HTTP 1.1 and earlier requests are processed one at a time per connection (HTTP 2 allows multiple requests in parallel, but Indy does not implement HTTP 2 at this time).
For example, imagine typing a URL in a browser and pressing refresh over and over. What I see happen is multiple context classes get created
On a refresh, the browser is closing the current connection and creating a new one. Closing the connection is the only way to cancel a pending request that has not completed yet.
However, I'm afraid that somewhere, the same context instance might be used to handle two requests at the same time.
That is not possible.
Is it possible for that to happen?
No.
Or is it safe to say that one instance will never process multiple requests at the same time?
Yes. It may process multiple requests during its lifetime, but not in parallel.
I'm almost sure it's the latter, considering the context is its own thread
The context is not a thread. More accurately, the context represents a specific connection, which happens to be serviced by only one thread. Indy can re-use threads (if you assign a thread-pooling scheduler to the server), where a given thread may service multiple contexts during its lifetime. But Indy does not re-use a context for multiple connections.

How do I make HTTP requests in Rails while still servicing many requests per minute?

I'm trying to scale up an app server to process over 20,000 requests per minute.
When I stress-test the requests, most requests are easily handling 20,000 RPM or more.
But, requests that need to make an external HTTP request (eg, Facebook Login) bring the server down to a crawl (3,000 RPM).
I conceptually understand the limitations of my current environment -- 3 load-balanced servers with 4 unicorn workers per server can only handle 12 requests at a time, even if all of them are waiting on HTTP requests.
What are my options for scaling this better? I'd like to handle many more connections at once.
Possible solutions as I understand it:
Brute force: use more unicorn workers (ie, more RAM) and more servers.
Push all the blocking operations into background/worker processes to free up the web processes. Clients will need to poll periodically to find when their request has completed.
Move to Puma instead of Unicorn (and probably to Rubinius from MRI), so that I can use threads instead of processes -- which may(??) improve memory usage per connection, and therefore allow the number of workers to be increased.
Fundamentally, what I'm looking for is: Is there a better way to increase the number of blocked/queued requests a single worker can handle so that I can increase the number of connections per server?
For example, I've heard discussion of using Thin with EventMachine. Does this open up the possibility of a Rails worker that can put down the web request it's currently working on (because that one is waiting on an external server) and then picks up another request while it's waiting? If so, is this a worthwhile avenue to pursue for performance compared with Unicorn and Puma? (Does it strongly depend on the runtime activities of the app?)
Unicorn is a single-threaded, multi-process synchronous app server. It's not a good match for this kind of processing.
It sounds like your application is I/O bound. This argues for an event-oriented daemon to process your requests.
I'd recommend trying EventMachine and the em-http-request and em-http-server.
This will allow you to service both incoming requests to the http server and outgoing HTTP service calls asynchronously.

Timeout for a synchronous web-service call

I'm wondering : is it a non-sense to put a timeout on a synchronous web-service call ? I mean, if there is a risk that the server does not respond, I should use an asynchronous call instead, doesn't it ?
(I'm using Jersey)
Thanks !
I'd always advise to set a connection and read timeout on any and all outbound network requests as indefinitely waiting for an answer could eventually consume all your threads and make your app server unresponsive.
In my experience it's no unusual at all to have partners' WS requests not respond withing 60s (which is quite generous).
Handling read timeouts can be tricky for write operations though as you can't tell whether the other system eventually recorded the request or not. In such situation, your partner hopefully provides an idempotent API allowing you to retry at a later time without risk of duplicate execution. Otherwise it may require manual communication with your partner.

Deferring blocking Rails requests

I found a question that explains how Play Framework's await() mechanism works in 1.2. Essentially if you need to do something that will block for a measurable amount of time (e.g. make a slow external http request), you can suspend your request and free up that worker to work on a different request while it blocks. I am guessing once your blocking operation is finished, your request gets rescheduled for continued processing. This is different than scheduling the work on a background processor and then having the browser poll for completion, I want to block the browser but not the worker process.
Regardless of whether or not my assumptions about Play are true to the letter, is there a technique for doing this in a Rails application? I guess one could consider this a form of long polling, but I didn't find much advice on that subject other than "use node".
I had a similar question about long requests that blocks workers to take other requests. It's a problem with all the web applications. Even Node.js may not be able to solve the problem of consuming too much time on a worker, or could simply run out of memory.
A web application I worked on has a web interface that sends request to Rails REST API, then the Rails controller has to request a Node REST API that runs heavy time consuming task to get some data back. A request from Rails to Node.js could take 2-3 minutes.
We are still trying to find different approaches, but maybe the following could work for you or you can adapt some of the ideas, I would love to get some feedbacks too:
Frontend make a request to Rails API with a generated identifier [A] within the same session. (this identifier helps to identify previous request from the same user session).
Rails API proxies the frontend request and the identifier [A] to the Node.js service
Node.js service add this job to a queue system(e.g. RabbitMQ, or Redis), the message contains the identifier [A]. (Here you should think about based on your own scenario, also assuming a system will consume the queue job and save the results)
If the same request send again, depending on the requirement, you can either kill the current job with the same identifier[A] and schedule/queue the lastest request, or ignore the latest request waiting for the first one to complete, or other decision fits your business requirement.
The Front-end can send interval REST request to check if the data processing with identifier [A] has completed or not, then these requests are lightweight and fast.
Once Node.js completes the job, you can either use the message subscription system or waiting for the next coming check status Request and return the result to the frontend.
You can also use a load balancer, e.g. Amazon load balancer, Haproxy. 37signals has a blog post and video about using Haproxy to off loading some long running requests that does not block shorter ones.
Github uses similar strategy to handle long requests for generating commits/contribution visualisation. They also set a limit of pulling time. If the time is too long, Github display a message saying it's too long and it has been cancelled.
YouTube has a nice message for longer queued tasks: "This is taking longer than expected. Your video has been queued and will be processed as soon as possible."
I think this is just one solution. You can also take a look EventMachine gem, that helps to improve the performance, handler parallel or async request.
Since this kind of problem may involve one or more services. Think about possibility of improving performance between those services(e.g. database, network, message protocol etc..), if caching may help, try out caching frequent requests, or pre-calculate results.

How to handle pending connections to a server that is designed to handle a limited number of connections at a time

I wonder what is the best approach to handle the following scenario:
I have a server that is designed to handle only 10 connections at a time, during which the server is busy with interacting with the clients. However, while the the server is busy, there may be new clients who want to connect (as part of the next 10 connections that the server is going to accept). The server should only accept the new connections after it finishes with all previous 10 agents.
Now, I would like to have an automatic way for the pending clients to wait and connect to the server once it becomes available (i.e. finished with the previous 10 clients).
So far, I can think of two approaches: 1. have a file watch on the client side, so that the client will watch for a file written by the server. When the server finishes with 10 clients, it will write the file, and the pending clients will know it's time to connect; 2. make the pending clients try to connect the server every 5 - 10 secs or so until success, and the server will return a message indicating whether it is ready.
Any other suggestion would be much welcome. Thanks.
Of the two options you provide, I am inclined toward the 2nd option of "Pinging" the server. I think it is more complicated to have the server write a file to the client triggering another attempt.
I would think that you should be able to have the client waiting and simply send a READY signal. Keep a running Queue of connection requests (from Socket.Connection.EndPoint, I believe). When one socket completes, accept the next Socket off the queue.

Resources