I am building a web scraping application. It should scrape a complex web site with concurrent HttpWebRequests from a single host to a single target web server.
The application should run on Windows server 2008.
One single HttpWebRequest for data could take from 1 minute to 4 minutes to complete (because of long running db operations)
I should have at least 100 parallel requests to the target web server, but i have noticed that when i use more then 2-3 long-running requests i have big performance issues (request timeouts/hanging).
How many concurrent requests can i have in this scenario from a single host to a single target web server? can i use Thread Pools in the application to run parallel HttpWebRequests to the server? will i have any issues with the default outbound HTTP connection/requests limits? what about Request timeouts when i reach outbound connection limits? what would be the best setup for my scenario?
Any help would be appreciated.
Thanks
By default, HTTP protocol limits the user agent to 2 concurrent connections per HTTP/1.1 server.That is the limit you are hitting.
Increase the limit by setting
ServicePointManager.DefaultConnectionLimit.
You can also set it per servicepoint, by setting
ServicePointManager.GetServicePoint(url).ConnectionLimit
Related
I have a Rails front-end server, which receives multiple requests from users, then send these requests to backend server.
Backend server processes requests asynchronously and notifies front-end server when it finishes each of the requests.
I use Redis pub/sub to communicate between these two servers. In particular, for each request coming from users, I create a new Redis instance that subscribes to the single channel (say, scoring_channel).
However, if I have 100 users making requests at the same time, each of the Redis subscribers will hold one thread.
Does this affect my server performance? If I have a constraint on maximum number of threads (e.g., Heroku allows max 256 threads), how should I avoid this issue?
This would not affect server performance since redis never blocked by pub/sub.
You should use non-blocking API in client side instead of blocking version to decrease number of threads.
I'm trying to scale up an app server to process over 20,000 requests per minute.
When I stress-test the requests, most requests are easily handling 20,000 RPM or more.
But, requests that need to make an external HTTP request (eg, Facebook Login) bring the server down to a crawl (3,000 RPM).
I conceptually understand the limitations of my current environment -- 3 load-balanced servers with 4 unicorn workers per server can only handle 12 requests at a time, even if all of them are waiting on HTTP requests.
What are my options for scaling this better? I'd like to handle many more connections at once.
Possible solutions as I understand it:
Brute force: use more unicorn workers (ie, more RAM) and more servers.
Push all the blocking operations into background/worker processes to free up the web processes. Clients will need to poll periodically to find when their request has completed.
Move to Puma instead of Unicorn (and probably to Rubinius from MRI), so that I can use threads instead of processes -- which may(??) improve memory usage per connection, and therefore allow the number of workers to be increased.
Fundamentally, what I'm looking for is: Is there a better way to increase the number of blocked/queued requests a single worker can handle so that I can increase the number of connections per server?
For example, I've heard discussion of using Thin with EventMachine. Does this open up the possibility of a Rails worker that can put down the web request it's currently working on (because that one is waiting on an external server) and then picks up another request while it's waiting? If so, is this a worthwhile avenue to pursue for performance compared with Unicorn and Puma? (Does it strongly depend on the runtime activities of the app?)
Unicorn is a single-threaded, multi-process synchronous app server. It's not a good match for this kind of processing.
It sounds like your application is I/O bound. This argues for an event-oriented daemon to process your requests.
I'd recommend trying EventMachine and the em-http-request and em-http-server.
This will allow you to service both incoming requests to the http server and outgoing HTTP service calls asynchronously.
I am trying to do some load testing and I was told that as parameters for testing, I should include both the number of concurrent requests and the number of concurrent connections. I really don't understand how there can be multiple requests on a given connection. When a client requests a webpage from a server, it first opens a connection, sends a request and gets a reponse and then closes a connection. What am I missing here?
UPDATE:
I meant to ask how it was possible for a single connection to have multiple requests concurrently (meaning simultaneously.) Otherwise, what would be the point of measuring both concurrent requests and concurrent connections? Would counting both of them be helpful in knowing how many connections are idle at a time? I realize that a single connection can handle more than one request consecutively, sorry for the confusion.
HTTP supports a feature called pipelining, which allows the browser to send multiple requests to the server over a single connection without waiting for the responses. The server must support this. IIRC, the server has to send a specific response to the request that indicates "yeah, I'll answer this request, and you can go ahead and send other requests while you're waiting". Last time I looked (many years ago), Firefox was the only browser that supported pipelining and it was turned off by default.
It is also worth noting that even without pipelining, concurrent connections is not equal to concurrent requests, because you can have open connections that are currently idle (no requests pending).
A server may keep a single connection open to serve multiple requests. See http://en.wikipedia.org/wiki/HTTP_persistent_connection. It describes HTTP persistent (also called keep-alive) connections. The idea is that if you make multiple requests, it removes some of the overhead of setting up and tearing down a new connection.
I believe the max PUT requests to Amazon's Simple DB is 300?
What happens when I throw 500 or 1,000 requests to it? Is it queued on the Amazon side, do I get 504's or should I build my own queuing server on EC2?
The max request volume is not a fixed number, but a combination of factors. There is a per-domain throttling policy but there seems to be some room for bursting requests before throttling kicks in. Also, every SimpleDB node handles many domains and every domain is handled by multiple nodes. The load on the node handling your request also contributes to your max request volume. So you can get higher throughput (in general) during off-peak hours.
If you send more requests than SimpleDB is willing or able to service, you will get back a 503 HTTP code. 503 Service unavailable responses are business as usual and should be retried. There is no request queuing going on within SimpleDB.
If you want to get the absolute max available throughput you have to be able to (or have a SimpleDB client that can) micro manage your request transmission rate. When the 503 response rate reaches about 10% you have to back off your request volume and subsequently build it back up. Also, spreading the requests across multiple domains is the primary means of scaling.
I wouldn't recommend building your own queuing server on EC2. I would try to get SimpleDB to handle the request volume directly. An extra layer could smooth things out, but it won't let you handle higher load.
I would use the work done at Netflix as an inspiration for high throughput writes:
http://practicalcloudcomputing.com/post/313922691/5-steps-simpledb-performance
Sorry if this might seem obvious. I've monitored that a web request on my Rails app uses 30-33% of CPU every time. For example, if I load a web page, then 30% of CPU is used. Does that mean that my box can only handle 3 concurrent web requests, and will stall if there are more than 3 web requests (i.e. I'll get a 100% CPU)?
If so, does that also mean that if I want to handle more than 3 concurrent web requests, then I'll have to get more servers to handle the load using a load balancer? (e.g. to handle 6 concurrent web requests, I'll need 2 servers; for 9 concurrent requests, I'll need 3 servers; for 12, I'll need 4 servers -- and so on?)
I think you should start with load tests. I wouldn't trust manual testing that much.
Load tests tell you how long the response takes for each client, and how many clients
simply time-out.
Also you will be able to measure the improvements objectively for any changes that you make.
Look at ab, or httperf; there are many other tools available.
Stephan
Your Apache or Nginx in front of the Passenger will queue requests until a Passenger worker becomes available. You can limit the number of concurrent workers so your server never stalls (but new visitors will have to wait longer until it's their turn).
It's difficult to tell based on this information. It depends very much on the web server stack you're using and which environment you're running. Different servers (Mongrel, Webrick, Apache using various mechanisms, Unicorn) all have different memory characteristics. Different environments (development vs. test vs. production) all exhibit radically different memory usage characteristics.