Is it bad to use unicorn without nginx? why?

I read that unicorn is fast to serve static content, slow users, making redirects.
Why is better nginx+unicorn vs running unicorn only, and scale the number of unicorn workers when needed?
Do you have any numbers showing how much fast is nginx on each of these things(redirecting, proxying, serving static content)?

As Heroku DevCenter claims, Unicorn workers are vulnerable to slow clients.
Each worker is only able to process a single request, and if the client is not ready to accept the entire answer (aka "slow client"), the Unicorn worker is blocked on sending out the response and cannot handle the next one. Since each Unicorn worker takes up a substantial amount of RAM (again, see Heroku, it claims to handle 2-4 processes at 512 MiB RAM), you cannot rely on number of workers, since it's about the number of clients that can render your application inoperable by pretending to have slow connections.
When behind nginx, Unicorn is able to dump the entire answer into nginx's buffer and switch immediately to handling the next request.
That said, nginx with a single Unicorn worker behind is much more reliable than a bunch of Unicorn workers exposed directly.
NB: for the folks using ancient Rubies out there: if you'll be using a set of Unicorn workers, consider migrating to at least Ruby 2.0 to reduce RAM consumption by sharing common data across forked processes (ref).


Nginx + Unicorn : difference between both worker_processes

I have a rails app running on a Nginx + Unicorn environnement
I see that both have a "worker_processes" in their config file and I wonder what is the optimal configuration
If I have 4 cores, should I put 4 for both? Or 1 for nginx and 4 for unicorn?
(by the way, I am using sidekiq too, so what about sidekiq concurrency?)
Nginx is event based server. It means that 1 operation system (OS) process can manage very big number of connections. It's possible, because usual state of connections is wait. While connection waiting for other side or sending/receiving packet of data - nginx can process with other connections. One nginx worker can work with thousands or even tens of thousands connections. So, even worker_processes 1 can be enough.
More nginx's workers allow to use more CPU cores (that can be important if nginx is the main CPU eater). More workers also good if nginx doing lot of disk IO.
Resume: you can safe start from 1 worker and increase to number of CPU cores.
Worker of Unicorn is little bit different from nginx because one worker = one request. Number of unicorn workers show how many ruby processes will execute same time. And this number depends on your application.
For example, you application is CPU bound (doing some math only). In this case number of workers greater than number of CPU cores can cause problems.
But usual application work with some databases and sleep while wait for database answer. If our database placed on other server (request processing do not eat our CPU) - ruby will sleep and CPU idle. In this case we can increase number of workers to CPU*3... CPU*5 or even CPU*20 workers.
Resume: Best way to find this number - load testing of your real application. Set number of unicorn workers, start load testing with the same number of concurrency. If server feels good - increase number of workers and test again.
Concurrency of sidekiq similar to unicorn workers. If tasks is CPU bound - set number of treads close to number of CPU. If I/O bound - number of thread can be greater than number of CPU cores. Also, other tasks of this server is important (like unicorn). Just remember, that number of CPU cores do not change if you will run sidekiq on same server with unicorn :)
Resume: same as unicorn.
There is no absolute best answer. (If there was, the software would tune itself automatically.)
It all depends on your operating system, environment, processor, memory, discs, buffer cache of the OS, caching policy in nginx, hit rates, application, and probably many other factors, of what would be the best possible solution.
Which, not very surprisingly, is actually what the documentation of nginx says, too:
The optimal value depends on many factors including (but not limited to) the number of CPU cores, the number of hard disk drives that store data, and load pattern. When one is in doubt, setting it to the number of available CPU cores would be a good start (the value “auto” will try to autodetect it).
As for unicorn, a quick search for "worker_processes unicorn" reveals the following as the first hit:
worker_processes should be scaled to the number of processes your backend system(s) can support. DO NOT scale it to the number of external network clients your application expects to be serving. unicorn is NOT for serving slow clients, that is the job of nginx.
worker_processes should be at least the number of CPU cores on a dedicated server (unless you do not have enough memory). If your application has occasionally slow responses that are /not/ CPU-intensive, you may increase this to workaround those inefficiencies.
Never, ever, increase worker_processes to the point where the system runs out of physical memory and hits swap. Production servers should never see heavy swap activity.
sets the current number of #worker_processes to nr. Each worker process will serve exactly one client at a time. You can increment or decrement this value at runtime by sending SIGTTIN or SIGTTOU respectively to the master process without reloading the rest of your Unicorn configuration. See the SIGNALS document for more information.
In summary:
for nginx, it is best to keep it at or below the number of CPUs (and I'd probably not count the hyperthreading ones, especially if you have other stuff running on the same server) and/or discs,
whereas for unicorn, it looks like, it probably has to be at least the number of CPUs, plus, if you have sufficient memory, and depending on your workload, you may possibly want to increase it much further than the raw number of the CPUs
The general rule of thumb is to use one worker process per core that your server has. So setting worker_processes 4; would be optimal in your scenario for both nginx and Unicorn config files, as given by example here:
# One worker process per CPU core is a good guideline.
worker_processes 1;
# The number of worker processes you have here should equal the number of CPU
# cores your server has.
worker_processes (ENV['RAILS_ENV'] == 'production' ? 4 : 1)
More information on the Sidekiq concurrency can be found here:
You can tune the amount of concurrency in your sidekiq process. By default, one sidekiq process creates 25 threads. If that's crushing your machine with I/O, you can adjust it down:
sidekiq -c 10
Don't set the concurrency higher than 50. I've seen stability issues with concurrency of 100, for example. Note that ActiveRecord has a connection pool which needs to be properly configured in config/database.yml to work well with heavy concurrency. Set the pool setting to something close or equal to the number of threads:
adapter: mysql2
database: foo_production
pool: 25

How do I make HTTP requests in Rails while still servicing many requests per minute?

I'm trying to scale up an app server to process over 20,000 requests per minute.
When I stress-test the requests, most requests are easily handling 20,000 RPM or more.
But, requests that need to make an external HTTP request (eg, Facebook Login) bring the server down to a crawl (3,000 RPM).
I conceptually understand the limitations of my current environment -- 3 load-balanced servers with 4 unicorn workers per server can only handle 12 requests at a time, even if all of them are waiting on HTTP requests.
What are my options for scaling this better? I'd like to handle many more connections at once.
Possible solutions as I understand it:
Brute force: use more unicorn workers (ie, more RAM) and more servers.
Push all the blocking operations into background/worker processes to free up the web processes. Clients will need to poll periodically to find when their request has completed.
Move to Puma instead of Unicorn (and probably to Rubinius from MRI), so that I can use threads instead of processes -- which may(??) improve memory usage per connection, and therefore allow the number of workers to be increased.
Fundamentally, what I'm looking for is: Is there a better way to increase the number of blocked/queued requests a single worker can handle so that I can increase the number of connections per server?
For example, I've heard discussion of using Thin with EventMachine. Does this open up the possibility of a Rails worker that can put down the web request it's currently working on (because that one is waiting on an external server) and then picks up another request while it's waiting? If so, is this a worthwhile avenue to pursue for performance compared with Unicorn and Puma? (Does it strongly depend on the runtime activities of the app?)
Unicorn is a single-threaded, multi-process synchronous app server. It's not a good match for this kind of processing.
It sounds like your application is I/O bound. This argues for an event-oriented daemon to process your requests.
I'd recommend trying EventMachine and the em-http-request and em-http-server.
This will allow you to service both incoming requests to the http server and outgoing HTTP service calls asynchronously.

Ruby rack app on 1 CPU - how many processes should I run?

I'm quite new to Ruby web apps (coming from java).
I have VPS that has 1 CPU and 2GB of RAM and would like to play with some rails/sinatra stuff.
I'm using Ruby 2.1.0 MRI
How does number of CPUs maps to number of web server's processes I need to run? I use puma as a web server and have default threads (0,16) set up. But I noticed there is also "workers" option that forks another process to better handle multiple requests.
Do I understand correctly that for such setup (1 CPU) there is no point in running 2 web server processes? The only reasonable setup is 1 process with threads?
Oh now this is a pretty big question!
The number of processes and threads aren't necessarily linked to the number of CPU's. It's more a case of the amount of memory available, the amount of concurrent requests and the amount of 'locking' stuff that's going on.
If you're going to have long running requests that block other requests, then having additional processes can help with that. You can still have more than one process with a single CPU.
There are a number of different servers in Ruby that handle scaling in different ways, Unicorn, Puma, Thin are some of them. Doing a search on Unicorn vs Puma vs Thin can turn up some useful blog posts on the topic.
Here's a couple
And some information on concurrency in Ruby
The TL:DR answer is, it depends!

Thin vs Unicorn on Heroku

Just wanted to get people's opinions on using Unicorn vs Thin as a rails server. Most of the articles/benchmarks I found online seem very incomplete, so it would nice to have a centralized place to discuss it.
Unicron is a multi-processes server, while thin is an event based/non-blocking server. Event-based servers are great... if your code is asynchronous/non-blocking - vanilla rails is blocking. So unless you use non-blocking rails libraries, I really don't see the advantage of using Thin. Even worse, in a non-blocking server, if your i/o loop is blocking you're going to block the entire loop and not be able to handle any more requests until the blocking call returns. Blocking libraries are going to slow thin down!
Why did Heroku choose Thin as their default server (for cedar)? They are smart guys, so I'm sure they had a reason.
Bellow is a link that suggests replacing Thin with 4 Unicorn workers - this makes perfect sense to me.
4 Unicron workers on Heroku
Thin is easy to configure - not optimal, but it just works in the Heroku environment.
Unicorn can be more efficient, but it needs to be configured: How many workers? Preload App? What do you pick?
I have released Unicorn Heroku apps with workers set to 3, 5 and 8 - just based on how big each app is - how much code, how much memory is used and how much traffic you get all go into picking this number, and you need to monitor over time to make sure you got the number right, and your app isn't running out of memory.
Preload false - this will make your app start slower, but when Unicorn restarts a worker, this is 'safer' with network connections (memcache, postgres, mongo etc)
Preload true - this is better, but you need to handle server re-connections correctly in the pre and post fork code.
Thin has none of these issues out of the box, but you only get process of execution.
Summary: It's really hard to configure Unicorn out of the box to work well (or at all) for everyone, whereas Thin can just work to get people running with fewer support requests.
Recently (only a few months ago) the folks behind Phusion Passenger add support to Heroku. Definitely this is an alternative you should try and see if fits your needs.
Is blazing fast even with 1 dyno and the drop in response time is palpable.
A simple Passenger Ruby Heroku Demo is hosted on github.
The main benefits that Passengers on Heroku claims are:
Static asset acceleration through Nginx - Don't let your Ruby app serve static assets, let Nginx do it for you and offload your app for the really important tasks. Nginx will do a much better job.
Multiple worker processes - Instead of running only one worker on a dyno, Phusion Passenger runs multiple worker on a single dyno, thus utilizing its resources to its fullest and giving you more bang for the buck. This approach is similar to Unicorn's. But unlike Unicorn, Phusion Passenger dynamically scales the number of worker processes based on current traffic, thus freeing up resources when they're not necessary.
Memory optimizations - Phusion Passenger uses less memory than Thin and Unicorn. It also supports copy-on-write virtual memory in combination with code preloading, thus making your app use even less memory when run on Ruby 2.0.
Request/response buffering - The included Nginx buffers requests and responses, thus protecting your app against slow clients (e.g. mobile devices on mobile networks) and improving performance.
Out-of-band garbage collection - Ruby's garbage collector is slow, but why bother your visitors with long response times? Fix this by running garbage collection outside of the normal request-response cycle! This concept, first introduced by Unicorn, has been improved upon: Phusion Passenger ensures that only one request at the same time is running out-of-band garbage collection, thus eliminating all the problems Unicorn's out-of-band garbage collection has.
JRuby support - Unicorn's a better choice than Thin, but it doesn't support JRuby. Phusion Passenger does.
Hope this helps.
Heroku does not use intelligent routing - it will randomly assign jobs to dynos regardless of whether the dyno is busy. Thus, if your dyno cannot handle multiple jobs at once, you will get latency (perhaps massive latency) even if you are paying for lots of other dynos that are free. " That's right — if your app needs 80 dynos with an intelligent router, it needs 4,000 with a random router. "
Heroku says they are working on this, and their plan is to make it easier to use Unicorn. They basically said "Oops, we didn't notice that this was a problem for a few years... and now that we look, it's definitely a problem for Thin... so I guess you need to use a different program than the one we've been pushing all this time."
From the official Heroku explanation (second link above):
"Rails, in fact, does not yet reliably support concurrent request handling. This leaves Rails developers unable to leverage the additional concurrency capabilities offered by the Cedar stack, unless they move to a concurrent web server like Puma or Unicorn.
Rails apps deployed to Cedar with Thin can rather quickly end up with request queuing problems. Because the Cedar router no longer does any queuing on behalf of the app, requests queued at the dyno must wait until the single Rails process works its way through the queue. Many customers have run into this issue and we failed to take action and provide them with a better approach to deploying Rails apps on Cedar."
Also of interest is that their performance tools, including New Relic, have not been reporting time spent in the dyno queue.

Slow Client connection blocks Mongrel

I have a Apache + Haproxy + Mongrel setup for my rails application. When I hit a particular server page, mongrel takes around 100ms to process the request and I get the page in around 5 secs due to data transmission time on my slow home connection.
Now I see that during these 5 secs of data transmission, mongrel does not serve any other request. I am surprised as that means mongrel is serving the response html to the client and is blocked till the client receives it. Shouldn't serving response be the job of Apache?
This puts serious bottleneck in the no of requests Mongrel can serve as that would depend on the speed of the client connection. Is there any way that html generated by mongrel is served by apache/haproxy or any other web server like nginx?
I wonder how the other high traffic sites are managing it?
Most sites that use mongrel use lots of them as they do block like you are experiencing.
You'll probably want to look into passenger instead as it is they way to go these days.
mongrel itself is multi-threaded, but rails can process only one process at a time by default, although this can be changed by config. In case of mongrel, use mongrel-cluster.
FYI passenger also sets up a pool of applications but it is nicer to deploy, has better press and is more popular right now.
