Monitoring Mongrel queue length - ruby-on-rails

I have a Apache + Haproxy + Mongrel Cluster setup. I want to receive alerts whenever my Mongrel queue length gets too high.
How to I get the current Mongrel Queue length and make it available for alerting tools such as Monit and Nagios?
I know that Haproxy has the information about Mongrel queue as it intelligently sends requests to least busy Mongrel in the cluster. I wonder how it finds out? I need a similar mechanism to generate alerts and/or restart mongrels when such a condition arrives.

Add this to your haproxy config
stats uri /haproxy/hastats
Then use lynx to get the stats like this:
(assuming haproxy runs on port 10000 - adjust to suit)
lynx --dump http://my-server:10000/haproxy/hastats
Each there will be a line for each of your server entries in the haproxy config file, telling you whether it's up or down, how long it's queue is, like this:
Server Queue Sessions Errors
Name Weight Status Act. Bck. Curr. Max. Curr. Max. Limit Cumul. Conn. Resp. Sec. Check Down
primary 1 UP Y - 0 0 68 386 - 134385861 207 699 0 7028 150
secondary 1 UP Y - 0 0 71 248 - 134464984 216 551 0 7129 98
Now all you need is a script to get the current queue (column 6) and feed it into nagios, and you're away!

New Relic's RPM product (www.newrelic.com) maintains information on Mongrel queue length. They have an API that you may be able to use to get near real-time feedback on queue length and adjust load balancing accordingly.
You can get more information on the API at: https://newrelic.tenderapp.com/faqs/docs/data-api
Hopefully that provides some help.

Related

Provide server-side metrics on APIs usage

I have service deployed through a stack on a swarm.
Let's say :
someStack :
SomeServer : ...
myApplication: ....
Above this, there is a traefik server, that allows me to call various services, but also to map different urls to APIs/subservices : since the above service provide actually many sub-services such as (seen from container POV):
/myApplication/getUsers/For/Area/51
/myApplication/getUsers/Admins
/myApplication/ping/enclyclopedia&code=42
/myApplication/bricks/list&code=0937
along with other endpoints from other stacks / services (/otherApplication/toto, /yaApp/titi, etc.)
The matching endpoints being (from traefik POV) :
/users&area=51
/getadmins
/ask&code=42
/listbricks&code=0937
Theses works well... Now, Il would like to be able to perform stats on the usage of each endpoint (eg using grafana), related to myApplication overall stats .
Something like :
/users : 57 % of myApplication calls, 33 % of myApplication total response time, 15 % of myApplication total errors
/getadmins : 33 % of myApplication call, 7% of myApplication total response time, 85 % of myApplication total errors
/ask : 7 % of myApplication call, 40% of myApplication total response time, 0 % of myApplication total errors
/listbricks : 3 % of cmyApplication all, 20% of myApplication total response time, 0 % of myApplication total errors
The metrics I have so far are those provided by cAdvisor and traefik itself. I am using using prometheus to pull them and build metrics on top of them.
Regarding traefik's metrics, I can't see any that match my need...
I do not own 'myApplication', and thus cannot basicely implement some kind of instrumentation from the inside (or not a trivial way).
I could also build metrics over traefik access logs, but I am mainly wondering if such metrics, or trick with existing metrics, could allow me to perform such stats on my application usage.
Any idea?

Erlang change VM process initial size. Tune Erlang VM

First I have to mention that I run on a CentOS 7 tuned up to support 1 million connections. I tested with a simple C server and client and I connected 512000 clients. I could have connect more but I did not have enought RAM to spawn more linux client machines, since from a machine I can open 65536 connections; 8 machines * 64000 connections each = 512000.
I made a simple Erlang server to which I want to connect 1 million or half a million clients, using the same C client. The problem I'm having now is memory related. For each successfully gen_tcp:accept call I spawn a process. Around 50000 open connections costs me 3.7 GB RAM on server, meanwhile using the C server I could have open 512000 connections using 1.9 GB RAM. It is true that on the C server I did not created a process after accept to handle stuff, I just called accept again in while loop, but even so... guys on web did this erlang thing with less memory ( ejabberd riak )
I presume that the flags that I pass to the erlang VM should do the trick. From what I read in documentation and on the web this is what I have: erl +K true +Q 64200 +P 134217727 -env ERL_MAX_PORTS 40960000 -env ERTS_MAX_PORTS 40960000 +a 16 +hms 1024 +hmbs 1024
This is the server code, I open 1 listener that monitors port 5001 by calling start(1, 5001).
start(Num,LPort) ->
case gen_tcp:listen(LPort,[{reuseaddr, true},{backlog,9000000000}]) of
{ok, ListenSock} ->
start_servers(Num,ListenSock),
{ok, Port} = inet:port(ListenSock),
Port;
{error,Reason} ->
{error,Reason}
end.
start_servers(0,_) ->
ok;
start_servers(Num,LS) ->
spawn(?MODULE,server,[LS,0]),
start_servers(Num-1,LS).
server(LS, Nr) ->
io:format("before accept ~w~n",[Nr]),
case gen_tcp:accept(LS) of
{ok,S} ->
io:format("after accept ~w~n",[Nr]),
spawn(ex,server,[LS,Nr+1]),
proc_lib:hibernate(?MODULE, loop, [S]);
Other ->
io:format("accept returned ~w - goodbye!~n",[Other]),
ok
end.
loop(S) ->
ok = inet:setopts(S,[{active,once}]),
receive
{tcp,S, _Data} ->
Answer = 1, % Not implemented in this example
gen_tcp:send(S,Answer),
proc_lib:hibernate(?MODULE, loop, [S]);
{tcp_closed,S} ->
io:format("Socket ~w closed [~w]~n",[S,self()]),
ok
end.
Given this configuration your my beam consumed about 2.5 GB of memory just on start without even your module loaded.
However, if you reduce maximum number of processes to the reasonable value, like +P 60000 for 50 000 connections test, memory consumption drops rapidly.
With 60 000 processes limit VM only used 527MB of virtual memory on start.
I've tried to reproduce your test, but unfortunately I was only able to launch 30 000 netcat's on my system before running out of memory (because of client jobs). However I only observed increase of VM memory consumption up to 570MB.
So my suggestion is that your numbers come from high startup memory consumption and not great number of opened connections. Even then you actually should pay attention to the stats change along with increasing number of opened connections and not absolute values.
I finally used the following configuration for my benchmark:
erl +K true +Q 64200 +P 60000 -env ERL_MAX_PORTS 40960000 -env ERTS_MAX_PORTS 40960000 +a 16 +hms 1024 +hmbs 1024
So I've launched clients with the command
for i in `seq 1 50000`; do nc 127.0.0.1 5001 & done
Apart from tunes you already made you can adjust tcp buffers as well. By default they take OS default values, but you can pass {recbuf, Size}and {sndbuf, Size} to gen_tcp:listen. It may reduce memory footprints significantly.

Configuring zabbix to monitor ping from a server

I am new to zabbix. I would like to monitor the ping from my server and I want to activate a trigger if the ping gets unresponsive or ping time exceeds 20 milliseconds.
I don't know how to configure the trigger expression to suit my needs. Please help. Thanks.
I used
type -> Simple check
key -> icmppingsec
Type of information -> Numeric(Float)
Units -> s
Flexible intervial -> 10secs, from 7:00-24:00
This is the trigger expression.
And a graph I created.
According to simple check documentation, icmppingsec item returns ping time in seconds or 0 if the host is not reachable. Therefore, your trigger can be as follows:
{Template ICMP Ping:icmppingsec.avg(5m)} > 0.020 |
{Template ICMP Ping:icmppingsec.max(5m)} = 0
If you are using at least Zabbix 2.4, you should use or instead of | (see What's new in Zabbix 2.4.0).
Note also that there is no point in using "1-7,00:00-24:00" flexible interval. You can just put "10" into "Updated interval (in sec)" field.

Need help in understanding httperf num_calls and num_conns

When I run httperf with following options, the output is easy to understand.
Options: Make total 10 connections (num-conns) at rate of 10 (rate) connections/second with 2 request calls per connection (num-calls).
Output: 10 connections with 20 request calls
httperf -v --server www.example.com --wlog=n,$HOME/tmp/reqs.txt_httperf --rate=10 --num-conns=10 --num-calls=2 --hog
Total: connections 10 requests 20 replies 10 test-duration 1.575 s
However, when I use following options, httperf output, output is confusing.
Options: Make total 4 connections (num-conns) at rate of 10 (rate) connections/second with 6 request calls per connection (num-calls).
httperf -v --server www.example.com --wlog=n,$HOME/tmp/reqs.txt_httperf --rate=10 --num-conns=4 --num-calls=6 --hog
Total: connections 4 requests 8 replies 4 test-duration 0.455 s
It seems like when num-calls is greater than num-conns, number of requests made are 2*num-conns.
I am not following why num-calls be greater than num-conns. Am I missing anything?
The reason why num-calls is greater than num-conns: on each connection, you can make multiple HTTP transactions (a.k.a "calls"). If num-conns = 4, on each connection, you make 2 transactions, then num-calls would be 8.
Hope this helps.

Erlang/OTP - Timing Applications

I am interested in bench-marking different parts of my program for speed. I having tried using info(statistics) and erlang:now()
I need to know down to the microsecond what the average speed is. I don't know why I am having trouble with a script I wrote.
It should be able to start anywhere and end anywhere. I ran into a problem when I tried starting it on a process that may be running up to four times in parallel.
Is there anyone who already has a solution to this issue?
EDIT:
Willing to give a bounty if someone can provide a script to do it. It needs to spawn though multiple process'. I cannot accept a function like timer.. at least in the implementations I have seen. IT only traverses one process and even then some major editing is necessary for a full test of a full program. Hope I made it clear enough.
Here's how to use eprof, likely the easiest solution for you:
First you need to start it, like most applications out there:
23> eprof:start().
{ok,<0.95.0>}
Eprof supports two profiling mode. You can call it and ask to profile a certain function, but we can't use that because other processes will mess everything up. We need to manually start it profiling and tell it when to stop (this is why you won't have an easy script, by the way).
24> eprof:start_profiling([self()]).
profiling
This tells eprof to profile everything that will be run and spawned from the shell. New processes will be included here. I will run some arbitrary multiprocessing function I have, which spawns about 4 processes communicating with each other for a few seconds:
25> trade_calls:main_ab().
Spawned Carl: <0.99.0>
Spawned Jim: <0.101.0>
<0.100.0>
Jim: asking user <0.99.0> for a trade
Carl: <0.101.0> asked for a trade negotiation
Carl: accepting negotiation
Jim: starting negotiation
... <snip> ...
We can now tell eprof to stop profiling once the function is done running.
26> eprof:stop_profiling().
profiling_stopped
And we want the logs. Eprof will print them to screen by default. You can ask it to also log to a file with eprof:log(File). Then you can tell it to analyze the results. We tell it to collapse the run time from all processes into a single table with the option total (see the manual for more options):
27> eprof:analyze(total).
FUNCTION CALLS % TIME [uS / CALLS]
-------- ----- --- ---- [----------]
io:o_request/3 46 0.00 0 [ 0.00]
io:columns/0 2 0.00 0 [ 0.00]
io:columns/1 2 0.00 0 [ 0.00]
io:format/1 4 0.00 0 [ 0.00]
io:format/2 46 0.00 0 [ 0.00]
io:request/2 48 0.00 0 [ 0.00]
...
erlang:atom_to_list/1 5 0.00 0 [ 0.00]
io:format/3 46 16.67 1000 [ 21.74]
erl_eval:bindings/1 4 16.67 1000 [ 250.00]
dict:store_bkt_val/3 400 16.67 1000 [ 2.50]
dict:store/3 114 50.00 3000 [ 26.32]
And you can see that most of the time (50%) is spent in dict:store/3. 16.67% is taken in outputting the result, another 16.67% is taken by erl_eval (this is why you get by running short functions in the shell -- parsing them becomes longer than running them).
You can then start going from there. That's the basics of profiling run times with Erlang. Handle with care, eprof can be quite a load on a production system or for functions that run for too long. Especially on a production system.
You can use eprof or fprof.
The normal way to do this is with timer:tc. Here is a good explanation.
I can recommend you this tool: https://github.com/virtan/eep
You will get something like this https://raw.github.com/virtan/eep/master/doc/sshot1.png as a result.
Step by step instruction for profiling all processes on running system:
On target system:
1> eep:start_file_tracing("file_name"), timer:sleep(20000), eep:stop_tracing().
$ scp -C $PWD/file_name.trace desktop:
On desktop:
1> eep:convert_tracing("file_name").
$ kcachegrind callgrind.out.file_name

Resources