How to determine which script is being executed in PHP-FPM process - fastcgi

I am running nginx + php-fpm. Is there any way how can I know what is each of the PHP processes doing? Something like extended mod_status in apache, where I can see that apache process with PID x is processing URL y. I'm not sure if the PHP process knows the URL, but getting the script path and name will be sufficient.

After some googling hours and browsing PHP.net bug tracking system I have found the solution. It is available since PHP 5.3.8 or 5.3.9, but doesn't seem to be documented. Based on feature request #54577, the status page supports option full, which will display status of each worker separately. So for example the URL will be http://server.com/php-status?full and sample output looks like:
pid: 22816
state: Idle
start time: 22/Feb/2013:15:03:42 +0100
start since: 10933
requests: 28352
request duration: 1392
request method: GET
request URI: /ad.php?zID=597
content length: 0
user: -
script: /home/web/server.com/ad/ad.php
last request cpu: 718.39
last request memory: 1310720

PHP-FPM has a built in status monitor, though it's not as details as mod_status. From the php-fpm config file /etc/php-fpm.d/www.conf (on CentOS 6)
; The URI to view the FPM status page. If this value is not set, no URI will be
; recognized as a status page. By default, the status page shows the following
; information:
; accepted conn - the number of request accepted by the pool;
; pool - the name of the pool;
; process manager - static or dynamic;
; idle processes - the number of idle processes;
; active processes - the number of active processes;
; total processes - the number of idle + active processes.
; The values of 'idle processes', 'active processes' and 'total processes' are
; updated each second. The value of 'accepted conn' is updated in real time.
; Example output:
; accepted conn: 12073
; pool: www
; process manager: static
; idle processes: 35
; active processes: 65
; total processes: 100
; By default the status page output is formatted as text/plain. Passing either
; 'html' or 'json' as a query string will return the corresponding output
; syntax. Example:
; http://www.foo.bar/status
; http://www.foo.bar/status?json
; http://www.foo.bar/status?html
; Note: The value must start with a leading slash (/). The value can be
; anything, but it may not be a good idea to use the .php extension or it
; may conflict with a real PHP file.
; Default Value: not set
;pm.status_path = /status
If you enable this, you can then pass the path from nginx to your socket/port for PHP-FPM and you can view the status page.
nginx.conf:
location /status {
include fastcgi_params;
fastcgi_pass unix:/var/lib/php/php-fpm.sock;
}

cgi command line is more convinient:
SCRIPT_NAME=/status \
SCRIPT_FILENAME=/status \
REQUEST_METHOD=GET \
cgi-fcgi -bind -connect 127.0.0.1:9000

You can use strace to show the scripts being run - and many other things - in real time. It's pretty verbose, but it can give you a good overall picture of what's going on:
# switch php-fpm7.0 for process you're using
sudo strace -f $(pidof php-fpm7.0 | sed 's/\([0-9]*\)/\-p \1/g')
The above will attach to the forked processes of php fpm. Use -p to attach to a particular pid.
The above would get the scrip path. To get the urls, you would look at your nginx / apache access logs.
As a side note, to see the syscalls and which ones are taking longest:
sudo strace -c -f $(pidof php-fpm7.0 | sed 's/\([0-9]*\)/\-p \1/g')
Wait a while, then hit Ctr-C

Related

docker-compose healthcheck retry frequency != interval

I recently set up healthchecks in my docker-compose config.
It is doing great and I like it. Here's a typical example:
services:
app:
healthcheck:
test: curl -sS http://127.0.0.1:4000 || exit 1
interval: 5s
timeout: 3s
retries: 3
start_period: 30s
My container is quite slow to boot, hence I set up a 30 seconds start_period.
But it doesn't really fit my expectation: I don't need check every 5 seconds, but I need to know when the container is ready for the first time as soon as possible for my orchestration, and since my start_period is approximative, if it is not ready yet at first check, I have to wait for interval before retry.
What I'd like to have is:
While container is not healthy, retry every 5 seconds
Once it is healthy, check every 1 minute
Ain't there a way to achieve this out-of-the-box with docker-compose?
I could write a custom script to achieve this, but I'd rather have a native solution if it is possible.
Unfortunately, this is not possible out of the box.
All the duration set are final. They can't be changed depending on the container state.
However, according to the documentation, the probe does not seem to wait for the start_period to finish before checking your test. The only thing it does is that any failure hapenning during start_period will not be considered as an error.
Below is the sentence that make me think that :
start_period provides initialization time for containers that need time to bootstrap. Probe failure during that period will not be counted towards the maximum number of retries. However, if a health check succeeds during the start period, the container is considered started and all consecutive failures will be counted towards the maximum number of retries.
I encourage you to test if this is really the case as I've never really paid any attention if the healthcheck is tested during the start period or not.
And if it is the case, you can probably increase your start_period if you're unsure about the duration and also increase the interval in order to find a good compromise.
I wrote a script that does this, though I'd rather find a native solution:
#!/bin/sh
HEALTHCHECK_FILE="/root/.healthchecked"
COMMAND=${*?"Usage: healthcheck_retry <COMMAND>"}
if [ -r "$HEALTHCHECK_FILE" ]; then
LAST_HEALTHCHECK=$(date -r "$HEALTHCHECK_FILE" +%s)
# FIVE_MINUTES_AGO=$(date -d 'now - 5 minutes' +%s)
FIVE_MINUTES_AGO=$(echo "$(( $(date +%s)-5*60 ))")
echo "Healthcheck file present";
# if (( $LAST_HEALTHCHECK > $FIVE_MINUTES_AGO )); then
if [ $LAST_HEALTHCHECK -gt $FIVE_MINUTES_AGO ]; then
echo "Healthcheck too recent";
exit 0;
fi
fi
if $COMMAND ; then
echo "\"$COMMAND\" succeed: updating file";
touch $HEALTHCHECK_FILE;
exit 0;
else
echo "\"$COMMAND\" failed: exiting";
exit 1;
fi
Which I use: test: /healthcheck_retry.sh curl -fsS localhost:4000/healthcheck
The pain is that I need to make sure the script is available in every container, so I have to create an extra volume for this:
image: postgres:11.6-alpine
volumes:
- ./scripts/utils/healthcheck_retry.sh:/healthcheck_retry.sh

PowerShell remote invocation mysteriously hangs

I have created a series of functions that basically collect all the IIS configurations about a site, when run on a server locally it executes without issue (albeit slowly) however when I run them remotely using an invoke-command in PowerShell 2 it runs through and mysteriously stops approximately 15-20 seconds into the process. It generally stalls on the same request but not always. The same commands executed locally work without any issues. No exception is raised, it just hangs indefinitely.
I can post the code if necessary however it is several hundred lines so I'm more looking for guidance on how to investigate a problem like this or if anyone has encountered something similar.
Comparing IISConfig between [targetserver] and localhost.
Checking Installed IIS version on [targetserver]:
IIS major version : 7
IIS minor version : 5
IIS7+ detected, using WebAdmin module and IIS metabase
Name Value
---- -----
name Default Web Site
id 1
serverAutoStart True
state 1
Site Configuration:
Name Path PSPath Handlers_Ac Access_sslF Asp_AppAllo Asp_AppAllo Asp_limits_ Asp_EnableP Asp_limits_
cessFlags lags wClientDebu wDebugging bufferingLi arentPaths queueTimeou
g mit t
---- ---- ------ ----------- ----------- ----------- ----------- ----------- ----------- -----------
Default ... IIS:Site... WebAdmin... Read,Script False False 25000000 True 00:00:00
WebApp VDir: /MyApp, App Pool: MyApp
App pool Configuration:
AppPoolID Enable32Bit managedPipe managedRunt AppPoolName AppPoolAuto processMode processMode processMode recycling_l
AppOnWin64 lineMode imeVersion Start l_idleTimeo l_identityT l_UserName ogEventOnRe
ut ype cycle
--------- ----------- ----------- ----------- ----------- ----------- ----------- ----------- ----------- -----------
False Classic v2.0 MyApp True 00:20:00 LocalSer... Time,Req...
Analyzing web directories for /MyApp, this could take a while....
Initial Collection Completed, found 141... took 0.9516122 seconds
0 C:\inetpub\wwwroot\MyApp\Core
1 C:\inetpub\wwwroot\MyApp\Core\AdminTools
2 C:\inetpub\wwwroot\MyApp\Core\AdminTools\Cache
3 C:\inetpub\wwwroot\MyApp\Core\AdminTools\Extra
4 C:\inetpub\wwwroot\MyApp\Core\AdminTools\HTTPPostTest
5 C:\inetpub\wwwroot\MyApp\Core\AdminTools\IISAdmin
6 C:\inetpub\wwwroot\MyApp\Core\AdminTools\Profiling
7 C:\inetpub\wwwroot\MyApp\Core\AdminTools\RecordTestData
8 C:\inetpub\wwwroot\MyApp\Core\AdminTools\ScrambleTest
9 C:\inetpub\wwwroot\MyApp\Core\AdminTools\Sessions
Analyzed 10 so far... took 6.7236862 seconds, remaining time 88.08028922 seconds
Current Folder: C:\inetpub\wwwroot\MyApp\Core\AdminTools\Sessions
10 C:\inetpub\wwwroot\MyApp\Core\AdminTools\SoapTest
11 C:\inetpub\wwwroot\MyApp\Core\AdminTools\StaticContent
Sometimes it makes it to 15 or so. I tried from my laptop and from one server to another and the behavior is the same.
Here is the loop which is hanging:
$start = [System.DateTime]::Now
$numanalyzed = 0
if ($true) #skip to test
{
# loop through all physical folders as it is much faster
foreach ($folder in $folders)
{
write-host $numanalyzed $folder.fullname
#figure out the virtual path to the folder
$iis7vwebfolderpath = $folder.FullName.Replace($iis7webapp.PhysicalPath, $iis7VDirWebApppath)
#Get-item $iis7vwebfolderpath | gm
$iis7VWebDirConfigItem = Get-LNOSIIS7ConfigForPSPath -PSPath $iis7vwebfolderpath
# add new item to list
$iis7VWebDirConfig += $iis7VWebDirConfigItem
# increment counter and report out progress every 10
$numAnalyzed++
if ($numanalyzed % 10 -eq 0)
{
$end = [System.DateTime]::Now
$timeSoFar = (NEW-TIMESPAN –Start $Start –End $End).TotalSeconds
$timeremaining = ($folders.Count - $numAnalyzed) * ($timeSoFar / $numanalyzed)
"Analyzed {0} so far... took {1} seconds, remaining time {2} seconds" -f $numanalyzed,$timeSoFar,$timeremaining | write-host
"Current Folder: {0}" -f $folder.FullName | Write-Host
}
}
}
$end = [System.DateTime]::Now
"Processed web dirs: {0} took {1} seconds" -f $iis7VWebDirConfig.Count,(NEW-TIMESPAN –Start $Start –End $End).TotalSeconds | write-host | Write-Host
The function I'm having performance problems with and I've got a separate question about but this post has the source code for the function:
web-administration vs WMI to query web directory properties performance problems
In my case, it seemed my PowerShell call froze due to the Idle-Timeout expiration (the call runs for a very long time).
Setting IdleTimeout value to a sufficiently long duration fixed my issue.
Once again, query the current configuration using
winrm get winrm/config/winrs
And set the timeout using
winrm set winrm/config/winrs '#{IdleTimeout="18000000"}'
I think i may have discovered the problem, i started getting some odd failures in other parts of the script:
[SEVERNAME] Processing data from remote server SERVERNAME failed with the following error message: The WSMan provider host process did not return a proper response. A provider in the host process may have behaved improperly. For more information, see the about_Remote_Troubleshooting Help topic.
+ CategoryInfo : OpenError: (SERVERNAME:String) [], PSRemotingTransportException
+ FullyQualifiedErrorId : 1726,PSSessionStateBroken
and
Processing data for a remote command failed with the following error message: Not enough storage is available to complete this operation. For more information, see the about_Remote_Troubleshooting Help topic.
+ CategoryInfo : OperationStopped (System.Manageme...pressionSyncJob:PSInvokeExpressionSyncJob) [], PSRemotingTransportException
+ FullyQualifiedErrorId : JobFailure
This lead me to the following site: http://www.gsx.com/blog/bid/83018/Troubleshooting-unknown-PowerShell-error-messages
The following recommendations seems to have cleared up most of the problems although i still have some testing to do.
Excerpt from site below:
As the first error message specifies, an overflow of memory in the remote session has occurred. Open a PowerShell prompt on the remote server and display the configuration of winrs using:
winrm get winrm/config/winrs
Check the "MaxMemoryPerShellMB" value. It is set by default to 150 MB on Windows Server 2008 R2 and Windows 7. This is something that Microsoft changed in Windows Server 2012 and Windows 8 to 1024 MB.
In order to resolve this issue, you need to increase the value to at least 512 MB with the following command:
winrm set winrm/config/winrs `#`{MaxMemoryPerShellMB=`"512`"`}
As an FYI if Invoke-Command always hangs:
Try a simple command to system :
Invoke-Command -ComputerName XXXXX -ScriptBlock { Get-ItemProperty -Path HKLM:\SOFTWARE\Microsoft\Windows\CurrentVersion }
Start the Windows Remote Management Service (on that system)
Check for the listening port:
netstat -aon | findstr "5985"
TCP 0.0.0.0:5985 0.0.0.0:0 LISTENING 4
TCP [::]:5985 [::]:0 LISTENING 4

Supervisord as Windows Service on Cygwin

I am attempting to run Celery as a Windows Service using Supervisord. I followed the configuration laid out on the Celery site and [here][1]. I have set up a virtual environment to run supervisord through cygwin.I have highlighted the lines I think are most important (with **). It appears supervisord and rabbitMQ are working. The problem is with Celery.
I setup the service with the commands:
$ cygrunsrv --install supervisord --path /usr/bin/python --args "/usr/bin/supervisord -n -c /usr/etc/supervisord.conf"
$ supervisord
UPDATED: I now have the following in my supervisord.log file:
2014-08-07 12:46:40,676 INFO exited: celery (exit status 1; not expected)
2014-08-07 12:47:07,187 INFO Increased RLIMIT_NOFILE limit to 1024
2014-08-07 12:47:07,238 INFO RPC interface 'supervisor' initialized
2014-08-07 12:47:07,251 INFO daemonizing the supervisord process
2014-08-07 12:47:07,253 INFO supervisord started with pid 7508
2014-08-07 12:47:08,272 INFO spawned: 'celery' with pid 8056
**2014-08-07 12:47:08,833 INFO success: celery entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)**
The config file is:
[inet_http_server] ; inet (TCP) server disabled by default
port=127.0.0.1:8072 ; (ip_address:port specifier, *:port for all iface)
username = user
password = 123
[supervisord]
logfile= /home/HBA/venv/logFiles/supervisord.log ; (main log file;default $CWD/supervisord.log)
logfile_maxbytes=50MB ; (max main logfile bytes b4 rotation;default 50MB)
logfile_backups=10 ; (num of main logfile rotation backups;default 10)
loglevel=info ; (log level;default info; others: debug,warn,trace)
pidfile=/tmp/supervisord.pid ; (supervisord pidfile;default supervisord.pid)
nodaemon=false ; (start in foreground if true;default false)
minfds=1024 ; (min. avail startup file descriptors;default 1024)
minprocs=200 ; (min. avail process descriptors;default 200)
;user=HBA ; (default is current user, required if root)
childlogdir=/tmp ; ('AUTO' child log dir, default $TEMP)
[rpcinterface:supervisor]
supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface
[supervisorctl]
serverurl=http://127.0.0.1:8072 ; use an http:// url to specify an inet socket
[program:celery]
command= celery worker -A runLogProject --loglevel=INFO ; the program (relative uses PATH, can take args)
directory= /home/HBA/venv/runLogProject
environment=PATH="/home/HBA/venv/;/home/HBA/venv/Scripts/"
numprocs=1
stdout_logfile= /home/HBA/venv/logFiles/%(program_name)s/worker.log ; stdout log path, NONE for none; default AUTO
stderr_logfile= /home/HBA/venv/logFiles/%(program_name)s/worker.log ; stderr log path, NONE for none; default AUTO
autostart=true ; start at supervisord start (default: true)
autorestart=true ; whether/when to restart (default: unexpected)
startsecs=0
stopwaitsecs=1000
killasgroup=true
My celery log file gives me:
**[2014-08-07 19:46:40,584: ERROR/MainProcess] Process 'Worker-4' pid:12284 exited with 'signal -1'
[2014-08-07 19:46:40,584: ERROR/MainProcess] Process 'Worker-3' pid:4432 exited with 'signal -1'
[2014-08-07 19:46:40,584: ERROR/MainProcess] Process 'Worker-2' pid:9120 exited with 'signal -1'
[2014-08-07 19:46:40,584: ERROR/MainProcess] Process 'Worker-1' pid:6280 exited with 'signal -1'**
C:\Python27\lib\site-packages\celery\apps\worker.py:161: CDeprecationWarning:
Starting from version 3.2 Celery will refuse to accept pickle by default.
The pickle serializer is a security concern as it may give attackers
the ability to execute any command. It's important to secure
your broker from unauthorized access when using pickle, so we think
that enabling pickle should require a deliberate action and not be
the default choice.
If you depend on pickle then you should set a setting to disable this
warning and to be sure that everything will continue working
when you upgrade to Celery 3.2::
CELERY_ACCEPT_CONTENT = ['pickle', 'json', 'msgpack', 'yaml']
You must only enable the serializers that you will actually use.
warnings.warn(CDeprecationWarning(W_PICKLE_DEPRECATED))
[2014-08-07 19:47:08,822: WARNING/MainProcess] C:\Python27\lib\site-packages\celery\apps\worker.py:161: CDeprecationWarning:
Starting from version 3.2 Celery will refuse to accept pickle by default.
The pickle serializer is a security concern as it may give attackers
the ability to execute any command. It's important to secure
your broker from unauthorized access when using pickle, so we think
that enabling pickle should require a deliberate action and not be
the default choice.
If you depend on pickle then you should set a setting to disable this
warning and to be sure that everything will continue working
when you upgrade to Celery 3.2::
CELERY_ACCEPT_CONTENT = ['pickle', 'json', 'msgpack', 'yaml']
You must only enable the serializers that you will actually use.
warnings.warn(CDeprecationWarning(W_PICKLE_DEPRECATED))
**[2014-08-07 19:47:08,944: INFO/MainProcess] Connected to amqp://guest:**#127.0.0.1:5672//
[2014-08-07 19:47:08,954: INFO/MainProcess] mingle: searching for neighbors
[2014-08-07 19:47:09,963: INFO/MainProcess] mingle: all alone**
C:\Python27\lib\site-packages\celery\fixups\django.py:236: UserWarning: Using settings.DEBUG leads to a memory leak, never use this setting in production environments!
warnings.warn('Using settings.DEBUG leads to a memory leak, never '
[2014-08-07 19:47:09,982: WARNING/MainProcess] C:\Python27\lib\site-packages\celery\fixups\django.py:236: UserWarning: Using settings.DEBUG leads to a memory leak, never use this setting in production environments!
warnings.warn('Using settings.DEBUG leads to a memory leak, never '
[2014-08-07 19:47:09,982: WARNING/MainProcess] celery#CORONADO ready.
I solved my issue using the following command: /home/HBA/venv/Scripts/celery worker -A runLogProject --loglevel=INFO
My biggest issue was an unfamiliarity with virtual environments. I needed to make sure the files were in the correct folders within the venv.

Jenkins - Posting results to a external monitoring job is adding garbage to the build job log

I have a external monitor job that I'm pushing the result of another job to it with curl and base on this link :
Monitoring external jobs
After I create the job I just need to run a curl command with the body encoded in HEX to the specified url and then a build will be created and the output will be added to it but what I get instead is part of my output in clear text and the rest in weird characters like so :
Started
Asking akamai to purge this urls:
http://xxx/sites/all/modules/custom/uk.png http://aaaaaasites/all/modules/custom/flags/jp.png
<html><head><title>401 Unauthorized</title> </h�VC��&�G����CV�WF��&��VC�������R&R��BWF��&��VBF�66W72F�B&W6�W&6S�����&�G�����F����F�RW&�F �6�V6�7FGW2�bF�R&WVW7B�2��F�RF��RF�v�B�2��6�Ɩ�r&6�w&�V�B��"F�6�V6�7FGW2�bF�RF�6�W#�v�F��rf�"���F�W&vRF��6O request please keep in mind this is an estimated time
Waiting for another 60 seconds
Asking akamai to purge this urls:
...
..
..
This is how I'm doing it :
export output=`cat msg.out|xxd -c 256 -ps`
curl -k -X POST -d "<run><log encoding=\"hexBinary\">$output</log><result>0</result> <duration>2000</duration></run>" https://$jenkinsuser:$jenkinspass#127.0.0.1/jenkins/job/akamai_purge_results/postBuildResult -H'.crumb:c775f3aa15464563456346e'
If I cat that file is all fine and even if I edit it with vi I can't see any problem with it.
Do you guys have any idea how to fix this ?
Could it be a problem with the hex encoding ? ( I tried hex/enc/dec pages with the result of xxd and they look fine)
Thanks.
I had the same issue, and stumbled across this: http://blog.markfeeney.com/2010/01/hexbinary-encoding.html
From that page, you can get the encoding you need via this command:
echo "Hello world" | hexdump -v -e '1/1 "%02x"'
48656c6c6f20776f726c640a
An excerpt from the explanation:
So what the hell is that? -v means don't suppress any duplicate data
in the output, and -e is the format string. hexdump's very particular
about the formatting of the -e argument; so careful with the quotes.
The 1/1 means for every 1 byte encountered in the input, apply the
following formatting pattern 1 time. Despite this sounding like the
default behaviour in the man page, the 1/1 is not optional. /1 also
works, but the 1/1 is very very slightly more readable, IMO. The
"%02x" is just a standard-issue printf-style format code.
So in your case, you would do this (removing 'export' in favor of inline variable)
OUTPUT=`cat msg.out | hexdump -v -e '1/1 "%02x"'` curl -k -X POST -d "<run><log encoding=\"hexBinary\">$OUTPUT</log><result>0</result> <duration>2000</duration></run>" https://$jenkinsuser:$jenkinspass#127.0.0.1/jenkins/job/akamai_purge_results/postBuildResult -H'.crumb:c775f3aa15464563456346e'

Monitoring URLs with Nagios

I'm trying to monitor actual URLs, and not only hosts, with Nagios, as I operate a shared server with several websites, and I don't think its enough just to monitor the basic HTTP service (I'm including at the very bottom of this question a small explanation of what I'm envisioning).
(Side note: please note that I have Nagios installed and running inside a chroot on a CentOS system. I built nagios from source, and have used yum to install into this root all dependencies needed, etc...)
I first found check_url, but after installing it into /usr/lib/nagios/libexec, I kept getting a "return code of 255 is out of bounds" error. That's when I decided to start writing this question (but wait! There's another plugin I decided to try first!)
After reviewing This Question that had almost practically the same problem I'm having with check_url, I decided to open up a new question on the subject because
a) I'm not using NRPE with this check
b) I tried the suggestions made on the earlier question to which I linked, but none of them worked. For example...
./check_url some-domain.com | echo $0
returns "0" (which indicates the check was successful)
I then followed the debugging instructions on Nagios Support to create a temp file called debug_check_url, and put the following in it (to then be called by my command definition):
#!/bin/sh
echo `date` >> /tmp/debug_check_url_plugin
echo $* /tmp/debug_check_url_plugin
/usr/local/nagios/libexec/check_url $*
Assuming I'm not in "debugging mode", my command definition for running check_url is as follows (inside command.cfg):
'check_url' command definition
define command{
command_name check_url
command_line $USER1$/check_url $url$
}
(Incidentally, you can also view what I was using in my service config file at the very bottom of this question)
Before publishing this question, however, I decided to give 1 more shot at figuring out a solution. I found the check_url_status plugin, and decided to give that one a shot. To do that, here's what I did:
mkdir /usr/lib/nagios/libexec/check_url_status/
downloaded both check_url_status and utils.pm
Per the user comment / review on the check_url_status plugin page, I changed "lib" to the proper directory of /usr/lib/nagios/libexec/.
Run the following:
./check_user_status -U some-domain.com.
When I run the above command, I kept getting the following error:
bash-4.1# ./check_url_status -U mydomain.com
Can't locate utils.pm in #INC (#INC contains: /usr/lib/nagios/libexec/ /usr/local/lib/perl5 /usr/local/share/perl5 /usr/lib/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib/perl5 /usr/share/perl5) at ./check_url_status line 34.
BEGIN failed--compilation aborted at ./check_url_status line 34.
So at this point, I give up, and have a couple of questions:
Which of these two plugins would you recommend? check_url or check_url_status?
(After reading the description of check_url_status, I feel that this one might be the better choice. Your thoughts?)
Now, how would I fix my problem with whichever plugin you recommended?
At the beginning of this question, I mentioned I would include a small explanation of what I'm envisioning. I have a file called services.cfg which is where I have all of my service definitions located (imagine that!).
The following is a snippet of my service definition file, which I wrote to use check_url (because at that time, I thought everything worked). I'll build a service for each URL I want to monitor:
###
# Monitoring Individual URLs...
#
###
define service{
host_name {my-shared-web-server}
service_description URL: somedomain.com
check_command check_url!somedomain.com
max_check_attempts 5
check_interval 3
retry_interval 1
check_period 24x7
notification_interval 30
notification_period workhours
}
I was making things WAY too complicated.
The built-in / installed by default plugin, check_http, can accomplish what I wanted and more. Here's how I have accomplished this:
My Service Definition:
define service{
host_name myers
service_description URL: my-url.com
check_command check_http_url!http://my-url.com
max_check_attempts 5
check_interval 3
retry_interval 1
check_period 24x7
notification_interval 30
notification_period workhours
}
My Command Definition:
define command{
command_name check_http_url
command_line $USER1$/check_http -I $HOSTADDRESS$ -u $ARG1$
}
The better way to monitor urls is by using webinject which can be used with nagios.
The below problem is due to the reason that you dont have the perl package utils try installing it.
bash-4.1# ./check_url_status -U mydomain.com Can't locate utils.pm in #INC (#INC contains:
You can make an script plugin. It is easy, you only have to check the URL with something like:
`curl -Is $URL -k| grep HTTP | cut -d ' ' -f2`
$URL is what you pass to the script command by param.
Then check the result: If you have an code greater than 399 you have a problem, else... everything is OK! THen an right exit mode and the message for Nagios.

Resources