Integrate graphite metrics with bosun

Integrate graphite metrics with bosun - monitoring

I am running Docker container for bosun. I want to integrate the graphite metrics with bosun.
What are the configuration changes that need to be done for this?

#kyle-brandt's answer is okay and I gave it an upvote but it and the Bosun docs don't really explain enough of how to use a Graphite that you don't host, i.e. hostedgraphite.com. Using the docs and some trial and error I figured things out. So here it goes:
Make a Graphite API key: http://docs.hostedgraphite.com/advanced/access-keys.html (you should whitelist IP addresses). Let's say you got https://www.hostedgraphite.com/deadbeef/431-831/graphite/.
Create data.conf with:
tsdbHost = localhost:4242
stateFile = /data/bosun.state
graphiteHost = https://www.hostedgraphite.com/deadbeef/431-831/graphite/render
Start the Docker container:
docker run -d \
-p 80:8070 \
--name=bosun \
-v `pwd`/bosun.conf:/data/bosun.conf \
stackexchange/bosun
Note that I didn't do the 4242 port mapping because I'm getting my data just from hostedgraphite.com and I mapped 8070 to 80 so that I don't have to specify the port when going to Bosun in the browser.
Adding expressions: The docs say to use GraphiteQuery but that didn't work for me, graphite worked instead. For example: graphite("my.long.metric.name.for.some.method", "10m", "", ""). There is also an example graphite alert in the examples part of the documentation (thanks #kyle-brandt).

As per the documentation you linked, you must set the graphiteHost in the config:
graphiteHost: an ip, hostname, ip:port, hostname:port or a URL,
defaults to standard http/https ports, defaults to “/render” path. Any
non-zero path (even “/” overrides path)
The graphing page and items page in Bosun only work with OpenTSDB as the backend. However, you can still you the expression page, dashboard, and config editor. When you use expressions that return a seriesSet as the graphite query functions do, you will see a graph tab on the expression tabe. You can also use the .Graph and .GraphAll template functions with graphite. So it is largely functional.
There is also an example graphite alert in the examples part of the documentation.

Related

How to get a program's std-out to fluentd (without docker)

Scenario:
You write a program in R or Python, which needs to run on Linux or Windows, you want to log (JSON structured and unstructured) std-out and (mostly unstructured) std-error from this program to a Fluentd instance. Adding a new program or starting another instance should not require to update the Fluentd configuration and the applications will not (yet) be running in a docker environment.
Question:
How to send "logs" from a bunch of programs to an fluentd instance, without the need to perform curl calls for every log entry that your application was originally writing to std-out?
When a UDP or TCP connection' is necessary for the application to run, it seems to become less easy to debug, and any dependency of your program that returns std-out will be required to be parsed, just to get it's logging passed through.
Thoughts:
Alternatively, a question could be, how to accept a 'connection' object which can either point to a file or to a TCP connection? So that switching between the std-out or a TCP destination is a matter of changing a single value?
I like the 'tail' input plugin, which could be what I am looking for, but then:
the original log file never appears to stop growing (will the trail position value reset when it is simply removed? I couldn't find this behaviour), and
it seems that it requires to reconfigure fluentd for every new program that you start on that server (if it logs in another file), I would highly prefer to keep that configuration on the program side...
I build an EFK stack with a docker logdriver set to fluentd, which does not seem to have an optimal solid solution either, but without docker, I already get kind of stuck with setting up a basic configuration (not referring to fluent.conf here).

TL;DR
std-out -> fluentd: Redirect the program output, when launching your program, to a file. On linux, use logrotate, you will love it.
Windows: use fluent-bit.
App side config: use single (or predictable) log locations, and the
fluentd/fluent-bit 'in_tail' plugin.
logging general:
It's recommended to always write application output to a file, if the std-out must be written to a file, pipe it's output at program startup. For more flexibility for the fluentd configuration, pipe them to separate files (just like 'Apache' does):
My_program.exe Do some crazy stuf > my_out_file.txt 2> my_error_file.txt
This opens the option for fluentd to read from this/these file(s).
Windows:
For Windows systems, use fluent-bit, it likely solves the issue for aggregating the Windows OS program logs. Support for Windows has just been implemented recently.
fluent-bit supports:
the 'tail' plugin, which records the 'inode' value (unique, renaming insensitive, file pointer) and the 'index' (called 'pos' for the full-blown 'fluent' application) value in a sqllite3 database and deals with un-processable data, which is allocated to a certain key ('log' by default)
Works on Windows machines, but note that it cannot buffer to disk, so be sure a lost connection, or another issue with the output, is reestablished or fixed in time so that you will not be running into OOM issues.
Appl. side config:
The tail plugin can monitor a folder, this makes it practically possible to keep the configuration on the side of your program. Just make sure you write your logs of your different applications to a predictable directory.
Fluent-bit setup/config:
For Linux, just use fluentd (unless > 100000 messages per second are required, which is where fluent-bit becomes your only choice).
For Windows, install Fluent-bit, and make it run as a deamon (almost funny sollution).
There are 2 execution methods:
Providing configuration directly via the commandline
Using a config file (example included in zip), and referring to it with the -c flag.
Directly from commandline
Some example executions (without making use of the option to work with a configuration file) can be found here:
PS .\bin\fluent-bit.exe -i winlog -p "channels=Setup,Windows PowerShell" -p "db=./test.db" -o stdout -m '*'
-i declares the input method. Currently, only a few plugins have been implemented, see the man page below.
PS fluent-bit.exe --help
Available Options
-b --storage_path=PATH specify a storage buffering path
-c --config=FILE specify an optional configuration file
-f, --flush=SECONDS flush timeout in seconds (default: 5)
-F --filter=FILTER set a filter
-i, --input=INPUT set an input
-m, --match=MATCH set plugin match, same as '-p match=abc'
-o, --output=OUTPUT set an output
-p, --prop="A=B" set plugin configuration property
-R, --parser=FILE specify a parser configuration file
-e, --plugin=FILE load an external plugin (shared lib)
-l, --log_file=FILE write log info to a file
-t, --tag=TAG set plugin tag, same as '-p tag=abc'
-T, --sp-task=SQL define a stream processor task
-v, --verbose increase logging verbosity (default: info)
-s, --coro_stack_size Set coroutines stack size in bytes (default: 98302)
-q, --quiet quiet mode
-S, --sosreport support report for Enterprise customers
-V, --version show version number
-h, --help print this help
Inputs
tail Tail files
dummy Generate dummy data
statsd StatsD input plugin
winlog Windows Event Log
tcp TCP
forward Fluentd in-forward
random Random
Outputs
counter Records counter
datadog Send events to DataDog HTTP Event Collector
es Elasticsearch
file Generate log file
forward Forward (Fluentd protocol)
http HTTP Output
influxdb InfluxDB Time Series
null Throws away events
slack Send events to a Slack channel
splunk Send events to Splunk HTTP Event Collector
stackdriver Send events to Google Stackdriver Logging
stdout Prints events to STDOUT
tcp TCP Output
flowcounter FlowCounter
Filters
aws Add AWS Metadata
expect Validate expected keys and values
record_modifier modify record
rewrite_tag Rewrite records tags
throttle Throttle messages using sliding window algorithm
grep grep events by specified field values
kubernetes Filter to append Kubernetes metadata
parser Parse events
nest nest events by specified field values
modify modify records by applying rules
lua Lua Scripting Filter
stdout Filter events to STDOUT

Error 403: "Flux query service disabled." But flux-enabled=true has been set in influxdb.conf

I have been using InfluxDB (server version 1.7.5) with the InfluxQL language for some time now. Unfortunately, InfluxQL does not allow me to perform any form of joins, so I need to use InfluxDB's new scripting language Flux instead.
The manual states that I have to enable Flux in /etc/influxdb/influxdb.conf by setting flux-enabled=true which I have done. I restarted the server to make sure I got the new settings and started the Influx Command Line tool with "-type=flux".
I then do get a different user interface than when I use InfluxQL. So far so good. I can also set and read variables etc. So I can set:
> dummy = 1
> dummy
1
However, when I try to do any form of query of the tables such as: from(bucket:"db_OxyFlux-test/autogen")
I always get
Error: Flux query service disabled. Verify flux-enabled=true in the [http] section of the InfluxDB config.
: 403 Forbidden
I found the manual for Fluxlang rather lacking in basic details of Schema exploration and so I am not sure if this is just an issue with my query raising this error or if something else is going wrong. I tested this both on my own home machine and on our remote work server and I get the same results.
Re: Vilix
Thank you. This lead me in the right direction.
I realised that InfluxDB does not automatically read the config file (which is not very intuitive). But your solution also forces me to start the deamon by hand each time. After some more googling I used:
"sudo influxd config -config /etc/influxdb/influxdb.conf"
So hopefully now the daemon will start automatically each time on startup rather than me having to do this by hand.

I have the same issue and solution is to start influxd with -config option:
influxd -config /etc/influxdb/influxdb.conf

Query on custom metrics exposed via prometheus node exporter textfile collector fails

I am new to prometheus/alertmanager.
I have created a cron job which executes shell script every minute. This shell script generates "test.prom" file (with a gauge metric in it) in the same directory which is assigned to --textfile.collector.directory argument (to node-exporter). I verified (using curl http://localhost:9100/metrics) that the node-exporter exposes that custom metric correctly.
When I tried to run a query against that custom metric in prometheus dashboard, it does not show up any results (it says no data found).
I could not figure out why the query against the metric exposed via node-exporter textfile collector fails. Any clues what I missed ? Also please let me know how to check and ensure that prometheus scraped my custom metric 'test_metric` ?
My query in prometheus dashboard is test_metric != 0 (in prometheus dashboard) which did not give any results. But I exposed test_metric via node-exporter textfile.
Any help is appreciated !!
BTW, the node-exporter is running as docker container in Kubernetes environment.

I had a similar situation, but it was not a configuration problem.
Instead, my data included timestamps:
# HELP network_connectivity_rtt Round Trip Time to each node
# TYPE network_connectivity_rtt gauge
network_connectivity_rtt{host="home"} 53.87 1541426242
network_connectivity_rtt{host="hop_1"} 58.8 1541426242
network_connectivity_rtt{host="hop_2"} 21.93 1541426242
network_connectivity_rtt{host="hop_3"} 71.69 1541426242
PNE was picking them up without any problem once I reloaded it. As prometheus is running under systemd, I had to check the logs like this:
journalctl --system -u prometheus.service --follow
There I read this line:
msg="Error on ingesting samples that are too old or are too far into the future"
Once I removed the timestamps, values started appearing. This lead me to read more in detail about the timestamps, and I found out they have to be in miliseconds. So this format now is ok:
# HELP network_connectivity_rtt Round Trip Time to each node
# TYPE network_connectivity_rtt gauge
network_connectivity_rtt{host="home"} 50.47 1541429581376
network_connectivity_rtt{host="hop_1"} 3.38 1541429581376
network_connectivity_rtt{host="hop_2"} 11.2 1541429581376
network_connectivity_rtt{host="hop_3"} 20.72 1541429581376
I hope it helps someone else.

Its my bad. I did not included scrape instructions for node-exporter in prometheus.yaml file. It worked after including them.

This issue is happening because of stale metrics.
Lets say you have written you metric in file at 13.00
by default after 5min prometheus will consider you metric stale and it might disappear from there at the time you are making query.

Prevent default redirection from port 80 to 5000 on Synology NAS (DSM 5)

I would like to use a nginx front server on my Synology NAS for reverse-proxying pruposes. The goal is to provide a facade for the non-standard port numbers used by diverse webservers hosted the NAS. nginx should be listening on port 80, otherwise all this wouldn't make any sense.
However DSM comes out of the box with an Apache server that is already listening on port 80. What it does is really silly : it simply redirects to port 5000, which is the entry point to the NAS web manager (DSM).
What I would like to do is disable this functionality, making the port 80 available for my nginx server. How can I do this ?

Since Google redirects to here also for recent Synology DSM, I answer for DSM6 (based on http://tonylawrence.com/posts/unix/synology/freeing-port-80/)
From DSM6, nginx is used as HTTP server and redirection place. The following commands will leave ngingx in place, put run it at port 8880 instead of 80.
ssh into your Synology
sudo -s
cd /usr/syno/share/nginx
Make a backup of server.mustache, DSM.mustache, WWWService.mustache
cp server.mustache server.mustache.bak
cp DSM.mustache DSM.mustache.bak
cp WWWService.mustache WWWService.mustache.bak
sed -i "s/80/8880/g" server.mustache
sed -i "s/80/8880/g" DSM.mustache
sed -i "s/80/8880/g" WWWService.mustache
Optionally, you can also move 443 to 8881:
sed -i "s/443/8881/g" server.mustache
sed -i "s/443/8881/g" DSM.mustache
sed -i "s/443/8881/g" WWWService.mustache
Quit the shell (e.g., via Ctrl+D)
Go to the Control Panel and change any setting (e.g. the Application portal -> Reverse Proxy to forward http://YOURSYNOLOGYHOSTNAME:80 to http://localhost:8181 - 8181 is the port suggested by the pi-hole on DSM tutorial).

tl;dr Edit /usr/syno/etc/synoservice.d/httpd-user.cfg to look like:
{
"init_job_map":{"upstart":["httpd-user"]},
"user_controllable":"no",
"mtu_sensitive":"yes",
"auto_start":"no"
}
Then edit the stop on runlevel to be [0123456] in /etc/init/httpd-user.conf:
Syno-Server> cat /etc/init/httpd-user.conf
description "start httpd-user daemon"
author "Development Infrastructure Team"
console log
reload signal SIGUSR1
start on syno.share.ready and syno.network.ready
stop on runlevel [0123456]
...
... then reboot.
Background infrormation
The answer given by Backslash36 is not the easiest solution and it may also be more difficult to maintain. Here, I give a solution that also doesn't involve starting webstation, which most other solutions demand. Note, for updated documentation see here, which gives a lot of info in general about the synology systems.
It is important to note that the new DSM (> 5.x) use upstart now, so much of the previous documentation is not correct. There are two httpd jobs which run by default on the synology machines:
httpd-sys : serves the administration page(s) and is located on 5000/5001 by default.
httpd-user : this, somewhat confusingly, always runs even if the webstation program is not enabled.
If webstation:
is enabled: then this program serves the user webpages.
is not enabled: then this program sets /usr/syno/synoman/phpsrc/web as its DocumentRoot (/usr/syno/synoman/phpsrc/web/index.cgi -> /usr/syno/synoman/webman/index.cgi), meaning that a call to http://address.of.my.dsm will call the index.cgi file. This cgi file is what drives the redirect to 5000 (or whatever you have set the admin_port to be).
From the command line, you can check what the [secure_]admin_port is set to:
Syno-Server> get_key_value /etc/synoinfo.conf admin_port
5184
Syno-Server> get_key_value /etc/synoinfo.conf secure_admin_port
5185
where I have set mine differently.
Ok, now to the solution. The best solution is simply to stop the httpd-user daemon from starting. This is presumably what you want anyways (e.g. to start another server like `nginx' in a docker). To do this, edit the relevant upstart configuration file:
Syno-Server> cat /usr/syno/etc/synoservice.d/httpd-user.cfg
{
"init_job_map":{"upstart":["httpd-user"]},
"user_controllable":"no",
"mtu_sensitive":"yes",
"auto_start":"no"
}
so that the "auto_start" entry is "no" (as it is above). It will presumably be "yes" on your machine and by default. Then edit the stop on runlevel to be [0123456] in /etc/init/httpd-user.conf:
Syno-Server> cat /etc/init/httpd-user.conf
description "start httpd-user daemon"
author "Development Infrastructure Team"
console log
reload signal SIGUSR1
start on syno.share.ready and syno.network.ready
stop on runlevel [0123456]
...
This last step is to ensure that the httpd-user service does actually start, but then automatically stops. This is because there are otherwise a number of services that depend upon it actually starting. Reboot your machine and you will now see that nothing is listening (or forwarding) on Port 80.

Done ! It was tricky, but now I have it working just fine. Here is how I did it.
What follows requires to connect to the NAS with ssh, and may not be recommended if you want to keep warranty on your product (even though it's completely safe IMHO)
TL;DR : In the following files, replace all occurences of port 80 by a non standard port (for example, 8080). This will release the port 80 and make it available to use by whatever you want.
/etc/httpd/conf/httpd.conf
/etc/httpd/conf/httpd.conf-user
/etc/httpd/conf/httpd.conf-sys
/etc.defaults/httpd/conf/httpd.conf-user
/etc.defaults/httpd/conf/httpd.conf-sys
Note that modifying a subset of these files is probably sufficient (I could observe that the first one is actually computed from several others). I guess modifying the files in /etc.defaults/ would be enough, but if not, worst-case scenario is to modify all those files and you will be just fine.
Once this is done, don't forget to restart your NAS !
For those interested in how I found out
I'm not that familiar with the Linux filesystem, and even less with Apache configuration. But I knew that scripts dealing with startup processes are located in /etc/init. The Apache server that was performing the redirection would be certainly launched from there.
This is where I had to get my hands dirty. I performed some cat <filename> | grep 80 for the files in that directory I considered relevant, hoping to find a configuration line that would set a port number to 80.
That intuition paid off : /etc/init/httpd-user.conf contained the line echo "DocumentRoot \"/usr/syno/synoman/phpsrc/web\"" >> "${HttpdConf}" #port 80 to 5000. Bingo !
Looking at the top of the file, I discovered that the HttpdConf variable was referring to /etc/httpd/conf/httpd.conf. This is where the actual configuration was taking place.
From there it is relatively straightforward, even for those John Snow out there that know nothing about Apache configuration. The trick was to notice that httpd.conf was instantiated from some template at startup (and changing this file was therefore not enough). Performing a find / -name "*httpd.conf*", combined with some grep 80 gave me the list of files to modify.
When you look back all this looks obvious of course.
However I wish Synology gave us more flexibility, so we don't have to perform dirty hacks like that...

lost logout functionality for grails app using spring security

I have a grails app that moved to a new subnet with a change to the DNS. As a result, the logout functionality stopped working. When I inspect the network using chrome, I get this message under request headers: CAUTION: Provisional headers are shown.
This means request to retrieve that resource was never made, so the headers being shown are not the real thing.
The logout function is executing this action
package edu.example.performanceevaluations
import org.codehaus.groovy.grails.plugins.springsecurity.SpringSecurityUtils
class LogoutController {
def index = {
// Put any pre-logout code here
redirect uri: SpringSecurityUtils.securityConfig.logout.filterProcessesUrl // '/j_spring_security_logout'
}
}
Would greatly appreciate a direction to look towards.

As suggested by that link run chrome://net-internals and see if you get anywhere
If you are still lost, I would suggest a two way debugging if you have Linux find something related to your traffic and run either something like tcpdump or if thats too complex install and run ngrep -W byline -d any port 8080 -q. and look for the pattern see what is going on.
ngrep/tcpdump and look for that old ip or subnet on entire traffic see if anything is still trying get through - (this all be best on grails app server ofcourse
(unsure possibly port 8080 or any other clear text port that your app may be running on)
Look for your ip in the apache logs does it hit the actual server when you log out etc?
Has the application been restarted since subnet change since it could have cached the next point from application in the running Java process:
pgrep java|awk '{print "netstat -plant "$1" |grep "$1 }'|/bin/sh
or
pgrep java|awk '{print " lsof -p "$1" |grep -i listen"}'|/bin/sh
I personally think something somewhere needs to be restarted since its hooking on to a cache of something .
Also check the hosts files of any end machines involved ensure nothing has previous subnet physically configured in there.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart