I am using GrafanaCloud. I have a dashboard variable to create a list of nodes:
node -> label_values(agent_hostname)
Then I have my query to graph CPU for the selected hostname:
instance:node_cpu_utilisation:rate5m{agent_hostname="$node"}
This works fine for a single host but I would like to be able to have lines for several servers on one graph. I have 'Include All' and 'Multi-value' switched on for the variable. At the moment, when I choose a second or third server from my nodes variable the graph shows 'No data'. Do I need to amend my variable so that it parses with a pipe (OR) symbol at the end? And if so how would I do that?
I still haven't solved this but I managed the following workaround. I added the following to the grafana-agent.yaml file so that prometheus would add an environment and role label for each machine:
prometheus_remote_write:
url: {{ prometheus_url }}
write_relabel_configs:
- source_labels: [__address__]
regex: '.*'
target_label: instance
replacement: {{ ansible_hostname }}
- source_labels: [__address__]
regex: '.*'
target_label: environment
replacement: {{ env | default('legacy') }}
- source_labels: [__address__]
regex: '.*'
target_label: role
replacement: {{ role | default('none') }}
Then in the dashboard variables I added:
Then in the query for each metric I added the role and environment, for example for load:
node_load15{role="$role", environment="$environment" }
This allows me to show load for several machines on the one graph and also allows me to easily switch between environments and clusters using the variables drop-down at the top.
Related
I deployed Neo4j in my AKS cluster using the standalone Helm chart.
It all gets deployed and my Node.js server connects to Neo4j correctly.
However queries throw the Neo4jError: Unknown function 'apoc.convert.fromJsonMap' error, so apoc is clearly missing.
I followed the procedure described here https://neo4j.com/docs/operations-manual/current/kubernetes/configuration/#operations-installing-plugins and my Values are here below.
The only difference I find is that in the guide apoc core is actually enabled afterwards by upgrading the helm chart, while I'm installing it with the option enabled already.
Looking at https://neo4j.com/docs/apoc/current/config/ I saw
As of Neo4j v.5.0, APOC config settings are no longer supported in the neo4j.conf file. Please move all apoc.* settings to apoc.conf. It is also possible to set the config settings using environment variables.
so as neo4j-standalone is using version 4.4.16 I moved the apoc configurations from apoc.config to neo4.config but still apoc procedures are not found by the queries.
Is there something I'm missing out to configure in order to enable apoc?
Thank you very much.
neo4j-db:
# neo4j-standalone:
nameOverride: "neo4j"
fullnameOverride: 'neo4j'
neo4j:
# Name of your cluster
name: "fixit-neo4j" # this will be the label: app: value for the service selector
password: "password"
##
passwordFromSecret: ""
passwordFromSecretLookup: false
edition: "community"
acceptLicenseAgreement: "yes"
offlineMaintenanceModeEnabled: false
resources:
cpu: "1000m"
memory: "2Gi"
volumes:
data:
mode: 'volumeClaimTemplate'
volumeClaimTemplate:
accessModes:
- ReadWriteOnce
storageClassName: neo4j-sc-data
resources:
requests:
storage: 4Gi
backups:
mode: 'share' # share an existing volume (e.g. the data volume)
share:
name: 'logs'
logs:
mode: 'volumeClaimTemplate'
volumeClaimTemplate:
accessModes:
- ReadWriteOnce
storageClassName: neo4j-sc-logs
resources:
requests:
storage: 4Gi
services:
# A ClusterIP service with the same name as the Helm Release name should be used for Neo4j Driver connections originating inside the
# Kubernetes cluster.
default:
# Annotations for the K8s Service object
annotations: { }
# A LoadBalancer Service for external Neo4j driver applications and Neo4j Browser
neo4j:
### this would create cluster-neo4j svc
enabled: false
# env:
# NEO4J_PLUGINS: '["graph-data-science"]'
config:
server.bolt.enabled : "true"
server.bolt.tls_level: "REQUIRED"
server.bolt.listen_address: "0.0.0.0:7687"
dbms.ssl.policy.bolt.client_auth: "NONE"
dbms.ssl.policy.bolt.enabled: "true"
server.directories.plugins: "/var/lib/neo4j/labs"
dbms.security.procedures.unrestricted: "apoc.*"
server.config.strict_validation.enabled: "false"
dbms.security.procedures.allowlist: "gds.*,apoc.*"
apoc_config:
apoc.trigger.enabled: "true"
apoc.jdbc.neo4j.url: "jdbc:foo:bar"
apoc.import.file.enabled: "true"
startupProbe:
failureThreshold: 1000
periodSeconds: 50
ssl:
# setting per "connector" matching neo4j config
bolt:
privateKey:
secretName: tls-secret
subPath: tls.key
publicCertificate:
secretName: tls-secret
subPath: tls.crt
trustedCerts:
sources: [ ]
revokedCerts:
sources: [ ]
OK after a bit of looking at quite a few issues on the same subject, I found that some solutions for this problem was to add dbms.directories.plugins: "/var/lib/neo4j/labs" and dbms.config.strict_validation: "false" in the config section which, as I understand it, mirrors these settings both for server and dbms. It indeed worked, but it's weird that in the official guide it's not mentioned. I mean, these mirrored settings make sense, tell both the server and the dbms where to look for plugins, but still it should be mentioned. I see so many post about this, which means the documentation is not clear enough. It's easy to take things for granted and in fact because this mirrored plugin location both for the server AND dbms need is just not stated anywhere in the docs, I as many others thought that dbms was already configured with the same location as server.directories.plugins: "/var/lib/neo4j/labs" ( which the docs say to configure ) and haven't added it, but hey.. ain't nobody's perfect I guess. Hope they change the docs then for future devs' sake, but meanwhile this answer could be helpful.
So the correct configuration is
env:
NEO4J_PLUGINS: '["graph-data-science"]'
config:
server.bolt.enabled: 'true'
server.bolt.tls_level: 'REQUIRED'
server.bolt.listen_address: '0.0.0.0:7687'
dbms.ssl.policy.bolt.client_auth: 'NONE'
dbms.ssl.policy.bolt.enabled: 'true'
## apoc
server.directories.plugins: '/var/lib/neo4j/labs'
server.config.strict_validation.enabled: 'false'
dbms.security.procedures.unrestricted: 'apoc.*'
dbms.security.procedures.allowlist: 'gds.*,apoc.*'
### additional needed dbms config mirroring server config
dbms.directories.plugins: "/var/lib/neo4j/labs"
dbms.config.strict_validation: "false"
apoc_config:
apoc.trigger.enabled: "true"
apoc.jdbc.neo4j.url: "jdbc:foo:bar"
apoc.import.file.enabled: "true"
It seems the docs are missing installing the APOC plugin. Change the following line to include APOC as well:
NEO4J_PLUGINS: '["graph-data-science", "apoc"]'
and you should be good
I am new to Prometheus, and I have been trying to set up blackbox exporter for monitoring my server by the http_post_2xx module but not http_2xx module. However, given so much researching on the internet, I still do not figure out.
Here is the background of my situation: I used to monitor my website available or not by Postman. After sending a post request, I should be able to receive a signal indicating the status is 200 OK or could not get any response, manually. This is ineffective and unresponsible as I should not be noticed an error from my website-visitor but not myself. Therefore, I turn to Prometheus.
Blackbox exporter seems like my solution. I build blackbox exporter on my server, and the configure file is like this:
modules:
http_post_2xx:
prober: http
timeout: 5s
http:
method: POST
headers:
Content-Type: application/json
body: '{text: "hi"}'
I configure the prometheus.yml in this way:
- job_name: 'blackbox'
metrics_path: /probe
params:
module: [http_post_2xx]
static_configs:
- targets:
- 10.0.100.130:2001
- 10.0.100.130:2002 # The IP address I want to monitor
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 10.0.100.130:9115 # Turn on this port for sending metrics
The dashboard I apply is 5345, but I got something like this:
enter image description here
I do not know why the HTTP Status Code is N/A or No, but the status from Postman is 200 OK. Is there anything wrong of my configuration?
I am using ansible to gather information from remote nodes and will then use this information to update relevant RPMs.
The issue I am having is collection version number of various applications and writing them to a file.
Playbook:
---
- name: Check Application Versions
hosts: kubernetes
tasks:
- name: Check K8S version.
shell: kubectl --version
register: k8s_version
- debug: msg="{{ k8s_version.stdout }}"
Inventory file:
[kubernetes]
172.29.219.102
172.29.219.105
172.29.219.104
172.29.219.103
Output:
TASK [debug] *******************************************************************
ok: [172.29.219.102] => {
"msg": "Kubernetes v1.4.0"
}
ok: [172.29.219.103] => {
"msg": "Kubernetes v1.4.0"
}
ok: [172.29.219.105] => {
"msg": "Kubernetes v1.4.0"
}
ok: [172.29.219.104] => {
"msg": "Kubernetes v1.4.0"
}
The above portion is simple and works. Now I want to write the output to file.
Now im trying to write this information to a file.I want something like:
Kubernetes v1.4.0
Kubernetes v1.4.0
Kubernetes v1.4.0
Kubernetes v1.4.0
So I added the below line:
- local_action: copy content={{ k8s_version.stdout_lines }} dest=/tmp/test
My /tmp/test looks like :
# cat /tmp/test
["Kubernetes v1.4.0"]
There is only one value here.
I tried to do something different then.
- local_action: lineinfile dest=/tmp/foo line="{{ k8s_version.stdout }}" insertafter=EOF
This resulted in:
# cat /tmp/foo
Kubernetes v1.4.0
Im trying to figure out why I only see one value whereas I should see the versions of every node in my inventory file. What am I doing wrong?
What am I doing wrong ?
lineinfile module does not perform the action "add a line to a file", instead it ensures a given line is present in the file. If all your target nodes have the same version, it won't add the same line multiple times.
On the other hand, copy module was overwriting the file.
If you need to register values for all hosts, you can for example create a template which will loop over hosts in the kubernetes group:
- copy:
content: "{% for host in groups.kubernetes %}{{ hostvars[host].k8s_version }}\n{% endfor %}"
dest: /tmp/test
delegate_to: localhost
run_once: true
Another way would be to extract the values with map from hostvars, but given you want the values from kubernetes host group only, I'm not sure it would be prettier. And having a for in the template allows you to easily add host names.
According to this post
Ansible register result of multiple commands
your desired variable is in k8s_version.results To access it you need to work with a template where you just iterate over it:
- local_action: template src=my_nodes.j2 dest=/tmp/test
And the template templates/my_nodes.j2 :
{% for res in k8s_version.results %}
{{ res.stdout }}
{% endfor %}
The complete playbook would then be:
---
- name: Check Application Versions
hosts: kubernetes
tasks:
- name: Check K8S version.
shell: kubectl --version
register: k8s_version
- local_action: template src=my_nodes.j2 dest=/tmp/test
I'm new on Prometheus and i'm trying to monitor some extend OID I created with the snmp_exporter, but it doesn't work as expected.
My script just does an "echo $VALUE" (value is an integer or a string).
I have this snmpd.conf :
extend value-return-test /usr/local/bin/script.sh
I generated his OID :
snmpwalk -v2c -c public localhost NET-SNMP-EXTEND-MIB::nsExtendResult.\"value-return-test\" -On
Now I'm able to get all the snmp extend link to my configuration :
snmpwalk -v2c -c public localhost NET-SNMP-EXTEND-MIB::nsExtendObjects |grep value-return-test
Now, here is my prometheus configuration prometheus.yml :
global:
scrape_interval: 5s
- job_name: 'snmp'
metrics_path: /snmp
params:
module: [tests]
static_configs:
- targets:
- 127.0.0.1 # SNMP device - add your IPs here
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 127.0.0.1:9116 # SNMP exporter.
and my snmp.yaml :
tests:
walk:
- 1.3.6.1.4.1.8072.1.3.2.4.1.2.23.109.97.105.108.45.113.117.101.117.101.45.115.101.110.100.105.110.103.45.114.97.116.101.1
- 1.3.6.1.4.1.8072.1.3.2.4.1.2.23.109.97.105.108.45.113.117.101.117.101.45.115.101.110.100.105.110.103.45.114.97.116.101.2
metrics:
- name: snmp_test1
oid: 1.3.6.1.4.1.8072.1.3.2.4.1.2.23.109.97.105.108.45.113.117.101.117.101.45.115.101.110.100.105.110.103.45.114.97.116.101.1
type: DisplayString
indexes:
- labelname: ifIndex
type: Integer32
- name: snmp_test2
oid: 1.3.6.1.4.1.8072.1.3.2.4.1.2.23.109.97.105.108.45.113.117.101.117.101.45.115.101.110.100.105.110.103.45.114.97.116.101.2
type: DisplayString
indexes:
- labelname: ifIndex
type: Integer32
With that configuration I'm not able to get my value on the page http://localhost:9116/snmp?target=127.0.0.1&module=tests :
# HELP snmp_scrape_duration_seconds Total SNMP time scrape took (walk and processing).
# TYPE snmp_scrape_duration_seconds gauge
snmp_scrape_duration_seconds 0.004676028
# HELP snmp_scrape_pdus_returned PDUs returned from walk.
# TYPE snmp_scrape_pdus_returned gauge
snmp_scrape_pdus_returned 0
# HELP snmp_scrape_walk_duration_seconds Time SNMP walk/bulkwalk took.
# TYPE snmp_scrape_walk_duration_seconds gauge
snmp_scrape_walk_duration_seconds 0.004477656
However if I put my configuration into an other block like the if_mib, I'm able to get the values BUT they are put in the wrong place :
As you can see I got the value "1" instead of "6".
I also tried the snmp exporter generator but i'm not able to build it :
$ go build
# github.com/prometheus/snmp_exporter/generator
./net_snmp.go:6:38: fatal error: net-snmp/net-snmp-config.h: No such file or directory
compilation terminated.
Thanks for your help
If you are able to change snmpd.conf that implies that you have enough control over the machine to run the node exporter. I'd suggest using the textfile collector of the node exporter to expose this data, rather than spending time figuring out the intricacies of how SNMP and MIBs work.
In general you should prefer the Node/WMI exporters where possible over using SNMP.
Using the get parameter instead of walk worked for me.
tests:
get:
- 1.3.6.1.4.1.8072.1.3.2.4.1.2.23.109.97.105.108.45.113.117.101.117.101.45.115.101.110.100.105.110.103.45.114.97.116.101.1
- 1.3.6.1.4.1.8072.1.3.2.4.1.2.23.109.97.105.108.45.113.117.101.117.101.45.115.101.110.100.105.110.103.45.114.97.116.101.2
metrics:
- name: snmp_test1
oid: 1.3.6.1.4.1.8072.1.3.2.4.1.2.23.109.97.105.108.45.113.117.101.117.101.45.115.101.110.100.105.110.103.45.114.97.116.101.1
type: DisplayString
indexes:
- labelname: ifIndex
type: Integer32
- name: snmp_test2
oid: 1.3.6.1.4.1.8072.1.3.2.4.1.2.23.109.97.105.108.45.113.117.101.117.101.45.115.101.110.100.105.110.103.45.114.97.116.101.2
type: DisplayString
indexes:
- labelname: ifIndex
type: Integer32
I am using Ansible with "docker_container" to deploy a web app to various environments. When the target host is a non-production server I set the "links" option with a variable, e.g.:
links: "{{ var_db_link }}"
.. and this works great ... when var_db_link is actually set.
Problem: I need to be able to leave this unset when deploying to a production host because in that case the DB is never going to be a linked container. I was expecting that if was not set that Ansible would ignore the "link" option and not try to use it. Instead it uses the unset value which produced the error: FAILED! => {"changed": false, "failed": true, "msg": "Error creating container: 500 Server Error: Internal Server Error (\"{\"message\":\"Could not get container for \"}\")"}
Question: Is this even possible? (can Ansible be told to not try to use an option when it has no value set) .. or .. should I (unfortunately) create separate roles / playbooks for each situation (i.e. with "links" option set and without it.
This line in the referenced commit gives a clue as to the solution:
Instead of trying to set it to null, just set it to an empty list:
links: "{{ [] if var_db_link == '' else var_db_link }}"
The behavior you want looks like what is described in Omitting Parameters.
You should do :
links: "{{ var_db_link | default(omit) }}"
You will need Ansible 1.8 or above.