Automate Grafana custom dashboards in Rancher Monitoring - monitoring

We are using Rancher and Racher tools for monitoring. i.e. Prometheus and Grafana on Rancher.
Rancher Monitoring
I am looking for automation of custom dashboard import in the Grafana. I referred to the Grafana documentation for provisioning here
Also referred to the answer of the question here. - stackoverflow
The automation needs to be done via updating dashboardProviders in values.yml or enabling sidecar.
Refering Grafana helm chart here values.yml
But this does not seem to be working for Grafana in Rancher Monitoring. Please refer to Rancher Grafana chart here -
Rancher Monitoring Grafana values.yml. The Rancher version does not have dashboardProviders or sideCar.
My questions are -
Is there any way to add dashboardProviders Or sidecar in Rancher Grafana to automate the dashboards?
Is there any other way, to automate the deployment of the dashboards of Grafana in Rancher.

I had the same issue.
After investigation of grafana-helm-charts it seams that you should add two additional blocks.
First is grafana.dashboardProviders and second grafana.dashboards eg:
grafana:
enabled: true
plugins:
- grafana-worldmap-panel
- grafana-piechart-panel
dashboardProviders:
external.yaml:
apiVersion: 1
providers:
- name: 'external'
orgId: 1
folder: ''
type: file
disableDeletion: false
editable: false
options:
path: /var/lib/grafana/dashboards/external
dashboards:
external:
nginx-ingress-prod:
gnetId: 9614
datasource: prod_cluster
The external word in above example is a name of provider and need to match to this one used in grafana.dashboard section.
Also the path in grafana.dashboardProviders.options.path should contains the provider name at the end as it is used by download_dashboards.sh script that download the json body of the dashboard.
Another way of adding own dashboard is by using config map with proper label like it is described in create-persistent-grafana-dashboard but for me this way did not work until I added the dashboard-provider: external label beside grafana_dashboard: "1" eg:
---
apiVersion: v1
kind: ConfigMap
metadata:
labels:
grafana_dashboard: "1"
dashboard-provider: external
name: nginx-ingress
namespace: cattle-dashboards
data:
nginx-ingress.json: |-
{...}

For non-regular and manual generation of the Grafana dashboards from the Prometheus metrics sample (i.e Micrometer format sample) you can use this service:
http://eljah.tatar/micrometer2grafana/

Related

Openshift: any deployment resulted in Application is not available

Fist time deploying to OpenShift (actually minishift in my Windows 10 Pro). Any sample application I deploied successfully resulted in:
From Web Console I see a weird message "Build #1 is pending" although I saw it was successfully from PowerShell
I found someone fixing similiar issue changing to 0.0.0.0 (enter link description here) but I give a try and it isn't the solution in my case.
Here are the full logs and how I am deploying
PS C:\to_learn\docker-compose-to-minishift\first-try> oc new-app https://github.com/openshift/nodejs-ex warning: Cannot check if git requires authentication.
--> Found image 93de123 (16 months old) in image stream "openshift/nodejs" under tag "10" for "nodejs"
Node.js 10.12.0
---------------
Node.js available as docker container is a base platform for building and running various Node.js applications and frameworks. Node.js is a platform built on Chrome's JavaScript runtime for easily building fast, scalable network applications. Node.js uses an event-driven, non-blocking I/O model that makes it lightweight and efficient, perfect for data-intensive real-time applications that run across distributed devices.
Tags: builder, nodejs, nodejs-10.12.0
* The source repository appears to match: nodejs
* A source build using source code from https://github.com/openshift/nodejs-ex will be created
* The resulting image will be pushed to image stream tag "nodejs-ex:latest"
* Use 'start-build' to trigger a new build
* WARNING: this source repository may require credentials.
Create a secret with your git credentials and use 'set build-secret' to assign it to the build config.
* This image will be deployed in deployment config "nodejs-ex"
* Port 8080/tcp will be load balanced by service "nodejs-ex"
* Other containers can access this service through the hostname "nodejs-ex"
--> Creating resources ...
imagestream.image.openshift.io "nodejs-ex" created
buildconfig.build.openshift.io "nodejs-ex" created
deploymentconfig.apps.openshift.io "nodejs-ex" created
service "nodejs-ex" created
--> Success
Build scheduled, use 'oc logs -f bc/nodejs-ex' to track its progress.
Application is not exposed. You can expose services to the outside world by executing one or more of the commands below:
'oc expose svc/nodejs-ex'
Run 'oc status' to view your app.
PS C:\to_learn\docker-compose-to-minishift\first-try> oc get bc/nodejs-ex -o yaml apiVersion: build.openshift.io/v1
kind: BuildConfig
metadata:
annotations:
openshift.io/generated-by: OpenShiftNewApp
creationTimestamp: 2020-02-20T20:10:38Z
labels:
app: nodejs-ex
name: nodejs-ex
namespace: samplepipeline
resourceVersion: "1123211"
selfLink: /apis/build.openshift.io/v1/namespaces/samplepipeline/buildconfigs/nodejs-ex
uid: 1003675e-541d-11ea-9577-080027aefe4e
spec:
failedBuildsHistoryLimit: 5
nodeSelector: null
output:
to:
kind: ImageStreamTag
name: nodejs-ex:latest
postCommit: {}
resources: {}
runPolicy: Serial
source:
git:
uri: https://github.com/openshift/nodejs-ex
type: Git
strategy:
sourceStrategy:
from:
kind: ImageStreamTag
name: nodejs:10
namespace: openshift
type: Source
successfulBuildsHistoryLimit: 5
triggers:
- github:
secret: c3FoC0RRfTy_76WEOTNg
type: GitHub
- generic:
secret: vlKqJQ3ZBxfP4HWce_Oz
type: Generic
- type: ConfigChange
- imageChange:
lastTriggeredImageID: 172.30.1.1:5000/openshift/nodejs#sha256:3cc041334eef8d5853078a0190e46a2998a70ad98320db512968f1de0561705e
type: ImageChange
status:
lastVersion: 1

scdf 2.1 k8s config security context non root no fs writable

I need config scdf2 skipper , scdf and app pods to run without root and no write into filesystem pod .
i made changes into config yamls
data:
application.yaml: |-
spring:
cloud:
skipper:
server:
platform:
kubernetes:
accounts:
default:
namespace: default
deploymentServiceAccountName: scdf2-server-data-flow
securityContext:
runAsUser: 2000
allowPrivilegeEscalation: false
limits:
Colla
And scdf start runs with user "2000", (there is a problem with writeable local maven repo, fixed with a pvc nfs)...
But, the app pods always starts as root user, no 2000 users.
I've change skipper-config with securitycontext, .. any clues?
TX
What you set as deploymentServiceAccountName is one of the Kubernetes deployer properties that can be used for deploying streaming applications or launching task applications.
Looks like the above configuration is not applied to your SCDF or Skipper server configuration properties as they should at least get applied when deploying applications.
For the SCDF server and Skipper servers, in your SCDF/Skipper server deployment configurations, you need to explicitly set your serviceAccountName (not as deploymentServiceAccountName as its name suggests, the deploymentServiceAccountName is internally converted into the actual serviceAccountName for the respective stream/task apps when they get deployed).
We got it. We use it into skipp/scdf deploy, not in pods deploymente.
Your request:
Into scdf / skipper cfg deployment got:
spec:
containers:
- name: {{ template "scdf.fullname" . }}-server
image: {{ .Values.server.image }}:{{ .Values.server.version }}
imagePullPolicy: {{ .Values.server.imagePullPolicy }}
volumeMounts:
...
serviceAccountName: {{ template "scdf.serviceAccountName" . }}
Do you tell me to change config map scdf/skipper to task and streams? Another property into or before config about deployment
How is it relation about "serviceaccount" and user running process into pod?
How related serviceaccount with running process user "2000"
I cant understand.
Please help, it is very important to running without root and no use local filesystem from pod excepts "tmp" files

How to scrape Jenkins metrics using Prometheus Operator

I'm using Kube-prometheus with Prometheus-Operator to monitor my K8s cluster. I've deployed Jenkins on my cluster and want to start to get metrics here using ServiceMonitor.
I've installed the Prometheus plugin which exposes the metrics using /prometheus or by /metrics/API_KEY/metrics, this works fine if I create a new static job. However, if I want to use ServiceMonitor, it does not work.
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
k8s-app: jenkins
name: jenkins
namespace: monitoring
spec:
endpoints:
- interval: 30s
port: http
path: /metrics/y1H6G16T-DhqpHdW9XwHWnP9FWAXMMfy4XnXVnyoIOEV3-gPJZKN284OFUcVkPxL/metrics
selector:
matchLabels:
jenkins: main
I don't know about ServiceMonitor, but I monitor my Jenkins instance without any problem, using annotations on Jenkins' service :
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
prometheus.io/path: "/prometheus"
I'm using kube-prometheus-stack v12.8.0 (formerly known as the prometheus-operator helm chart).
To make prometheus-operator detect external serviceMonitors (like the one provided by Jenkins), you'll have to configure two things:
configure it to scan other namespaces:
serviceMonitorNamespaceSelector
matchLabels:
prometheus: please-scan-this-namespace-too
note: alternatively, you can leave it to {} so that all namespaces are scanned
configure it to also select the serviceMonitors detected in these other namespaces:
serviceMonitorSelector:
matchLabels:
release: prometheus-operator
note: even though the documentation states that if you leave serviceMonitorSelector to {}, it will select all serviceMonitors, it does not seem to work.
And finally, you'd still need to add these labels to 1) the namespaces and 2) serviceMonitors that you want prometheus to adopt.

GCP Kubernetes workload "Does not have minimum availability"

Background: I'm trying to set up a Bitcoin Core regtest pod on Google Cloud Platform. I borrowed some code from https://gist.github.com/zquestz/0007d1ede543478d44556280fdf238c9, editing it so that instead of using Bitcoin ABC (a different client implementation), it uses Bitcoin Core instead, and changed the RPC username and password to both be "test". I also added some command arguments for the docker-entrypoint.sh script to forward to bitcoind, the daemon for the nodes I am running. When attempting to deploy the following three YAML files, the dashboard in "workloads" shows bitcoin has not having minimum availability. Getting the pod to deploy correctly is important so I can send RPC commands to the Load Balancer. Attached below are my YAML files being used. I am not very familiar with Kubernetes, and I'm doing a research project on scalability which entails running RPC commands against this pod. Ask for relevant logs and I will provide them in seperate pastebins. Right now, I'm only running three machines on my cluster, as I'm am still setting this up. The zone is us-east1-d, machine type is n1-standard-2.
Question: Given these files below, what is causing GCP Kubernetes Engine to respond with "Does not have minimum availability", and how can this be fixed?
bitcoin-deployment.sh
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
namespace: default
labels:
service: bitcoin
name: bitcoin
spec:
strategy:
type: Recreate
replicas: 1
template:
metadata:
labels:
service: bitcoin
spec:
containers:
- env:
- name: BITCOIN_RPC_USER
valueFrom:
secretKeyRef:
name: test
key: test
- name: BITCOIN_RPC_PASSWORD
valueFrom:
secretKeyRef:
name: test
key: test
image: ruimarinho/bitcoin-core:0.17.0
name: bitcoin
ports:
- containerPort: 18443
protocol: TCP
volumeMounts:
- mountPath: /data
name: bitcoin-data
resources:
requests:
memory: "1.5Gi"
command: ["./entrypoint.sh"]
args: ["-server", "-daemon", "-regtest", "-rpcbind=127.0.0.1", "-rpcallowip=0.0.0.0/0", "-rpcport=18443", "-rpcuser=test", "-rpcpassport=test"]
restartPolicy: Always
volumes:
- name: bitcoin-data
gcePersistentDisk:
pdName: disk-bitcoincore-1
fsType: ext4
bitcoin-secrets.yml
apiVersion: v1
kind: Secret
metadata:
name: bitcoin
type: Opaque
data:
rpcuser: dGVzdAo=
rpcpass: dGVzdAo=
bitcoin-srv.yml
apiVersion: v1
kind: Service
metadata:
name: bitcoin
namespace: default
spec:
ports:
- port: 18443
targetPort: 18443
selector:
service: bitcoin
type: LoadBalancer
externalTrafficPolicy: Local
I have run into this issue several times. The solutions that I used:
Wait. Google Cloud does not have enough resource available in the Region/Zone that you are trying to launch into. In some cases this took an hour to an entire day.
Select a different Region/Zone.
An example was earlier this month. I could not launch new resources in us-west1-a. I think just switched to us-east4-c. Everything launched.
I really do not know why this happens under the covers with Google. I have personally experienced this problem three times in the last three months and I have seen this problem several times on StackOverflow. The real answer might be a simple is that Google Cloud is really started to grow faster than their infrastructure. This is a good thing for Google as I know that they are investing in major new reasources for the cloud. Personally, I really like working with their cloud.
There could be many reasons for this failure:
Insufficient resources
Liveliness probe failure
Readiness probe failure
I encountered this error within GKE.
The reason was the pod was not about to find the configmap due to name mismatch. So make sure all the resources are discoverable by the pod.
The error message you mentioned isn't directly pointing to a stockout; it's more of resources unavailable within the cluster. You can try again after adding another node to the cluster etc. Also, this troubleshooting guide suggests if your Nodes have enough resources but you still have Does not have minimum availability message, check if the Nodes have SchedulingDisabled or Cordoned status: in this case they don't accept new pods.
Please, check your logs https://console.cloud.google.com/logs you might be surprised that your app is been failing.
I faced with the same issue when my spring-boot application failed to start due to my spring-boot configuration mistake.
Also in the args you use:
args: ["-server", "-daemon", "-regtest", "-rpcbind=127.0.0.1", "-rpcallowip=0.0.0.0/0", "-rpcport=18443", "-rpcuser=test", "-rpcpassport=test"]
should it be "-rpcpassport" or "-rpcpassword" ?

Kubernetes scaling based on network utilization or requests per second

Is there any way to scale Kubernetes nodes based on network utilization and not based on memory or CPU?
Let's say for example you are sending thousands of requests to a couple of nodes behind a load balancer. The CPU is not struggling or the memory, but because there are thousands of requests per second you would need additional nodes to serve this. How can you do this in Google Cloud Kubernetes?
I have been researching around but I can't seem to find any references to this type of scaling, and I am guessing I am not the only one to come across this problem. So I am wondering if any of you knows of any best practice solutions.
I guess the ideal solution would be to have one pod per node receiving requests and creating more nodes based on more requests and scale up or down based on this.
This is possible and you have to use Prometheus Adaptor to configure custom rules to generate Custom Metrics.
This link has more details on how to setup prometheus, install adaptor and apply configuration with custom metrics..
I've implement this on my gke cluster using this custom metrics.
This the example of my HPA configuration :
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: hpa-name
namespace: your-namespace
annotations:
metric-config.external.prometheus-query.prometheus/interval: 30s
metric-config.external.prometheus-query.prometheus/prometheus-server: http://your-prometheus-server-ip
metric-config.external.prometheus-query.prometheus/istio-requests-total: |
sum(rate(istio_requests_total{reporter="destination", destination_workload="deployment-name", destination_service_namespace="your-namespace"}[2m]))
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: deployment-name
minReplicas: 1
maxReplicas: 10
metrics:
- type: External
external:
metric:
name: prometheus-query
selector:
matchLabels:
query-name: istio-requests-total
target:
type: AverageValue
averageValue: 7
I think HPA(Horizontal Pod Autoscaler) along with Cluster Autoscaler will do the magic.
Have a look at this - https://medium.com/google-cloud/kubernetes-autoscaling-with-istio-metrics-76442253a45a

Resources