Newrelic infra agent on AKS Node pool - azure-aks

I would like to install and configure newrelic infra agent on AKS node pool. I have explored links but couldn't find any helpful. Could anyone help to start working on this?

here is the helm chart that deploys the New Relic Infrastructure agent as a Daemonset.

I have got my solution via deamonset and configmap with nodeinstaller. Below links really helped me but not through terraform as AKS won't support custom script to automate via terraform.(Hi can I have a custom script to be executed in AKS node group?)
Reference links: https://medium.com/#patnaikshekhar/initialize-your-aks-nodes-with-daemonsets-679fa81fd20e
https://github.com/patnaikshekhar/AKSNodeInstaller
daemonset.yml
apiVersion: v1
kind: Namespace
metadata:
name: node-installer
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: installer
namespace: node-installer
spec:
selector:
matchLabels:
job: installer
template:
metadata:
labels:
job: installer
spec:
hostPID: true
restartPolicy: Always
containers:
- image: patnaikshekhar/node-installer:1.3
name: installer
securityContext:
privileged: true
volumeMounts:
- name: install-script
mountPath: /tmp
- name: host-mount
mountPath: /host
volumes:
- name: install-script
configMap:
name: sample-installer-config
- name: host-mount
hostPath:
path: /tmp/install
sampleconfigmap.yml
apiVersion: v1
kind: ConfigMap
metadata:
name: sample-installer-config
namespace: node-installer
data:
install.sh: |
#!/bin/bash
# install newrelic-infra
echo "license_key: #{NEW_RELIC_LICENSE_KEY}#" | sudo tee -a /etc/newrelic-infra.yml
echo "enabled: #{NEW_RELIC_INFRA_AGENT_ENABLED}#" | sudo tee -a /etc/newrelic-infra.yml
curl -s https://download.newrelic.com/infrastructure_agent/gpg/newrelic-infra.gpg | sudo apt-key add -
printf "deb https://download.newrelic.com/infrastructure_agent/linux/apt bionic main" | sudo tee -a /etc/apt/sources.list.d/newrelic-infra.list
sudo apt-get update -y
sudo apt-get install newrelic-infra -y
sudo systemctl status newrelic-infra
echo "Newrelic infra agent installation is done"
# enable log forwarding
echo "logs:" | sudo tee -a /etc/newrelic-infra/logging.d/logs.yml
echo " - name: log-files-in-folder" | sudo tee -a /etc/newrelic-infra/logging.d/logs.yml
echo " file: /var/log/onefc/*/*.newrelic.log" | sudo tee -a /etc/newrelic-infra/logging.d/logs.yml
echo " max_line_kb: 256" | sudo tee -a /etc/newrelic-infra/logging.d/logs.yml
# trigger log forwarding
sudo newrelic-infra-ctl

Related

GitHub Action creating Docker Host Context

Here is my attempt at creating a Docker Host Context via GitHub Actions:
name: CICD
on:
push:
branches:
- main
- staging
workflow_dispatch:
jobs:
build_and_deploy_monitoring:
concurrency: monitoring
runs-on: [self-hosted, linux, X64]
steps:
- uses: actions/checkout#v2
- name: Save secrets to mon.env files
run: |
echo "DATA_SOURCE_NAME=${{ secrets.DB_DATASOURCE }}" >> mon.env
echo "GF_SECURITY_ADMIN_USER=${{ secrets.GF_ADMIN_USER }}" >> mon.env
echo "GF_SECURITY_ADMIN_PASSWORD=${{ secrets.GF_ADMIN_PASS }}" >> mon.env
echo "DISCORD_TOKEN=${{ secrets.DISCORD_TOKEN }}" >> mon.env
echo "PROMCORD_PREFIX=promcord_" >> mon.env
echo "DB_CONNECTION_STRING=${{ secrets.DBC_STRING }}" >> mon.env
# - name: Setup SSH stuff
# run: |
# sudo mkdir -p ~/.ssh/
# sudo echo "${{ secrets.SSH_KEY }}" >> ~/.ssh/tempest
# sudo chmod 0400 ~/.ssh/tempest
# sudo echo "${{ secrets.KNOWN_HOSTS }}" >> ~/.ssh/known_hosts
# sudo echo -e "Host ${{ secrets.SSH_HOST }}\n\tHostName ${{ secrets.SSH_HOST }}\n\tUser ${{ secrets.SSH_USER }}\n\tIdentityFile ~/.ssh/tempest" >> ~/.ssh/config
- name: Install docker-compose
run: sudo pip install docker-compose
- name: Create context for docker host
run: docker context create remote --docker
- name: Set default context for docker
run: docker context use remote
- name: Always build the monitoring stack
run: COMPOSE_PARAMIKO_SSH=1 COMPOSE_IGNORE_ORPHANS=1 docker-compose --context remote -f docker-compose-monitoring.yml up --build -d
The output is:
0s
Run docker context create remote --docker
docker context create remote --docker
shell: /usr/bin/bash -e {0}
/actions-runner/actions-runner/_work/_temp/05fc146a-237e-4a92-b27d-796451184c0c.sh: line 1: docker: command not found
Error: Process completed with exit code 127.
I am trying to create a workflow that is able to create a docker compose for some monitoring tools. I have set up GitHub runners to do this and it has been successful for everything until the docker host section. The error is given above. Can I get some help as I am completely stumped?

Run shell script or custom data on AKS node pool via terraform

I would like to run shell script or custom data on AKS node pool via terraform script. I ran shell script via custom data on VMSS (Virtual machine scale set) through terraform.Similarly I would like to run the same shell script via AKS node pool. I searched many link and ways but couldn't get any solution for this. Is there any way or recommended this? Appreciate your help.I have been trying for this solution since a month but couldn't get proper solution.
I have got my solution via deamonset and configmap with nodeinstaller.
Below links really helped me but not through terraform as AKS won't support custom script to automate via terraform.(Hi can I have a custom script to be executed in AKS node group?)
Reference links: https://medium.com/#patnaikshekhar/initialize-your-aks-nodes-with-daemonsets-679fa81fd20e
https://github.com/patnaikshekhar/AKSNodeInstaller
daemonset.yml
apiVersion: v1
kind: Namespace
metadata:
name: node-installer
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: installer
namespace: node-installer
spec:
selector:
matchLabels:
job: installer
template:
metadata:
labels:
job: installer
spec:
hostPID: true
restartPolicy: Always
containers:
- image: patnaikshekhar/node-installer:1.3
name: installer
securityContext:
privileged: true
volumeMounts:
- name: install-script
mountPath: /tmp
- name: host-mount
mountPath: /host
volumes:
- name: install-script
configMap:
name: sample-installer-config
- name: host-mount
hostPath:
path: /tmp/install
sampleconfigmap.yml
apiVersion: v1
kind: ConfigMap
metadata:
name: sample-installer-config
namespace: node-installer
data:
install.sh: |
#!/bin/bash
# install newrelic-infra
echo "license_key: #{NEW_RELIC_LICENSE_KEY}#" | sudo tee -a /etc/newrelic-infra.yml
echo "enabled: #{NEW_RELIC_INFRA_AGENT_ENABLED}#" | sudo tee -a /etc/newrelic-infra.yml
curl -s https://download.newrelic.com/infrastructure_agent/gpg/newrelic-infra.gpg | sudo apt-key add -
printf "deb https://download.newrelic.com/infrastructure_agent/linux/apt bionic main" | sudo tee -a /etc/apt/sources.list.d/newrelic-infra.list
sudo apt-get update -y
sudo apt-get install newrelic-infra -y
sudo systemctl status newrelic-infra
echo "Newrelic infra agent installation is done"
# enable log forwarding
echo "logs:" | sudo tee -a /etc/newrelic-infra/logging.d/logs.yml
echo " - name: log-files-in-folder" | sudo tee -a /etc/newrelic-infra/logging.d/logs.yml
echo " file: /var/log/onefc/*/*.newrelic.log" | sudo tee -a /etc/newrelic-infra/logging.d/logs.yml
echo " max_line_kb: 256" | sudo tee -a /etc/newrelic-infra/logging.d/logs.yml
# trigger log forwarding
sudo newrelic-infra-ctl

Restarting Docker daemon on host node from within Kubernetes pod

Goal: Restart Docker daemon on GKE
Issue: Cannot connect to bus
Background
While on Google Kubernetes Engine (GKE), I am attempting to restart the host node's Docker daemon in order to enable the Nvidia GPU Telemetry for Kubernetes on nodes that have a GPU. I have correctly isolated just the GPU nodes properly, and I am able to run every command on the host node by having a DaemonSet run an initContainer following the Automatically bootstrapping Kubernetes Engine nodes with daemonSets guide.
During runtime, however, the following pod does not allow me to connect to the Docker daemon:
apiVersion: v1
kind: Pod
metadata:
name: debug
namespace: gpu-monitoring
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: cloud.google.com/gke-accelerator
operator: Exists
containers:
- command:
- sleep
- "86400"
env:
- name: ROOT_MOUNT_DIR
value: /root
image: docker.io/ubuntu:18.04
imagePullPolicy: IfNotPresent
name: node-initializer
securityContext:
privileged: true
volumeMounts:
- mountPath: /root
name: root
- mountPath: /scripts
name: entrypoint
- mountPath: /run
name: run
volumes:
- hostPath:
path: /
type: ""
name: root
- configMap:
defaultMode: 484
name: nvidia-container-toolkit-installer-entrypoint
name: entrypoint
- hostPath:
path: /run
type: ""
name: run
The user is 0, while the users present in /run/user are 1003, and 1002.
In order to verify connectivity and interactions with the root Kubernetes (k8s) node, the following is run:
root#debug:/# chroot "${ROOT_MOUNT_DIR}" ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 226124 9816 ? Ss Oct13 0:27 /sbin/init
The Issues
Both images
When attempting to interact with the underlying Kubernetes (k8s) node to restart the Docker daemon, I get the following:
root#debug:/# ls /run/dbus
system_bus_socket
root#debug:/# ROOT_MOUNT_DIR="${ROOT_MOUNT_DIR:-/root}"
root#debug:/# chroot "${ROOT_MOUNT_DIR}" systemctl status docker
Failed to connect to bus: No data available
When attempting to start dbus on the host node:
root#debug:/# export XDG_RUNTIME_DIR=/run/user/`id -u`
root#debug:/# export DBUS_SESSION_BUS_ADDRESS="unix:path=${XDG_RUNTIME_DIR}/bus"
root#debug:/# chroot "${ROOT_MOUNT_DIR}" /etc/init.d/dbus start
Failed to connect to bus: No data available
Image: solita/ubuntu-systemd
When trying to run commands using the same k8s pod config, except inside the solita/ubuntu-systemd image, the following are the results:
root#debug:/# /etc/init.d/dbus start
[....] Starting dbus (via systemctl): dbus.serviceRunning in chroot, ignoring request: start
. ok
Configuration Variations Attempted
I have tried to change the following, in pretty much every combination, to no avail:
Image to docker.io/solita/ubuntu-systemd:18.04
Add shareProcessNamespace: true
Add the following mounts: /dev, /proc, /sys
Restrict /run to /run/dbus, and /run/systemd
So the answer is a weird workaround that was not fully expected. In order to restart the Docker daemon, first punch a firewall hole for pods to connect to the host node. Next, use gcloud compute ssh, and ssh into the node and restart via a remote ssh command:
apt-get update
apt-get install -y \
apt-transport-https \
curl \
gnupg \
lsb-release \
ssh
export CLOUD_SDK_REPO="cloud-sdk-$(lsb_release -c -s)"
echo "deb https://packages.cloud.google.com/apt $CLOUD_SDK_REPO main" | tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
apt-get update
apt-get install -y google-cloud-sdk
CLUSTER_NAME="$(curl -sS http://metadata/computeMetadata/v1/instance/attributes/cluster-name -H "Metadata-Flavor: Google")"
NODE_NAME="$(curl -sS http://metadata.google.internal/computeMetadata/v1/instance/name -H 'Metadata-Flavor: Google')"
FULL_ZONE="$(curl -sS http://metadata.google.internal/computeMetadata/v1/instance/zone -H 'Metadata-Flavor: Google' | awk -F "/" '{print $4}')"
MAIN_ZONE=$(echo $FULL_ZONE | sed 's/\(.*\)-.*/\1/')
gcloud compute ssh \
--internal-ip $NODE_NAME \
--zone=$FULL_ZONE \
-- "sudo systemctl restart docker"

Kubernetes - statefulSet and volume permissions

I am trying to create statefulSet like below where I run init container apply permissions to volume data before I use it in the main container but I get permissions error as below
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgresql
spec:
serviceName: postgresql-headless
replicas: 1
selector:
matchLabels:
app: awx
template:
metadata:
name: postgresql
labels:
app: awx
spec:
securityContext:
fsGroup: 1001
serviceAccountName: awx
initContainers:
- name: init-chmod-data
image: docker.local/data/awx/bitnami/minideb/minideb:1.0
command:
- /bin/sh
- -cx
- |
echo "current user id: `id`"
mkdir -p /bitnami/postgresql/data
chmod 700 /bitnami/postgresql/data
find /bitnami/postgresql/data -mindepth 1 -maxdepth 1 -not -name ".snapshot" -not -name "lost+found" | \
xargs chown -R 1001:1001
securityContext:
runAsUser: 1001
volumeMounts:
- name: data
mountPath: /bitnami/postgresql/data
subPath: ""
containers:
- name: postgresql
image: docker.local/bitnami/postgresql:11.6.0-debian-10-r5
securityContext:
runAsUser: 1001
env:
- name: POSTGRESQL_PASSWORD
value: "p#ssw0rd"
volumeMounts:
- name: data
mountPath: /bitnami/postgresql/data
subPath: ""
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
storageClassName: cinder
When I run this spec, it fails on init container :
kubectl -n mynamespace logs postgresql-0 -c init-chmod-data
+ id
current user id: uid=1001(postgresql) gid=1001(postgresql) groups=1001(postgresql)
+ echo current user id: uid=1001(postgresql) gid=1001(postgresql) groups=1001(postgresql)
+ mkdir -p /bitnami/postgresql/data
+ chmod 700 /bitnami/postgresql/data
chmod: changing permissions of '/bitnami/postgresql/data': Operation not permitted
However when I run the image used in init container locally in docker, I am able to change these permissions:
sudo docker image ls | grep 1.0 | grep minideb
docker.local/data/awx/bitnami/minideb/minideb 1.0 698636b178a6 2 hours ago 53.7MB
sudo docker run -it --name minideb 698636b178a6
postgresql#248dcad0e738:/$ mkdir -p /bitnami/postgresql/data
postgresql#248dcad0e738:/$ chmod 700 /bitnami/postgresql/data
postgresql#248dcad0e738:/$
The minideb image has been modified like below because I can't run containers as root:
FROM docker.local/bitnami/minideb:stretch
USER 0
RUN groupadd --gid 1001 postgresql && useradd --uid 1001 --gid 1001 postgresql
RUN mkdir -p /bitnami/postgresql ; chown -R 1001:1001 /bitnami/postgresql
USER 1001
Any idea what I am doing wrong?
Thank you!
fixed after removing stale pvc

Jenkins pipeline exception - Docker not found

I have created GKE cluster and install Jenkins on that cluster. Now i am running pipeline, i have Jenkinsfile created which is used to build DockerImage but when i am running the pipeline, it throws exception that Docker not found
1) Created GKE Cluster
2) Installed Jenkins
3) Added Docker hub credentials
4) Added access key for gitlab
Jenkinsfile:
stage('Build Docker Image') {
when {
branch 'master'
}
steps {
script {
echo 'Before docker run'
sh 'docker --version'
app = docker.build("sarab321/test-pipeline")
echo 'docker run successfully'
}
}
}
Please see the exception below
apiVersion: "v1"
kind: "Pod"
metadata:
annotations: {}
labels:
jenkins: "slave"
jenkins/cd-jenkins-slave: "true"
name: "default-d7qdb"
spec:
containers:
- args:
- "59c323186a77b4be015362977ec64e4838001b6d77c0f372bec7cda7cf93f9b2"
- "default-d7qdb"
env:
- name: "JENKINS_SECRET"
value: "59c323186a77b4be015362977ec64e4838001b6d77c0f372bec7cda7cf93f9b2"
- name: "JENKINS_TUNNEL"
value: "cd-jenkins-agent:50000"
- name: "JENKINS_AGENT_NAME"
value: "default-d7qdb"
- name: "JENKINS_NAME"
value: "default-d7qdb"
- name: "JENKINS_URL"
value: "http://cd-jenkins.default.svc.cluster.local:8080"
image: "jenkins/jnlp-slave:3.27-1"
imagePullPolicy: "IfNotPresent"
name: "jnlp"
resources:
limits:
memory: "512Mi"
cpu: "1"
requests:
memory: "256Mi"
cpu: "500m"
securityContext:
privileged: false
tty: false
volumeMounts:
- mountPath: "/var/run/docker.sock"
name: "volume-0"
readOnly: false
- mountPath: "/home/jenkins"
name: "workspace-volume"
readOnly: false
workingDir: "/home/jenkins"
nodeSelector: {}
restartPolicy: "Never"
serviceAccount: "default"
volumes:
- hostPath:
path: "/var/run/docker.sock"
name: "volume-0"
- emptyDir:
medium: ""
name: "workspace-volume"
docker version
/home/jenkins/workspace/TestPipeline_master#tmp/durable-5dd73d2b/script.sh: 1: /home/jenkins/workspace/TestPipeline_master#tmp/durable-5dd73d2b/script.sh: docker: not found
Doesn't look like docker is installed on your build agent, that's inside the container using the "jenkins/jnlp-slave:3.27-1" image. I have examples of how I've installed the docker CLI in the jenkins LTS image at: https://github.com/sudo-bmitch/jenkins-docker
That image includes the following steps to make the docker integration portable:
installs the Docker CLI
installs gosu (needed since the entrypoint will start as root)
configures the jenkins user to be a member of the docker group
includes an entrypoint that adjusts to docker GID to match that of the /var/run/docker.sock GID
The actual docker CLI install is performed in the following lines:
RUN curl -fsSL https://download.docker.com/linux/debian/gpg | apt-key add - \
&& add-apt-repository \
"deb [arch=amd64] https://download.docker.com/linux/debian \
$(lsb_release -cs) \
stable" \
&& apt-get update \
&& DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
docker-ce-cli${DOCKER_CLI_VERSION}
You can take the entrypoint.sh and Dockerfile, modify the base image (FROM) of the Dockerfile, and the original entrypoint script inside entrypoint.sh, to point to the jnlp-slave equivalents.

Resources