Best practice for JSON logging in Kubernetes - docker

I'm developing an app whose logs contain custom fields for metric purposes.
Therefore, we produce the logs in JSON format and send them to an Elasticsearch cluster.
We're currently working on migrating the app from a local Docker node to our organization's Kubernetes cluster.
Our cluster uses Fluentd as a DaemonSet, to output logs from all pods to our Elasticsearch cluster.
The setup is similar to this: https://medium.com/kubernetes-tutorials/cluster-level-logging-in-kubernetes-with-fluentd-e59aa2b6093a
I'm trying to figure out what's the best practice to send logs from our app. My two requirements are:
That the logs are formatted correctly in JSON format. I don't want them to be nested in the msg field of the persisted document.
That I can run kubectl logs -f <pod> and view the logs in readable text format.
Currently, if I don't do anything and let the DaemonSet send the logs, it'll fail both requirements.
The best solution I thought about is to ask the administrators of our Kubernetes cluster to replace the Fluentd logging with Fluentbit.
Then I can configure my deployment like this:
apiVersion: apps/v1
kind: Deployment
metadata:
name: example-app
labels:
app: example-app
annotations:
fluentbit.io/parser-example-app: json
fluentbit.io/exclude-send-logs: "true"
spec:
selector:
matchLabels:
app: example-app
template:
metadata:
labels:
app: example-app
spec:
containers:
- name: example-app
image: myapp:1.0.0
volumeMounts:
- name: app-logs
mountPath: "/var/log/app"
- name: tail-logs
image: busybox
args: [/bin/sh, -c, 'tail -f /var/log/example-app.log']
volumeMounts:
- name: app-logs
mountPath: "/var/log/app"
volumes:
- name: app-logs
emptyDir: {}
Then the logs are sent to the Elasticsearch in correct JSON format, and I can run kubectl logs -f example-app -c tail-logs to view them in a readable format.
Is this the best practice though? Am I missing a simpler solution?
Is there an alternative supported by Fluentd?
I'll be glad to here your opinion :)

There isn't really a good option here that isn't going to chew massive amounts of CPU. The closest things I can suggest other than the solution you mentioned above is inverting it where the main output stream is unformatted and you run Fluent* (usually Bit) are a sidecar on a secondary file stream. That's no better though.
Really most of us just make the output be in JSON format and on the rare occasions we need to manually poke at logs outside of the normal UI (Kibana, Grafana, whatever), we just deal with the annoyance.
You could also theoretically make your "human" format sufficiently machine parsable to allow for querying. The usual choice there is "logfmt", aka key=value pairs. So my log lines on logfmt-y services look like timestamp=2021-05-15T03:48:05.171973Z level=info event="some message" otherkey=1 foo="bar baz". That's simple enough to read by hand but also can be parsed efficiently.

Related

How can I set JNDI configuration in a Docker overrides.yaml file?

If I have a java configuration bean, saying:
package com.mycompany.app.configuration;
// whatever imports
public class MyConfiguration {
private String someConfigurationValue = "defaultValue";
// getters and setters etc
}
If I set that using jetty for local testing I can do so using a config.xml file in the following form:
<myConfiguration class="com.mycompany.app.configuration.MyConfiguration" context="SomeContextAttribute">
<someConfigurationValue>http://localhost:8080</someConfigurationValue>
</myConfiguration>
However in the deployed environment in which I need to test, I will need to use docker to set these configuration values, we use jboss.
Is there a way to directly set these JNDI values? I've been looking for examples for quite a while but cannot find any. This would be in the context of a yaml file which is used to configure a k8 cluster. Apologies for the psuedocode, I would post the real code but it's all proprietary so I can't.
What I have so far for the overrides.yaml snippet is of the form:
env:
'MyConfig.SomeContextAttribute':
class_name: 'com.mycompany.app.configuration.MyConfiguration'
someConfigurationValue: 'http://localhost:8080'
However this is a complete guess.
You can achieve it by using ConfigMap.
A ConfigMap is an API object used to store non-confidential data in key-value pairs. Pods can consume ConfigMaps as environment variables, command-line arguments, or as configuration files in a volume.
First what you need to create ConfigMap from your file using command as below:
kubectl create configmap <map-name> <data-source>
Where <map-name> is the name you want to assign to the ConfigMap and <data-source> is the directory, file, or literal value to draw the data from. You can read more about it here.
Here is an example:
Download the sample file:
wget https://kubernetes.io/examples/configmap/game.properties
You can check what is inside this file using cat command:
cat game.properties
You will see that there are some variables in this file:
enemies=aliens
lives=3
enemies.cheat=true
enemies.cheat.level=noGoodRotten
secret.code.passphrase=UUDDLRLRBABAS
secret.code.allowed=true
secret.code.lives=30r
Create the ConfigMap from this file:
kubectl create configmap game-config --from-file=game.properties
You should see output that ConfigMap has been created:
configmap/game-config created
You can display details of the ConfigMap using command below:
kubectl describe configmaps game-config
You will see output as below:
Name: game-config
Namespace: default
Labels: <none>
Annotations: <none>
Data
====
game.properties:
----
enemies=aliens
lives=3
enemies.cheat=true
enemies.cheat.level=noGoodRotten
secret.code.passphrase=UUDDLRLRBABAS
secret.code.allowed=true
secret.code.lives=30
You can also see how yaml of this ConfigMap will look using:
kubectl get configmaps game-config -o yaml
The output will be similar:
apiVersion: v1
data:
game.properties: |-
enemies=aliens
lives=3
enemies.cheat=true
enemies.cheat.level=noGoodRotten
secret.code.passphrase=UUDDLRLRBABAS
secret.code.allowed=true
secret.code.lives=30
kind: ConfigMap
metadata:
creationTimestamp: "2022-01-28T12:33:33Z"
name: game-config
namespace: default
resourceVersion: "2692045"
uid: 5eed4d9d-0d38-42af-bde2-5c7079a48518
Next goal is connecting ConfigMap to Pod. It could be added in yaml file of Podconfiguration.
As you can see under containersthere is envFrom section. As name is a name of ConfigMapwhich I created in previous step. You can read about envFrom here
apiVersion: v1
kind: Pod
metadata:
name: test-pod
spec:
containers:
- name: test-container
image: nginx
envFrom:
- configMapRef:
name: game-config
Create a Pod from yaml file using:
kubectl apply -f <name-of-your-file>.yaml
Final step is checking environment variables in this Pod using below command:
kubectl exec -it test-pod -- env
As you can see below, there are environment variables from simple file which I downloaded in the first step:
game.properties=enemies=aliens
lives=3
enemies.cheat=true
enemies.cheat.level=noGoodRotten
secret.code.passphrase=UUDDLRLRBABAS
secret.code.allowed=true
secret.code.lives=30
The way to do this is as follows:
If you are attempting to set a value that looks like this in terms of fully qualified name:
com.mycompany.app.configuration.MyConfiguration#someConfigurationValue
Then that will look like the following in a yaml file:
com_mycompany_app_configuration_MyConfiguration_someConfigurationValue: 'blahValue'
It really is that simple. It does need to be set as an environment variable in the yaml, but I'm not sure whether it needs to be under env: or if that's specific to us.
I don't think there's a way of setting something in YAML that in XML would be an attribute, however. I've tried figuring that part out, but I haven't been able to.

What is the proper image url to be entered in the yaml file for KNATIVE deployment

I am trying to complete the KNative tutorial for deploying this tutorial : https://knative.dev/docs/serving/samples/hello-world/helloworld-ruby/
I have a url upon completion however, the page is not reachable. I am getting 404 not found.
When I run: kubectl get all , I get the following:
NAME CONFIG NAME K8S SERVICE NAME GENERATION READY REASON ACTUAL REPLICAS DESIRED REPLICAS
revision.serving.knative.dev/helloworld-go-00001 helloworld-go helloworld-go-00001 1 True 0 0
revision.serving.knative.dev/sample-app-00001 sample-app 1 False ContainerMissing
revision.serving.knative.dev/sample-app-00002 sample-app 2 False ContainerMissing
Which leaves me to believe that there is something wrong with the image url specified in my yaml file.
My yaml looks like this:
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: sample-app
namespace: default
spec:
template:
spec:
containers:
- image: docker.io/adet4ever/sample-app
env:
- name: TARGET
value: "Ruby Sample v1"
I noticed that the only time a different app works is when I am ableto wget the image url like in the example here:https://docs.openshift.com/container-platform/4.1/serverless/getting-started-knative-services.html
I cannot wget this url and not sure why: docker.io/adet4ever/sample-app
I created a docker hub account and pushed the image. I dont know if I am missing anything else.
Thanks for helping as I have spent 2 days trying to fix this problem.

container labels in kubernetes

I am building my docker image with jenkins using:
docker build --build-arg VCS_REF=$GIT_COMMIT \
--build-arg BUILD_DATE=`date -u +"%Y-%m-%dT%H:%M:%SZ"` \
--build-arg BUILD_NUMBER=$BUILD_NUMBER -t $IMAGE_NAME\
I was using Docker but I am migrating to k8.
With docker I could access those labels via:
docker inspect --format "{{ index .Config.Labels \"$label\"}}" $container
How can I access those labels with Kubernetes ?
I am aware about adding those labels in .Metadata.labels of my yaml files but I don't like it that much because
- it links those information to the deployment and not the container itself
- can be modified anytime
...
kubectl describe pods
Thank you
Kubernetes doesn't expose that data. If it did, it would be part of the PodStatus API object (and its embedded ContainerStatus), which is one part of the Pod data that would get dumped out by kubectl get pod deployment-name-12345-abcde -o yaml.
You might consider encoding some of that data in the Docker image tag; for instance, if the CI system is building a tagged commit then use the source control tag name as the image tag, otherwise use a commit hash or sequence number. Another typical path is to use a deployment manager like Helm as the principal source of truth about deployments, and if you do that there can be a path from your CD system to Helm to Kubernetes that can pass along labels or annotations. You can also often set up software to know its own build date and source control commit ID at build time, and then expose that information via an informational-only API (like an HTTP GET /_version call or some such).
I'll add another option.
I would suggest reading about the Recommended Labels by K8S:
Key Description
app.kubernetes.io/name The name of the application
app.kubernetes.io/instance A unique name identifying the instance of an application
app.kubernetes.io/version The current version of the application (e.g., a semantic version, revision hash, etc.)
app.kubernetes.io/component The component within the architecture
app.kubernetes.io/part-of The name of a higher level application this one is part of
app.kubernetes.io/managed-by The tool being used to manage the operation of an application
So you can use the labels to describe a pod:
apiVersion: apps/v1
kind: Pod # Or via Deployment
metadata:
labels:
app.kubernetes.io/name: wordpress
app.kubernetes.io/instance: wordpress-abcxzy
app.kubernetes.io/version: "4.9.4"
app.kubernetes.io/managed-by: helm
app.kubernetes.io/component: server
app.kubernetes.io/part-of: wordpress
And use the downward api (which works in a similar way to reflection in programming languages).
There are two ways to expose Pod and Container fields to a running Container:
1 ) Environment variables.
2 ) Volume Files.
Below is an example for using volumes files:
apiVersion: v1
kind: Pod
metadata:
name: kubernetes-downwardapi-volume-example
labels:
version: 4.5.6
component: database
part-of: etl-engine
annotations:
build: two
builder: john-doe
spec:
containers:
- name: client-container
image: k8s.gcr.io/busybox
command: ["sh", "-c"]
args: # < ------ We're using the mounted volumes inside the container
- while true; do
if [[ -e /etc/podinfo/labels ]]; then
echo -en '\n\n'; cat /etc/podinfo/labels; fi;
if [[ -e /etc/podinfo/annotations ]]; then
echo -en '\n\n'; cat /etc/podinfo/annotations; fi;
sleep 5;
done;
volumeMounts:
- name: podinfo
mountPath: /etc/podinfo
volumes: # < -------- We're mounting in our example the pod's labels and annotations
- name: podinfo
downwardAPI:
items:
- path: "labels"
fieldRef:
fieldPath: metadata.labels
- path: "annotations"
fieldRef:
fieldPath: metadata.annotations
Notice that in the example we accessed the labels and annotations that were passed and mounted to the /etc/podinfo path.
Beside labels and annotations, the downward API exposed multiple additional options like:
The pod's IP address.
The pod's service account name.
The node's name and IP.
A Container's CPU limit , CPU request , memory limit, memory request.
See full list in here.
(*) A nice blog discussing the downward API.
(**) You can view all your pods labels with
$ kubectl get pods --show-labels
NAME ... LABELS
my-app-xxx-aaa pod-template-hash=...,run=my-app
my-app-xxx-bbb pod-template-hash=...,run=my-app
my-app-xxx-ccc pod-template-hash=...,run=my-app
fluentd-8ft5r app=fluentd,controller-revision-hash=...,pod-template-generation=2
fluentd-fl459 app=fluentd,controller-revision-hash=...,pod-template-generation=2
kibana-xyz-adty4f app=kibana,pod-template-hash=...
recurrent-tasks-executor-xaybyzr-13456 pod-template-hash=...,run=recurrent-tasks-executor
serviceproxy-1356yh6-2mkrw app=serviceproxy,pod-template-hash=...
Or viewing only specific label with $ kubectl get pods -L <label_name>.

GCP Kubernetes workload "Does not have minimum availability"

Background: I'm trying to set up a Bitcoin Core regtest pod on Google Cloud Platform. I borrowed some code from https://gist.github.com/zquestz/0007d1ede543478d44556280fdf238c9, editing it so that instead of using Bitcoin ABC (a different client implementation), it uses Bitcoin Core instead, and changed the RPC username and password to both be "test". I also added some command arguments for the docker-entrypoint.sh script to forward to bitcoind, the daemon for the nodes I am running. When attempting to deploy the following three YAML files, the dashboard in "workloads" shows bitcoin has not having minimum availability. Getting the pod to deploy correctly is important so I can send RPC commands to the Load Balancer. Attached below are my YAML files being used. I am not very familiar with Kubernetes, and I'm doing a research project on scalability which entails running RPC commands against this pod. Ask for relevant logs and I will provide them in seperate pastebins. Right now, I'm only running three machines on my cluster, as I'm am still setting this up. The zone is us-east1-d, machine type is n1-standard-2.
Question: Given these files below, what is causing GCP Kubernetes Engine to respond with "Does not have minimum availability", and how can this be fixed?
bitcoin-deployment.sh
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
namespace: default
labels:
service: bitcoin
name: bitcoin
spec:
strategy:
type: Recreate
replicas: 1
template:
metadata:
labels:
service: bitcoin
spec:
containers:
- env:
- name: BITCOIN_RPC_USER
valueFrom:
secretKeyRef:
name: test
key: test
- name: BITCOIN_RPC_PASSWORD
valueFrom:
secretKeyRef:
name: test
key: test
image: ruimarinho/bitcoin-core:0.17.0
name: bitcoin
ports:
- containerPort: 18443
protocol: TCP
volumeMounts:
- mountPath: /data
name: bitcoin-data
resources:
requests:
memory: "1.5Gi"
command: ["./entrypoint.sh"]
args: ["-server", "-daemon", "-regtest", "-rpcbind=127.0.0.1", "-rpcallowip=0.0.0.0/0", "-rpcport=18443", "-rpcuser=test", "-rpcpassport=test"]
restartPolicy: Always
volumes:
- name: bitcoin-data
gcePersistentDisk:
pdName: disk-bitcoincore-1
fsType: ext4
bitcoin-secrets.yml
apiVersion: v1
kind: Secret
metadata:
name: bitcoin
type: Opaque
data:
rpcuser: dGVzdAo=
rpcpass: dGVzdAo=
bitcoin-srv.yml
apiVersion: v1
kind: Service
metadata:
name: bitcoin
namespace: default
spec:
ports:
- port: 18443
targetPort: 18443
selector:
service: bitcoin
type: LoadBalancer
externalTrafficPolicy: Local
I have run into this issue several times. The solutions that I used:
Wait. Google Cloud does not have enough resource available in the Region/Zone that you are trying to launch into. In some cases this took an hour to an entire day.
Select a different Region/Zone.
An example was earlier this month. I could not launch new resources in us-west1-a. I think just switched to us-east4-c. Everything launched.
I really do not know why this happens under the covers with Google. I have personally experienced this problem three times in the last three months and I have seen this problem several times on StackOverflow. The real answer might be a simple is that Google Cloud is really started to grow faster than their infrastructure. This is a good thing for Google as I know that they are investing in major new reasources for the cloud. Personally, I really like working with their cloud.
There could be many reasons for this failure:
Insufficient resources
Liveliness probe failure
Readiness probe failure
I encountered this error within GKE.
The reason was the pod was not about to find the configmap due to name mismatch. So make sure all the resources are discoverable by the pod.
The error message you mentioned isn't directly pointing to a stockout; it's more of resources unavailable within the cluster. You can try again after adding another node to the cluster etc. Also, this troubleshooting guide suggests if your Nodes have enough resources but you still have Does not have minimum availability message, check if the Nodes have SchedulingDisabled or Cordoned status: in this case they don't accept new pods.
Please, check your logs https://console.cloud.google.com/logs you might be surprised that your app is been failing.
I faced with the same issue when my spring-boot application failed to start due to my spring-boot configuration mistake.
Also in the args you use:
args: ["-server", "-daemon", "-regtest", "-rpcbind=127.0.0.1", "-rpcallowip=0.0.0.0/0", "-rpcport=18443", "-rpcuser=test", "-rpcpassport=test"]
should it be "-rpcpassport" or "-rpcpassword" ?

distribute docker containers evenly with kubectl

If I create 3 nodes in a cluster, how do I distribute the docker containers evenly across the containers? For example, if I create a cluster of 3 nodes with 8 cpus on each node, I've determined through performance profiling that I get the best performance when I run one container per cpu.
gcloud container clusters create mycluster --num-nodes 3 --machine-type n1-standard-8
kubectl run myapp --image=gcr.io/myproject/myapp -r 24
When I ran kubectl above, it put 11 containers on the first node, 10 on the second, and 3 on the third. How to I make it so that it is 8 each?
Both your and jpapejr's solutions seem like they'd work, but using a nodeSelector to force scheduling to a single node has the downside of requiring multiple RCs for a single application and making that application less resilient to a node failure. The idea of a custom scheduler is nice but has the downside of the amount of work to write and maintain that code.
I think another possible solution would be to set runtime constraints in your pod spec that might get you near to what you want. Based on this newly merged doc with examples of runtime contraints, I think you could set resources.requests.cpu in the pod spec part of the RC and get close to a CPU-per-pod:
apiVersion: v1
kind: Pod
metadata:
name: myapp
spec:
containers:
- name: myapp
image: myregistry/myapp:v1
resources:
requests:
cpu: "1000m"
That docs has other good examples of how requests and limits differ and interact. There may be a combination that gives you what you want and also keeps your application at proper capacity when an individual node fails.
If I'm not mistaken, what you see is the expectation. If you want finer grained control over pod placement you probably want a customer scheduler.
In my case, I want to put a fixed number of containers in each node. I am able to do this by labeling each node and then using a nodeSelector with a config. Ignore that fact that I mislabeled the 3rd node, here is my setup:
kubectl label nodes gke-n3c8-7d9f8163-node-dol5 node=1
kubectl label nodes gke-n3c8-7d9f8163-node-hmbh node=2
kubectl label nodes gke-n3c8-7d9f8163-node-kdc4 node=3
That can be automated doing:
kubectl get nodes --no-headers | awk '{print NR " " $1}' | xargs -l bash -c 'kubectl label nodes $1 node=$0'
apiVersion: v1
kind: ReplicationController
metadata:
name: nginx
spec:
replicas: 8
selector:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
nodeSelector:
node: "1"
containers:
- name: nginx
image: nginx

Resources