Kubernetes Deployment dynamic port forwarding - docker

I am moving a Docker Image from Docker to a K8s Deployment. I have auto-scale rules on so it starts 5 but can go to 12. The Docker image on K8s starts perfectly with a k8s service in front to cluster the Deployment.
Now each container has its own JVM which has a Prometheus app retrieving its stats. In Docker, this is no problem because the port that serves Prometheus info is dynamically created with a starting port of 8000, so the docker-compose.yml grows the port by 1 based on how many images are started.
The problem is that I can't find how to do this in a K8s [deployment].yml file. Because Deployment pods are dynamic, I would have thought there would be some way to set a starting HOST port to be incremented based on how many containers are started.
Maybe I am looking at this the wrong way so any clarification would be helpful meanwhile will keep searching the Google for any info on such a thing.

Well after reading and reading and reading so much I came to the conclusion that K8s is not responsible to open ports for a Docker Image or provide ingress to your app on some weird port, it's not its responsibility. K8s Deployment just deploys the Pods you requested. You can set the Ports option on a DEPLOYMENT -> SPEC -> CONTAINERS -> PORTS which just like Docker is only informational. But this allows you to JSONPath query for all PODS(containers) with a Prometheus port available. This will allow you to rebuild the "targets" value in Prometheus.yaml file. Now having those targets makes them available to Grafana to create a dashboard.
That's it, pretty easy. I was complicating something because I did not understand it. I am including a script I QUICKLY wrote to get something going USE AT YOUR OWN RISK.
By the way, I use Pod and Container interchangeably.
#!/usr/bin/env bash
#set -x
_MyappPrometheusPort=8055
_finalIpsPortArray=()
_prometheusyamlFile=prometheus.yml
cd /docker/images/prometheus
#######################################################################################################################################################
#One container on the K8s System is weave and it holds the subnet we need to validate against.
#weave-net-lwzrk 2/2 Running 8 (7d3h ago) 9d 192.168.2.16 accl-ffm-srv-006 <none> <none>
_weavenet=$(kubectl get pod -n kube-system -o wide | grep weave | cut -d ' ' -f1 )
echo "_weavenet: $_weavenet"
#The default subnet is the one that lets us know the conntainer is part of kubernetes network.
# Range: 10.32.0.0/12
# DefaultSubnet: 10.32.0.0/12
_subnet=$( kubectl exec -n kube-system $_weavenet -c weave -- /home/weave/weave --local status | sed -En "s/^(.*)(DefaultSubnet:\s)(.*)?/\3/p" )
echo "_subnet: $_subnet"
_cidr2=$( echo "$_subnet" | cut -d '/' -f2 )
echo "_cidr2: /$_cidr2"
#######################################################################################################################################################
#This is an array of the currently monitored containers that prometheus was sstarted with.
#We will remove any containers form the array that fit the K8s Weavenet subnet with the myapp prometheus port.
_targetLineFound_array=($( egrep '^\s{1,20}-\s{0,5}targets\s{0,5}:\s{0,5}\[.*\]' $_prometheusyamlFile | sed -En "s/(.*-\stargets:\s\[)(.*)(\]).*/\2/p" | tr "," "\n"))
for index in "${_targetLineFound_array[#]}"
do
_ip="${index//\'/$''}"
_ipTocheck=$( echo $_ip | cut -d ':' -f1 )
_portTocheck=$( echo $_ip | cut -d ':' -f2 )
#We need to check if the IP is within the subnet mask attained from K8s.
#The port must also be the prometheus port in case some other port is used also for Prometheus.
#This means the IP should be removed since we will put the list of IPs from
#K8s currently in production by Deployment/AutoScale rules.
#Network: 10.32.0.0/12
_isIpWithinSubnet=$( ipcalc $_ipTocheck/$_cidr2 | sed -En "s/^(.*)(Network:\s+)([0-9]{1}[0-9]?[0-9]?\.[0-9]{1}[0-9]?[0-9]?\.[0-9]{1}[0-9]?[0-9]?\.[0-9]{1}[0-9]?[0-9]?)(\/[0-9]{1}[0-9]{1}.*)?/\3/p" )
if [[ "$_isIpWithinSubnet/$_cidr2" == "$_subnet" && "$_portTocheck" == "$_MyappPrometheusPort" ]]; then
echo "IP managed by K8s will be deleted: _isIpWithinSubnet: ($_ip) $_isIpWithinSubnet"
else
_finalIpsPortArray+=("$_ip")
fi
done
#######################################################################################################################################################
#This is an array of the current running myapp App containers with a prometheus port that is available.
#From this list we will add them to the prometheus file to be available for Grafana monitoring.
readarray -t _currentK8sIpsArr < <( kubectl get pods --all-namespaces --chunk-size=0 -o json | jq '.items[] | select(.spec.containers[].ports != null) | select(.spec.containers[].ports[].containerPort == '$_MyappPrometheusPort' ) | .status.podIP' )
for index in "${!_currentK8sIpsArr[#]}"
do
_addIPToMonitoring=${_currentK8sIpsArr[index]//\"/$''}
echo "IP Managed by K8s as myapp app with prometheus currently running will be added to monitoring: $_addIPToMonitoring"
_finalIpsPortArray+=("$_addIPToMonitoring:$_MyappPrometheusPort")
done
######################################################################################################################################################
#we need to recreate this string and sed it into the file
#- targets: ['192.168.2.13:3201', '192.168.2.13:3202', '10.32.0.7:8055', '10.32.0.8:8055']
_finalPrometheusTargetString="- targets: ["
i=0
# Iterate the loop to read and print each array element
for index in "${!_finalIpsPortArray[#]}"
do
((i=i+1))
_finalPrometheusTargetString="$_finalPrometheusTargetString '${_finalIpsPortArray[index]}'"
if [[ $i != ${#_finalIpsPortArray[#]} ]]; then
_finalPrometheusTargetString="$_finalPrometheusTargetString,"
fi
done
_finalPrometheusTargetString="$_finalPrometheusTargetString]"
echo "$_finalPrometheusTargetString"
sed -i -E "s/(.*)-\stargets:\s\[.*\]/\1$_finalPrometheusTargetString/" ./$_prometheusyamlFile
docker-compose down
sleep 4
docker-compose up -d
echo "All changes were made. Exiting"
exit 0

Ideally, you should be using the Average of JVM across all the replicas. There is no meaning to create a different deployment with a different port if you are running the single same Docker image across all the replicas.
i think keeping a single deployment with resource requirements set to deployment would be the best practice.
You can get the JVM average of all the running replicas
sum(jvm_memory_max_bytes{area="heap", app="app-name",job="my-job"}) / sum(kube_pod_status_phase{phase="Running"})
as you are running the same Docker image across all replicas and K8s service by default will be managing the Load Balancing, average utilization would be an option to monitor.
Still, if you want to filter and get different values you can create different deployments (Not at all good way) or use the stateful sets.
You can also filter the data by hostname (POD name) in Prometheus, so will get the each replica usage.

Related

Pass docker host ip as env var into devcontainer

I am trying to pass an environment variable into my devcontainer that is the output of a command run on my dev machine. I have tried the following in my devcontainer.json with no luck:
"initializeCommand": "export DOCKER_HOST_IP=\"$(ifconfig | grep -E '([0-9]{1,3}.){3}[0-9]{1,3}' | grep -v 127.0.0.1 | awk '{ print $2 }' | cut -f2 -d: | head -n1)\"",
"containerEnv": {
"DOCKER_HOST_IP1": "${localEnv:DOCKER_HOST_IP}",
"DOCKER_HOST_IP2": "${containerEnv:DOCKER_HOST_IP}"
},
and
"runArgs": [
"-e DOCKER_HOST_IP=\"$(ifconfig | grep -E '([0-9]{1,3}.){3}[0-9]{1,3}' | grep -v 127.0.0.1 | awk '{ print $2 }' | cut -f2 -d: | head -n1)\"
],
(the point of the ifconfig/grep piped command is to provide me with the IP of my docker host which is running via Docker for Desktop (Mac))
Some more context
Within my devcontainer I am running some kubectl deployments (to a cluster running on Docker for Desktop) where I would like to configure a hostAlias for a pod (docs) such that that pod will direct requests to https://api.cancourier.local to the ip of the docker host (which would then hit an ingress I have configured for that CNAME).
I could just pass in the output of the ifconfig command to my kubectl command when running from within the devcontainer. The problem is that I get two different results from this depending on whether I am running it on my host (10.0.0.89) or from within the devcontainer (10.1.0.1). 10.0.0.89 in this case is the "correct" IP as if I curl this from within my devcontainer, or my deployed pod, I get the response I'd expect from my ingress.
I'm also aware that I could just use the name of my k8s service (in this case api) to communicate between pods, but this isn't ideal. As for why, I'm running a Next.js application in a pod. The Next.js app on this pod has two "contexts":
my browser - the app serves up static HTML/JS to my browser where communicating with https://api.cancourier.local works fine
on the pod itself - running some things (ie. _middleware) on the pod itself, where the pod does not currently know what https://api.cancourier.local
What I was doing to temporarily get around this was to have a separate config on the pod, one for the "browser context" and the other for things running on the pod itself. This is less than ideal as when I go to deploy this Next.js app (to Vercel) it won't be an issue (as my API will be deployed on some publicly accessible CNAME). If I can accomplish what I was trying to do above, I'd be able to avoid this.
So I didn't end up finding a way to pass the output of a command run on the host machine as an env var into my devcontainer. However I did find a way to get the "correct" docker host IP and pass this along to my pod.
In my devcontainer.json I have this:
"runArgs": [
// https://stackoverflow.com/a/43541732/3902555
"--add-host=api.cancourier.local:host-gateway",
"--add-host=cancourier.local:host-gateway"
],
which augments the devcontainer's /etc/hosts with:
192.168.65.2 api.cancourier.local
192.168.65.2 cancourier.local
then in my Makefile where I store my kubectl commands I am simply running:
deploy-the-things:
DOCKER_HOST_IP = $(shell cat /etc/hosts | grep 'api.cancourier.local' | awk '{print $$1}')
helm upgrade $(helm_release_name) $(charts_location) \
--install \
--namespace=$(local_namespace) \
--create-namespace \
-f $(charts_location)/values.yaml \
-f $(charts_location)/local.yaml \
--set cwd=$(HOST_PROJECT_PATH) \
--set dockerHostIp=$(DOCKER_HOST_IP) \
--debug \
--wait
then within my helm chart I can use the following for the pod running my Next.js app:
hostAliases:
- ip: {{ .Values.dockerHostIp }}
hostnames:
- "api.cancourier.local"
Highly recommend following this tutorial: Container environment variables
In this tutorial, 2 methods are mentioned:
Adding individual variables
Using env file
Choose which is more comfortable for you, good luck))

Erlang: Why does inet:gethostbyname deliver 2 addresses whereas inet:getaddr delivers just one

The problem
I used inet:gethostbyname(Hoststr) in a docker environment with a couple containers for over a year without problems. Due to non recoverable read errors on the SSD which -- according to the provider -- is perfectly fine I was forced to reinstall from scratch with a CentOS image.
After restore I get a crash which boils down to
3> inet:gethostbyname("www").
{ok,{hostent,"www",
["www"],
inet,4,
[{10,0,1,205},{10,0,1,180}]}}
obviously because I get 2 IPs.
getaddr works fine:
4> inet:getaddr("www", inet).
{ok,{10,0,1,205}}
Question
I can replace gethostbyname with getaddr, no problem, but I would like to know why I get 2 IPs in the first place and how this misbehavior could creep in.
PHP does just fine:
id=$(docker ps -a | grep "vx_www.1\." | grep -v "xited" | awk '{print $1}') && docker exec -it $id ash
php > echo gethostbyname('www');
10.0.1.205
Is it a docker problem?
The problem may lie on the docker side, as both addresses ping.
It gets even more interesting (from the host):
/ # ip a | grep -oE "\b([0-9]{1,3}\.){3}[0-9]{1,3}\b"
127.0.0.1
10.0.1.205
10.0.1.255
172.18.0.13
172.18.255.255
10.0.8.33
10.0.8.255
10.0.0.162
10.0.0.255
10.0.9.19
10.0.9.255
This should show addresses from within the docker system, if I understand correctly, and the latter address is not in the list. But it is somewhere:
# docker inspect sa3oizstflg3 | grep "10.0.1"
"Addr": "10.0.1.180/24"
What is a VirtualIP?
Actually I get this address with the ID of www as VirtualIP
together with a bunch of others
"VirtualIPs": [
{
"NetworkID": "y3rf5yes97yjs1rfzpzf4aou8",
"Addr": "10.0.0.148/24"
},
{
"NetworkID": "swagio8bzavl2bf5u5bivmt13",
"Addr": "10.0.1.180/24"
},
{
"NetworkID": "tnmpad21shpvsps6fps5m4own",
"Addr": "10.0.8.5/24"
},
{
"NetworkID": "mz9aogx7jxit8x2bflgpyh5lh",
"Addr": "10.0.9.2/24"
}
The same container listens to 2 different IPs
Taking a different container with PHP on board, I get the second address given by inet:gethostbyname("www") for the same container, so both seem to be correct and usable:
# id=$(docker ps -a | grep "vx_wsm.1\." | grep -v "xited" | awk '{print $1}') && docker exec -it $id ash
/ # php -a
Interactive shell
php > echo gethostbyname('www');
10.0.1.180
Now I am confused. Anybody knows what is happening here?
inet:gethostbyname seems to be not wrong but more correct, then.
Erlang question
As an addendum: I am not that proficient in Erlang. In my code it reads:
get_ip_web(Web) -> % e.g. www = > 100.0.1.226
[case X of $, -> $.; _ -> X end || X <- lists:flatten(io_lib:format("~p",element(6, element(2, inet:gethostbyname(Web))))), X=/=${, X=/=$}].
How to rewrite this fine piece of code to pick one of the two addresses, but working also with only one result?
Well, this is mostly an academic question as I didn't understand this one-liner at all initially -- my comment was incomprehensible. This is no longer the case, but I still struggle with constructs easy to handle in other languages, especially if there is no routine for a long time.
For your information: I replaced the one-liner above with this much simpler one-liner:
inet:ntoa(element(2,inet:getaddr(Web, inet)))
Erlang:
From the code, inet:getaddr/2 just gets the first ip from inet:getaddrs/2, which in turn gets them from gethostbyname.
You should use the hostent record instead of element (only if you go with functions that return a hostent, I'd rather use the getaddr like you did in the end, it's actually the same code):
-module(test).
-include_lib("kernel/include/inet.hrl").
-export([get_ip/0]).
get_ip() ->
{ok, #hostent{h_addr_list = [IP|_]}} = inet:gethostbyname("www.stackoverflow.com"),
inet:ntoa(IP).
Docker:
If you run ip -a in the host, you'll get only the ips for the host, not the ones used by containers. Usually the host has an IP in each of the bridges that make the different docker networks, so it's in the same range as the containers.
Regarding the several ips, I don't have experience with docker swarm (which it seems that you're using), only with kubernetes.
My guess is that you have a service exposed in several docker networks that shares the name (www) with the container that implements it. Thus, in the network of the www container you have DNS both resolve www to the container IP and the service virtualIP address. Maybe you can find the virtual ips in iptables/nftables?

How to monitor and log networking latency between group of docker containers?

I have a setup of 10 docker containers from different immages in a swarm on 3 machines. I need to monitor and log network latency / packet delays between each container. Is there a right tool for it?
I can implement something like
while true; for host in "${my_hosts[*]}"; do ping -c 1 "$host" > latency.log; done done
and launch it on each machine, tailing latency.log to monitor like Prometheus. But it feels like reinvensing a square wheel.
I hope i understand what you need , Im implementing something like this myself .
I tested netdata with prometheus and graphana and metricbeat\filebeat with elastic and kibana
we choose to use elastic (ELK stack) because in the same DB you can handle metrics and textual data .
hope i gave you some directions .
What I have at the end is a setup that:
Shares hosts between containers by volume,
Measures latency feeding hosts to fping,
Writes fping output to log file,
Serves this log file to Prometheus by mtail.
I've implemented wrapper around fping to let it work with mtail:
#!/usr/bin/env bash
# It wraps `fping -lDe` to give output for multiple hosts one line at time (for `mtail` parser)
# Default `fping -lDe` behavior produce X lines at time where X = number of hosts to ping
# This waits for hosts file with `# docker` section as described in usage guide
echo "Measuing time delays to docker hosts from '$1'"
# take hostnames after '# docker' comment line
hosts=$(cat $1 | sed -n '/# docker/,$p' | sed 1,1d)
trap "exit" INT # exit loop by SIGINT
# start `fping` and write it's output to stdout line by line
stdbuf -oL fping -lDe $hosts |
while IFS= read -r line
do
echo $line
done
And there is mtail parser for the log file:
gauge delay by host
gauge loss by host
# [<epoch_time>] <host> : [2], 84 bytes, <delay> ms (0.06 avg, 0% loss)
/\[(?P<epoch_time>\d+\.\d+)\] (?P<host>[\w\d\.]*) : \[\d+\], \d+ \w+, (?P<delay>\d+\.\d+) ms \(\d+\.\d+ avg, (?P<loss>\d+)% loss\)/ {
delay[$host]=$delay
loss[$host]=$loss
}
Now you can add fping and mtail to your images to let it serve delays and losses metrics for Prometheus.
References:
mtail: https://github.com/google/mtail
fping: https://fping.org/

How to find Kubernetes pod that's handling the request

I am running 2 pods and a service with Type: NodePort, to load balance the requests between pods. I want to know when I send a request to the service, which pod the request is forwarded to. Is there a way to find this, because looking at the response, it looks like all the requests are handled by same pod.
A kubernetes Service will load-balance using WRR by default. When you create a Service, iptables rules will be generated in the node.
To be sure, ssh into the node and run iptables-save|less. Search for the name of the service. In the example below, a Service microbot load balances the microbot deployment with 3 replicas. There should be 2 entries in your case since you have just 2 pods.
-A KUBE-SVC-LX5ZXALLN4UQ7ZFL -m comment --comment "default/microbot:" -m statistic --mode random --probability 0.33332999982 -j KUBE-SEP-OZCDYTQTC3KQGJK5
-A KUBE-SVC-LX5ZXALLN4UQ7ZFL -m comment --comment "default/microbot:" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-SKIRAXBCCQB5R4MV
-A KUBE-SVC-LX5ZXALLN4UQ7ZFL -m comment --comment "default/microbot:" -j KUBE-SEP-SPMPNZCOIJIRSNNQ
If the iptables output doesn't look like the above, it's likely that your Service is not configured properly. Like what Heidi said, the pods are not associated with the Service.
Most likely because of your context and lack of namespace you can't see the logs.
Try kubectl get pods -o wide --all-namespaces | grep <pod> to get the information of which namespace the pod is in and the node IP address.
Then feed the and into the command below to get a running tail of the last 100 lines of the log
kubectl --namespace <namespace> logs --tail 100 -f <pod>
There is also the remote chance that the pod is not associated with the service. To check run kubectl describe services --namespace <namespace> <service> and look for the app name in the Selector: section
You can also exec into the container and see if the port is accessible or bound on the pod itself. If it isn't listening or answering, most likely it's due to the service not being associated with the application in the namespace.
You can look at the application log file, if your using one. if you print anything in stdout, use kubectl logs <pod> to see the message.
For testing you can include pod hostname in the response.

Reach host with Docker Compose

I have a Docker Compose v2 file which starts a container. I locally run a service on port 3001. I want to reach this service from the Docker container.
The Docker Compose file looks like this:
version: '2'
services:
my-thingy:
image: my-image:latest
#network_mode: host #DOES not help
environment:
- THE_HOST_I_WANT_TO_CONNECT_TO=http://127.0.0.1:3001
ports:
- "3010:3010"
Now, how can I reach THE_HOST_I_WANT_TO_CONNECT_TO?
What I tried is:
Setting network_mode to host. This did not work. 127.0.0.1 could not be reached.
I can also see that I can reach the host from the container if I use the local IP of the host. A quick hack would be to use something like ifconfig | grep broadcast | awk '{print $2}' to obtain the IP and substitute that in Docker Compose. Since this IP can change on reconnect and different setups can have different ifconfig results, I am looking for a better solution.
I've used another hack/workarkound from comments in the docker issue #1143. Seems to Work For Me™ for the time being... Specifically, I've added the following lines in my Dockerfile:
# - net-tools contains netstat, used to discover IP of Docker host server.
# NOTE: the netstat trick is to make Docker host server accessible
# from inside Docker container under name 'dockerhost'. Unfortunately,
# as of 2016.10, there's no official/robust way to do this when Docker host
# has no public IP/DNS entry. What is used here is built based on:
# - https://github.com/docker/docker/issues/1143#issuecomment-39364200
# - https://github.com/docker/docker/issues/1143#issuecomment-46105218
# See also:
# - http://stackoverflow.com/q/38936738/98528
# - https://github.com/docker/docker/issues/8395#issuecomment-200808798
# - https://github.com/docker/docker/issues/23177
RUN apt-get update && apt-get install -y net-tools
CMD (netstat -nr | grep '^0\.0\.0\.0' | awk '{print $2" dockerhost"}' >> /etc/hosts) && \
...old CMD...
With this, I can use dockerhost as the name of the host where Docker is installed. As mentioned above, this is based on:
https://github.com/docker/docker/issues/1143#issuecomment-39364200
(...) One way is to rely on the fact that the Docker host is reachable through the address of the Docker bridge, which happens to be the default gateway for the container. In other words, a clever parsing of ip route ls | grep ^default might be all you need in that case. Of course, it relies on an implementation detail (the default gateway happens to be an IP address of the Docker host) which might change in the future. (...)
https://github.com/docker/docker/issues/1143#issuecomment-46105218
(...) A lot of people like us are looking for a little tidbit like this
netstat -nr | grep '^0\.0\.0\.0' | awk '{print $2}'
where netstat -nr means:
Netstat prints information about the Linux networking subsystem.
(...)
--route , -r
Display the kernel routing tables.
(...)
--numeric , -n
Show numerical addresses instead of trying to determine symbolic host, port or user names.
This is a known issue with Docker Compose: see Document how to connect to Docker host from container #1143. The suggested solution of a dockerhost entry in /etc/hosts is not implemented.
I went for the solution with a shell variable as also suggested in a comment by amcdl on the issue:
Create a LOCAL_XX_HOST variable: export LOCAL_XX_HOST="http://$(ifconfig en0 inet | grep "inet " | awk -F'[: ]+' '{ print $2 }'):3001".
Then, for example, refer to this variable in docker-compose like this:
my-thingy:
image: my-image:latest
environment:
- THE_HOST_I_WANT_TO_CONNECT_TO=${LOCAL_XX_HOST}

Resources