Mysterious Filebeat 7 X-Pack issue using Docker image - docker

I've also posted this as a question on the official Elastic forum, but that doesn't seem super frequented.
https://discuss.elastic.co/t/x-pack-check-on-oss-docker-image/198521
At any rate, here's the query:
We're running a managed AWS Elasticsearch cluster — not ideal, but that's life — and run most the rest of our stuff with Kubernetes. We recently upgraded our cluster to Elasticsearch 7, so I wanted to upgrade the Filebeat service we have running on the Kubernetes nodes to capture logs.
I've specified image: docker.elastic.co/beats/filebeat-oss:7.3.1 in my daemon configuration, but I still see
Connection marked as failed because the onConnect callback failed:
request checking for ILM availability failed:
401 Unauthorized: {"Message":"Your request: '/_xpack' is not allowed."}
in the logs. Same thing when I've tried other 7.x images. A bug? Or something that's new in v7?
The license file is an Apache License, and the build when I do filebeat version inside the container is a4be71b90ce3e3b8213b616adfcd9e455513da45.

It turns out that starting in one of the 7.x versions they turned on index lifecycle management checks by default. ILM (index lifecycle management) is an X-Pack feature, so turning this on by default means that Filebeat will do an X-Pack check by default.
This can be fixed by adding setup.ilm.enabled: false to the Filebeat configuration. So, not a bug per se in the OSS Docker build.

Related

docker: Error response from daemon: manifest for gcr.io/google_containers/hyperkube-amd64:v1.24.2 not found

Following this guide:
https://jamesdefabia.github.io/docs/getting-started-guides/docker/
and both
export K8S_VERSION=$(curl -sS https://storage.googleapis.com/kubernetes-release/release/stable.txt)
and
export K8S_VERSION=$(curl -sS https://storage.googleapis.com/kubernetes-release/release/latest.txt)
fail at the docker run stage with a not found error. E.g:
docker: Error response from daemon: manifest for gcr.io/google_containers/hyperkube-amd64:v1.24.2 not found: manifest unknown: Failed to fetch "v1.24.2" from request "/v2/google_containers/hyperkube-amd64/manifests/v1.24.2".
Any suggestions?
Check the repo of hyperkube and use an available tag:
https://console.cloud.google.com/gcr/images/google-containers/global/hyperkube-amd64
As mentioned by #zerkms #vladtkachuk that google hyperkube is not available anymore. As mentioned in the document:
Hyperkube, an all-in-one binary for Kubernetes components, is now
deprecated and will not be built by the Kubernetes project going
forward.Several, older beta API versions are deprecated in 1.19 and
will be removed in version 1.22. We will provide a follow-on update
since this means 1.22 will likely end up being a breaking release for
many end users.
Setting up a local Kubernetes environment as your development environment is the recommended option, no matter your situation, because this setup can create a safe and agile application-deployment process.
Fortunately, there are multiple platforms that you can try out to run Kubernetes locally, and they are all open source and available under the Apache 2.0 license.
Minikube has the primary goals of being the best tool for local Kubernetes application development, and to support all Kubernetes features that fit.
kind runs local Kubernetes clusters using Docker container "nodes."

Unable to reach registry-1.docker.io from Kind cluster node on WSL2

I am setting up and airflow k8s cluster using kind deployment on a WSL2 setup. When I execute standard helm install $RELEASE_NAME apache-airflow/airflow --namespace $NS it fails. Further investigation shows that cluster worker node cannot connect to registry-1.docker.io.
Error log for one the image pull
Failed to pull image "redis:6-buster": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/redis:6-buster": failed to resolve reference "docker.io/library/redis:6-buster": failed to do request: Head "https://registry-1.docker.io/v2/library/redis/manifests/6-buster": dial tcp: lookup registry-1.docker.io on 172.19.0.1:53: no such host
I can access all other websites from this node e.g. google.com, yahoo.com merriam-webster.com etc. ; even docker.com works. This issue is very specific to registry-1.docker.io.
All the search and links seems to be around general internet connection issue.
Current solution:
If I manually change the /etc/resolv.conf on the kind worker node to point to the IP address from /etc/resolv.conf of the WSL2 Debian main IP address, then it works.
But, this is a dynamic cluster and node and I cannot do this every time. I am currently searching for a way as to how the make it a part of the cluster configuration. Some way that makes it work just by saying kind create cluster and one should be able to use kubectl or helm by default.
However, I am more interested in figuring out why this network setup fails specifically for registry-1.docker.io. Is there some configuration that can be done to avoid changing DNS to host IP or google DNS? As the current network configuration seems to work pretty much for the rest of the internet.
I have documented all the steps and investigation details including some of network configuration details on github repositroy. If you need any further information to help solve the issue, please let me know. I will keep on updating the github documentation as I make progress.
Setup:
Windows 11 with WSL2 without any Docker desktop
WSL2 image : Debian bullseye (11) with docker engine on linux
Docker version : 20.10.2
Kind version : 0.11.1
Kind image: kindest/node:v1.20.7#sha256:cbeaf907fc78ac97ce7b625e4bf0de16e3ea725daf6b04f930bd14c
67c671ff9
I am not sure, if it is an answer or not. After spending 2 days trying to find solution. I thought to change the node image version. On the Kind release page, it says 1.21 as the latest image for the kind version 0.11.1. I had problems with 1.21 to even start the cluster. 1.20 faced this strange DNS image. So went with 1.23. It all worked fine with thus image.
However, to my surprise, when I changed the cluster configuration back to 1.20, the DNS issue was gone. So, I do not what changed due to switch of of the image, but I cannot reproduce the issue again! Maybe it will help someone else
I find that i have found the correct workaround for this bug: Switching IPTables to legacy mode has fixed this for me.
https://github.com/docker/for-linux/issues/1406#issuecomment-1183487816

how can I ensure traffic is being served only on my new cloud run service deployment?

I am seeing in my stackdriver logs that I am running an old revision of a cloud run service, while the console shows only one newer revision as live.
In this case I have a cloud run container built in a GCP project, but which is deployed in a second project, using the fully specified image name. (My attempts at Terraform were sidetracked into an auth catch 22, so until I have restored determination, I am manually deploying.) I don't know if this wrinkle of the two projects is relevant to my question. Otherwise all the auth works.
In brief, why may I be seeing old deployments receiving traffic minutes after the new deployment is made? Even more than 30 minutes later traffic is still reaching the old deployment.
There are a few things to take into account here:
Try explicitly telling Cloud Run to migrate all traffic to the latest revision. You can do that by adding the following to your .yaml file
metadata:
name: SERVICE
spec:
...
traffic:
- latestRevision: true
percent: 100
Try always adding :latest tag upon the building of a new image
so instead of only having let's say gcr.io/project/newimage it would be gcr.io/project/newimage:latest. This way you will ensure the latest image is being used and not previously automatically assigned tags.
If neither fix your issue, then please provide the logs as there might be something useful that indicates what is the root cause. (Also let us know if you are using any caching config)
You can tell Cloud Run to route all traffic to the latest instance with gcloud run services update-traffic [SERVICE NAME] --to-latest. That will route all traffic to the latest deployment, and update the traffic allocation as you deploy new instances.
You might not want to use this if you need to validate the service after deployment and before "opening the floodgates", or if you're doing canary deployments.

Timeouts accessing services on swarm published ports

We're using Docker in Swarm mode to host a number of services. Recently we've hit an issue where we get connection timeouts intermittently (sometimes as much as every other request) when trying to access some services.
We've upgraded the environment to the latest version of Docker (currently Docker version 17.03.0-ce, build 3a232c8), done a staggered reboot of all servers (trying to maintain uptime if possible even though this environment is technically a test environment) and tried stopping / starting services as well, but the issue still persists.
I'm confident the issue is not related to the service that's running in Docker, as we're seeing it on various services which have until recently been running without issue, I think it's more likely an environmental issue, or some problem with Docker's internal routing in the overlay network, but not sure how to prove / solve this.
Any advice on how to diagnose or solve this would be greatly appreciated!

how can i launch the kafka scheduler using marathon in minimesos?

I'm trying to launch the kafka-mesos framework scheduler using the docker container as prescribed at https://github.com/mesos/kafka/tree/master/src/docker#running-image-in-marathon using the Marathon implementation running in minimesos (I would like to add a minimesos tag, but don't have the points). The app is registered and can be seen in the Marathon console but it remains in Waiting state and the Deployment GUI says that it is trying to ScaleApplication.
I've tried looking for /var/log files in the marathon and mesos-master containers that might show why this is happening. Initially i thought it may have been because the image was not pulled, so i added "forcePullImage": true to the JSON app configuration but it still waits. I've also changed the networking from HOST to BRIDGE on the assumption that this is consistent with the minimesos caveats at http://minimesos.readthedocs.org/en/latest/ .
In the mesos log i do see:
I0106 20:07:15.259790 15 master.cpp:4967] Sending 1 offers to framework 5e1508a8-0024-4626-9e0e-5c063f3c78a9-0000 (marathon) at scheduler-575c233a-8bc3-413f-b070-505fcf138ece#172.17.0.6:39111
I0106 20:07:15.266100 9 master.cpp:3300] Processing DECLINE call for offers: [ 5e1508a8-0024-4626-9e0e-5c063f3c78a9-O77 ] for framework 5e1508a8-0024-4626-9e0e-5c063f3c78a9-0000 (marathon) at scheduler-575c233a-8bc3-413f-b070-505fcf138ece#172.17.0.6:39111
I0106 20:07:15.266633 9 hierarchical.hpp:1103] Recovered ports(*):[33000-34000]; cpus(*):1; mem(*):1001; disk(*):13483 (total: ports(*):[33000-34000]; cpus(*):1; mem(*):1001; disk(*):13483, allocated: ) on slave 5e1508a8-0024-4626-9e0e-5c063f3c78a9-S0 from framework 5e1508a8-0024-4626-9e0e-5c063f3c78a9-0000
I0106 20:07:15.266770 9 hierarchical.hpp:1140] Framework 5e1508a8-0024-4626-9e0e-5c063f3c78a9-0000 filtered slave 5e1508a8-0024-4626-9e0e-5c063f3c78a9-S0 for 2mins
I0106 20:07:16.261010 11 hierarchical.hpp:1521] Filtered offer with ports(*):[33000-34000]; cpus(*):1; mem(*):1001; disk(*):13483 on slave 5e1508a8-0024-4626-9e0e-5c063f3c78a9-S0 for framework 5e1508a8-0024-4626-9e0e-5c063f3c78a9-0000
I0106 20:07:16.261245 11 hierarchical.hpp:1326] No resources available to allocate!
I0106 20:07:16.261335 11 hierarchical.hpp:1421] No inverse offers to send out!
but I'm not sure if this is relevant since it does not correlate to the resource settings in the Kafka App config. The GUI shows that no tasks have been created.
I do have ten mesosphere/inky docker tasks running alongside the attempted Kafka deployment. This may be a configuration issue specific to the Kafka docker image. I just don't know the best way to debug it. Perhaps a case of increasing the log levels in a config file. It may be an environment variable or network setting. I'm digging into it and will update my progress, but any suggestions would be appreciated.
thanks!
Thanks for trying this out! I am looking into this and you can follow progress on this issue at https://github.com/ContainerSolutions/minimesos/issues/188 and https://github.com/mesos/kafka/issues/172
FYI I got Mesos Kafka installed on minimesos via a quickstart shell script. See this PR on Mesos Kafka https://github.com/mesos/kafka/pull/183
It does not use Marathon and the minimesos install command yet. That is the next step.

Resources