An EKS app running on fargate worker nodes is failing to connect to an elasticache redis cluster within the same subnet - network-programming

I have an app running on a EKS (kubernetes) cluster. The cluster was created with the eksctl tool. I'm running fargate only. The app needs to connect to an elasticache redis cluster, which I spun up within the same subnet as the fargate worker. The connection errors out with:
{ Error: Redis connection to my-redis.kptb5s.ng.0001.use1.cache.amazonaws.com:6379 failed - connect ETIMEDOUT 192.168.116.58:6379 │
│ at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1107:14) │
│ errno: 'ETIMEDOUT', │
│ code: 'ETIMEDOUT', │
│ syscall: 'connect', │
│ address: '192.168.116.58', │
│ port: 6379 }
How can I troubleshoot this? I need to get this connection to redis working. What are the most likely issues?

The most likely reason (based on the above) is that the Security Groups setup do not allow traffic to flow from the pod to the redis instance on port 6379. Assuming the EKS cluster SG associated to the pod has all outbound flow allowed, I would focus on the SG assigned to the Redis cluster endpoint (which needs to allow traffic from the EKS cluster security group).

Related

I am trying to deploy the microsoft fluid framework on Aws eks cluster but the pods go in to CrashLoopBackOff

When I get the logs for one of the pods with the CrashLoopBackOff status
kubectl logs alfred
it returns the following errors.
error: alfred service exiting due to error {"label":"winston","timestamp":"2021-11-08T07:02:02.324Z"}
at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:66:26) {
errno: 'ENOTFOUND',
code: 'ENOTFOUND',
syscall: 'getaddrinfo',
hostname: 'mongodb'
} {"label":"winston","timestamp":"2021-11-08T07:02:02.326Z"}
error: Client Manager Redis Error: getaddrinfo ENOTFOUND redis {"errno":"ENOTFOUND","code":"ENOTFOUND","syscall":"getaddrinfo","hostname":"redis","stack":"Error: getaddrinfo ENOTFOUND redis\n at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:66:26)","label":"winston","timestamp":"2021-11-08T07:02:02.368Z"}
I am new to the Kubernetes and Aws Eks. Would be looking forward to help. Thanks
If you see the error its failing at getaddrinfo, which is a program/function to resolve the dns name and connect with an external service. It is trying to access a redis cluster. Seems like your EKS cluster doesn't have the connectivity.
However if you are running redis as part of your EKS cluster, make sure to provide/update the kubernetes service dns in the application code, or set this as an environment variable which can be set just before deployment.
Its redis and mongodb, also as error says you are providing hostname as redis and mongodb, it won't resolve to an IP address unless you have mapped it in /etc/hosts file which is actually untrue.
Give the correct hostnames, the pods will come up. This is the root-cause.
The errors above were being generated because mongo and redis were not exposed by a service. After I created service.yaml files for the instances the above errors went away. Aws Eks deploys containers in pods which are scattered across different nodes. In order to let mongodb communicate from one pod to another you must expose a service or aka "frontend" for the mongodb deployment.

minikube status Unknown, Windows 10, docker

I was trying to see the dashboard, previously works fine...
Now I get using minikube dashboard
λ minikube dashboard
X Exiting due to GUEST_STATUS: state: unknown state "minikube": docker container inspect minikube --format=: exit status 1
stdout:
stderr:
Error: No such container: minikube
*
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ │
│ * If the above advice does not help, please let us know: │
│ https://github.com/kubernetes/minikube/issues/new/choose │
│ │
│ * Please attach the following file to the GitHub issue: │
│ * - C:\Users\JOSELU~1\AppData\Local\Temp\minikube_dashboard_dc37e18dac9641f7847258501d0e823fdfb0604c_0.log │
│ │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
With minikube status
λ minikube status
E0604 13:13:20.260421 27600 status.go:258] status error: host: state: unknown state "minikube": docker container inspect minikube --format={{.State.Status}}: exit status 1
stdout:
stderr:
Error: No such container: minikube
E0604 13:13:20.261425 27600 status.go:261] The "minikube" host does not exist!
minikube
type: Control Plane
host: Nonexistent
kubelet: Nonexistent
apiserver: Nonexistent
kubeconfig: Nonexistent
With the command minikube profile list
λ minikube profile list
|----------|-----------|---------|--------------|------|---------|---------|-------|
| Profile | VM Driver | Runtime | IP | Port | Version | Status | Nodes |
|----------|-----------|---------|--------------|------|---------|---------|-------|
| minikube | docker | docker | 192.168.49.2 | 8443 | v1.20.2 | Unknown | 1 |
|----------|-----------|---------|--------------|------|---------|---------|-------|
Now,...
What would be it happens?
What would be the best solution?
Thansk...
Remove unused data:
docker system prune
Clear minikube's local state:
minikube delete
Start the cluster:
minikube start --driver=<driver_name>
(In your case driver name is docker as per minikube profile list info shared by you)
Check the cluster status:
minikube status
Use the following documentation for more information:
https://docs.docker.com/engine/reference/commandline/system_prune/#examples
https://v1-18.docs.kubernetes.io/docs/tasks/tools/install-minikube/

jenkins build script cant git pull from remote repo

Started by user sabari k
Building in workspace /var/lib/jenkins/workspace/actualdairy
[actualdairy] $ /bin/sh -xe /tmp/jenkins4465259595371700187.sh
+ echo hello
+ cd
+ ./actualDairy/deploy.sh
Pseudo-terminal will not be allocated because stdin is not a terminal.
Welcome to Ubuntu 18.04.1 LTS (GNU/Linux 4.15.0-36-generic x86_64)
Documentation: https://help.ubuntu.com
Management: https://landscape.canonical.com
Support: https://ubuntu.com/advantage
System information as of Thu Oct 25 20:05:25 UTC 2018
System load: 0.09 Processes: 90
Usage of /: 8.7% of 24.06GB Users logged in: 1
Memory usage: 38% IP address for eth0:
Swap usage: 0% IP address for eth1:
Get cloud support with Ubuntu Advantage Cloud Guest:
http://www.ubuntu.com/business/services/cloud
43 packages can be updated.
6 updates are security updates.
Welcome to DigitalOcean's One-Click Node.js Droplet.
To keep this Droplet secure, the UFW firewall is enabled.
All ports are BLOCKED except 22 (SSH), 80 (HTTP), and 443 (HTTPS).
To get started, visit http://do.co/node1804
To delete this message of the day: rm -rf /etc/update-motd.d/99-one-click
mesg: ttyname failed: Inappropriate ioctl for device
git#bitbucket.org: Permission denied (publickey).
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
npm WARN optional Skipping failed optional dependency /chokidar/fsevents:
npm WARN notsup Not compatible with your operating system or architecture: fsevents#1.2.4
Use --update-env to update environment variables
[PM2] Applying action restartProcessId on app [all](ids: 0)
[PM2] www ✓
┌──────────┬────┬─────────┬──────┬───────┬────────┬─────────┬────────┬─────┬──────────┬──────┬──────────┐
│ App name │ id │ version │ mode │ pid │ status │ restart │ uptime │ cpu │ mem │ user │ watching │
├──────────┼────┼─────────┼──────┼───────┼────────┼─────────┼────────┼─────┼──────────┼──────┼──────────┤
│ www │ 0 │ 0.0.2 │ fork │ 23688 │ online │ 65 │ 0s │ 0% │ 5.4 MB │ root │ disabled │
└──────────┴────┴─────────┴──────┴───────┴────────┴─────────┴────────┴─────┴──────────┴──────┴──────────┘
Use pm2 show <id|name> to get more details about an app
Finished: SUCCESS
i dont know why the git pull is not working. my deploy script is
#!/bin/bash
ssh root#ipaddress <<EOF
cd actualdairy
git pull
npm install
pm2 restart all
exit
EOF
i added remote server public key in bitbucket but its not pulling from the repo.saying Permission denied (publickey)

Issues with Kubernetes multi-master using kubeadm on premises

Following Kubernetes v1.11 documentation, I have managed to setup Kubernetes high availability using kubeadm, stacked control plane nodes, with 3 masters running on-premises on CentOS7 VMs. But with no load-balancer available, I used Keepalived to set a failover virtual IP (10.171.4.12) for apiserver as described in Kubernetes v1.10 documentation. As a result, my "kubeadm-config.yaml" used to boostrap the control planes had the following header:
apiVersion: kubeadm.k8s.io/v1alpha2
kind: MasterConfiguration
kubernetesVersion: v1.11.0
apiServerCertSANs:
- "10.171.4.12"
api:
controlPlaneEndpoint: "10.171.4.12:6443"
etcd:
...
The configuration went fine with the following Warning that appeared when boostrapping all 3 Masters:
[endpoint] WARNING: port specified in api.controlPlaneEndpoint
overrides api.bindPort in the controlplane address
And this Warning when joining Workers:
[WARNING RequiredIPVSKernelModulesAvailable]: the IPVS proxier will
not be used, because the following required kernel modules are not
loaded: [ip_vs ip_vs_rr ip_vs_wrr ip_vs_sh] or no builtin kernel ipvs
support: map[ip_vs:{} ip_vs_rr:{} ip_vs_wrr:{} ip_vs_sh:{}
nf_conntrack_ipv4:{}] you can solve this problem with following
methods:
1. Run 'modprobe -- ' to load missing kernel modules;
2. Provide the missing builtin kernel ipvs support
Afterwards, basic tests succeed:
When stopped, Keepalived is failing over to another Master and apiserver is always accessible (all kubectl commands succeed).
When stopping the main Master (with highest Keepalived preference), the deployment of apps is successful (tested with Kubernetes bootcamp) and everything syncs properly with the main Master when it is back online.
Kubernetes bootcamp application runs successfully, and all master & worker nodes respond properly when the service exposing bootcamp with NodePort is curled.
Successfully deployed docker-registry as per https://github.com/kubernetes/ingress-nginx/tree/master/docs/examples/docker-registry
But then comes these issues:
Nginx Ingress Controller pod fails to run and enters state CrashLoopBackOff (refer to events below)
After installing helm and tiller on any Master, all commands using "helm install" or "helm list" failed to execute (refer to command ouputs below)
I am running Kubernetes v1.11.1 but kubeadm-config.yaml mentions 1.11.0, is this something I should worry about?
Shall I not follow the official documentation and go for other alternatives such as described at: https://medium.com/#bambash/ha-kubernetes-cluster-via-kubeadm-b2133360b198
Note: same issue with new Kubernetes HA installation using the latest version 1.11.2 (three masters + one worker) and deployed nginx latest ingress controller release 0.18.0.
-- Nginx controller pod events & logs:
Normal Pulled 28m (x38 over 2h) kubelet, node3.local Container image "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.17.1" already present on machine
Warning Unhealthy 7m (x137 over 2h) kubelet, node3.local Liveness probe failed: Get http://10.240.3.14:10254/healthz: dial tcp 10.240.3.14:10254: connect: connection refused
Warning BackOff 2m (x502 over 2h) kubelet, node3.local Back-off restarting failed container
nginx version: nginx/1.13.12
W0809 14:05:46.171066 5 client_config.go:552] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I0809 14:05:46.171748 5 main.go:191] Creating API client for https://10.250.0.1:443
-- helm command outputs:
'# helm install ...
Error: no available release name found
'# helm list
Error: Get https://10.250.0.1:443/api/v1/namespaces/kube-system/configmaps?labelSelector=OWNER%!D(MISSING)TILLER: dial tcp 10.250.0.1:443: i/o timeout
-- kubernetes service & endpoints:
# kubectl describe svc kubernetes
Name: kubernetes
Namespace: default
Labels: component=apiserver
provider=kubernetes
Annotations: <none>
Selector: <none>
Type: ClusterIP
IP: 10.250.0.1
Port: https 443/TCP
TargetPort: 6443/TCP
Endpoints: 10.171.4.10:6443,10.171.4.8:6443,10.171.4.9:6443
Session Affinity: None
Events: <none>
# kubectl get endpoints --all-namespaces
NAMESPACE NAME ENDPOINTS AGE
default bc-svc 10.240.3.27:8080 6d
default kubernetes 10.171.4.10:6443,10.171.4.8:6443,10.171.4.9:6443 7d
ingress-nginx default-http-backend 10.240.3.24:8080 4d
kube-system kube-controller-manager <none> 7d
kube-system kube-dns 10.240.2.4:53,10.240.2.5:53,10.240.2.4:53 + 1 more... 7d
kube-system kube-scheduler <none> 7d
kube-system tiller-deploy 10.240.3.25:44134 5d
Problems solved when switched my POD network from Flanneld to Calico.
(tested on Kubernetes 1.11.0; will repeat tests tomorrow on latest k8s version 1.11.2)
As you can see in the Kubernetes client-go code, IP address and port are read from environment variables inside a container:
host, port := os.Getenv("KUBERNETES_SERVICE_HOST"), os.Getenv("KUBERNETES_SERVICE_PORT")
You can check these variables if you run following command mentioning any healthy pod in it:
$ kubectl exec <healthy-pod-name> -- printenv | grep SERVICE
I think the cause of the problem is that the variables KUBERNETES_SERVICE_HOST:KUBERNETES_SERVICE_PORT is set to 10.250.0.1:443 instead of 10.171.4.12:6443
Could you confirm it by checking these variables in your cluster?
Important Additional Notes:
After running couple of labs, I got the same issue with:
- new Kubernetes HA installation using the latest version 1.11.2 (three masters + one worker) and nginx latest ingress controller release 0.18.0.
- standalone Kubernetes master with few workers using version 1.11.1 (one master + two workers) and nginx latest ingress controller release 0.18.0.
- but with standalone Kubernetes master version 1.11.0 (one master + two workers), nginx ingress controller 0.17.1 worked with no complaints while 0.18.0 complained that Readiness probe failed but the pod went into the running state.
=> As a result, I think the issue is related to kubernetes releases 1.11.1 and 1.11.2 in the way they interpret the health probes maybe

Connect CI Runner to Docker network

I have the following configuration:
dockered gitlab (named gitlab)
dockered gitlab-ci-multirunner (linked to gitlab and named gitlab-runners).
┌──────────────────────┐ ┌─────────┐
│ 172.12.x.x │ │172.13.x.│
┌┴──────────┬┬──────────┴┐┌┴─────────┴┐
│ GitLab ││ GitLab ││ GitLab │
│ ││ Runners ││ Runners │
│ ││ ││ │
└───────────┘└───────────┘└───────────┘
│ │ │ ▲
│ │ │ ╱
│ │ │ ╱
│ │ ▼ ╱
───────┴────────────┴────────────────────
I successfully registered a runner into gitlab, but when I try to run a build I cannot manage to connect the docker container of the project spawned by the gitlab-runners to my gitlab docker; therefore when the project docker tries to clone the project it's not able to resolve the name http://gitlab/ I tried to use the parameter -links=["network-name:gitlab"] in the toml file of my runner, but this leads to:
API error (500) Could not get container for <network name>.
Any clues?
Here is my .toml:
concurrent = 1
check_interval = 0
[[runners]]
name = "d4cf95ba5a90"
url = "http://gitlab/ci"
token = "9e6c2edb5832f92512a69df1ec4464"
executor = "docker"
[runners.docker]
tls_verify = false
image = "node:4.2.2"
privileged = false
disable_cache = false
volumes = ["/cache"]
links = ["evci_default:gitlab"]
[runners.cache]
Only one solution i found is to add IP of docker host to 'extra_hosts' of config.toml
extra_hosts = ["host:192.168.137.1"]

Resources