How to run Jenkins with Docker on Kubernetes? - docker

I’m attempting to execute a Jenkins & Docker CLI container on Kubernetes. Here are my steps:
I create the pod using:
kubectl --kubeconfig my-kubeconfig.yml run my-jenkins-pod --image=trion/jenkins-docker-client --restart=Never
Which creates a pod-based on the image https://hub.docker.com/r/trion/jenkins-docker-client
I create the deployment using:
kubectl --kubeconfig my-kubeconfig.yml apply -f /kub/kube
/kub/kube contains jenkins-deployment-yaml which I have configured as:
apiVersion: v1
kind: Service
metadata:
name: my-jenkins-pod
spec:
ports:
- protocol: "TCP"
port: 50000
targetPort: 5001
selector:
app: my-jenkins-pod
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-jenkins-pod
spec:
selector:
matchLabels:
app: my-jenkins-pod
replicas: 1
template:
metadata:
labels:
app: my-jenkins-pod
spec:
containers:
- name: ml-services
image: trion/jenkins-docker-client
ports:
- containerPort: 5001
To access the Jenkins container I expose the IP using:
kubectl --kubeconfig my-kubeconfig.yml expose deployment my-jenkins-pod --type=LoadBalancer --name=my-jenkins-pod-public
To return the IP of the Jenkins and Docker image I use :
kubectl --kubeconfig my-kubeconfig.yml get services my-jenkins-pod-public
Which returns:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
my-jenkins-pod-public LoadBalancer 9.5.52.28 161.61.222.16 5001:30878/TCP 10m
To test I open the URL at location:
http://161.61.222.16:5001/
Which returns:
This page isn’t working
161.61.222.16 didn’t send any data.
ERR_EMPTY_RESPONSE
It seems the service has started but the port mappings are incorrect?
The log of the pod my-jenkins-pod contains:
Running from: /usr/share/jenkins/jenkins.war webroot:
EnvVars.masterEnvVars.get("JENKINS_HOME") 2021-04-03 11:15:42.899+0000
[id=1] INFO org.eclipse.jetty.util.log.Log#initialized: Logging
initialized #274ms to org.eclipse.jetty.util.log.JavaUtilLog
2021-04-03 11:15:43.012+0000 [id=1] INFO winstone.Logger#logInternal:
Beginning extraction from war file 2021-04-03 11:15:44.369+0000
[id=1] WARNING o.e.j.s.handler.ContextHandler#setContextPath: Empty
contextPath 2021-04-03 11:15:44.416+0000
[id=1] INFO org.eclipse.jetty.server.Server#doStart:
jetty-9.4.39.v20210325; built: 2021-03-25T14:42:11.471Z; git:
9fc7ca5a922f2a37b84ec9dbc26a5168cee7e667; jvm 1.8.0_282-b08 2021-04-03
11:15:44.653+0000
[id=1] INFO o.e.j.w.StandardDescriptorProcessor#visitServlet: NO JSP
Support for /, did not find org.eclipse.jetty.jsp.JettyJspServlet
2021-04-03 11:15:44.695+0000
[id=1] INFO o.e.j.s.s.DefaultSessionIdManager#doStart:
DefaultSessionIdManager workerName=node0 2021-04-03 11:15:44.695+0000
[id=1] INFO o.e.j.s.s.DefaultSessionIdManager#doStart: No
SessionScavenger set, using defaults 2021-04-03 11:15:44.696+0000
[id=1] INFO o.e.j.server.session.HouseKeeper#startScavenging: node0
Scavenging every 660000ms 2021-04-03 11:15:45.081+0000
[id=1] INFO hudson.WebAppMain#contextInitialized: Jenkins home
directory: /var/jenkins_home found at:
EnvVars.masterEnvVars.get("JENKINS_HOME") 2021-04-03 11:15:45.203+0000
[id=1] INFO o.e.j.s.handler.ContextHandler#doStart: Started
w.#24f43aa3{Jenkins
v2.286,/,file:///var/jenkins_home/war/,AVAILABLE}{/var/jenkins_home/war}
2021-04-03 11:15:45.241+0000
[id=1] INFO o.e.j.server.AbstractConnector#doStart: Started
ServerConnector#4f0f2942{HTTP/1.1, (http/1.1)}{0.0.0.0:8080}
2021-04-03 11:15:45.241+0000
[id=1] INFO org.eclipse.jetty.server.Server#doStart: Started #2616ms
2021-04-03 11:15:45.245+0000 [id=21] INFO winstone.Logger#logInternal:
Winstone Servlet Engine running: controlPort=disabled 2021-04-03
11:15:46.479+0000 [id=26] INFO jenkins.InitReactorRunner$1#onAttained:
Started initialization 2021-04-03 11:15:46.507+0000
[id=26] INFO jenkins.InitReactorRunner$1#onAttained: Listed all
plugins 2021-04-03 11:15:47.654+0000
[id=27] INFO jenkins.InitReactorRunner$1#onAttained: Prepared all
plugins 2021-04-03 11:15:47.660+0000
[id=26] INFO jenkins.InitReactorRunner$1#onAttained: Started all
plugins 2021-04-03 11:15:47.680+0000
[id=27] INFO jenkins.InitReactorRunner$1#onAttained: Augmented all
extensions 2021-04-03 11:15:48.620+0000
[id=26] INFO jenkins.InitReactorRunner$1#onAttained: System config
loaded 2021-04-03 11:15:48.621+0000
[id=26] INFO jenkins.InitReactorRunner$1#onAttained: System config
adapted 2021-04-03 11:15:48.621+0000
[id=27] INFO jenkins.InitReactorRunner$1#onAttained: Loaded all jobs
2021-04-03 11:15:48.622+0000
[id=27] INFO jenkins.InitReactorRunner$1#onAttained: Configuration for
all jobs updated 2021-04-03 11:15:48.704+0000
[id=40] INFO hudson.model.AsyncPeriodicWork#lambda$doRun$0: Started
Download metadata 2021-04-03 11:15:48.722+0000
[id=40] INFO hudson.util.Retrier#start: Attempt #1 to do the action
check updates server 2021-04-03 11:15:49.340+0000
[id=26] INFO jenkins.install.SetupWizard#init:
************************************************************* Jenkins initial setup is required. An admin user has been created and a
password generated. Please use the following password to proceed to
installation: ab5dbf74145c405fb5a33456d4b97436 This may also be found
at: /var/jenkins_home/secrets/initialAdminPassword
************************************************************* 2021-04-03 11:16:08.107+0000
[id=27] INFO jenkins.InitReactorRunner$1#onAttained: Completed
initialization 2021-04-03 11:16:08.115+0000
[id=20] INFO hudson.WebAppMain$3#run: Jenkins is fully up and running
2021-04-03 11:16:08.331+0000
[id=40] INFO h.m.DownloadService$Downloadable#load: Obtained the
updated data file for hudson.tasks.Maven.MavenInstaller 2021-04-03
11:16:08.332+0000 [id=40] INFO hudson.util.Retrier#start: Performed
the action check updates server successfully at the attempt #1
2021-04-03 11:16:08.334+0000
[id=40] INFO hudson.model.AsyncPeriodicWork#lambda$doRun$0: Finished
Download metadata. 19,626 ms
Is Jenkins server is started at port 8080? because of this log message:
11:15:45.241+0000 [id=1] INFO o.e.j.server.AbstractConnector#doStart:
Started ServerConnector#4f0f2942{HTTP/1.1, (http/1.1)}{0.0.0.0:8080}
2021-04-03
I tried changing jenkins-deployment-yaml to point at port 8080 instead of 50000, resulting in the updated jenkins-deployment-yaml :
apiVersion: v1
kind: Service
metadata:
name: my-jenkins-pod
spec:
ports:
- protocol: "TCP"
port: 8080
But the same error is returned when I attempt to access http://161.61.222.16:5001/
Are my port mappings incorrect? Is this a correct method of adding an existing docker container that is available on the docker hub to a Kubernetes cluster?
Update:
The result of command kubectl describe services my-jenkins-pod-public is :
Name: my-jenkins-pod-public
Namespace: default
Labels: <none>
Annotations: kubernetes.digitalocean.com/load-balancer-id: d46ae9ae-6e8a-4fd8-aa58-43c08310059a
Selector: app=my-jenkins-pod
Type: LoadBalancer
IP Families: <none>
IP: 10.245.152.228
IPs: 10.245.152.228
LoadBalancer Ingress: 161.61.222.16
Port: <unset> 5001/TCP
TargetPort: 5001/TCP
NodePort: <unset> 30878/TCP
Endpoints: 10.214.12.12:5001
Session Affinity: None
External Traffic Policy: Cluster
Events: <none>
Trying to access http://161.61.222.16:30878/ via browser returns:
This site can’t be reached159.65.211.46 refused to connect. Try:
Checking the connection Checking the proxy and the firewall
ERR_CONNECTION_REFUSED
Trying to access http://161.61.222.16:5001/ via browser returns:
This page isn’t working
161.61.222.16 didn’t send any data.
ERR_EMPTY_RESPONSE
Seems the port 5001 is exposed/accessible but is not sending any data.
I also tried accessing 10.214.12.12 on ports 5001 & 30878 but both requests time out.

You need to use http://161.61.222.16:30878/ from outside of the host which is running containers on. Port 5001 is just accessible inside the cluster with internal IP (9.5.52.28 is in your case). Whenever you expose your deployment, automatically (also you can define manually) one of the NodePort (by default between 30000 - 32767)assign to the service for external request.
For service details, you need to run the below command. The command output will give you NodePort and another details.
kubectl describe services my-service
Please check related kubernetes documentation
Also you have configured service with port 5001 but Jenkins working with 8080 as far as I see in logs. Try to change target port of service to 8080 from 5001

Related

Failed to connect to spark-master:7077

I am trying to deploy my spark application on Kubernetes. I followed the below steps:
Installed spark-kubernetes-operator:
helm repo add spark-operator https://googlecloudplatform.github.io/spark-on-k8s-operator
helm install gcp-spark-operator spark-operator/spark-operator
Created a spark-app.py
from pyspark.sql.functions import *
from pyspark.sql import *
from pyspark import SparkContext
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType,StructField, StringType, IntegerType
if __name__ == "__main__":
spark = SparkSession.builder.appName('spark-on-kubernetes-test').getOrCreate()
data2 = [("James","","Smith","36636","M",3000),
("Michael","Rose","","40288","M",4000),
("Robert","","Williams","42114","M",4000),
("Maria","Anne","Jones","39192","F",4000),
("Jen","Mary","Brown","","F",-1)
]
schema = StructType([ \
StructField("firstname",StringType(),True), \
StructField("middlename",StringType(),True), \
StructField("lastname",StringType(),True), \
StructField("id", StringType(), True), \
StructField("gender", StringType(), True), \
StructField("salary", IntegerType(), True) \
])
df = spark.createDataFrame(data=data2,schema=schema)
df.printSchema()
df.show(truncate=False)
print("program is completed !")
Then I created the new image with my application:
FROM bitnami/spark
USER root
WORKDIR /app
COPY requirements.txt .
RUN pip3 install -r requirements.txt
COPY spark-app.py .<
Then I created the spark-application.yaml file:
apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
name: pyspark-app
namespace: default
spec:
type: Python
mode: cluster
image: "test/spark-k8s-app:1.0"
imagePullPolicy: Always
mainApplicationFile: local:///app/spark-app.py
sparkVersion: 3.3.0
restartPolicy:
type: OnFailure
onFailureRetries: 3
onFailureRetryInterval: 10
onSubmissionFailureRetries: 5
onSubmissionFailureRetryInterval: 20
driver:
cores: 1
coreLimit: "1200m"
memory: "512m"
serviceAccount: spark
labels:
version: 3.3.0
volumeMounts:
- name: "test-volume"
mountPath: "/tmp"
executor:
cores: 1
instances: 1
memory: "512m"
labels:
version: 3.3.0
volumeMounts:
- name: "test-volume"
mountPath: "/tmp"
But when I tried to deploy the yaml file, I am getting the below error:
$ kubectl logs pyspark-app-driver
←[38;5;6m ←[38;5;5m13:04:01.12 ←[0m
←[38;5;6m ←[38;5;5m13:04:01.12 ←[0m←[1mWelcome to the Bitnami spark container←[0m
←[38;5;6m ←[38;5;5m13:04:01.12 ←[0mSubscribe to project updates by watching ←[1mhttps://github.com/bitnami/containers←[0m
←[38;5;6m ←[38;5;5m13:04:01.12 ←[0mSubmit issues and feature requests at ←[1mhttps://github.com/bitnami/containers/issues←[0m
←[38;5;6m ←[38;5;5m13:04:01.12 ←[0m
22/09/24 13:04:04 INFO SparkContext: Running Spark version 3.3.0
22/09/24 13:04:04 INFO ResourceUtils: ==============================================================
22/09/24 13:04:04 INFO ResourceUtils: No custom resources configured for spark.driver.
22/09/24 13:04:04 INFO ResourceUtils: ==============================================================
22/09/24 13:04:04 INFO SparkContext: Submitted application: ml_framework
22/09/24 13:04:04 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 512, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)
22/09/24 13:04:04 INFO ResourceProfile: Limiting resource is cpus at 1 tasks per executor
22/09/24 13:04:04 INFO ResourceProfileManager: Added ResourceProfile id: 0
22/09/24 13:04:04 INFO SecurityManager: Changing view acls to: root
22/09/24 13:04:04 INFO SecurityManager: Changing modify acls to: root
22/09/24 13:04:04 INFO SecurityManager: Changing view acls groups to:
22/09/24 13:04:04 INFO SecurityManager: Changing modify acls groups to:
22/09/24 13:04:04 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users
with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
22/09/24 13:04:05 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22/09/24 13:04:05 INFO Utils: Successfully started service 'sparkDriver' on port 7078.
22/09/24 13:04:05 INFO SparkEnv: Registering MapOutputTracker
22/09/24 13:04:05 INFO SparkEnv: Registering BlockManagerMaster
22/09/24 13:04:05 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
22/09/24 13:04:05 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
22/09/24 13:04:05 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
22/09/24 13:04:05 INFO DiskBlockManager: Created local directory at /var/data/spark-019ba05b-dba8-4350-a281-ffa35b54d840/blockmgr-20d5c478-8e93-42f6-85a9-9ed070f50b2b
22/09/24 13:04:05 INFO MemoryStore: MemoryStore started with capacity 117.0 MiB
22/09/24 13:04:05 INFO SparkEnv: Registering OutputCommitCoordinator
22/09/24 13:04:06 INFO Utils: Successfully started service 'SparkUI' on port 4040.
22/09/24 13:04:06 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://spark-master:7077...
22/09/24 13:04:10 WARN TransportClientFactory: DNS resolution failed for spark-master:7077 took 4005 ms
22/09/24 13:04:10 WARN StandaloneAppClient$ClientEndpoint: Failed to connect to master spark-master:7077
org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:102)
at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:110)
at org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint$$anon$1.run(StandaloneAppClient.scala:107)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.io.IOException: Failed to connect to spark-master:7077
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:288)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:218)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:230)
at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:204)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:202)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:198)
... 4 more
Caused by: java.net.UnknownHostException: spark-master
at java.net.InetAddress.getAllByName0(InetAddress.java:1287)
at java.net.InetAddress.getAllByName(InetAddress.java:1199)
at java.net.InetAddress.getAllByName(InetAddress.java:1127)
at java.net.InetAddress.getByName(InetAddress.java:1077)
at io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:156)
at io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:153)
at java.security.AccessController.doPrivileged(Native Method)
at io.netty.util.internal.SocketUtils.addressByName(SocketUtils.java:153)
at io.netty.resolver.DefaultNameResolver.doResolve(DefaultNameResolver.java:41)
at io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:61)
at io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:53)
at io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:55)
at io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:31)
at io.netty.resolver.AbstractAddressResolver.resolve(AbstractAddressResolver.java:106)
at io.netty.bootstrap.Bootstrap.doResolveAndConnect0(Bootstrap.java:206)
at io.netty.bootstrap.Bootstrap.access$000(Bootstrap.java:46)
at io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:180)
at io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:166)
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:578)
at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:552)
at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:491)
at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:616)
at io.netty.util.concurrent.DefaultPromise.setSuccess0(DefaultPromise.java:605)
at io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:104)
at io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:84)
at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetSuccess(AbstractChannel.java:990)
at io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:516)
at io.netty.channel.AbstractChannel$AbstractUnsafe.access$200(AbstractChannel.java:429)
at io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:486)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:503)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
... 1 more
22/09/24 13:04:26 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://spark-master:7077...
22/09/24 13:04:30 WARN TransportClientFactory: DNS resolution failed for spark-master:7077 took 4006 ms
22/09/24 13:04:30 WARN StandaloneAppClient$ClientEndpoint: Failed to connect to master spark-master:7077
org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:102)
at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:110)
at org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint$$anon$1.run(StandaloneAppClient.scala:107)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.io.IOException: Failed to connect to spark-master:7077
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:288)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:218)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:230)
at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:204)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:202)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:198)
... 4 more
Caused by: java.net.UnknownHostException: spark-master
at java.net.InetAddress.getAllByName0(InetAddress.java:1287)
at java.net.InetAddress.getAllByName(InetAddress.java:1199)
at java.net.InetAddress.getAllByName(InetAddress.java:1127)
at java.net.InetAddress.getByName(InetAddress.java:1077)
at io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:156)
at io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:153)
at java.security.AccessController.doPrivileged(Native Method)
at io.netty.util.internal.SocketUtils.addressByName(SocketUtils.java:153)
at io.netty.resolver.DefaultNameResolver.doResolve(DefaultNameResolver.java:41)
at io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:61)
at io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:53)
at io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:55)
at io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:31)
at io.netty.resolver.AbstractAddressResolver.resolve(AbstractAddressResolver.java:106)
at io.netty.bootstrap.Bootstrap.doResolveAndConnect0(Bootstrap.java:206)
at io.netty.bootstrap.Bootstrap.access$000(Bootstrap.java:46)
at io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:180)
at io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:166)
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:578)
at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:552)
at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:491)
at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:616)
at io.netty.util.concurrent.DefaultPromise.setSuccess0(DefaultPromise.java:605)
at io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:104)
at io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:84)
at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetSuccess(AbstractChannel.java:990)
at io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:516)
at io.netty.channel.AbstractChannel$AbstractUnsafe.access$200(AbstractChannel.java:429)
at io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:486)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469)9)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:503)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
... 1 more
How can I resolve this?

Can't access minikube service using NodePort from host on Mac

I'm trying to deploy a single web application to Minikube on my Mac, and then access it in the browser. I'm trying to use the simplest of setups, but it's not working, I just get a "connection refused" error and I can't figure out why.
This is what I'm trying:
$ minikube start --insecure-registry=docker.example.com:5000
😄 minikube v1.12.3 on Darwin 10.14.6
✨ Using the docker driver based on existing profile
👍 Starting control plane node minikube in cluster minikube
🔄 Restarting existing docker container for "minikube" ...
🐳 Preparing Kubernetes v1.18.3 on Docker 19.03.8 ...
🔎 Verifying Kubernetes components...
🌟 Enabled addons: default-storageclass, storage-provisioner
🏄 Done! kubectl is now configured to use "minikube"
$ eval $(minikube -p minikube docker-env)
$ docker build -t web-test .
Sending build context to Docker daemon 16.66MB
Step 1/3 : FROM docker.example.com/library/openjdk:11-jdk-slim
11-jdk-slim: Pulling from library/openjdk
bf5952930446: Pull complete
092c9b8e633f: Pull complete
0b793152b850: Pull complete
7900923f09cb: Pull complete
Digest: sha256:b5d8f95b23481a9d9d7e73c108368de74abb9833c3fae80e6bdfa750663d1b97
Status: Downloaded newer image for docker.example.com/library/openjdk:11-jdk-slim
---> de8b1b4806af
Step 2/3 : COPY target/web-test-0.0.1-SNAPSHOT.jar app.jar
---> 6838e3db240a
Step 3/3 : ENTRYPOINT ["java","-Djava.security.egd=file:/dev/./urandom","-jar","app.jar"]
---> Running in 550bf762bf2d
Removing intermediate container 550bf762bf2d
---> ce1468d1ff10
Successfully built ce1468d1ff10
Successfully tagged web-test:latest
$ kubectl apply -f web-test-service.yaml
service/web-test unchanged
$ kubectl apply -f web-test-deployment.yaml
deployment.apps/web-test configured
$ kubectl get po -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
web-test-6bb45ffc54-8mxbc 1/1 Running 0 16m 172.18.0.2 minikube <none> <none>
$ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 16m
web-test NodePort 10.102.19.201 <none> 8080:31317/TCP 16m
$ minikube ip
127.0.0.1
$ curl http://127.0.0.1:31317
curl: (7) Failed to connect to 127.0.0.1 port 31317: Connection refused
$ kubectl logs web-test-6bb45ffc54-8mxbc
. ____ _ __ _ _
/\\ / ___'_ __ _ _(_)_ __ __ _ \ \ \ \
( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
\\/ ___)| |_)| | | | | || (_| | ) ) ) )
' |____| .__|_| |_|_| |_\__, | / / / /
=========|_|==============|___/=/_/_/_/
:: Spring Boot :: (v2.3.3.RELEASE)
2020-08-26 14:45:32.692 INFO 1 --- [ main] com.example.web.WebTestApplication : Starting WebTestApplication v0.0.1-SNAPSHOT on web-test-6bb45ffc54-8mxbc with PID 1 (/app.jar started by root in /)
2020-08-26 14:45:32.695 INFO 1 --- [ main] com.example.web.WebTestApplication : No active profile set, falling back to default profiles: default
2020-08-26 14:45:34.041 INFO 1 --- [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat initialized with port(s): 8080 (http)
2020-08-26 14:45:34.053 INFO 1 --- [ main] o.apache.catalina.core.StandardService : Starting service [Tomcat]
2020-08-26 14:45:34.053 INFO 1 --- [ main] org.apache.catalina.core.StandardEngine : Starting Servlet engine: [Apache Tomcat/9.0.37]
2020-08-26 14:45:34.135 INFO 1 --- [ main] o.a.c.c.C.[Tomcat].[localhost].[/] : Initializing Spring embedded WebApplicationContext
2020-08-26 14:45:34.135 INFO 1 --- [ main] w.s.c.ServletWebServerApplicationContext : Root WebApplicationContext: initialization completed in 1355 ms
2020-08-26 14:45:34.587 INFO 1 --- [ main] o.s.s.concurrent.ThreadPoolTaskExecutor : Initializing ExecutorService 'applicationTaskExecutor'
2020-08-26 14:45:34.797 INFO 1 --- [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat started on port(s): 8080 (http) with context path ''
2020-08-26 14:45:34.810 INFO 1 --- [ main] com.example.web.WebTestApplication : Started WebTestApplication in 2.808 seconds (JVM running for 3.426)
$ minikube ssh
docker#minikube:~$ curl 10.102.19.201:8080
Up and Running
docker#minikube:~$
As you can see, the web app is up and running, and I can access it from inside the cluster by doing a minikube ssh, but from outside the cluster, it won't connect. These are my service and deployment manifests:
web-test-service.yaml:
apiVersion: v1
kind: Service
metadata:
labels:
app: web-test
name: web-test
spec:
type: NodePort
ports:
- nodePort: 31317
port: 8080
protocol: TCP
targetPort: 8080
selector:
app: web-test
web-test-deployment.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: web-test
name: web-test
spec:
replicas: 1
selector:
matchLabels:
app: web-test
strategy: {}
template:
metadata:
labels:
app: web-test
spec:
containers:
- image: web-test
imagePullPolicy: Never
name: web-test
ports:
- containerPort: 8080
restartPolicy: Always
status: {}
Anyone have any idea what I'm doing wrong? Or perhaps how I could try to diagnose the issue further? I have allow tried deploying an ingress, but that doesn't work either.
You are mostly facing this issue when you use minikube ip which returns 127.0.0.1. It should work if you use internal ip from kubectl get node -o wide instead of 127.0.0.1.
A much easier approach from the official reference docs is you can get the url using minikube service web-test --url and use it in browser or if you use minikube service web-test it will open the url in browser directly.
Your deployment yamls and everything else looks good and hopefully should not have any issue when deploying to a remote cluster.
It seems that is related to the default docker driver used when you start the minikube. To avoid these problems you can force a specific driver (e.g. "virtualbox"). To do so, follow the next steps:
Remove old minikube with:
minikube delete
Start minikube with virtualbox driver:
minikube start --memory=4096 --driver=virtualbox
Run minikube ip. You'll see an output like 192.168.99.100.
Then, create again the Pods and the service and it should work properly.
I've found this info in this issue: https://github.com/kubernetes/minikube/issues/7344#issuecomment-703225254
You can export an Service from minikube with minikube service web-test
https://kubernetes.io/docs/tutorials/hello-minikube/#create-a-service
Edit:
If you have a deployment, you can export that deployment with the following kubectl command.
minikube kubectl -- expose deployment your-deployment --port 80 --type=LoadBalancer
Just in case you have not already stumbled across a broader concept for accessing a nodeport service that applies in general vs proprietary minikube constructs:
$ k get service -A
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP
default nginx LoadBalancer 10.43.228.207 172.22.0.240 80:30467/TCP 11h
$ kubectl port-forward --address 0.0.0.0 service/nginx 8082:80
Then from a different host on my network, I do: curl [the host running minikube]:8082
Forwarding from 0.0.0.0:8082 -> 80
Handling connection for 8082
Then you can connect from a different host as well.
docker-desktop UI for Mac and Windows provides an easier alternative compared to minikube, which you could simply activate the Kubernetes feature on your docker-desktop UI:
once it is setup you can right click on the docker desktop icon > Kubernetes
To verify now that your deployement/service works properly:
kubectl apply -f /file.yaml
One checkpoint we should keep in mind for ports.
targetPort: 80
Belongs to the port which we have exposed in our(Dockerfile or Docker-compose file). If the port is
mismatched you won’t be able to access it.
The answer is never use Minikube. It does not allow you to use Nodeport connections. You will always get ECONNREFUSED with minikube no matter what. Just use the docker desktop context, kill minikube, and then re-apply your services. Minikube is only there to further confuse people who are learning Kubernetes.

Docker image works, Kubernetes Pod not working. Ubuntu. Log: /bin/sh: [npm,start]: not found

I'm taking a course that uses Kubernetes and am running into an error when I try to create a pod in Kubernetes.
I'm using Ubuntu, AMD64
I installed microk8s.kubectl following these instructions (https://ubuntu.com/kubernetes/install)
Here's my Dockerfile which runs correctly when I use only Docker.
FROM node:alpine
WORKDIR /app
COPY package.json ./
RUN npm install
COPY ./ ./
CMD ["npm", "start"]
Here's my posts.yaml file, verbatim to the course I'm taking
apiVersion: v1
kind: Pod
metadata:
name: posts
spec:
containers:
- name: posts
image: emendoza1986/blog_posts:0.0.1
output from kubectl get pods
NAME READY STATUS RESTARTS AGE
posts 0/1 CrashLoopBackOff 6 10m
output from kubectl logs posts
/bin/sh: [npm,start]: not found
output from kubectl describe pod posts
Name: posts
Namespace: default
Priority: 0
Node: desktope/192.168.0.18
Start Time: Thu, 23 Jul 2020 10:58:40 -0700
Labels: <none>
Annotations: Status: Running
IP: 10.1.87.20
IPs:
IP: 10.1.87.20
Containers:
posts:
Container ID: containerd://acb403c53759670370959cfa2cc0939f53126aee889e1f6dc2e831bc4dc22c3c
Image: emendoza1986/blog_posts:0.0.1
Image ID: docker.io/emendoza1986/blog_posts#sha256:f69b30cf0382d4c273643ac11c505378854b966063974cc57d187718cc0b0fd5
Port: <none>
Host Port: <none>
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 127
Started: Thu, 23 Jul 2020 10:58:59 -0700
Finished: Thu, 23 Jul 2020 10:58:59 -0700
Ready: False
Restart Count: 2
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-2fm2c (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-2fm2c:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-2fm2c
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 48s default-scheduler Successfully assigned default/posts to desktope
Normal Pulled 29s (x3 over 47s) kubelet, desktope Container image "emendoza1986/blog_posts:0.0.1" already present on machine
Normal Created 29s (x3 over 47s) kubelet, desktope Created container posts
Normal Started 29s (x3 over 47s) kubelet, desktope Started container posts
Warning BackOff 12s (x4 over 45s) kubelet, desktope Back-off restarting failed container
output from microk8s.status
microk8s is running
addons:
dashboard: enabled
dns: enabled
metrics-server: enabled
cilium: disabled
fluentd: disabled
gpu: disabled
helm: disabled
helm3: disabled
host-access: disabled
ingress: disabled
istio: disabled
jaeger: disabled
knative: disabled
kubeflow: disabled
linkerd: disabled
metallb: disabled
prometheus: disabled
rbac: disabled
registry: disabled
storage: disabled
output from microk8s inspect
Inspecting Certificates
Inspecting services
Service snap.microk8s.daemon-cluster-agent is running
Service snap.microk8s.daemon-containerd is running
Service snap.microk8s.daemon-apiserver is running
Service snap.microk8s.daemon-apiserver-kicker is running
Service snap.microk8s.daemon-proxy is running
Service snap.microk8s.daemon-kubelet is running
Service snap.microk8s.daemon-scheduler is running
Service snap.microk8s.daemon-controller-manager is running
Service snap.microk8s.daemon-flanneld is running
Service snap.microk8s.daemon-etcd is running
Copy service arguments to the final report tarball
Inspecting AppArmor configuration
Gathering system information
Copy processes list to the final report tarball
Copy snap list to the final report tarball
Copy VM name (or none) to the final report tarball
Copy disk usage information to the final report tarball
Copy memory usage information to the final report tarball
Copy server uptime to the final report tarball
Copy current linux distribution to the final report tarball
Copy openSSL information to the final report tarball
Copy network configuration to the final report tarball
Inspecting kubernetes cluster
Inspect kubernetes cluster
Building the report tarball
Report tarball is at /var/snap/microk8s/1503/inspection-report-20200723_112646.tar.gz
I see the error coming from the log but I haven't been able to find a solution. Thank you for your help!
Thank you for the helpful comments. Originally I had my Dockerfile as
CMD ['npm', 'start'].
I had fixed it locally to
CMD ["npm", "start"]
but I didn't push the new version to docker hub.
Pushing the new version fixed the problem.

Kubernetes Calico node 'XXXXXXXXXXX' already using IPv4 Address XXXXXXXXX, CrashLoopBackOff

I used the AWS Kubernetes Quickstart to create a Kubernetes cluster in a VPC and private subnet: https://aws-quickstart.s3.amazonaws.com/quickstart-heptio/doc/heptio-kubernetes-on-the-aws-cloud.pdf. It was running fine for a while. I have Calico installed on my Kubernetes cluster. I have two nodes and a master. The calico pods on the master are running fine, the ones on the nodes are in crashloopbackoff state:
NAME READY STATUS RESTARTS AGE
calico-etcd-ztwjj 1/1 Running 1 55d
calico-kube-controllers-685755779f-ftm92 1/1 Running 2 55d
calico-node-gkjgl 1/2 CrashLoopBackOff 270 22h
calico-node-jxkvx 2/2 Running 4 55d
calico-node-mxhc5 1/2 CrashLoopBackOff 9 25m
Describing one of the crashed pods:
ubuntu#ip-10-0-1-133:~$ kubectl describe pod calico-node-gkjgl -n kube-system
Name: calico-node-gkjgl
Namespace: kube-system
Node: ip-10-0-0-237.us-east-2.compute.internal/10.0.0.237
Start Time: Mon, 17 Sep 2018 16:56:41 +0000
Labels: controller-revision-hash=185957727
k8s-app=calico-node
pod-template-generation=1
Annotations: scheduler.alpha.kubernetes.io/critical-pod=
Status: Running
IP: 10.0.0.237
Controlled By: DaemonSet/calico-node
Containers:
calico-node:
Container ID: docker://d89979ba963c33470139fd2093a5427b13c6d44f4c6bb546c9acdb1a63cd4f28
Image: quay.io/calico/node:v3.1.1
Image ID: docker-pullable://quay.io/calico/node#sha256:19fdccdd4a90c4eb0301b280b50389a56e737e2349828d06c7ab397311638d29
Port: <none>
Host Port: <none>
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Tue, 18 Sep 2018 15:14:44 +0000
Finished: Tue, 18 Sep 2018 15:14:44 +0000
Ready: False
Restart Count: 270
Requests:
cpu: 250m
Liveness: http-get http://:9099/liveness delay=10s timeout=1s period=10s #success=1 #failure=6
Readiness: http-get http://:9099/readiness delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
ETCD_ENDPOINTS: <set to the key 'etcd_endpoints' of config map 'calico-config'> Optional: false
CALICO_NETWORKING_BACKEND: <set to the key 'calico_backend' of config map 'calico-config'> Optional: false
CLUSTER_TYPE: kubeadm,bgp
CALICO_DISABLE_FILE_LOGGING: true
CALICO_K8S_NODE_REF: (v1:spec.nodeName)
FELIX_DEFAULTENDPOINTTOHOSTACTION: ACCEPT
CALICO_IPV4POOL_CIDR: 192.168.0.0/16
CALICO_IPV4POOL_IPIP: Always
FELIX_IPV6SUPPORT: false
FELIX_IPINIPMTU: 1440
FELIX_LOGSEVERITYSCREEN: info
IP: autodetect
FELIX_HEALTHENABLED: true
Mounts:
/lib/modules from lib-modules (ro)
/var/lib/calico from var-lib-calico (rw)
/var/run/calico from var-run-calico (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-cni-plugin-token-b7sfl (ro)
install-cni:
Container ID: docker://b37e0ec7eba690473a4999a31d9f766f7adfa65f800a7b2dc8e23ead7520252d
Image: quay.io/calico/cni:v3.1.1
Image ID: docker-pullable://quay.io/calico/cni#sha256:dc345458d136ad9b4d01864705895e26692d2356de5c96197abff0030bf033eb
Port: <none>
Host Port: <none>
Command:
/install-cni.sh
State: Running
Started: Mon, 17 Sep 2018 17:11:52 +0000
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Mon, 17 Sep 2018 16:56:43 +0000
Finished: Mon, 17 Sep 2018 17:10:53 +0000
Ready: True
Restart Count: 1
Environment:
CNI_CONF_NAME: 10-calico.conflist
ETCD_ENDPOINTS: <set to the key 'etcd_endpoints' of config map 'calico-config'> Optional: false
CNI_NETWORK_CONFIG: <set to the key 'cni_network_config' of config map 'calico-config'> Optional: false
Mounts:
/host/etc/cni/net.d from cni-net-dir (rw)
/host/opt/cni/bin from cni-bin-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-cni-plugin-token-b7sfl (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
lib-modules:
Type: HostPath (bare host directory volume)
Path: /lib/modules
HostPathType:
var-run-calico:
Type: HostPath (bare host directory volume)
Path: /var/run/calico
HostPathType:
var-lib-calico:
Type: HostPath (bare host directory volume)
Path: /var/lib/calico
HostPathType:
cni-bin-dir:
Type: HostPath (bare host directory volume)
Path: /opt/cni/bin
HostPathType:
cni-net-dir:
Type: HostPath (bare host directory volume)
Path: /etc/cni/net.d
HostPathType:
calico-cni-plugin-token-b7sfl:
Type: Secret (a volume populated by a Secret)
SecretName: calico-cni-plugin-token-b7sfl
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: :NoSchedule
:NoExecute
:NoSchedule
:NoExecute
CriticalAddonsOnly
node.kubernetes.io/disk-pressure:NoSchedule
node.kubernetes.io/memory-pressure:NoSchedule
node.kubernetes.io/not-ready:NoExecute
node.kubernetes.io/unreachable:NoExecute
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning BackOff 4m (x6072 over 22h) kubelet, ip-10-0-0-237.us-east-2.compute.internal Back-off restarting failed container
The logs for the same pod:
ubuntu#ip-10-0-1-133:~$ kubectl logs calico-node-gkjgl -n kube-system -c calico-node
2018-09-18 15:14:44.605 [INFO][8] startup.go 251: Early log level set to info
2018-09-18 15:14:44.605 [INFO][8] startup.go 269: Using stored node name from /var/lib/calico/nodename
2018-09-18 15:14:44.605 [INFO][8] startup.go 279: Determined node name: ip-10-0-0-237.us-east-2.compute.internal
2018-09-18 15:14:44.609 [INFO][8] startup.go 101: Skipping datastore connection test
2018-09-18 15:14:44.610 [INFO][8] startup.go 352: Building new node resource Name="ip-10-0-0-237.us-east-2.compute.internal"
2018-09-18 15:14:44.610 [INFO][8] startup.go 367: Initialize BGP data
2018-09-18 15:14:44.614 [INFO][8] startup.go 564: Using autodetected IPv4 address on interface ens3: 10.0.0.237/19
2018-09-18 15:14:44.614 [INFO][8] startup.go 432: Node IPv4 changed, will check for conflicts
2018-09-18 15:14:44.618 [WARNING][8] startup.go 861: Calico node 'ip-10-0-0-237' is already using the IPv4 address 10.0.0.237.
2018-09-18 15:14:44.618 [WARNING][8] startup.go 1058: Terminating
Calico node failed to start
So it seems like there is a conflict finding the node IP address, or Calico seems to think the IP is already assigned to another node. Doing a quick search i found this thread: https://github.com/projectcalico/calico/issues/1628. I see that this should be resolved by setting the IP_AUTODETECTION_METHOD to can-reach=DESTINATION, which I'm assuming would be "can-reach=10.0.0.237". This config is an environment variable set on calico/node container. I have been attempting to shell into the container itself, but kubectl tells me the container is not found:
ubuntu#ip-10-0-1-133:~$ kubectl exec calico-node-gkjgl --stdin --tty /bin/sh -c calico-node -n kube-system
error: unable to upgrade connection: container not found ("calico-node")
I'm suspecting this is due to Calico being unable to assign IPs. So I logged onto the host and attempt to shell on the container using docker:
root#ip-10-0-0-237:~# docker exec -it k8s_POD_calico-node-gkjgl_kube-system_a6998e98-ba9a-11e8-a9fa-0a97f5a48ef4_1 /bin/bash
rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "exec: \"/bin/bash\": stat /bin/bash: no such file or directory"
So I guess there is no shell to execute in the container. Makes sense why Kubernetes couldn't execute that. I tried running commands externally to list environment variables, but I haven't been able to find any, I could be running these commands wrong however:
root#ip-10-0-0-237:~# docker inspect -f '{{range $index, $value := .Config.Env}}{{$value}} {{end}}' k8s_POD_calico-node-gkjgl_kube-system_a6998e98-ba9a-11e8-a9fa-0a97f5a48ef4_1
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
root#ip-10-0-0-237:~# docker exec -it k8s_POD_calico-node-gkjgl_kube-system_a6998e98-ba9a-11e8-a9fa-0a97f5a48ef4_1 printenv IP_AUTODETECTION_METHOD
rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "exec: \"printenv\": executable file not found in $PATH"
root#ip-10-0-0-237:~# docker exec -it k8s_POD_calico-node-gkjgl_kube-system_a6998e98-ba9a-11e8-a9fa-0a97f5a48ef4_1 /bin/env
rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "exec: \"/bin/env\": stat /bin/env: no such file or directory"
Okay, so maybe I am going about this the wrong way. Should I attempt to change the Calico config files using Kubernetes and redeploy it? Where can I find these on my system? I haven't been able to find where to set the environment variables.
If you look at the Calico docs IP_AUTODETECTION_METHOD is already defaulting to first-round.
My guess is that something or the IP address is not being released by the previous 'run' of calico, or just simply a bug in the v3.1.1 version of calico.
Try:
Delete your Calico pods that are in a CrashBackOff loop
kubectl -n kube-system delete calico-node-gkjgl calico-node-mxhc5
Your pods will be re-created and hopefully initialize.
Upgrade Calico to v3.1.3 or latest. Follow these docs My guess is that Heptio's Calico installation is using the etcd datastore.
Try to understand how Heptio's AWS AMIs work and see if there are any issues with them. This might take some time so you could contact their support as well.
Try a different method to install Kubernetes with Calico. Well documented on https://kubernetes.io
For me what worked was to remove left over docker-networks on the Nodes.
I had to list out current networks on each Node: docker network list and then remove the unneeded ones: docker network rm <networkName>.
After doing that the calico deployment pods were running fine

Running kubernetes autoscalar

I have a replication controller running with the following spec:
apiVersion: v1
kind: ReplicationController
metadata:
name: owncloud-controller
spec:
replicas: 1
selector:
app: owncloud
template:
metadata:
labels:
app: owncloud
spec:
containers:
- name: owncloud
image: adimania/owncloud9-centos7
ports:
- containerPort: 80
volumeMounts:
- name: userdata
mountPath: /var/www/html/owncloud/data
resources:
requests:
cpu: 400m
volumes:
- name: userdata
hostPath:
path: /opt/data
Now I run a hpa using autoscale command.
$ kubectl autoscale rc owncloud-controller --max=5 --cpu-percent=10
I have also started heapster using kubernetes run command.
$ kubectl run heapster --image=gcr.io/google_containers/heapster:v1.0.2 --command -- /heapster --source=kubernetes:http://192.168.0.103:8080?inClusterConfig=false --sink=log
After all this, the autoscaling never kicks in. From logs, it seems that the actual CPU utilization is not getting reported.
$ kubectl describe hpa owncloud-controller
Name: owncloud-controller
Namespace: default
Labels: <none>
Annotations: <none>
CreationTimestamp: Thu, 26 May 2016 14:24:51 +0530
Reference: ReplicationController/owncloud-controller/scale
Target CPU utilization: 10%
Current CPU utilization: <unset>
Min replicas: 1
Max replicas: 5
ReplicationController pods: 1 current / 1 desired
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
44m 8s 92 {horizontal-pod-autoscaler } Warning FailedGetMetrics failed to get CPU consumption and request: metrics obtained for 0/1 of pods
44m 8s 92 {horizontal-pod-autoscaler } Warning FailedComputeReplicas failed to get CPU utilization: failed to get CPU consumption and request: metrics obtained for 0/1 of pods
What am I missing here?
Most probably heapster is running in a wrong namespace ("default"). HPA expects heapster to be in "kube-system" namespace. Please, add --namespace=kube-system to kubectl run heapster command.
I installed hepaster under the name space "kube-system" and it worked. After running heapster, make sure it's running before you use HPA for your application.
How to run Heapster with Kubernetes cluster
I put all files here https://gitlab.com/abushoeb/kubernetes/tree/master/heapster. They are collected from the official Kubernetes Repository and made minor changes.
How to run Heapster
Go to the directory heapster where you have grafana.yaml, heapster.yaml and influxdb.yaml and run following command
$ kubectl create -f .
How to stop Heapster
Go to the same heapster directory and then run following command
$ kubectl delete -f .
How to check Heapster is running
You can access heapster metric model from the pod where heapster is running to make sure heapster is working. It can be accessed via web browser by accessing http://heapster-pod-ip:heapster-service-port/api/v1/model/metrics/. The same result can be seen by executing following command.
$ curl -L http://heapster-pod-ip:heapster-service-port/api/v1/model/metrics/
If you see the list of metrics then heapster is running correctly. You can also browse grafana dashboard to see it (find the ip of the pod where grafana is running and the access it http://grafana-pod-ip:grafana-service-port).
Full documentation of Heapster Metric Model are available here.
Also just run ($ kubectl cluster-info) and see if it shows results like this:
Kubernetes master is running at https://cluster-ip:6443
Heapster is running at https://cluster-ip:6443/api/v1/proxy/namespaces/kube-system/services/heapster
kubernetes-dashboard is running at https://cluster-ip:6443/api/v1/proxy/namespaces/kube-system/services/kubernetes-dashboard
monitoring-grafana is running at https://cluster-ip:6443/api/v1/proxy/namespaces/kube-system/services/monitoring-grafana
monitoring-influxdb is running at https://cluster-ip:6443/api/v1/proxy/namespaces/kube-system/services/monitoring-influxdb
Check influxdb
You can also check influxdb if it has data in it. Install Influxdb Client on your local machine to get connected to infuxdb database.
$ influx -host <cluster-ip> -port <influxdb-service-port>
Some Sample influxdb queries
show databases
use db-name
show measurements
select value from "cpu/node_capacity"
Reference and Help
https://github.com/kubernetes/heapster/blob/master/docs/influxdb.md
https://github.com/kubernetes/heapster/blob/master/docs/debugging.md
https://blog.kublr.com/how-to-utilize-the-heapster-influxdb-grafana-stack-in-kubernetes-for-monitoring-pods-4a553f4d36c9
http://www.dasblinkenlichten.com/installing-cadvisor-and-heapster-on-bare-metal-kubernetes/
http://blog.arungupta.me/kubernetes-monitoring-heapster-influxdb-grafana/

Resources