RabbitMQ rabbits cannot find each other on Kubernetes - docker

To create a RabbitMQ cluster within my kubernetes cluster, I use kubernetes objects definition from the following source: https://wesmorgan.svbtle.com/rabbitmq-cluster-on-kubernetes-with-statefulsets
which are:
---
apiVersion: v1
kind: Service
metadata:
# Expose the management HTTP port on each node
name: rabbitmq-management
labels:
app: rabbitmq
spec:
ports:
- port: 15672
name: http
selector:
app: rabbitmq
type: NodePort # Or LoadBalancer in production w/ proper security
---
apiVersion: v1
kind: Service
metadata:
# The required headless service for StatefulSets
name: rabbitmq
labels:
app: rabbitmq
spec:
ports:
- port: 5672
name: amqp
- port: 4369
name: epmd
- port: 25672
name: rabbitmq-dist
clusterIP: None
selector:
app: rabbitmq
---
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: rabbitmq
spec:
serviceName: "rabbitmq"
replicas: 5
template:
metadata:
labels:
app: rabbitmq
spec:
containers:
- name: rabbitmq
image: rabbitmq:3.6.6-management-alpine
lifecycle:
postStart:
exec:
command:
- /bin/sh
- -c
- >
if [ -z "$(grep rabbitmq /etc/resolv.conf)" ]; then
sed "s/^search \([^ ]\+\)/search rabbitmq.\1 \1/" /etc/resolv.conf > /etc/resolv.conf.new;
cat /etc/resolv.conf.new > /etc/resolv.conf;
rm /etc/resolv.conf.new;
fi;
until rabbitmqctl node_health_check; do sleep 1; done;
if [[ "$HOSTNAME" != "rabbitmq-0" && -z "$(rabbitmqctl cluster_status | grep rabbitmq-0)" ]]; then
rabbitmqctl stop_app;
rabbitmqctl join_cluster rabbit#rabbitmq-0;
rabbitmqctl start_app;
fi;
rabbitmqctl set_policy ha-all "." '{"ha-mode":"exactly","ha-params":3,"ha-sync-mode":"automatic"}'
env:
- name: RABBITMQ_ERLANG_COOKIE
valueFrom:
secretKeyRef:
name: rabbitmq-config
key: erlang-cookie
ports:
- containerPort: 5672
name: amqp
volumeMounts:
- name: rabbitmq
mountPath: /var/lib/rabbitmq
volumeClaimTemplates:
- metadata:
name: rabbitmq
annotations:
volume.alpha.kubernetes.io/storage-class: anything
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 1Gi # make this bigger in production
With this configuration, pods cannot find each other on the network with the following error:
unable to connect to epmd (port 4369) on rabbitmq-1: nxdomain (non-existing domain)

Related

issues setting up traefik with kubernetes using a simple container

Not sure what I am missing, trying to set up a simple traefik environment with kubernetes proxying the errm/cheese:cheddar docker container to cheddar.minikube
Prerequisite:
have minikube setup
git clone # personal repo that is now deleted. see solution below
# setup.sh will delete current minikube environment then recreate it
./setup.sh
# add IP to minikube
echo `minikube ip` cheddar.minikube | sudo tee -a /etc/hosts
after running
minikube delete
minikube start
kubectl apply -f https://raw.githubusercontent.com/traefik/traefik/v2.9/docs/content/reference/dynamic-configuration/kubernetes-crd-definition-v1.yml
kubectl apply -f https://raw.githubusercontent.com/traefik/traefik/v2.9/docs/content/reference/dynamic-configuration/kubernetes-crd-rbac.yml
kubectl apply -f traefik-deployment.yaml -f traefik-whoami.yaml
with...
traefik-deployment.yaml:
apiVersion: v1
kind: ServiceAccount
metadata:
namespace: default
name: traefik-ingress-controller
---
kind: Deployment
apiVersion: apps/v1
metadata:
namespace: default
name: traefik
labels:
app: traefik
spec:
replicas: 1
selector:
matchLabels:
app: traefik
template:
metadata:
labels:
app: traefik
spec:
hostNetwork: true
serviceAccountName: traefik-ingress-controller
containers:
- name: traefik
image: traefik:v2.9
args:
- --api.insecure
- --accesslog
- --entrypoints.web.Address=:80
- --entrypoints.websecure.Address=:443
- --providers.kubernetescrd
ports:
- name: web
containerPort: 8000
# hostPort: 80
- name: websecure
containerPort: 4443
# hostPort: 443
- name: admin
containerPort: 8080
# hostPort: 8080
securityContext:
capabilities:
drop:
- ALL
add:
- NET_BIND_SERVICE
---
apiVersion: v1
kind: Service
metadata:
name: traefik
spec:
ports:
- protocol: TCP
name: web
port: 80
- protocol: TCP
name: admin
port: 8080
- protocol: TCP
name: websecure
port: 443
selector:
app: traefik
traefik-whoami.yaml:
kind: Deployment
apiVersion: apps/v1
metadata:
namespace: default
name: whoami
labels:
app: whoami
spec:
replicas: 2
selector:
matchLabels:
app: whoami
template:
metadata:
labels:
app: whoami
spec:
containers:
- name: whoami
image: traefik/whoami
ports:
- name: web
containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: whoami
spec:
ports:
- protocol: TCP
name: web
port: 80
selector:
app: whoami
---
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
name: simpleingressroute
namespace: default
spec:
entryPoints:
- web
routes:
- match: PathPrefix(`/notls`)
kind: Rule
services:
- name: whoami
port: 80
I was able to get a simple container working with traefik in kubernetes at:
echo `minikube ip`/notls

Rabbitmq pod on kubernetes struck in pod initialization state

I am running 3 nodes rabbitmq cluster on the Kubernetes. Kubernetes cluster is running on the AWS spot instances and somehow one of the Kubernetes nodes got terminated unexpectedly on with one of the Rabbitmq pods was running. Now the pod git scheduled ona another node and Since then my rabbitmq pod is stuck in the pod initialization state.
Kubernetes event says "FailedPostStartHook".
Logs:
9m46s Warning FailedPostStartHook pod/rabbitmq-0 Exec lifecycle hook ([/bin/sh -c until rabbitmqctl --erlang-cookie ${RABBITMQ_ERLANG_COOKIE} node_health_check; do sleep 1; done; rabbitmqctl --erlang-cookie ${RABBITMQ_ERLANG_COOKIE} set_policy ha-all "" '{"ha-mode":"all", "ha-sync-mode": "automatic"}'
]) for Container "rabbitmq" in Pod "rabbitmq-0_devops(c96c1a6e-bf9a-450d-828d-ed0e8a0ad949)" failed - error: command '/bin/sh -c until rabbitmqctl --erlang-cookie ${RABBITMQ_ERLANG_COOKIE} node_health_check; do sleep 1; done; rabbitmqctl --erlang-cookie ${RABBITMQ_ERLANG_COOKIE} set_policy ha-all "" '{"ha-mode":"all", "ha-sync-mode": "automatic"}'
' exited with 137: Error: unable to perform an operation on node 'rabbit#rabbitmq-0.rabbitmq-service.devops.svc.cluster.local'. Please see diagnostics information and suggestions below.
Most common reasons for this are:
* Target node is unreachable (e.g. due to hostname resolution, TCP connection or firewall issues)
* CLI tool fails to authenticate with the server (e.g. due to CLI tool's Erlang cookie not matching that of the server)
* Target node is not running
In addition to the diagnostics info below:
* See the CLI, clustering and networking guides on https://rabbitmq.com/documentation.html to learn more
* Consult server logs on node rabbit#rabbitmq-0.rabbitmq-service.devops.svc.cluster.local
* If target node is configured to use long node names, don't forget to use --longnames with CLI tools
DIAGNOSTICS
===========
attempted to contact: ['rabbit#rabbitmq-0.rabbitmq-service.devops.svc.cluster.local']
Kubernetes statefulset manifest:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: rabbitmq
namespace: devops
spec:
podManagementPolicy: OrderedReady
replicas: 3
revisionHistoryLimit: 3
selector:
matchLabels:
app: rabbitmq
serviceName: rabbitmq-service
template:
metadata:
annotations:
labels:
app: rabbitmq
name: rabbitmq
spec:
containers:
- env:
- name: HOSTNAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: RABBITMQ_USE_LONGNAME
value: "true"
- name: RABBITMQ_BASIC_AUTH
valueFrom:
secretKeyRef:
key: password
name: rabbitmq
- name: RABBITMQ_NODENAME
value: rabbit#$(HOSTNAME).rabbitmq-service.$(NAMESPACE).svc.cluster.local
- name: K8S_SERVICE_NAME
value: rabbitmq-service
- name: RABBITMQ_DEFAULT_USER
value: admin
- name: RABBITMQ_DEFAULT_PASS
valueFrom:
secretKeyRef:
key: password
name: rabbitmq
- name: RABBITMQ_ERLANG_COOKIE
value: some-cookie
- name: NODE_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
image: rabbitmq:3.8.1-management-alpine
imagePullPolicy: IfNotPresent
lifecycle:
postStart:
exec:
command:
- /bin/sh
- -c
- |
until rabbitmqctl --erlang-cookie ${RABBITMQ_ERLANG_COOKIE} node_health_check; do sleep 1; done; rabbitmqctl --erlang-cookie ${RABBITMQ_ERLANG_COOKIE} set_policy ha-all "" '{"ha-mode":"all", "ha-sync-mode": "automatic"}'
livenessProbe:
exec:
command:
- rabbitmqctl
- status
failureThreshold: 3
initialDelaySeconds: 60
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 30
name: rabbitmq
ports:
- containerPort: 4369
protocol: TCP
- containerPort: 5672
protocol: TCP
- containerPort: 5671
protocol: TCP
- containerPort: 25672
protocol: TCP
- containerPort: 15672
protocol: TCP
readinessProbe:
exec:
command:
- rabbitmqctl
- status
failureThreshold: 3
initialDelaySeconds: 20
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 30
resources:
limits:
cpu: "2"
memory: 3Gi
requests:
cpu: "1"
memory: 2Gi
volumeMounts:
- mountPath: /var/lib/rabbitmq/
name: rabbitmq-data
- mountPath: /etc/rabbitmq
name: config
dnsPolicy: ClusterFirst
initContainers:
- command:
- /bin/bash
- -euc
- |
rm -f /var/lib/rabbitmq/.erlang.cookie
cp /rabbitmqconfig/rabbitmq.conf /etc/rabbitmq/rabbitmq.conf
cp /rabbitmqconfig/enabled_plugins /etc/rabbitmq/enabled_plugins
image: rabbitmq:3.8.1-management-alpine
imagePullPolicy: Always
name: copy-rabbitmq-config
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /rabbitmqconfig
name: rabbitmq-configmap
- mountPath: /etc/rabbitmq
name: config
- mountPath: /var/lib/rabbitmq
name: rabbitmq-data
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: rabbitmq
serviceAccountName: rabbitmq
terminationGracePeriodSeconds: 10
volumes:
- configMap:
defaultMode: 420
items:
- key: rabbitmq.conf
path: rabbitmq.conf
- key: enabled_plugins
path: enabled_plugins
name: rabbitmq-configmap
name: rabbitmq-configmap
- emptyDir: {}
name: config
updateStrategy:
type: RollingUpdate
volumeClaimTemplates:
- apiVersion: v1
kind: PersistentVolumeClaim
metadata:
creationTimestamp: null
name: rabbitmq-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi
storageClassName: gp2
volumeMode: Filesystem
Things I have tried:
Logged into the struck pod and executed(This command just struck without any response)
rabbitmqctl stop_app
Tried deleting the pod forcefully but no luck.
Logged into the struck pod and executed
rabbitmqctl reset
Logged into the struck pod and executed
rabbitmqctl force_boot
Logged into the struck pod and executed
rm /var/log/rabbitmq/*
None of the above things helped.
Please note that the other 2 rabbitmq nodes are running fine and serving the traffic and showing the failed node as up:
rabbitmq-2 rabbitmq 2021-07-04 12:19:07.233 [info] <0.490.0> node 'rabbit#rabbitmq-0.rabbitmq-service.devops.svc.cluster.local' up
rabbitmq-1 rabbitmq 2021-07-04 12:19:07.208 [info] <0.494.0> node 'rabbit#rabbitmq-0.rabbitmq-service.devops.svc.cluster.local' up
Running the rollout restart of statefulset command worked for me.
kubectl rollout restart statefulset rabbitmq -n devops
After this command the rabbitmq cluster is up and running and all the three nodes joined the cluster without any issue.
Once this is done its required to restart the applications which are connecting to this rabbitmq cluster.

Convert Kafka docker run command to Kubernetes

The Kubernetes yaml below creates resources as expected and there are no errors/restarts. External requests to the kafka broker at localhost:80005 hang indefinitely. My understanding is that the docker notion of a "network" is achieved in Kubernetes via services. In this case, I have a NodePort service which brings traffic into the cluster and then a ClusterIp port which handles communication between containers. Based on the non-Kubernetes, docker run version of this Zookeeper/Kafka configuration, I suspect my issue lies somewhere in my service and/or advertised listeners. The internet is replete with docker compose files and helm charts but I don't seem to be able to find a combination of configurations that work.
Kafka/Zookeeper YAML:
# Zookeeper
apiVersion: apps/v1
kind: Deployment
metadata:
name: zookeeper-deploy
spec:
replicas: 1
selector:
matchLabels:
app: zookeeper-app
template:
metadata:
labels:
app: zookeeper-app
spec:
containers:
- name: zookeeper-app
image: confluentinc/cp-zookeeper:5.4.1
ports:
- containerPort: 2181
env:
- name: ZOOKEEPER_TICK_TIME
value: "2000"
- name: ZOOKEEPER_CLIENT_PORT
value: "2181"
---
apiVersion: v1
kind: Service
metadata:
name: zk-cluster-svc
labels:
app: zk
spec:
type: ClusterIP
selector:
app: zookeeper-app
ports:
- port: 2181
targetPort: 2181
protocol: TCP
---
# Kafka
apiVersion: apps/v1
kind: Deployment
metadata:
name: kafka-deploy
spec:
replicas: 1
selector:
matchLabels:
app: kafka-app
template:
metadata:
labels:
app: kafka-app
spec:
containers:
- name: kafka-app
image: confluentinc/cp-kafka:5.4.1
ports:
- containerPort: 9092
- containerPort: 29092
env:
- name: KAFKA_BROKER_ID
value: "1"
- name: KAFKA_ZOOKEEPER_CONNECT
value: "zk-cluster-svc:2181"
- name: KAFKA_ADVERTISED_LISTENERS
value: "INSIDE://kafka-cluster-svc:29092,OUTSIDE://localhost:30005"
- name: KAFKA_LISTENER_SECURITY_PROTOCOL_MAP
value: "INSIDE:PLAINTEXT,OUTSIDE:PLAINTEXT"
- name: KAFKA_INTER_BROKER_LISTENER_NAME
value: "INSIDE"
- name: KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR
value: "1"
---
apiVersion: v1
kind: Service
metadata:
name: kafka-service
labels:
app: kafka-service
spec:
type: NodePort
ports:
- port: 9092
nodePort: 30005
protocol: TCP
selector:
app: kafka-app
---
apiVersion: v1
kind: Service
metadata:
name: kafka-cluster-svc
labels:
app: kafka-app
spec:
selector:
app: kafka-app
ports:
- port: 29092
targetPort: 29092
protocol: TCP
^ This is based on a working docker run version of these containers:
docker network create kafzoonet
docker run --name app-zookeeper -d --network="kafzoonet" --env ZOOKEEPER_TICK_TIME=2000 --env ZOOKEEPER_CLIENT_PORT=2181 confluentinc/cp-zookeeper:5.4.1
docker run --name app-kafka -p 30005:9092 -d --network="kafzoonet" --env KAFKA_BROKER_ID=1 --env KAFKA_ZOOKEEPER_CONNECT="app-zookeeper:2181" --env KAFKA_ADVERTISED_LISTENERS="PLAINTEXT://app-kafka:29092,PLAINTEXT_HOST://localhost:30005" --env KAFKA_LISTENER_SECURITY_PROTOCOL_MAP="PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT" --env KAFKA_INTER_BROKER_LISTENER_NAME=PLAINTEXT --env KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=1 confluentinc/cp-kafka:5.4.1
Are the services configured correctly for a single Kafka node? It appears that Kafka is able to reach Zookeeper but for some reason when a client connects, it isn't getting the correct listener echoed back. Unfortunately I don't have any error messages to support this as the client hangs and neither the Kafka or Zookeeper pods log anything.

Kubernetes: Modeling Jobs/Cron tasks for Postgres + Tomcat application

I work on an open source system that is comprised of a Postgres database and a tomcat server. I have docker images for each component. We currently use docker-compose to test the application.
I am attempting to model this application with kubernetes.
Here is my first attempt.
apiVersion: v1
kind: Pod
metadata:
name: dspace-pod
spec:
volumes:
- name: "pgdata-vol"
emptyDir: {}
- name: "assetstore"
emptyDir: {}
- name: my-local-config-map
configMap:
name: local-config-map
containers:
- image: dspace/dspace:dspace-6_x
name: dspace
ports:
- containerPort: 8080
name: http
protocol: TCP
volumeMounts:
- mountPath: "/dspace/assetstore"
name: "assetstore"
- mountPath: "/dspace/config/local.cfg"
name: "my-local-config-map"
subPath: local.cfg
#
- image: dspace/dspace-postgres-pgcrypto
name: dspacedb
ports:
- containerPort: 5432
name: http
protocol: TCP
volumeMounts:
- mountPath: "/pgdata"
name: "pgdata-vol"
env:
- name: PGDATA
value: /pgdata
I have a configMap that is setting the hostname to the name of the pod.
apiVersion: v1
kind: ConfigMap
metadata:
creationTimestamp: 2016-02-18T19:14:38Z
name: local-config-map
namespace: default
data:
local.cfg: |-
dspace.dir = /dspace
db.url = jdbc:postgresql://dspace-pod:5432/dspace
dspace.hostname = dspace-pod
dspace.baseUrl = http://dspace-pod:8080
solr.server=http://dspace-pod:8080/solr
This application has a number of tasks that are run from the command line.
I have created a 3rd Docker image that contains the jars that are needed on the command line.
I am interested in modeling these command line tasks as Jobs in Kubernetes. Assuming that is a appropriate way to handle these tasks, how do I specify that a job should run within a Pod that is already running?
Here is my first attempt at defining a job.
apiVersion: batch/v1
kind: Job
#https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/
metadata:
name: dspace-create-admin
spec:
template:
spec:
volumes:
- name: "assetstore"
emptyDir: {}
- name: my-local-config-map
configMap:
name: local-config-map
containers:
- name: dspace-cli
image: dspace/dspace-cli:dspace-6_x
command: [
"/dspace/bin/dspace",
"create-administrator",
"-e", "test#test.edu",
"-f", "test",
"-l", "admin",
"-p", "admin",
"-c", "en"
]
volumeMounts:
- mountPath: "/dspace/assetstore"
name: "assetstore"
- mountPath: "/dspace/config/local.cfg"
name: "my-local-config-map"
subPath: local.cfg
restartPolicy: Never
The following configuration has allowed me to start my services (tomcat and postgres) as I hoped.
apiVersion: v1
kind: ConfigMap
metadata:
creationTimestamp: 2016-02-18T19:14:38Z
name: local-config-map
namespace: default
data:
# example of a simple property defined using --from-literal
#example.property.1: hello
#example.property.2: world
# example of a complex property defined using --from-file
local.cfg: |-
dspace.dir = /dspace
db.url = jdbc:postgresql://dspacedb-service:5432/dspace
dspace.hostname = dspace-service
dspace.baseUrl = http://dspace-service:8080
solr.server=http://dspace-service:8080/solr
---
apiVersion: v1
kind: Service
metadata:
name: dspacedb-service
labels:
app: dspacedb-app
spec:
type: NodePort
selector:
app: dspacedb-app
ports:
- protocol: TCP
port: 5432
# targetPort: 5432
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: dspacedb-deploy
labels:
app: dspacedb-app
spec:
selector:
matchLabels:
app: dspacedb-app
template:
metadata:
labels:
app: dspacedb-app
spec:
volumes:
- name: "pgdata-vol"
emptyDir: {}
containers:
- image: dspace/dspace-postgres-pgcrypto
name: dspacedb
ports:
- containerPort: 5432
name: http
protocol: TCP
volumeMounts:
- mountPath: "/pgdata"
name: "pgdata-vol"
env:
- name: PGDATA
value: /pgdata
---
apiVersion: v1
kind: Service
metadata:
name: dspace-service
labels:
app: dspace-app
spec:
type: NodePort
selector:
app: dspace-app
ports:
- protocol: TCP
port: 8080
targetPort: 8080
name: http
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: dspace-deploy
labels:
app: dspace-app
spec:
selector:
matchLabels:
app: dspace-app
template:
metadata:
labels:
app: dspace-app
spec:
volumes:
- name: "assetstore"
emptyDir: {}
- name: my-local-config-map
configMap:
name: local-config-map
containers:
- image: dspace/dspace:dspace-6_x-jdk8-test
name: dspace
ports:
- containerPort: 8080
name: http
protocol: TCP
volumeMounts:
- mountPath: "/dspace/assetstore"
name: "assetstore"
- mountPath: "/dspace/config/local.cfg"
name: "my-local-config-map"
subPath: local.cfg
After applying the configuration above, I have the following results.
$ kubectl get services -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
dspace-service NodePort 10.104.224.245 <none> 8080:32459/TCP 3s app=dspace-app
dspacedb-service NodePort 10.96.212.9 <none> 5432:30947/TCP 3s app=dspacedb-app
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 22h <none>
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
dspace-deploy-c59b77bb8-mr47k 1/1 Running 0 10m
dspacedb-deploy-58dd85f5b9-6v2lf 1/1 Running 0 10
I was pleased to see that the service name can be used for port forwarding.
$ kubectl port-forward service/dspace-service 8080:8080
Forwarding from 127.0.0.1:8080 -> 8080
Forwarding from [::1]:8080 -> 8080
I am also able to run the following job using the defined service names in the configMap.
apiVersion: batch/v1
kind: Job
metadata:
name: dspace-create-admin
spec:
template:
spec:
volumes:
- name: "assetstore"
emptyDir: {}
- name: my-local-config-map
configMap:
name: local-config-map
containers:
- name: dspace-cli
image: dspace/dspace-cli:dspace-6_x
command: [
"/dspace/bin/dspace",
"create-administrator",
"-e", "test#test.edu",
"-f", "test",
"-l", "admin",
"-p", "admin",
"-c", "en"
]
volumeMounts:
- mountPath: "/dspace/assetstore"
name: "assetstore"
- mountPath: "/dspace/config/local.cfg"
name: "my-local-config-map"
subPath: local.cfg
restartPolicy: Never
Results
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
dspace-create-admin-kl6wd 0/1 Completed 0 5m
dspace-deploy-c59b77bb8-mr47k 1/1 Running 0 10m
dspacedb-deploy-58dd85f5b9-6v2lf 1/1 Running 0 10m
I still have some work to do persisting the volumes.

DNS not working with Kubernetes PetSet

Ok, following the examples and documentation on the Kubernetes website along with extensive research on Google, I still cannot get DNS resolution between the containers within my Pod.
I have a Service and a PetSet with 2 containers defined. When I deploy the PetSet and Service, they start and run successfully, but if I attempt to ping the host of one of my containers from the other by hostname or by the full domain name I get destination unreachable. I can ping by IP address though.
Here is my Kubernetes configuration file:
apiVersion: v1
kind: Service
metadata:
name: ml-service
labels:
app: marklogic
annotations:
service.alpha.kubernetes.io/tolerate-unready-endpoints: "true"
spec:
#restartPolicy: OnFailure
clusterIP: None
selector:
app: marklogic
ports:
- protocol: TCP
port: 7997
#nodePort: 31997
name: ml7997
- protocol: TCP
port: 8000
#nodePort: 32000
name: ml8000
# ... More ports defined
#type: NodePort
---
apiVersion: apps/v1alpha1
kind: PetSet
metadata:
name: marklogic
spec:
serviceName: "ml-service"
replicas: 2
template:
metadata:
labels:
app: marklogic
annotations:
pod.alpha.kubernetes.io/initialized: "true"
spec:
terminationGracePeriodSeconds: 30
containers:
- name: 'marklogic'
image: "{local docker registry ip}:5000/dcgs-sof/ml8-docker-final:v1"
imagePullPolicy: Always
command: ["/opt/entry-point.sh", "-l", "/opt/mlconfig.sh"]
ports:
- containerPort: 7997
name: ml7997
- containerPort: 8000
name: ml8000
- containerPort: 8001
name: ml8001
- containerPort: 8002
name: ml8002
- containerPort: 8040
name: ml8040
- containerPort: 8041
name: ml8041
- containerPort: 8042
name: ml8042
- containerPort: 8050
name: ml8050
- containerPort: 8051
name: ml8051
- containerPort: 8060
name: ml8060
env:
- name: POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
lifecycle:
preStop:
exec:
command: ["/etc/init.d/MarkLogic stop"]
volumeMounts:
- name: ml-data
mountPath: /data
volumeClaimTemplates:
- metadata:
name: ml-data
annotations:
volume.alpha.kubernetes.io/storage-class: anything
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 1Gi
I commented out the type: NodePort definition as I thought that might be the culprit, but still no success.
Additionally, if I run docker#minikube:/$ docker exec b4d21c4bc065 /bin/bash -c 'nslookup marklogic-1.marklogic.default.svc.cluster.local' it cannot resolve the name.
What am I missing???
You are resolving the wrong domain name.
See http://kubernetes.io/docs/user-guide/petset/#network-identity
You should try to resolve:
marklogic-0.ml-service.default.svc.cluster.local
If everything is within the default namespace, the DNS name is:
<pod_name>.<svc_name>.default.svc.cluster.local

Resources