Issues with running a consul docker health check - docker

am running the progrium/consul container with the gliderlabs/registrator container. I am trying to create health checks to monitor if my docker containers are up or down. However I noticed some very strange activity with with health check I was able to make. Here is the command I used to create the health check:
curl -v -X PUT http://$CONSUL_IP_ADDR:8500/v1/agent/check/register -d #/home/myUserName/health.json
Here is my health.json file:
{
"id": "docker_stuff",
"name": "echo test",
"docker_container_id": "4fc5b1296c99",
"shell": "/bin/bash",
"script": "echo hello",
"interval": "2s"
}
First I noticed that this check would automatically delete the service whenever the container was stopped properly, but would do nothing when the container was stopped improperly (i.e. durring a node failure).
Second I noticed that the docker_container_id did not matter at all, this health check would attach itself to every container running on the consul node it was attached to.
I would like to just have a working tcp or http health test run for every docker container running on a consul node (yes I know my above json file runs a script, I just created that one following the documentation example). I just want consul to be able to tell if a container is stopped or running. I don't want my services to delete themselves when a health check fails. How would I do this.
Note: I find the consul documentation on Agent Health Checks very lacking, vague and inaccurate. So please don't just link to it and tell me to go read it. I am looking for a full explanation on exactly how to set up docker health checks the right way.
Update: Here is how to start consul servers with the most current version of the official consul container (right now its the dev versions, soon ill update it with the production versions):
#bootstrap server
docker run -d \
-p 8300:8300 \
-p 8301:8301 \
-p 8301:8301/udp \
-p 8302:8302 \
-p 8302:8302/udp \
-p 8400:8400 \
-p 8500:8500 \
-p 53:53/udp \
--name=dev-consul0 consul agent -dev -ui -client 0.0.0.0
#its IP address will then be the IP of the host machine
#lets say its 172.17.0.2
#start the other two consul servers, without web ui
docker run -d --name --name=dev-consul1 \
-p 8300:8300 \
-p 8301:8301 \
-p 8301:8301/udp \
-p 8302:8302 \
-p 8302:8302/udp \
-p 8400:8400 \
-p 8500:8500 \
-p 53:53/udp \
consul agent -dev -join=172.17.0.2
docker run -d --name --name=dev-consul2 \
-p 8300:8300 \
-p 8301:8301 \
-p 8301:8301/udp \
-p 8302:8302 \
-p 8302:8302/udp \
-p 8400:8400 \
-p 8500:8500 \
-p 53:53/udp \
consul agent -dev -join=172.17.0.2
# then heres your clients
docker run -d --net=host --name=client0 \
-e 'CONSUL_LOCAL_CONFIG={"leave_on_terminate": true}' \
consul agent -bind=$(hostname -i) -retry-join=172.17.0.2
https://hub.docker.com/r/library/consul/

progrium/consul image has old version of consul (https://hub.docker.com/r/progrium/consul/tags/) and currently seems to be not maintained.
Please try to use official image with current version for consul https://hub.docker.com/r/library/consul/tags/
You can also use registrator to register checks in consul connected with your service. eg.
SERVICE_[port_]CHECK_SCRIPT=nc $SERVICE_IP $SERVICE_PORT | grep OK
More examples: http://gliderlabs.com/registrator/latest/user/backends/#consul

So a solution that works around using any version of the consul containers is to just directly install consul on the host machine. This can be done by following these steps from https://sonnguyen.ws/install-consul-and-consul-template-in-ubuntu-14-04/:
sudo apt-get update -y
sudo apt-get install -y unzip curl
sudo wget https://releases.hashicorp.com/consul/0.6.4/consul_0.6.4_linux_amd64.zip
sudo unzip consul_0.6.4_linux_amd64.zip
sudo rm consul_0.6.4_linux_amd64.zip
sudo chmod +x consul
sudo mv consul /usr/bin/consul
sudo mkdir -p /opt/consul
cd /opt/consul
sudo wget https://releases.hashicorp.com/consul/0.6.4/consul_0.6.4_web_ui.zip
sudo unzip consul_0.6.4_web_ui.zip
sudo rm consul_0.6.4_web_ui.zip
sudo mkdir -p /etc/consul.d/
sudo wget https://releases.hashicorp.com/consul-template/0.14.0/consul-template_0.14.0_linux_amd64.zip
sudo unzip consul-template_0.14.0_linux_amd64.zip
sudo rm consul-template_0.14.0_linux_amd64.zip
sudo chmod a+x consul-template
sudo mv consul-template /usr/bin/consul-template
sudo nohup consul agent -server -bootstrap-expect 1 \
-data-dir /tmp/consul -node=agent-one \
-bind=$(hostname -i) \
-client=0.0.0.0 \
-config-dir /etc/consul.d \
-ui-dir /opt/consul/ &
echo 'Done with consul install!!!'
Then after you do this create your consul health check json files, info on how to do that can be found here. After you create your json files just put them in the /etc/consul.d directory and restart consul with consul reload. If after the reload consul does not add your new health checks then there is something wrong with the syntax of your json files. Go back edit them and try again.

Related

GCC builds under teamcity docker agent

I'm trying out teamcity to build GCC binaries with docker agents on centos. I setup a docker agent to connect to builder2 TC server.
$ docker pull jetbrains/teamcity-agent
$ mkdir -p /mnt/builders/teamcity/agent1/conf
$ mkdir -p /mnt/builders/teamcity/agent/work
$ mkdir -p /mnt/builders/teamcity/agent/system
docker run -it --name agent1 \
-e SERVER_URL="http://builder2:8111" \
-e AGENT_NAME="builder2_agent1" \
--hostname builder2_agent \
--dns="xx.xxx.xx.xx" \
-v /mnt/builders/teamcity/agent1/conf:/data/teamcity_agent/conf \
-v /mnt/builders/teamcity/agent/work:/opt/buildagent/work \
-v /mnt/builders/teamcity/agent/system:/opt/buildagent/system \
--network='BuilderNetwork' \
jetbrains/teamcity-agent
All that works good but in order to make a build you must import the devtoolset like this
scl enable devtoolset-10 "/bin/bash"
$ which make
/opt/rh/devtoolset-10/root/usr/bin/make
So how is this done with docker agent? Are these tools to be built in the image or do you expose the /opt/rh dir to the container? Also if you were to expose the volume then how do you install /usr/bin/scl (i.e rh package scl-utils-20130529-19.el7.x86_64) into the docker container? Does it even make sense to run an agent in docker for this?

NiFi: Why Does My User Have Insufficient Permissions?

I am following the steps in the "Standalone Instance, Two-Way SSL" section of https://hub.docker.com/r/apache/nifi. However, when I visit the NiFi page, my user has insufficient permissions. Below is the process I am using:
Generate self-signed certificates
mkdir conf
docker exec \
-ti toolkit \
/opt/nifi/nifi-toolkit-current/bin/tls-toolkit.sh \
standalone \
-n 'nifi1.bluejay.local' \
-C 'CN=admin,OU=NIFI'
docker cp toolkit:/opt/nifi/nifi-current/nifi-cert.pem conf
docker cp toolkit:/opt/nifi/nifi-current/nifi-key.key conf
docker cp toolkit:/opt/nifi/nifi-current/nifi1.bluejay.local conf
docker cp toolkit:/opt/nifi/nifi-current/CN=admin_OU=NIFI.p12 conf
docker cp toolkit:/opt/nifi/nifi-current/CN=admin_OU=NIFI.password conf
docker stop toolkit
Import client certificate to browser
Import the .p12 file into your browser.
Update /etc/hosts
Add "127.0.0.1 nifi1.bluejay.local" to the end of your /etc/hosts file.
Define a NiFi network
docker network create --subnet=10.18.0.0/16 nifi
Run NiFi in a container
docker run -d \
-e AUTH=tls \
-e KEYSTORE_PATH=/opt/certs/keystore.jks \
-e KEYSTORE_TYPE=JKS \
-e KEYSTORE_PASSWORD=$(grep keystorePasswd conf/nifi1.bluejay.local/nifi.properties | cut -d'=' -f2) \
-e TRUSTSTORE_PATH=/opt/certs/truststore.jks \
-e TRUSTSTORE_PASSWORD=$(grep truststorePasswd conf/nifi1.bluejay.local/nifi.properties | cut -d'=' -f2) \
-e TRUSTSTORE_TYPE=JKS \
-e INITIAL_ADMIN_IDENTITY="CN=admin,OU=NIFI" \
-e NIFI_WEB_PROXY_CONTEXT_PATH=/nifi \
-e NIFI_WEB_PROXY_HOST=nifi1.bluejay.local \
--hostname nifi1.bluejay.local \
--ip 10.18.0.10 \
--name nifi \
--net nifi \
-p 8443:8443 \
-v $(pwd)/conf/nifi1.bluejay.local:/opt/certs:ro \
-v /data/projects/nifi-shared:/opt/nifi/nifi-current/ls-target \
apache/nifi
Visit Page
When you visit http://localhost:8443/nifi, you'll be asked to select a certificate. Select the certificate (e.g. admin) that you imported.
At this point, I am seeing:
Insufficient Permissions
Unknown user with identity 'CN=admin, OU=NIFI'. Contact the system administrator.
In the examples I am seeing, there is no mention of this issue or how to resolve it.
How are permissions assigned to the Initial Admin Identity?
You are missing a space at line
-e INITIAL_ADMIN_IDENTITY="CN=admin,OU=NIFI"
See the error msg.

Using Pumba to shutdown the container connection permanently

I am trying to use Pumba to isolate a container from the docker network. I am on Windows, and the command I am using is the following.
docker run \
-d \
--name pumba \
--network docker_default \
-v //var/run/docker.sock:/var/run/docker.sock
gaiaadm/pumba netem \
--tc-image="gaiadocker/iproute2" \
--duration 1000s \
loss \
-p 100 \
753_mycontainer_1
I start the container to isolate using docker-compose, with the restart property set to always. My wish is to let Pumba block the networking of the container also after each restart.
How can I achieve this behavior?
Thanks.
I managed to achieve the result, letting the docker restart the pumba container. I reduce the duration parameter to 30s that is the average time for my 753_mycontainer_1 container to stop itself and restart.
In this way, the two containers restart more or less synchronously, producing a real chaos test, in which the 753_mycontainer_1 container randomly lose the network.
docker run \
-d \
--name pumba \
--restart always \
--network docker_default \
-v //var/run/docker.sock:/var/run/docker.sock gaiaadm/pumba \
netem \
--tc-image="gaiadocker/iproute2" \
--duration 30s \
loss \
-p 100 \
753_mycontainer_1

Kafka with Docker 3 nodes in different host - Broker may not be available

I'm played with wurstmeister/kafka image on three different docker host
, and hosts ip are
10.1.1.11
10.1.1.12
10.1.1.13
I enter these command to run image:
10.1.1.11:
sudo docker run --name kafka -p 9092:9092 --restart always \
-e KAFKA_BROKER_ID="1" \
-e KAFKA_ADVERTISED_HOST_NAME="10.1.1.11" \
-e KAFKA_ADVERTISED_PORT="9092" \
-e KAFKA_ZOOKEEPER_CONNECT="0.0.0.0:2181,10.1.1.12:2181,10.1.1.13:2181" \
-d wurstmeister/kafka
10.1.1.12:
sudo docker run --name kafka -p 9092:9092 --restart always \
-e KAFKA_BROKER_ID="2" \
-e KAFKA_ADVERTISED_HOST_NAME="10.1.1.12" \
-e KAFKA_ADVERTISED_PORT="9092" \
-e KAFKA_ZOOKEEPER_CONNECT="10.1.1.11:2181,0.0.0.0:2181,10.1.1.13:2181" \
-d wurstmeister/kafka
10.1.1.13:
sudo docker run --name kafka -p 9092:9092 --restart always \
-e KAFKA_BROKER_ID="3" \
-e KAFKA_ADVERTISED_HOST_NAME="10.1.1.13" \
-e KAFKA_ADVERTISED_PORT="9092" \
-e KAFKA_ZOOKEEPER_CONNECT="10.1.1.11:2181,10.1.1.12:2181,0.0.0.0:2181" \
-d wurstmeister/kafka
When run those command, always the first command appear the:
Warning, the rest of two are not appear this question.
I'm using kafka producer test too. if host appear this problem, message send failed with message
if not appear this problem, message send success.
When I restart the image in 10.1.1.11, the problem is fixed, but 10.1.1.12 start the same problem and so on.
All i search this problem solve method are set the KAFKA_ADVERTISED_HOST_NAME to docker host. But i already did.
I have no idea why appear this problem.
My Zookeeper command on 10.1.1.11:
sudo docker run --name zookeeper -p 2181:2181 -p 2888:2888 -p 3888:3888 \
--restart always \
-e ZOO_MY_ID="1" \
-e ZOO_SERVERS="server.1=0.0.0.0:2888:3888 server.2=10.1.1.12:2888:3888 server.3=10.1.1.13:2888:3888" \
-d zookeeper:latest
Solution from OP.
The problem was firewall block docker container connect to docker host.
so i can't telnet docker host inside docker container.
the solution was set rule to iptables
sudo iptables -I INPUT 1 -i <docker-bridge-interface> -j ACCEPT
I found solution from https://github.com/moby/moby/issues/24370

Multi-host Validating Peer Cluster Setup

I am attempting to create a cluster of Hyperledger validating peers, each running on a different host, but it does not appear to be functioning properly.
After starting the root node and 3 peer nodes, this is the output running peer network list on the root node, vp0:
{"peers":[{"ID":{"name":"vp1"},"address":"172.17.0.2:30303","type":1},{"ID":{"name":"vp2"},"address":"172.17.0.2:30303","type":1},{"ID":{"name":"vp3"},"address":"172.17.0.2:30303","type":1}]}
This is the output from the same command on one of the peers, vp3:
{"peers":[{"ID":{"name":"vp0"},"address":"172.17.0.2:30303","type":1},{"ID":{"name":"vp3"},"address":"172.17.0.2:30303","type":1}]}
All of the peers only list themselves and the root, vp0, in their lists.
This is the log output from the root node, vp0: https://gist.github.com/mikezaccardo/f139eaf8004540cdfd24da5a892716cc
This is the log output from one of the peer nodes, vp3: https://gist.github.com/mikezaccardo/7379584ca4f67bce553c288541e3c58e
This is the command I'm running to create the root node:
nohup sudo docker run --name=$HYPERLEDGER_PEER_ID \
--restart=unless-stopped \
-i \
-p 5000:5000 \
-p 30303:30303 \
-p 30304:30304 \
-p 31315:31315 \
-e CORE_VM_ENDPOINT=http://172.17.0.1:4243 \
-e CORE_PEER_ID=$HYPERLEDGER_PEER_ID \
-e CORE_PEER_ADDRESSAUTODETECT=true \
-e CORE_PEER_NETWORKID=dev \
-e CORE_PEER_VALIDATOR_CONSENSUS_PLUGIN=pbft \
-e CORE_PBFT_GENERAL_MODE=classic \
-e CORE_PBFT_GENERAL_N=$HYPERLEDGER_CLUSTER_SIZE \
-e CORE_PBFT_GENERAL_TIMEOUT_REQUEST=10s \
joequant/hyperledger /bin/bash -c "rm config.yaml; cp /usr/share/go-1.6/src/github.com/hyperledger/fabric/consensus/obcpbft/config.yaml .; peer node start" > $HYPERLEDGER_PEER_ID.log 2>&1&
And this is the command I'm running to create each of the other peer nodes:
nohup sudo docker run --name=$HYPERLEDGER_PEER_ID \
--restart=unless-stopped \
-i \
-p 30303:30303 \
-p 30304:30304 \
-p 31315:31315 \
-e CORE_VM_ENDPOINT=http://172.17.0.1:4243 \
-e CORE_PEER_ID=$HYPERLEDGER_PEER_ID \
-e CORE_PEER_DISCOVERY_ROOTNODE=$HYPERLEDGER_ROOT_NODE_ADDRESS:30303 \
-e CORE_PEER_ADDRESSAUTODETECT=true \
-e CORE_PEER_NETWORKID=dev \
-e CORE_PEER_VALIDATOR_CONSENSUS_PLUGIN=pbft \
-e CORE_PBFT_GENERAL_MODE=classic \
-e CORE_PBFT_GENERAL_N=$HYPERLEDGER_CLUSTER_SIZE \
-e CORE_PBFT_GENERAL_TIMEOUT_REQUEST=10s \
joequant/hyperledger /bin/bash -c "rm config.yaml; cp /usr/share/go-1.6/src/github.com/hyperledger/fabric/consensus/obcpbft/config.yaml .; peer node start" > $HYPERLEDGER_PEER_ID.log 2>&1&
HYPERLEDGER_PEER_ID is vp0 for the root node and vp1, vp2, ... for the peer nodes, HYPERLEDGER_ROOT_NODE_ADDRESS is the public IP address of the root node, and HYPERLEDGER_CLUSTER_SIZE is 4.
This is the Docker image that I am using: github.com/joequant/hyperledger
Is there anything obviously wrong with my commands? Should the actual public IP addresses of the peers be showing up as opposed to just 172.17.0.2? Are my logs helpful / is any additional information needed?
Any help or insight would be greatly appreciated, thanks!
I've managed to get a noops cluster working in which all nodes discover each other and chaincodes successfully deploy.
I made a few fixes since my post above:
I now use mikezaccardo/hyperledger-peer image, a fork of yeasy/hyperledger-peer, instead of joequant/hyperledger.
I changed:
-e CORE_PEER_ADDRESSAUTODETECT=true \
to:
-e CORE_PEER_ADDRESS=$HOST_ADDRESS:30303 \
-e CORE_PEER_ADDRESSAUTODETECT=false \
so that each peer would advertise its public IP, not private.
And I properly tag my image as the official base image:
sudo docker tag mikezaccardo/hyperledger:latest hyperledger/fabric-baseimage:latest
Finally, for context, this is all related to my development of a blueprint for Apache Brooklyn which deploys a Hyperledger Fabric cluster. That repository, which contains all of the code mentioned in this post and answer, can be found here: https://github.com/cloudsoft/brooklyn-hyperledger.

Resources