Apache Kafka in docker AND VirtualBox VM - docker

I'm trying to use Apache Kafka, i.e. a version usable in connection with docker (wurstmeister/kafka-docker), in a Virtual Machine and connect to the kafka broker in the dockers over the host system of the VM.
I will describe the setup in more detail:
My host system is an Ubuntu 64 bit 14.04.3 LTS (Kernel 3.13) running on an "usual" computer. I have a complete and complex structure of various docker-containers interacting with each other. In order to not disturb, or better said, to encapsulate this whole structure, it is not an option to run the docker images directly on the host system. Another reason for that is the need for different python libraries on the host, which interfere with the python-lib version required by docker-compose (which is used to start the different docker images).
Therefore the desired solution should be to setup a virtual machine over VirtualBox (guest system: Ubuntu 16.04.1 LTS) and to completely run the docker environment in this VM. This has the distinct advantage that the VM itself can be configured exactly according to the requirements of the docker structure.
As mentioned above one of the docker images provides kafka and zookeeper functionality to use for communication and messaging. That means the .yaml file, which sets up a container running this image, forwards all necessary ports of kafka and zookeeper to the host system of the docker environment (which is the guest system of the virtual box VM). In order to also make the docker environment visible in the host system, I forwarded all the ports over network settings in VirtualBox (network -> adapter NAT -> advanced -> port forwarding). The behaviour of the system is as follows:
When I run the docker-environment (including kafka) I can connect, consume and produce from the VM-kafka, which uses the recommended standard kafka shell scripts (producer & consumer api).
When I run a Kafka and zookeeper server ON THE VM guest system, I can connect from outside the VM (host), produce and consume over producer & consumer api.
When I run Kafka in the docker environment I can connect from the host system outside the VM, meaning I can see all topics, get infos about topics, while also seeing some debug output from kafka and zookeeper in the docker.
What is not possible is unfortunately to produce or consume any message from the host system to/from the docker-kafka. From the producer API I get a "Batch expired" exception, the consumer returns with a "ClosedChannelException".
I have done a lot of googling and found a lot of hints how to solve similiar problems. Most of them refer to the advertised.host.name-parameter in kafka-server.properties, which is accessible over KAFKA_ADVERTISED_HOST_NAME in .yaml. https://stackoverflow.com/a/35374057/3770602 e.g. refers to that parameter when both the errors mentioned above occur. Unfortunately none of the scenarios feature both docker AND VM functionality.
Further more trying to modify this parameter does not have an affect at all. Although I am not very familiar with docker and kafka, I can see a problem here, as the kafka consumer and producer would get the local IP of the docker environment, which is 10.0.2.15 in the NAT case, to use as broker. But unfortunately this IP is not visibile from outside the VM. Consequently the solution would probably be a changed setup, where one should use the bridged mode in VirtualBox networking. Strange thing here is, that a bridged connection (of course) leads to an own IP of the VM over DHCP, which then leads to a non-accessible docker-kafka both from the VM AND the host system. This last behaviour seems to be very awkward to me.
Concluding my question would be, if somebody has some experiences with this or a similiar scenario and can tell me, how to configure docker-kafka, VirtualBox VM and host system. I have really tried a lot of different settings for the consumer and producer call with no success. Of course some docker, kafka or both docker & kafka experts are welcome to answer as well.
If you need additional information or you can provide some hints, please let me know.
Thanks in advance

Related

Connect from container to a service on the host (docker for mac)

I have a somewhat complex situation and am probably out of luck here, but here's hoping. This is part of a large development project, so my options for what changes I can make are somewhat limited.
I have a virtual machine running a k8s cluster. That cluster has an http service that is exposed via ingress, and is available, on my local machine, at develop.com, via an /etc/hosts entry on the host mac.
I have a container, necessarily (see above) separate from the cluster, which needs access to this service. This container uses an env var, SERVICE_HOST to configure its requests.
What is the simplest way to provide a value that can be resolved by the standalone container to my cluster? Ideally, something other than ngrok which is simple, but is complicated by the fact that it's already in use in this setup to allow the cluster to reach the standalone container! I'd much prefer to make this work without premium features...
I'm aware of --net=host concept, but it doesn't work on an OSX host.

Figuring out the IP address of a service for dockerized Consul

I am building a microservices based application and would like to use Consul as service registry. All in all I have three scenarios:
All the services run on the host.
All the services run on the host, but Consul runs in Docker.
All the services and Consul run in Docker.
Now I have the problem of how to register the services with their IP address, because I need to figure out their IP address so that it is reachable by Consul (e.g., for the health checks):
If everything runs on the same host, it's pretty easy: Simply use 127.0.0.1, and you're done.
If everything (including Consul) runs in Docker, I could use hostname -i from within the Docker containers to figure out their external IP and hand it over to Consul. This works, but I wonder if there is a better way to solve this? (Ideally, the solution should also work in the same way on Kubernetes.)
If the services run on the host, but Consul runs in Docker, right now I am missing any idea at all. Basically, Consul requires the host's IP address to be able to talk to the services, but I can only detect this from within the Consul container (by resolving host.docker.internal). But first, this does not work from externally, and second it only works for Docker for Mac / Windows, not e.g. with Kubernetes.
How could I solve these issues?
PS: I would like to avoid using a container such as registrator by Gliderlabs, since I have doubts how well this works on Kubernetes, and also it won't help with the mixed Docker / host scenario.
If you're using Kubernetes, you might start by checking whether its built-in service registry meets your needs. There's generally not a direct path to reach a pod via its node's host's IP address, so the setup you describe won't really work well. (I might consider Consul for a key/value store but I wouldn't reach for it as a service registry in Kubernetes land.)
In plain multi-host Docker land, this is one of the few situations I've found where host networking is appropriate. Start Consul with --net host or an equivalent option in Docker Compose or another orchestration tool. Then Consul will believe "its" IP address is the host's, and if you have automated TCP probes of well-known ports, you can search every service that's running on the host and discover e.g. a MySQL service on port 3306, whether running in a container or natively on the host.
With this setup, servicename.service.consul will resolve to some physical-host IP address. If you have a Docker container pointing at its current host for DNS service, then that will route a service to some host, maybe the same one, but this has worked reliably for me in the past.
Note that the relevant hostnames will be different in different environments: servicename.service.consul for a Consul-based setup, servicename.namespacename.svc.cluster.local in Kubernetes, maybe localhost in a developer-desktop environment. You need to make sure this is configurable, most straightforwardly via an environment variable.

Easy, straightforward, robust way to make host port available to Docker container?

It is really easy to mount directories into a docker container. How can I just as easily "mount a port into" a docker container?
Example:
I have a MySQL server running on my local machine. To connect to it from a docker container I can mount the mysql.sock socket file into the container. But let's say for some reason (like intending to run a MySQL slave instance) I cannot use mysql.sock to connect and need to use TCP.
How can I accomplish this most easily?
Things to consider:
I may be running Docker natively if I'm using Linux, but I may also be running it in a VM if I'm on Mac or Windows, through Docker Machine or Docker for Mac/Windows (Beta). The answer should handle both scenarios seamlessly, without me as the user having to decide which solution is right depending on my specific Docker setup.
Simply assigning the container to the host network is often not an option, so that's unfortunately not a proper solution.
Potential solution directions:
1) I understand that setting up proper local DNS and making the Docker container (network) talk to it might be a proper, robust solution. If there is such a DNS service that can be set up with 1, max 2 commands and then "just work", that might be something.
2) Essentially what's needed here is that something will listen on a port inside the container and like a sort of proxy route traffic between the TCP/IP participants. There's been discussion on this closed Docker GH issue that shows some ip route command-line magic, but that's a bit too much of a requirement for many people, myself included. But if there was something akin to this that was fully automated while understanding Docker and, again, possible to get up and running with 1-2 commands, that'd be an acceptable solution.
I think you can run your container with --net=host option. In this case container will bind to the host's network and will be able to access all the ports on your local machine.

Docker host information and cluster

I am setting up a simple cluster using docker on several hosts. Before using docker the processes were simply started with a argument giving the address to a config server. The first thing each process does is to connect to the config server, get the addresses (host and port) of all the other services as well as register itself with host (and several different ports, one for each the services it provides).
However, it does not seem to be possible to dockerize this workflow? Since a process in a container seems not to be able to get the address and ports on the host (based on for example How to get the IP address of the docker host from inside a docker container) it does not know what to register itself as. Is this really not possible?
If not, are there any alternative ways this sort of setup is intended to be run using docker?

How to link Docker services across hosts?

Docker allows servers from multiple containers to connect to each other via links and service discovery. However, from what I can see this service discovery is host-local. I would like to implement a service that uses other services hosted on a different machine.
There have been several approaches to solving this problem in Docker, such as CoreOS's jumpers, host-local services that essentially proxy to the other machine, and a whole bunch of github projects for managing Docker deployments that appear to have attempted to support this use-case.
Given the pace of development it is hard to follow what current best practices are. Therefore my question is essentially:
What (if any) is the current predominant method for linking across hosts in Docker, and
Are there any plans for supporting this functionality directly in the Docker system?
Update
Docker has recently announced a new tool called Swarm for Docker orchestration.
Swarm allows you do "join" multiple docker daemons: You first create a swarm, start a swarm manager on one machine, and have docker daemons "join" the swarm manager using the swarm's identifier. The docker client connects to the swarm manager as if it were a regular docker server.
When a container started with Swarm, it is automatically assigned to a free node that meets any constraints that have been defined. The following example is taken from the blog post:
$ docker run -d -P -e constraint:storage=ssd mysql
One of the supported constraints is "node" that allows you pin a container to a specific hostname. The swarm also resolves links across nodes.
In my testing I got the impression that Swarm doesn't yet work with volumes at a fixed location very well (or at least the process of linking them is not very intuitive), so this is something to keep in mind.
Swarm is now in beta phase.
Until recently, the Ambassador Pattern was the only Docker-native approach to remote-host service discovery. This pattern can still be used and doesn't require any magic beyond plain Docker in that the pattern consists of one or more additional containers that act as proxies.
Additionally, there are several third-party extensions to make Docker cluster-capable. Third-party solutions include:
Connecting the Docker network bridges on two hosts, lightweight and various solutions exist, but generally with some caveats
DNS-based discovery e.g. with skydock and SkyDNS
Docker management tools such as Shipyard, and Docker orchestration tools. See this question for an extensive list: How to scale Docker containers in production
UPDATE 3
Libswarm has been renamed as swarm and is now a separate application.
Here is the github page demo to use as a starting point:
# create a cluster
$ swarm create
6856663cdefdec325839a4b7e1de38e8
# on each of your nodes, start the swarm agent
# <node_ip> doesn't have to be public (eg. 192.168.0.X),
# as long as the other nodes can reach it, it is fine.
$ swarm join --token=6856663cdefdec325839a4b7e1de38e8 --addr=<node_ip:2375>
# start the manager on any machine or your laptop
$ swarm manage --token=6856663cdefdec325839a4b7e1de38e8 --addr=<swarm_ip:swarm_port>
# use the regular docker cli
$ docker -H <swarm_ip:swarm_port> info
$ docker -H <swarm_ip:swarm_port> run ...
$ docker -H <swarm_ip:swarm_port> ps
$ docker -H <swarm_ip:swarm_port> logs ...
...
# list nodes in your cluster
$ swarm list --token=6856663cdefdec325839a4b7e1de38e8
http://<node_ip:2375>
UPDATE 2
The official approach is now to use libswarm see a demo here
UPDATE
There is a nice gist for openvswitch hosts communication in docker using the same approach.
To allow service discovery there is an interesting approach based on DNS called skydock.
There is also a screencast.
This is also a nice article using the same pieces of the puzzle but adding also vlans on top:
http://fbevmware.blogspot.it/2013/12/coupling-docker-and-open-vswitch.html
The patching has nothing to do with the robustness of the solution. Docker is actually only a sort of DSL upon Linux Containers and both solutions in these articles simply bypass some Docker automatic settings and fall back directly to Linux Containers.
So you can use the solutions safely and wait to be able to do it in a simpler way once Docker will implement it.
Weave is a new Docker virtual network technology that acts as a virtual ethernet switch over TCP/UDP - all you need is a Docker container running Weave on your host.
What's interesting here is
Instead of links, use static IPs/hostnames in your virtual network
Hosts don't need full connectivity, a mesh is formed based on what peers are available, and packets will be routed multi-hop to where they need to go
This leads to interesting scenarios like
Create a virtual network across the WAN, none of the Docker containers will know or care what actual network they sit in
Move your containers to different physical docker hosts, Weave will detect the peer accordingly
For example, there's an example guide on how to create a multi-node Cassandra cluster across your laptop and a few cloud (EC2) hosts with two commands per host. I launched a CoreOS cluster with AWS CloudFormation, installed weave on each in /home/core, plus my laptop vagrant docker VM, and got a cluster up in under an hour. My laptop is firewalled but Weave seemed to be okay with that, it just connects out to its EC2 peers.
Update
Docker 1.12 contains the so called swarm mode and also adds a service abstraction. They probably aren't mature enough for every use case, but I suggest you to keep them under observation. The swarm mode at least helps in a multi-host setup, which doesn't necessarily make linking easier. The Docker-internal DNS server (since 1.11) should help you to access container names, if they are well-known - meaning that the generated names in a Swarm context won't be so easy to address.
With the Docker 1.9 release you'll get built in multi host networking. They also provide an example script to easily provision a working cluster.
You'll need a K/V store (e.g. Consul) which allows to share state across the different Docker engines on every host. Every Docker engine need to be configured with that K/V store and you can then use Swarm to connect your hosts.
Then you create a new overlay network like this:
$ docker network create --driver overlay my-network
Containers can now be run with the network name as run parameter:
$ docker run -itd --net=my-network busybox
They can also be connected to a network when already running:
$ docker network connect my-network my-container
More details are available in the documentation.
The following article describes nicely how to connect docker containers on multiple hosts: http://goldmann.pl/blog/2014/01/21/connecting-docker-containers-on-multiple-hosts/
It is possible to bridge several Docker subnets together using Open vSwitch or Tinc. I have prepared Gists to show how to do it:
Open vSwitch: https://gist.github.com/noteed/8656989
Tinc: https://gist.github.com/noteed/11031504
The advantage I see using this solution instead of the --link option and the ambassador pattern is that I find it more transparent: there is no need to have additional containers and more importantly, no need to expose ports on the host. Actually I think of the --link option to be a temporary hack before Docker get a nicer story about multi-host (or multi-daemon) setups.
Note: I know there is another answer pointing to my first Gist but I don't have enough karma to edit or comment on that answer.
As mentioned above, Weave is definitely a viable solution to link Docker containers across the hosts. Based on my own experience with it, it is fairly straightfoward to set it up. It is now also has DNS service which you can address container's by its DNS names.
On the other hand, there is CoreOS's Flannel and Juniper's Opencontrail for wiring the containers across the hosts.
Seems like docker swarm 1.14 allows you to:
assing hostname to container, using --hostname tag, but i haven't been able to make it work, containers are not able to ping each other by assigned hostnames.
assigning services to machine using --constraint 'node.hostname == <host>'

Resources