Spark driver / executors in docker containers with port translation - docker

I'm trying to setup a spark standalone cluster on a bunch of docker containers in a private cloud. The executor processes, running in nodes different from the driver's node, are not able to connect back to the driver because the host port that is exposed (randomly assigned port in this case) is different from the port number that is advertised by the driver process. The same problem occurs with the blockmanager service started in each executor / driver - the advertised port is not available for communication from the outside.
Please suggest if there is a configuration that spark provides that allows us to differentiate the advertised port number from the actual port number (couldn't find any so far). The host name / port number that is visible from the outside can be obtained by environment variables that are automatically set when the container starts up, so if there is such a configuration (say spark.driver.advertisePort / spark.blockManager.advertisePort) then I can populate these values.
Note: we can get around this by using host networking or port mapping with same internal and external ports in the docker containers. But neither are allowed in the private cloud infrastructure that I'm trying to deploy the cluster on.

Related

Docker Swarm, how to communicate to other services through their "hostname" only?

I have some experience with Docker Compose and container linking. In a non-swarm environment, you could easily connect from, e.g, the web container to the db_mysql container using its name (for example, in PHP I can configure the MySQL connection to be:
$dsn = 'mysql:host=db_mysql;
I am having a hard time understanding how that works with Docker in Swarm mode, especially considering the "replicas" and "load balancing" mechanisms.
Let's say I have 5 different Docker Machines, each having a different public IP, participating in a Swarm. I also have a web service and a db service that's replicated across these 5 different machines (1 instance per each machine).
My question is: how do I make any of the 5 web containers, communicate to any of the 5 db_mysql containers without forcing these web containers to have knowledge of any Docker Machine public IPs or the fact that these containers live within a Swarm?
You use the service name. This will resolve in DNS to either a VIP or the 5 ip addresses (one for each replica) of the service. Under the covers, the VIP uses IPVS to round robin to one of the healthy replicas without suffering from stale DNS issues. You can also get all the replica IP addresses using service_name.tasks even if you use the default VIP.
In Docker's DNS implementation, you can resolve the container name, and any network alias. The network alias includes the service name with DNSRR (used by docker-compose without swarm). Or the service name resolves to a VIP in swarm mode. The hostname of the container does not resolve, likely because it can change outside of the control (and therefore knowledge) of the docker engine.
Using Docker version 19.03.5 the correct DNS name to query in order to obtain all the IP addresses of the replica of a service is the following:
tasks.<service-name>

Remote Docker container by hostname

How do you access remote Docker container by its hostname?
I need to access remote Docker containers by its hostnames (or some constant IP's) for development and testing purposes. I have tried:
looking for any DNS approach (have not found any clues),
importing /ets/hosts (probably impossible),
creating tunnes (only this works but it is very time consuming).
It's the same as running any other process on a host, Docker or not Docker: you access it via the host name or IP address of the host and the port the service is listening on (the first port of the docker run -p argument). Docker containers don't have externally visible individual IP addresses any more than non-Docker HTTP or ssh daemons do.
If you do have DNS infrastructure available to you, you could set up CNAME records to resolve particular service names to the specific hosts that are running them.
One solution that may help you is some sort of service registry; in the past I've used Consul with some success. You can configure Consul with some health checks or other probes ("look for an HTTP service on port 12345 that answers GET / calls"), and it will provide its own DNS service ("okay, http://whatevername.service.consul:12345/ will reach your service on whichever hosts it happens to be running on").
Nothing in the Docker infrastructure specifically helps this. Using /etc/hosts is distinctly not a best practice: the name-to-IP mapping needs to be kept in sync across all machines and you'll start wishing you had a network service to publish it for you, which is exactly what DNS is for.

Docker app not available with host IP when using automatic port mapping

I am deploying a eureka server on a VM(say host external IP is a.b.c.d) as a docker image. Trying this in 2 ways.
1.I am running the docker image without explicit port mapping : docker run -p 8671 test/eureka-server
Then running docker ps command shows the port mapping as : 0.0.0.0:32769->8761/tcp
Try accessing the eureka server from outside of the VM with http://a.b.c.d:32769 , its not available.
2.I am running the docker image with explicit port mapping : docker run -p 8761:8761 test/eureka-server
Then running docker ps command shows the port mapping as : 0.0.0.0:8761->8761/tcp
Try accessing the eureka server from outside of the VM with http://a.b.c.d:8761 , its available.
Why in the first case the eureka server is not available from out side the host machine even if there is a random port(32769) assigned by docker.
Is it necessary to have explicit port mapping to have docker app available from external network ?
Since you're looking for access from the outside world to the host via the mapped port you'll need to ensure that the source traffic is allowed to reach that port on the host and given protocol. I'm not a network security specialist, but I'd suggest that opening up an entire range of ports simply because you don't know which port docker will pick would be a bad idea. If you can, I'd say pick a port and explicitly map it and ensure the firewall allows access to that port from the appropriate source address(es) e.g. ALLOW TCP/8671 in from 10.0.1.2/32 as an example - obviously your specific address range will vary on your network configuration. Docker compose may help you keep this consistent (as will other orchestration technologies like Kubernetes). In addition if you use cloud hosting services like AWS you may be able to leverage VPC security groups to help you whitelist source traffic to the port without knowing all possible source IP addresses ahead of time.
You either have the firewall blocking this port, or from wherever you are making the requests, for certain ports your outgoing traffic is disabled, so your requests never leave your machine.
Some companies do this. They leave port 80, 443, and couple of more for their intranet, and disable all other destination ports.

Docker 1.12 Port Fowarding Services Across Nodes

So I've got a Plex server running on my Docker swarm!! If I kill a node magically it'll start Plex somewhere else. This is great! Now comes the fun part...
With old-school containers I would just port forward port 32400 on my router to the server that was running Plex and it would work find. Now that Plex can run in multiple different places I need to figure out how to forward the port to some static resource. I could use HAProxy to bind some bridge interface and run it on every node to provide failover...but I'd like to see if there's an easier way to accomplish this.
What's the best way to forward ports to services in Docker Swarm?
Port forwarding is built into the new swarm mode. There's a section on load balancing in the documentation:
The swarm manager uses ingress load balancing to expose the services
you want to make available externally to the swarm. The swarm manager
can automatically assign the service a PublishedPort or you can
configure a PublishedPort for the service in the 30000-32767 range.
External components, such as cloud load balancers, can access the
service on the PublishedPort of any node in the cluster whether or not
the node is currently running the task for the service. All nodes in
the swarm cluster route ingress connections to a running task
instance.
Swarm mode has an internal DNS component that automatically assigns
each service in the swarm a DNS entry. The swarm manager uses internal
load balancing to distribute requests among services within the
cluster based upon the DNS name of the service.
Update
The following article discusses how to integrate a proxy load balancer into the docker engine
https://technologyconversations.com/2016/08/01/integrating-proxy-with-docker-swarm-tour-around-docker-1-12-series/

How can I expose kubernetes services running within docker?

What I want to do is run kubernetes within docker and expose the kubernetes services externally. I followed the docs on getting kubernetes running within docker. As long as I connect from the localhost, I can access my services. However, connecting from a different computer doesn't work. If I spin up a docker image directly, then I can access it. Only things running within kubernetes aren't exposed. Is this possible?
Ensure your nodes have externally reachable IP addresses.
Then create a service of type NodePort:
https://github.com/kubernetes/kubernetes/blob/master/docs/user-guide/services.md#type-nodeport
And direct traffic to nodes at the allocated port.

Resources