Spark with docker swarm: Illegal character in hostname - docker

I have a Spark and a hadoop cluster which were built with docker swarm. They are identified under the same network. I write a simple WordCount example with Scala:
val spark = SparkSession.builder().master("local").appName("test").getOrCreate()
val data = spark.sparkContext.textFile("hdfs://10.0.3.16:8088/Sample.txt")
val counts = data.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_ + _)
counts.foreach(println)
When I run the code in master node container of spark, the IP address changes with the name of the container and occurs an error
Illegal character in hostname at index 12: hdfs://spark_namenode.1.ywlf9yx9hcm4duhxnywn91i35.spark_overlay:9000
and I can not change the container name because it is not allowed in docker swarm.

Related

Docker error: standard_init_linux.go:228: exec user process caused: exec format error

I was able to build a multiarch image successfully from an M1 Macbook which is arm64.
Here's my docker file and trying to run from a raspberrypi aarch64/arm64 and I am getting this error when running the image: standard_init_linux.go:228: exec user process caused: exec format error
Editing the post with the python file as well:
FROM frolvlad/alpine-python3
RUN pip3 install docker
RUN mkdir /hoster
WORKDIR /hoster
ADD hoster.py /hoster/
CMD ["python3", "-u", "hoster.py"]
#!/usr/bin/python3
import docker
import argparse
import shutil
import signal
import time
import sys
import os
label_name = "hoster.domains"
enclosing_pattern = "#-----------Docker-Hoster-Domains----------\n"
hosts_path = "/tmp/hosts"
hosts = {}
def signal_handler(signal, frame):
global hosts
hosts = {}
update_hosts_file()
sys.exit(0)
def main():
# register the exit signals
signal.signal(signal.SIGINT, signal_handler)
signal.signal(signal.SIGTERM, signal_handler)
args = parse_args()
global hosts_path
hosts_path = args.file
dockerClient = docker.APIClient(base_url='unix://%s' % args.socket)
events = dockerClient.events(decode=True)
#get running containers
for c in dockerClient.containers(quiet=True, all=False):
container_id = c["Id"]
container = get_container_data(dockerClient, container_id)
hosts[container_id] = container
update_hosts_file()
#listen for events to keep the hosts file updated
for e in events:
if e["Type"]!="container":
continue
status = e["status"]
if status =="start":
container_id = e["id"]
container = get_container_data(dockerClient, container_id)
hosts[container_id] = container
update_hosts_file()
if status=="stop" or status=="die" or status=="destroy":
container_id = e["id"]
if container_id in hosts:
hosts.pop(container_id)
update_hosts_file()
def get_container_data(dockerClient, container_id):
#extract all the info with the docker api
info = dockerClient.inspect_container(container_id)
container_hostname = info["Config"]["Hostname"]
container_name = info["Name"].strip("/")
container_ip = info["NetworkSettings"]["IPAddress"]
if info["Config"]["Domainname"]:
container_hostname = container_hostname + "." + info["Config"]["Domainname"]
result = []
for values in info["NetworkSettings"]["Networks"].values():
if not values["Aliases"]:
continue
result.append({
"ip": values["IPAddress"] ,
"name": container_name,
"domains": set(values["Aliases"] + [container_name, container_hostname])
})
if container_ip:
result.append({"ip": container_ip, "name": container_name, "domains": [container_name, container_hostname ]})
return result
def update_hosts_file():
if len(hosts)==0:
print("Removing all hosts before exit...")
else:
print("Updating hosts file with:")
for id,addresses in hosts.items():
for addr in addresses:
print("ip: %s domains: %s" % (addr["ip"], addr["domains"]))
#read all the lines of thge original file
lines = []
with open(hosts_path,"r+") as hosts_file:
lines = hosts_file.readlines()
#remove all the lines after the known pattern
for i,line in enumerate(lines):
if line==enclosing_pattern:
lines = lines[:i]
break;
#remove all the trailing newlines on the line list
if lines:
while lines[-1].strip()=="": lines.pop()
#append all the domain lines
if len(hosts)>0:
lines.append("\n\n"+enclosing_pattern)
for id, addresses in hosts.items():
for addr in addresses:
lines.append("%s %s\n"%(addr["ip"]," ".join(addr["domains"])))
lines.append("#-----Do-not-add-hosts-after-this-line-----\n\n")
#write it on the auxiliar file
aux_file_path = hosts_path+".aux"
with open(aux_file_path,"w") as aux_hosts:
aux_hosts.writelines(lines)
#replace etc/hosts with aux file, making it atomic
shutil.move(aux_file_path, hosts_path)
def parse_args():
parser = argparse.ArgumentParser(description='Synchronize running docker container IPs with host /etc/hosts file.')
parser.add_argument('socket', type=str, nargs="?", default="tmp/docker.sock", help='The docker socket to listen for docker events.')
parser.add_argument('file', type=str, nargs="?", default="/tmp/hosts", help='The /etc/hosts file to sync the containers with.')
return parser.parse_args()
if __name__ == '__main__':
main()
A "multiarch" Python interpreter built on MacOS is intended to target MacOS-on-Intel and MacOS-on-Apple's-arm64.
There is absolutely no binary compatibility with Linux-on-Apple's-arm64, or with Linux-on-aarch64. You can't run MacOS executables on Linux, no matter if the architecture matches or not.
This happens when you build the image in a machine(host) whose operating system/platform that is different from the platform that you want to spin up the containers.
The solution is to build the docker image using the same machine/operating system(that needs to run it/that needs to spin up the container(s)).
In my case I build NodeJS, Python, Nginx, redis and postgres images in an OSX host and was trying to spin up containers from the images in an Ubuntu debian host.
I solved by building the images in Ubuntu debian and span up the containers in the same platform(Ubuntu debian)

Jupyterhub SwarmSpawner select worker node

I have set a docker swarm with multiple worker nodes.
My current Jupyterhub setup with SwarmSpawner works fine, I am able to deploy single-user docker images based on user-selected image before spawning the image, using _options_form_default in my jupyterhub_config.py.
What I would like now is to give users the possibility to select the swarm worker node name (hostname) on which he would like to spawn his single-user JupyterHub image, because our worker nodes have different types of hardware specs (GPUs, RAM, processors etc) and users know in advance the name of the host he would like to use.
Is it possible to determine the node on which to spawn the image ?
My current swarm has for example 1 master node: "master" and 3 worker nodes: "node1", "node2", "node3" (those are their hostnames, as it appears in the column HOSTNAME in the output of the command docker node ls on the master node).
So what I would like is that, just as it appears in the image below, users have a dropdown selection of the swarm worker nodes hostnames on which they would like to spawn their jupyterhub image, with a question such as: "Select the server name".
Ok so I actually figured out how to do that.
Here is the relevant part in my jupyterhub_config.py:
class CustomFormSpawner(SwarmSpawner):
# Shows frontend form to user for host selection
# The option values should correspond to the hostnames
# that appear in the `docker node ls` command output
def _options_form_default(self):
return """
<label for="hostname">Select your desired host</label>
<select name="hostname" size="1">
<option value="node1">node1 - GPU: RTX 2070 / CPU: 40</option>
<option value="node2">node2 - GPU: GTX 1080 / CPU: 32</option>
</select>
"""
# Retrieve the selected choice and set the swarm placement constraints
def options_from_form(self, formdata):
options = {}
options['hostname'] = formdata['hostname']
hostname = ''.join(formdata['hostname'])
self.extra_placement_spec = { 'constraints' : ['node.hostname==' + hostname] }
return options
c.JupyterHub.spawner_class = CustomFormSpawner

spark app socket communication between container on docker spark cluster

So I have a Spark cluster running in Docker using Docker Compose. I'm using docker-spark images.
Then i add 2 more containers, 1 is behave as server (plain python) and 1 as client (spark streaming app). They both run on the same network.
For server (plain python) i have something like
import socket
s.bind(('', 9009))
s.listen(1)
print("Waiting for TCP connection...")
while True:
# Do and send stuff
And for my client (spark app) i have something like
conf = SparkConf()
conf.setAppName("MyApp")
sc = SparkContext(conf=conf)
sc.setLogLevel("ERROR")
ssc = StreamingContext(sc, 2)
ssc.checkpoint("my_checkpoint")
# read data from port 9009
dataStream = ssc.socketTextStream(PORT, 9009)
# What's PORT's value?
So what is PORT's value? is it the IP Adress value from docker inspect of the container?
Okay so i found that i can use the IP of the container, as long as all my containers are on the same network.
So i check the IP by running
docker inspect <container_id>
and check the IP, and use that as host for my socket
Edit:
I know it's kinda late, but i just found out that i can actually use the container's name as long as they're in the same network
More edit:
i made changes in docker-compose like:
container-1:
image: image-1
container_name: container-1
networks:
- network-1
container-2:
image: image-2
container_name: container-2
ports:
- "8000:8000"
networks:
- network-1
and then in my script (container 2):
conf = SparkConf()
conf.setAppName("MyApp")
sc = SparkContext(conf=conf)
sc.setLogLevel("ERROR")
ssc = StreamingContext(sc, 2)
ssc.checkpoint("my_checkpoint")
# read data from port 9009
dataStream = ssc.socketTextStream("container-1", 9009) #Put container's name here
I also expose the socket port in Dockerfile, I don't know if that have effect or not

I can attach to docker swarm as worker but can not attach as manager

After sudo docker swarm join --token XXXXX YYY.YYY.YYY.YYY:2377 I can attach to swarm as worker successfully. Than I leave this swarm from secondary/slave node and try again with management token. And receive:
Error response from daemon: manager stopped: can't initialize raft node: rpc error: code = Unknown desc = could not connect to prospective new cluster member using its advertised address: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Both nodes directly connected one to another. Firewall in both node is not working. What can be a reason of this issue?
You can add a node as worker and promote it to manager role
docker swarm join --token XXXXX YYY.YYY.YYY.YYY:2377
And on the manager node:
docker node promote SECOND_MANAGER_HOSTNAME
For me port 2377 was not open as a TCP destination in the subnet ingress rule.
I am using VMs from different VPCs.

HBase + TestContainers - Port Remapping

I am trying to use Test Containers to run an integration test against HBase launched in a Docker container. The problem I am running into may be a bit unique to how a client interacts with HBase.
When the HBase Master starts in the container, it stores its hostname:port in Zookeeper so that clients can find it. In this case, it stores "localhost:16000".
In my test case running outside the container, the client retrieves "localhost:16000" from Zookeeper and cannot connect. The connection fails because the port has been remapped by TestContainers to some other random port, other than 16000.
Any ideas how to overcome this?
(1) One idea is to find a way to tell the HBase Client to use the remapped port, ignoring the value it retrieved from Zookeeper, but I have yet to find a way to do this.
(2) If I could get the HBase Master to write the externally accessible host:port in Zookeeper that would also fix the problem. But I do not believe the container itself has any knowledge about how Test Containers is doing the port remapping.
(3) Perhaps there is a different solution that Test Containers provides for this sort of situation?
You can take a look at KafkaContainer's implementation where we start a Socat (fast tcp proxy) container first to acquire a semi-random port and use it later to configure the target container.
The algorithm is:
In doStart, first start Socat targetting the original container's network alias & port like 12345
Get mapped port (it will be something like 32109 pointing to 12345)
Make the original container (e.g. with environment variables) use the mapped port in addition to the original one, or, if only one port can be configured, see CouchbaseContainer for the more advanced option
Return Socat's host & port to the client
we build a new image of hbase to be compliant with test container.
Use this image:
docker run --env HBASE_MASTER_PORT=16000 --env HBASE_REGION_PORT=16020 jcjabouille/hbase-standalone:2.4.9
Then create this Container (in scala here)
private[test] class GenericHbase2Container
extends GenericContainer[GenericHbase2Container](
DockerImageName.parse("jcjabouille/hbase-standalone:2.4.9")
) {
private val randomMasterPort: Int = FreePortFinder.findFreeLocalPort(18000)
private val randomRegionPort: Int = FreePortFinder.findFreeLocalPort(20000)
private val hostName: String = InetAddress.getLocalHost.getHostName
val hbase2Configuration: Configuration = HBaseConfiguration.create
addExposedPort(randomMasterPort)
addExposedPort(randomRegionPort)
addExposedPort(2181)
withCreateContainerCmdModifier { cmd: CreateContainerCmd =>
cmd.withHostName(hostName)
()
}
waitingFor(Wait.forLogMessage(".*0 row.*", 1))
withStartupTimeout(Duration.ofMinutes(10))
withEnv("HBASE_MASTER_PORT", randomMasterPort.toString)
withEnv("HBASE_REGION_PORT", randomRegionPort.toString)
setPortBindings(Seq(s"$randomMasterPort:$randomMasterPort", s"$randomRegionPort:$randomRegionPort").asJava)
override protected def doStart(): Unit = {
super.doStart()
hbase2Configuration.set("hbase.client.pause", "200")
hbase2Configuration.set("hbase.client.retries.number", "10")
hbase2Configuration.set("hbase.rpc.timeout", "3000")
hbase2Configuration.set("hbase.client.operation.timeout", "3000")
hbase2Configuration.set("hbase.client.scanner.timeout.period", "10000")
hbase2Configuration.set("zookeeper.session.timeout", "10000")
hbase2Configuration.set("hbase.zookeeper.quorum", "localhost")
hbase2Configuration.set("hbase.zookeeper.property.clientPort", getMappedPort(2181).toString)
}
}
More details here: https://hub.docker.com/r/jcjabouille/hbase-standalone

Resources