I have set a docker swarm with multiple worker nodes.
My current Jupyterhub setup with SwarmSpawner works fine, I am able to deploy single-user docker images based on user-selected image before spawning the image, using _options_form_default in my jupyterhub_config.py.
What I would like now is to give users the possibility to select the swarm worker node name (hostname) on which he would like to spawn his single-user JupyterHub image, because our worker nodes have different types of hardware specs (GPUs, RAM, processors etc) and users know in advance the name of the host he would like to use.
Is it possible to determine the node on which to spawn the image ?
My current swarm has for example 1 master node: "master" and 3 worker nodes: "node1", "node2", "node3" (those are their hostnames, as it appears in the column HOSTNAME in the output of the command docker node ls on the master node).
So what I would like is that, just as it appears in the image below, users have a dropdown selection of the swarm worker nodes hostnames on which they would like to spawn their jupyterhub image, with a question such as: "Select the server name".
Ok so I actually figured out how to do that.
Here is the relevant part in my jupyterhub_config.py:
class CustomFormSpawner(SwarmSpawner):
# Shows frontend form to user for host selection
# The option values should correspond to the hostnames
# that appear in the `docker node ls` command output
def _options_form_default(self):
return """
<label for="hostname">Select your desired host</label>
<select name="hostname" size="1">
<option value="node1">node1 - GPU: RTX 2070 / CPU: 40</option>
<option value="node2">node2 - GPU: GTX 1080 / CPU: 32</option>
</select>
"""
# Retrieve the selected choice and set the swarm placement constraints
def options_from_form(self, formdata):
options = {}
options['hostname'] = formdata['hostname']
hostname = ''.join(formdata['hostname'])
self.extra_placement_spec = { 'constraints' : ['node.hostname==' + hostname] }
return options
c.JupyterHub.spawner_class = CustomFormSpawner
Related
I am moving a Docker Image from Docker to a K8s Deployment. I have auto-scale rules on so it starts 5 but can go to 12. The Docker image on K8s starts perfectly with a k8s service in front to cluster the Deployment.
Now each container has its own JVM which has a Prometheus app retrieving its stats. In Docker, this is no problem because the port that serves Prometheus info is dynamically created with a starting port of 8000, so the docker-compose.yml grows the port by 1 based on how many images are started.
The problem is that I can't find how to do this in a K8s [deployment].yml file. Because Deployment pods are dynamic, I would have thought there would be some way to set a starting HOST port to be incremented based on how many containers are started.
Maybe I am looking at this the wrong way so any clarification would be helpful meanwhile will keep searching the Google for any info on such a thing.
Well after reading and reading and reading so much I came to the conclusion that K8s is not responsible to open ports for a Docker Image or provide ingress to your app on some weird port, it's not its responsibility. K8s Deployment just deploys the Pods you requested. You can set the Ports option on a DEPLOYMENT -> SPEC -> CONTAINERS -> PORTS which just like Docker is only informational. But this allows you to JSONPath query for all PODS(containers) with a Prometheus port available. This will allow you to rebuild the "targets" value in Prometheus.yaml file. Now having those targets makes them available to Grafana to create a dashboard.
That's it, pretty easy. I was complicating something because I did not understand it. I am including a script I QUICKLY wrote to get something going USE AT YOUR OWN RISK.
By the way, I use Pod and Container interchangeably.
#!/usr/bin/env bash
#set -x
_MyappPrometheusPort=8055
_finalIpsPortArray=()
_prometheusyamlFile=prometheus.yml
cd /docker/images/prometheus
#######################################################################################################################################################
#One container on the K8s System is weave and it holds the subnet we need to validate against.
#weave-net-lwzrk 2/2 Running 8 (7d3h ago) 9d 192.168.2.16 accl-ffm-srv-006 <none> <none>
_weavenet=$(kubectl get pod -n kube-system -o wide | grep weave | cut -d ' ' -f1 )
echo "_weavenet: $_weavenet"
#The default subnet is the one that lets us know the conntainer is part of kubernetes network.
# Range: 10.32.0.0/12
# DefaultSubnet: 10.32.0.0/12
_subnet=$( kubectl exec -n kube-system $_weavenet -c weave -- /home/weave/weave --local status | sed -En "s/^(.*)(DefaultSubnet:\s)(.*)?/\3/p" )
echo "_subnet: $_subnet"
_cidr2=$( echo "$_subnet" | cut -d '/' -f2 )
echo "_cidr2: /$_cidr2"
#######################################################################################################################################################
#This is an array of the currently monitored containers that prometheus was sstarted with.
#We will remove any containers form the array that fit the K8s Weavenet subnet with the myapp prometheus port.
_targetLineFound_array=($( egrep '^\s{1,20}-\s{0,5}targets\s{0,5}:\s{0,5}\[.*\]' $_prometheusyamlFile | sed -En "s/(.*-\stargets:\s\[)(.*)(\]).*/\2/p" | tr "," "\n"))
for index in "${_targetLineFound_array[#]}"
do
_ip="${index//\'/$''}"
_ipTocheck=$( echo $_ip | cut -d ':' -f1 )
_portTocheck=$( echo $_ip | cut -d ':' -f2 )
#We need to check if the IP is within the subnet mask attained from K8s.
#The port must also be the prometheus port in case some other port is used also for Prometheus.
#This means the IP should be removed since we will put the list of IPs from
#K8s currently in production by Deployment/AutoScale rules.
#Network: 10.32.0.0/12
_isIpWithinSubnet=$( ipcalc $_ipTocheck/$_cidr2 | sed -En "s/^(.*)(Network:\s+)([0-9]{1}[0-9]?[0-9]?\.[0-9]{1}[0-9]?[0-9]?\.[0-9]{1}[0-9]?[0-9]?\.[0-9]{1}[0-9]?[0-9]?)(\/[0-9]{1}[0-9]{1}.*)?/\3/p" )
if [[ "$_isIpWithinSubnet/$_cidr2" == "$_subnet" && "$_portTocheck" == "$_MyappPrometheusPort" ]]; then
echo "IP managed by K8s will be deleted: _isIpWithinSubnet: ($_ip) $_isIpWithinSubnet"
else
_finalIpsPortArray+=("$_ip")
fi
done
#######################################################################################################################################################
#This is an array of the current running myapp App containers with a prometheus port that is available.
#From this list we will add them to the prometheus file to be available for Grafana monitoring.
readarray -t _currentK8sIpsArr < <( kubectl get pods --all-namespaces --chunk-size=0 -o json | jq '.items[] | select(.spec.containers[].ports != null) | select(.spec.containers[].ports[].containerPort == '$_MyappPrometheusPort' ) | .status.podIP' )
for index in "${!_currentK8sIpsArr[#]}"
do
_addIPToMonitoring=${_currentK8sIpsArr[index]//\"/$''}
echo "IP Managed by K8s as myapp app with prometheus currently running will be added to monitoring: $_addIPToMonitoring"
_finalIpsPortArray+=("$_addIPToMonitoring:$_MyappPrometheusPort")
done
######################################################################################################################################################
#we need to recreate this string and sed it into the file
#- targets: ['192.168.2.13:3201', '192.168.2.13:3202', '10.32.0.7:8055', '10.32.0.8:8055']
_finalPrometheusTargetString="- targets: ["
i=0
# Iterate the loop to read and print each array element
for index in "${!_finalIpsPortArray[#]}"
do
((i=i+1))
_finalPrometheusTargetString="$_finalPrometheusTargetString '${_finalIpsPortArray[index]}'"
if [[ $i != ${#_finalIpsPortArray[#]} ]]; then
_finalPrometheusTargetString="$_finalPrometheusTargetString,"
fi
done
_finalPrometheusTargetString="$_finalPrometheusTargetString]"
echo "$_finalPrometheusTargetString"
sed -i -E "s/(.*)-\stargets:\s\[.*\]/\1$_finalPrometheusTargetString/" ./$_prometheusyamlFile
docker-compose down
sleep 4
docker-compose up -d
echo "All changes were made. Exiting"
exit 0
Ideally, you should be using the Average of JVM across all the replicas. There is no meaning to create a different deployment with a different port if you are running the single same Docker image across all the replicas.
i think keeping a single deployment with resource requirements set to deployment would be the best practice.
You can get the JVM average of all the running replicas
sum(jvm_memory_max_bytes{area="heap", app="app-name",job="my-job"}) / sum(kube_pod_status_phase{phase="Running"})
as you are running the same Docker image across all replicas and K8s service by default will be managing the Load Balancing, average utilization would be an option to monitor.
Still, if you want to filter and get different values you can create different deployments (Not at all good way) or use the stateful sets.
You can also filter the data by hostname (POD name) in Prometheus, so will get the each replica usage.
I have a Spark and a hadoop cluster which were built with docker swarm. They are identified under the same network. I write a simple WordCount example with Scala:
val spark = SparkSession.builder().master("local").appName("test").getOrCreate()
val data = spark.sparkContext.textFile("hdfs://10.0.3.16:8088/Sample.txt")
val counts = data.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_ + _)
counts.foreach(println)
When I run the code in master node container of spark, the IP address changes with the name of the container and occurs an error
Illegal character in hostname at index 12: hdfs://spark_namenode.1.ywlf9yx9hcm4duhxnywn91i35.spark_overlay:9000
and I can not change the container name because it is not allowed in docker swarm.
I'm trying to create multiple user-defined bridge networks. But seem docker can create only 31 user-defined bridge networks per host machine. So, Can I increase more than 31 networks on my machine? Doesn't anyone know how I can do that, how to config my host machine? Thank you for your time!
Have a look at this:
This is due to the fact that it uses hardcoded list of broad network ranges – 172.17-31.x.x/16 and 192.168.x.x/20 – for bridge network driver.
You can get around this by manually specifying the subnet for each network created:
for net in {1..50}
do
docker network create -d bridge --subnet=172.18.${net}.0/24 net${net}
done
Since Docker version 18.06 the allocation ranges can be customized in the daemon configuration like so:
/etc/docker/daemon.json
{
"default-address-pools": [
{"base": "172.17.0.0/16", "size": 24}
]
}
This example would create a pool of /24 subnets out of a /16 range, increasing the bridge network limit from 31 to 255. More pools could be added as necessary. Note that this limits the number of attached containers per network to about 254.
Yer it is possible to extend network limit edit: /etc/docker/daemon.json:
{
"default-address-pools": [
{
"base":"172.17.0.0/12",
"size":16
},
{
"base":"192.168.0.0/16",
"size":20
},
{
"base":"10.99.0.0/16",
"size":24
}
]
}
(add param if not exists), then sudo service docker restart
First two are default docker address pools, last is one of the private network
With this change you have additionally 255 networks. New containers attach to new address pool 10.99.0.0/16.
Im running jgroups application inside docker containers. Containers are running across two nodes (A & B) and they are all connected using docker swarm mode overlay network.
I referred to https://docs.docker.com/engine/swarm/networking/
Here is what i did and observed.
Jgroup bind address is set to the overlay network IP of the container
containers running inside the same node are forming a cluster.
I used nslookup to ensure that the overlay network IP of a container running in node A is reachable by a container running in node B
docker node ls display the nodes correctly and able to schedule the containers instances successfully.
Docker Version: Docker version 17.03.1-ce, build c6d412e
OS: Ubuntu 16.04.2
Cloud: AWS (Port 7946 TCP/UDP and 4789 UDP are open)
JGroups: 3.6.6
Jgroups XML:
<config xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="urn:org:jgroups"
xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/jgroups.xsd">
<TCP bind_port="7600"
recv_buf_size="${tcp.recv_buf_size:5M}"
send_buf_size="${tcp.send_buf_size:5M}"
max_bundle_size="64K"
max_bundle_timeout="30"
use_send_queues="true"
sock_conn_timeout="300"
timer_type="new3"
timer.min_threads="4"
timer.max_threads="10"
timer.keep_alive_time="3000"
timer.queue_max_size="500"
thread_pool.enabled="true"
thread_pool.min_threads="2"
thread_pool.max_threads="8"
thread_pool.keep_alive_time="5000"
thread_pool.queue_enabled="true"
thread_pool.queue_max_size="10000"
thread_pool.rejection_policy="discard"
oob_thread_pool.enabled="true"
oob_thread_pool.min_threads="1"
oob_thread_pool.max_threads="8"
oob_thread_pool.keep_alive_time="5000"
oob_thread_pool.queue_enabled="false"
oob_thread_pool.queue_max_size="100"
oob_thread_pool.rejection_policy="discard"/>
<JDBC_PING connection_url="jdbc:mysql://${database.host:localhost}:3306/mydb"
connection_username="root"
connection_password="root"
connection_driver="com.mysql.jdbc.Driver"
initialize_sql="CREATE TABLE JGROUPSPING (
own_addr varchar(200) NOT NULL,
bind_addr varchar(200) NOT NULL,
created timestamp NOT NULL,
cluster_name varchar(200) NOT NULL,
ping_data varbinary(5000) DEFAULT NULL,
PRIMARY KEY (`own_addr`,`cluster_name`))"
insert_single_sql="INSERT INTO JGROUPSPING (own_addr, bind_addr, created, cluster_name, ping_data) values (?,
'${jgroups.bind_addr:127.0.0.1}',now(), ?, ?)"
delete_single_sql="DELETE FROM JGROUPSPING WHERE own_addr=? AND cluster_name=?"
select_all_pingdata_sql="SELECT ping_data FROM JGROUPSPING WHERE cluster_name=?"
/>
<MERGE3 min_interval="10000"
max_interval="30000"/>
<FD_SOCK/>
<FD timeout="3000" max_tries="3" />
<VERIFY_SUSPECT timeout="1500" />
<BARRIER />
<pbcast.NAKACK2 use_mcast_xmit="false"
discard_delivered_msgs="true"/>
<UNICAST3 />
<pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
max_bytes="4M"/>
<pbcast.GMS print_local_addr="true" join_timeout="2000"
view_bundling="true"/>
<MFC max_credits="2M"
min_threshold="0.4"/>
<FRAG2 frag_size="60K" />
<RSVP resend_interval="2000" timeout="10000"/>
<pbcast.STATE_TRANSFER/>
Any help is appreciated
You probably need to use host networking (--net=host) or use external_addr in the transport in addition to bind_addr.
An alternative if you have DNS available is DNS_PING [2].
For details take a look at [1].
[1] https://github.com/belaban/jgroups-docker
[2] http://www.jgroups.org/manual4/index.html#_dns_ping
I am facing a huge problem at deploying a dashDB local cluster. After a successful deployment the following error comes in case of trying to create a single table or launch a query. Furthermore webserver is not working properly like in previous SMP deployment.
Cannot connect to database "BLUDB" on node "20" because the difference
between the system time on the catalog node and the virtual timestamp
on this node is greater than the max_time_diff database manager
configuration parameter.. SQLCODE=-1472, SQLSTATE=08004,
DRIVER=4.18.60
I followed official deployment guide, so followings were doublechecked:
each physical machines' and docker containers' /etc/hosts file contains all ips, fully qualified and simple hostnames
there is a NFS preconfigured and mounted to /mnt/clusterfs on every single server
none of the servers signed an error at phase "docker logs --follow dashDB" command
nodes config file is located in /mnt/clusterfs directory
After starting dashDB with following command:
docker exec -it dashDB start
It looks as it should be (see below), but the error can be found at /opt/ibm/dsserver/logs/dsserver.0.log.
#
--- dashDB stack service status summary ---
##################################################################### Redirecting to /bin/systemctl status slapd.service
SUMMARY
LDAPrunning: SUCCESS
dashDBtablesOnline: SUCCESS
WebConsole : SUCCESS
dashDBconnectivity : SUCCESS
dashDBrunning : SUCCESS
#
--- dashDB high availability status ---
#
Configuring dashDB high availability ... Stopping the system Stopping
datanode dashdb02 Stopping datanode dashdb01 Stopping headnode
dashdb03 Running sm on head node dashdb03 .. Running sm on data node
dashdb02 .. Running sm on data node dashdb01 .. Attempting to activate
previously failed nodes, if any ... SM is RUNNING on headnode dashdb03
(ACTIVE) SM is RUNNING on datanode dashdb02 (ACTIVE) SM is RUNNING on
datanode dashdb01 (ACTIVE) Overall status : RUNNING
After several redeployment nothing has changed. Please help me in what I am doing wrong.
Many Thanks, Daniel
Always make sure to NTP service is started on every single cluster node before starting a docker container. Otherwise it will take no effect on it.