Vert.x high availability is not working - docker

Brief description
I'm just getting started with VertX and I wanted to try the high availability feature with a little toy example. In my setup I have a fatjar application which is deployed to several docker containers. The application programmatically creates an instance of VertX and starts one verticle called ContainerVerticle. This runs an HTTP server and acts as a "launcher" - when a "SPAWN" command is received it deploys another verticle called AppVerticle in high-availability mode. The idea is that I want to run this on 3 containers and then kill the JVM on one of them, that should redeploy the AppVerticle to another docker container.
Actual result: the verticles can talk to each other using the event bus, also the cluster seems to work correctly: according to the log file the members see each other. However, when I kill one verticle, it is not getting redeployed.
More details
(All source code is written in Kotlin)
Vertx initialization:
val hzConfig = Config()
val mgr = HazelcastClusterManager(hzConfig) // empty config -> use default
val hostAddress = getAddress() // get the local ip address (not localhost!)
val options = VertxOptions()
.setClustered(true)
.setClusterHost(hostAddress)
.setClusterPort(18001)
.setClusterManager(mgr)
//.setQuorumSize(2)
.setHAEnabled(true)
val eventBusOptions = EventBusOptions()
eventBusOptions
.setClustered(true)
.setHost(hostAddress)
.setPort(18002)
options.setEventBusOptions(eventBusOptions)
Vertx.clusteredVertx(options) { res ->
if (res.succeeded()) {
val vertx = res.result()
vertx.deployVerticle(ContainerVerticle::class.java.name,
DeploymentOptions()
.setHa(false)) // ContainerVerticle should not restart
}
}
ContainerVerticle (our 'launcher')
class ContainerVerticle : AbstractVerticle() {
...
override fun start(startFuture: Future<Void>?) {
val router = createRouter()
val port = config().getInteger("http.port", 8080)
vertx.eventBus().consumer<Any>("mynamspace.container.spawn") { message ->
val appVerticleID = message.body()
log.info(" - HANDLE SPAWN message \"${appVerticleID}\"")
val appVerticleConfig = JsonObject().put("ID", appVerticleID)
vertx.deployVerticle(AppVerticle::class.java.name, // Deploy the APP!!!
DeploymentOptions()
.setConfig(appVerticleConfig)
.setInstances(1)
.setHa(true))
}
vertx.createHttpServer()... // omitted (see github link)
}
private fun createRouter(): Router { ... } // omitted (see github link)
val handlerRoot = Handler<RoutingContext> { routingContext ->
val cmd = routingContext.bodyAsString
val tokens = cmd.split(" ")
if (tokens[0] == "spawn") {
vertx.eventBus().send("mynamspace.container.spawn", tokens[1]) // round-robin
routingContext.response().end("Successfully handled command ${cmd}\n")
} else if (tokens[0] == "send") {
vertx.eventBus().send("mynamspace.app.${tokens[1]}", tokens[2])
routingContext.response().end("success\n")
} else {
routingContext.response().end("ERROR: Unknown command ${cmd}\n")
}
}
}
The last part: the AppVerticle:
class AppVerticle : AbstractVerticle() {
var timerID = 0L
override fun start(startFuture: Future<Void>?) {
val id = config().getString("ID")
log.info(" SPAWNED app verticle \"${id}\"")
vertx.eventBus().consumer<Any>("mynamspace.app.${id}") { message ->
val cmd = message.body()
log.info(" - app verticle \"${id}\" handled message ${cmd}")
}
timerID = vertx.setPeriodic(1000) {
log.info(" - app verticle \"${id}\" is alive")
}
}
}
Running
Open 3 terminals and run 3 docker instances. Minor details: here we re-map port 8080 to three different ports 8081, 8082, 8083, we also give unique names to the containers: cont1, cont2, cont3
docker run --name "cont1" -it --rm -p 8081:8080 -v $PWD/build/libs:/app anapsix/alpine-java java -jar /app/vertxhaeval-1.0-SNAPSHOT-all.jar
docker run --name "cont2" -it --rm -p 8082:8080 -v $PWD/build/libs:/app anapsix/alpine-java java -jar /app/vertxhaeval-1.0-SNAPSHOT-all.jar
docker run --name "cont2" -it --rm -p 8083:8080 -v $PWD/build/libs:/app anapsix/alpine-java java -jar /app/vertxhaeval-1.0-SNAPSHOT-all.jar
Observation 1
The cluster members seem to see each other because of the following message:
Members [3] {
Member [172.17.0.2]:5701 - 1d50394c-cf11-4bd7-877e-7e06e2959940 this
Member [172.17.0.3]:5701 - 3fa2cff4-ba9e-431b-9c4e-7b1fd8de9437
Member [172.17.0.4]:5701 - b9a3114a-7c15-4992-b609-63c0f22ed388
}
Also we can span the AppContainer:
curl -d "spawn -={Application-1}=-" -XPOST http://localhost:8083
The message bus seems to work correctly, because we see that the spawn message gets delivered to ContainerVerticle in round-robin fashion.
Observation 2- the problem
Now let's try to kill the verticle (assuming it runs in cont2):
docker kill --signal=SIGKILL cont2
The remaining containers seems to react to that event, the log file has something like this:
Aug 14, 2018 8:18:45 AM com.hazelcast.internal.cluster.ClusterService
INFO: [172.17.0.4]:5701 [dev] [3.8.2] Removing Member [172.17.0.2]:5701 - fbe67a02-80a3-4207-aa10-110fc09e0607
Aug 14, 2018 8:18:45 AM com.hazelcast.internal.cluster.ClusterService
INFO: [172.17.0.4]:5701 [dev] [3.8.2]
Members [2] {
Member [172.17.0.3]:5701 - 8b93a822-aa7f-460d-aa3e-568e0d85067c
Member [172.17.0.4]:5701 - b0ecea8e-59f1-440c-82ca-45a086842004 this
}
However the AppVerticle does NOT get redeployed.
The full source code is available on github:
https://github.com/conceptacid/vertx-ha-eval

I spent several hours debugging this, but finally found it.
So here is the solution:
Your verticle start method header is:
override fun start(startFuture: Future<Void>?)
You're overriding start methods which gives you the future that will waited for after the start of the verticle. Vert.x waits forever for the completion of this future since you do not call
startFuture.complete()
at the end of the method.
So the verticle will never be added to the verticle-list of the HAManager and so will not be redeployed.
Alternatively, you can use
override fun start()
as method header if your verticle does a simple, synchronous start-up.
Hope this helps.

Related

How to define a rule to capture alerts when any manual command gets executed inside the container on Falco

Installed Falco drivers on the host.
Able to capture alerts for specific conditions like when there is a process spawned or if any script is getting executed inside the container. But the requirement is to trigger an alert whenever any manual command gets executed inside the container.
Is there any custom condition we use to generate an alert whenever any command gets executed inside a container?
Expecting the below condition should capture an alert whenever command line contains newline char or pressed enter inside a container or the command executed contains any .sh but this didn't work.
- rule: shell_in_container
desc: notice shell activity within a container
condition: >
container.id != host and
proc.cmdline contains "\n" or
proc.cmdline endswith ".sh"
output: >
shell in a container
(user=%user.name container_id=%container.id container_name=%container.name
shell=%proc.name parent=%proc.pname source_ip=%fd.rip cmdline=%proc.cmdline)
priority: WARNING
Your question made me go and read about falco(I learned a new lesson today). After installing falco and reading its documentation, I found a solution that seems to work.
- rule: shell_in_container
desc: notice shell activity within a container
condition: >
container.id != host and
proc.cmdline != ""
output: >
shell in a container
(user=%user.name container_id=%container.id container_name=%container.name
shell=%proc.name parent=%proc.pname source_ip=%fd.rip cmdline=%proc.cmdline)
priority: WARNING
Below rule is generating alerts whenever there is a manual command executed inside container (exec with bash or sh) with all the required fields in the output.
Support for pod ip to be present in falco version 0.35. work is in progress.
https://github.com/falcosecurity/libs/pull/708 and will be called container.ip (but effectively it is the Pod_IP since all containers share the network stack of the pod) and container.cni.json for a complete view in case you have dual-stack and multiple interfaces.
- rule: shell_in_container
desc: notice shell activity within a container
condition: >
container.id != host and
evt.type = execve and
(proc.pname = bash or
proc.pname = sh) and
proc.cmdline != bash
output: >
(user=%user.name command=%proc.cmdline timestamp=%evt.datetime.s container_id=%container.id container_name=%container.name pod_name=%k8s.pod.name proc_name=%proc.name proc_pname=%proc.pname res=%evt.res)
priority: informational

error when pulling a docker container using singularity in nextflow

I am making a very short workflow in which I use a tool for my analysis called salmon.
In the hpc that I am working in, I cannot install this tool so I decided to pull the container from biocontainers.
In the hoc we do not have docker installed (I also do not have permission to do so) but we have singularity instead.
So I have to pull docker container (from: quay.io/biocontainers/salmon:1.2.1--hf69c8f4_0) using singularity.
The workflow management system that I am working with is nextflow.
This is the short workflow I made (index.nf):
#!/usr/bin/env nextflow
nextflow.preview.dsl=2
container = 'quay.io/biocontainers/salmon:1.2.1--hf69c8f4_0'
shell = ['/bin/bash', '-euo', 'pipefail']
process INDEX {
script:
"""
salmon index \
-t /hpc/genome/gencode.v39.transcripts.fa \
-i index \
"""
}
workflow {
INDEX()
}
I run it using this command:
nextflow run index.nf -resume
But got this error:
salmon: command not found
Do you know how I can fix the issue?
You are so close! All you need to do is move these directives into your nextflow.config or declare them at the top of your process body:
container = 'quay.io/biocontainers/salmon:1.2.1--hf69c8f4_0'
shell = ['/bin/bash', '-euo', 'pipefail']
My preference is to use a process selector to assign the container directive. So for example, your nextflow.config might look like:
process {
shell = ['/bin/bash', '-euo', 'pipefail']
withName: INDEX {
container = 'quay.io/biocontainers/salmon:1.2.1--hf69c8f4_0'
}
}
singularity {
enabled = true
// not strictly necessary, but highly recommended
cacheDir = '/path/to/singularity/cache'
}
And your index.nf might then look like:
nextflow.enable.dsl=2
params.transcripts = '/hpc/genome/gencode.v39.transcripts.fa'
process INDEX {
input:
path fasta
output:
path 'index'
"""
salmon index \\
-t "${fasta}" \\
-i index \\
"""
}
workflow {
transcripts = file( params.transcripts )
INDEX( transcripts )
}
If run using:
nextflow run -ansi-log false index.nf
You should see the following results:
N E X T F L O W ~ version 21.04.3
Launching `index.nf` [clever_bassi] - revision: d235de22c4
Pulling Singularity image docker://quay.io/biocontainers/salmon:1.2.1--hf69c8f4_0 [cache /path/to/singularity/cache/quay.io-biocontainers-salmon-1.2.1--hf69c8f4_0.img]
[8a/279df4] Submitted process > INDEX

How does /dev work in Docker?

I had some issues with /dev/random and I replaced it with /dev/urandom on some of my servers:
lrwxrwxrwx 1 root root 12 Jul 22 21:04 /dev/random -> /dev/urandom
I since have replaced some of my infrastructure with Docker. I thought, it would be sufficient to replace /dev/random on my host machine. But when starting the service, I quickly noticed that some RNG operations blocked because it used the original implementation of /dev/random.
I wrote a little test program to proof this behavior:
package main
import (
"fmt"
"os"
)
func main() {
f, err := os.Open("/dev/random")
if err != nil {
panic(err)
}
chunk := make([]byte, 1024)
index := 0
for {
index = index + 1
_, err := f.Read(chunk)
if err != nil {
panic(err)
}
fmt.Println("iteration ", index)
}
}
Executing this program on the host machine (which has the symlink) works as expected - it will not block and run until I shut it down.
When running this in a container, it will block after the first iteration (at least on my machine).
I can obviously fix this problem by mounting my random file into the container:
docker run -it --rm -v /dev/random:/dev/random bla
But this is not the point of this question. I want to know, the following:
How does Docker set up the devices listed in /dev
Why is it not just using (some) of the device files of the host machine.
Docker never uses any of the host system's filesystem unless you explicitly instruct it to. The device files in /dev are whatever (if anything) is baked into the image.
Also note that the conventional Unix device model is that a device "file" is either a character- or block-special device, and a pair of major and minor device numbers, and any actual handling is done in the kernel. Relatedly, the host's kernel is shared across all Docker containers. So if you fix your host's broken implementation of /dev/random then your containers will inherit this fix too; but merely changing what's in the host's /dev will have no effect.

How to get Task ID from within ECS container?

Hello I am interested in retrieving the Task ID from within inside a running container which lives inside of a EC2 host machine.
AWS ECS documentation states there is an environment variable ECS_CONTAINER_METADATA_FILE with the location of this data but will only be set/available if ECS_ENABLE_CONTAINER_METADATA variable is set to true upon cluster/EC2 instance creation. I don't see where this can be done in the aws console.
Also, the docs state that this can be done by setting this to true inside the host machine but would require to restart the docker agent.
Is there any other way to do this without having to go inside the EC2 to set this and restart the docker agent?
This doesn't work for newer Amazon ECS container versions anymore, and in fact it's now much simpler and also enabled by default. Please refer to this docu, but here's a TL;DR:
If you're using Amazon ECS container agent version 1.39.0 and higher, you can just do this inside the docker container:
curl -s "$ECS_CONTAINER_METADATA_URI_V4/task" \
| jq -r ".TaskARN" \
| cut -d "/" -f 3
Here's a list of container agent releases, but if you're using :latest – you're definitely fine.
The technique I'd use is to set the environment variable in the container definition.
If you're managing your tasks via Cloudformation, the relevant yaml looks like so:
Taskdef:
Type: AWS::ECS::TaskDefinition
Properties:
...
ContainerDefinitions:
- Name: some-name
...
Environment:
- Name: AWS_DEFAULT_REGION
Value: !Ref AWS::Region
- Name: ECS_ENABLE_CONTAINER_METADATA
Value: 'true'
This technique helps you keep everything straightforward and reproducible.
If you need metadata programmatically and don't have access to the metadata file, you can query the agent's metadata endpoint:
curl http://localhost:51678/v1/metadata
Note that if you're getting this information as a running task, you may not be able to connect to the loopback device, but you can connect to the EC2 instance's own IP address.
We set it with the so called user data, which are executed at the start of the machine. There are multiple ways to set it, for example: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/user-data.html#user-data-console
It could look like this:
#!/bin/bash
cat <<'EOF' >> /etc/ecs/ecs.config
ECS_CLUSTER=ecs-staging
ECS_ENABLE_CONTAINER_METADATA=true
EOF
Important: Adjust the ECS_CLUSTER above to match your cluster name, otherwise the instance will not connect to that cluster.
Previous answers are correct, here is another way of doing this:
From the ec2 instance where container is running, run this command
curl http://localhost:51678/v1/tasks | python -mjson.tool |less
From the AWS ECS cli Documentation
Command:
aws ecs list-tasks --cluster default
Output:
{
"taskArns": [
"arn:aws:ecs:us-east-1:<aws_account_id>:task/0cc43cdb-3bee-4407-9c26-c0e6ea5bee84",
"arn:aws:ecs:us-east-1:<aws_account_id>:task/6b809ef6-c67e-4467-921f-ee261c15a0a1"
]
}
To list the tasks on a particular container instance
This example command lists the tasks of a specified container instance, using the container instance UUID as a filter.
Command:
aws ecs list-tasks --cluster default --container-instance f6bbb147-5370-4ace-8c73-c7181ded911f
Output:
{
"taskArns": [
"arn:aws:ecs:us-east-1:<aws_account_id>:task/0cc43cdb-3bee-4407-9c26-c0e6ea5bee84"
]
}
My ECS solution as bash and Python snippets. Logging calls are able to print for debug by piping to sys.stderr while print() is used to pass the value back to a shell script
#!/bin/bash
TASK_ID=$(python3.8 get_ecs_task_id.py)
echo "TASK_ID: ${TASK_ID}"
Python script - get_ecs_task_id.py
import json
import logging
import os
import sys
import requests
# logging configuration
# file_handler = logging.FileHandler(filename='tmp.log')
# redirecting to stderr so I can pass back extracted task id in STDOUT
stdout_handler = logging.StreamHandler(stream=sys.stderr)
# handlers = [file_handler, stdout_handler]
handlers = [stdout_handler]
logging.basicConfig(
level=logging.INFO,
format="[%(asctime)s] {%(filename)s:%(lineno)d} %(levelname)s - %(message)s",
handlers=handlers,
datefmt="%Y-%m-%d %H:%M:%S",
)
logger = logging.getLogger(__name__)
def get_ecs_task_id(host):
path = "/task"
url = host + path
headers = {"Content-Type": "application/json"}
r = requests.get(url, headers=headers)
logger.debug(f"r: {r}")
d_r = json.loads(r.text)
logger.debug(d_r)
ecs_task_arn = d_r["TaskARN"]
ecs_task_id = ecs_task_arn.split("/")[2]
return ecs_task_id
def main():
logger.debug("Extracting task ID from $ECS_CONTAINER_METADATA_URI_V4")
logger.debug("Inside get_ecs_task_id.py, redirecting logs to stderr")
logger.debug("so that I can pass the task id back in STDOUT")
host = os.environ["ECS_CONTAINER_METADATA_URI_V4"]
ecs_task_id = get_ecs_task_id(host)
# This print statement passes the string back to the bash wrapper, don't remove
logger.debug(ecs_task_id)
print(ecs_task_id)
if __name__ == "__main__":
main()

k6 using docker with mounted volume errors on run, with "accepts 1 arg(s), received 2"

I'm trying to run perf test in my CI environment, using the k6 docker, and a simple single script file works fine. However, I want to break down my tests into multiple JS files. In order to do this, I need to mount a volume on Docker so I can import local modules.
The volume seems to be mounting correctly, with my command
docker run --env-file ./test/performance/env/perf.list -v \
`pwd`/test/performance:/perf -i loadimpact/k6 run - /perf/index.js
k6 seems to start, but immediately errors with
time="2018-01-17T13:04:17Z" level=error msg="accepts 1 arg(s), received 2"
Locally, my file system looks something like
/toychicken
/test
/performance
/env
- perf.list
- index.js
- something.js
And the index.js looks like this
import { check, sleep } from 'k6'
import http from 'k6/http'
import something from '/perf/something'
export default () => {
const r = http.get(`https://${__ENV.DOMAIN}`)
check(r, {
'status is 200': r => r.status === 200
})
sleep(2)
something()
}
You need to remove the "-" after run in the Docker command. The "-" instructs k6 to read from stdin, but in this case you want to load the main JS file from the file system. That's why it complains that it receives two args, one being the "-" and the second being the path to index.js (the error message could definitely be more descriptive).
You'll also need to add .js to the '/perf/something' import.

Resources