error when pulling a docker container using singularity in nextflow - docker

I am making a very short workflow in which I use a tool for my analysis called salmon.
In the hpc that I am working in, I cannot install this tool so I decided to pull the container from biocontainers.
In the hoc we do not have docker installed (I also do not have permission to do so) but we have singularity instead.
So I have to pull docker container (from: quay.io/biocontainers/salmon:1.2.1--hf69c8f4_0) using singularity.
The workflow management system that I am working with is nextflow.
This is the short workflow I made (index.nf):
#!/usr/bin/env nextflow
nextflow.preview.dsl=2
container = 'quay.io/biocontainers/salmon:1.2.1--hf69c8f4_0'
shell = ['/bin/bash', '-euo', 'pipefail']
process INDEX {
script:
"""
salmon index \
-t /hpc/genome/gencode.v39.transcripts.fa \
-i index \
"""
}
workflow {
INDEX()
}
I run it using this command:
nextflow run index.nf -resume
But got this error:
salmon: command not found
Do you know how I can fix the issue?

You are so close! All you need to do is move these directives into your nextflow.config or declare them at the top of your process body:
container = 'quay.io/biocontainers/salmon:1.2.1--hf69c8f4_0'
shell = ['/bin/bash', '-euo', 'pipefail']
My preference is to use a process selector to assign the container directive. So for example, your nextflow.config might look like:
process {
shell = ['/bin/bash', '-euo', 'pipefail']
withName: INDEX {
container = 'quay.io/biocontainers/salmon:1.2.1--hf69c8f4_0'
}
}
singularity {
enabled = true
// not strictly necessary, but highly recommended
cacheDir = '/path/to/singularity/cache'
}
And your index.nf might then look like:
nextflow.enable.dsl=2
params.transcripts = '/hpc/genome/gencode.v39.transcripts.fa'
process INDEX {
input:
path fasta
output:
path 'index'
"""
salmon index \\
-t "${fasta}" \\
-i index \\
"""
}
workflow {
transcripts = file( params.transcripts )
INDEX( transcripts )
}
If run using:
nextflow run -ansi-log false index.nf
You should see the following results:
N E X T F L O W ~ version 21.04.3
Launching `index.nf` [clever_bassi] - revision: d235de22c4
Pulling Singularity image docker://quay.io/biocontainers/salmon:1.2.1--hf69c8f4_0 [cache /path/to/singularity/cache/quay.io-biocontainers-salmon-1.2.1--hf69c8f4_0.img]
[8a/279df4] Submitted process > INDEX

Related

Airflow - failing XCOM push when using Alpine image

I want to run KubernetesPodOperator in Airflow that reads some file and send the content to XCOM.
Definition looks like:
read_file = DefaultKubernetesPodOperator(
image = 'alpine:3.16',
cmds = ['bash', '-cx'],
arguments = ['cat file.json >> /airflow/xcom/return.json'],
name = 'some-name',
task_id = 'some_name',
do_xcom_push = True,
image_pull_policy = 'IfNotPresent',
)
but I am getting: INFO - stderr from command: cat: can't open '/***/xcom/return.json': No such file or directory
When I use ubuntu:22.04 it works, but I want it make faster by using smaller (Alpine) image. Why it is not working with alpine and how to overcome that?

Vert.x high availability is not working

Brief description
I'm just getting started with VertX and I wanted to try the high availability feature with a little toy example. In my setup I have a fatjar application which is deployed to several docker containers. The application programmatically creates an instance of VertX and starts one verticle called ContainerVerticle. This runs an HTTP server and acts as a "launcher" - when a "SPAWN" command is received it deploys another verticle called AppVerticle in high-availability mode. The idea is that I want to run this on 3 containers and then kill the JVM on one of them, that should redeploy the AppVerticle to another docker container.
Actual result: the verticles can talk to each other using the event bus, also the cluster seems to work correctly: according to the log file the members see each other. However, when I kill one verticle, it is not getting redeployed.
More details
(All source code is written in Kotlin)
Vertx initialization:
val hzConfig = Config()
val mgr = HazelcastClusterManager(hzConfig) // empty config -> use default
val hostAddress = getAddress() // get the local ip address (not localhost!)
val options = VertxOptions()
.setClustered(true)
.setClusterHost(hostAddress)
.setClusterPort(18001)
.setClusterManager(mgr)
//.setQuorumSize(2)
.setHAEnabled(true)
val eventBusOptions = EventBusOptions()
eventBusOptions
.setClustered(true)
.setHost(hostAddress)
.setPort(18002)
options.setEventBusOptions(eventBusOptions)
Vertx.clusteredVertx(options) { res ->
if (res.succeeded()) {
val vertx = res.result()
vertx.deployVerticle(ContainerVerticle::class.java.name,
DeploymentOptions()
.setHa(false)) // ContainerVerticle should not restart
}
}
ContainerVerticle (our 'launcher')
class ContainerVerticle : AbstractVerticle() {
...
override fun start(startFuture: Future<Void>?) {
val router = createRouter()
val port = config().getInteger("http.port", 8080)
vertx.eventBus().consumer<Any>("mynamspace.container.spawn") { message ->
val appVerticleID = message.body()
log.info(" - HANDLE SPAWN message \"${appVerticleID}\"")
val appVerticleConfig = JsonObject().put("ID", appVerticleID)
vertx.deployVerticle(AppVerticle::class.java.name, // Deploy the APP!!!
DeploymentOptions()
.setConfig(appVerticleConfig)
.setInstances(1)
.setHa(true))
}
vertx.createHttpServer()... // omitted (see github link)
}
private fun createRouter(): Router { ... } // omitted (see github link)
val handlerRoot = Handler<RoutingContext> { routingContext ->
val cmd = routingContext.bodyAsString
val tokens = cmd.split(" ")
if (tokens[0] == "spawn") {
vertx.eventBus().send("mynamspace.container.spawn", tokens[1]) // round-robin
routingContext.response().end("Successfully handled command ${cmd}\n")
} else if (tokens[0] == "send") {
vertx.eventBus().send("mynamspace.app.${tokens[1]}", tokens[2])
routingContext.response().end("success\n")
} else {
routingContext.response().end("ERROR: Unknown command ${cmd}\n")
}
}
}
The last part: the AppVerticle:
class AppVerticle : AbstractVerticle() {
var timerID = 0L
override fun start(startFuture: Future<Void>?) {
val id = config().getString("ID")
log.info(" SPAWNED app verticle \"${id}\"")
vertx.eventBus().consumer<Any>("mynamspace.app.${id}") { message ->
val cmd = message.body()
log.info(" - app verticle \"${id}\" handled message ${cmd}")
}
timerID = vertx.setPeriodic(1000) {
log.info(" - app verticle \"${id}\" is alive")
}
}
}
Running
Open 3 terminals and run 3 docker instances. Minor details: here we re-map port 8080 to three different ports 8081, 8082, 8083, we also give unique names to the containers: cont1, cont2, cont3
docker run --name "cont1" -it --rm -p 8081:8080 -v $PWD/build/libs:/app anapsix/alpine-java java -jar /app/vertxhaeval-1.0-SNAPSHOT-all.jar
docker run --name "cont2" -it --rm -p 8082:8080 -v $PWD/build/libs:/app anapsix/alpine-java java -jar /app/vertxhaeval-1.0-SNAPSHOT-all.jar
docker run --name "cont2" -it --rm -p 8083:8080 -v $PWD/build/libs:/app anapsix/alpine-java java -jar /app/vertxhaeval-1.0-SNAPSHOT-all.jar
Observation 1
The cluster members seem to see each other because of the following message:
Members [3] {
Member [172.17.0.2]:5701 - 1d50394c-cf11-4bd7-877e-7e06e2959940 this
Member [172.17.0.3]:5701 - 3fa2cff4-ba9e-431b-9c4e-7b1fd8de9437
Member [172.17.0.4]:5701 - b9a3114a-7c15-4992-b609-63c0f22ed388
}
Also we can span the AppContainer:
curl -d "spawn -={Application-1}=-" -XPOST http://localhost:8083
The message bus seems to work correctly, because we see that the spawn message gets delivered to ContainerVerticle in round-robin fashion.
Observation 2- the problem
Now let's try to kill the verticle (assuming it runs in cont2):
docker kill --signal=SIGKILL cont2
The remaining containers seems to react to that event, the log file has something like this:
Aug 14, 2018 8:18:45 AM com.hazelcast.internal.cluster.ClusterService
INFO: [172.17.0.4]:5701 [dev] [3.8.2] Removing Member [172.17.0.2]:5701 - fbe67a02-80a3-4207-aa10-110fc09e0607
Aug 14, 2018 8:18:45 AM com.hazelcast.internal.cluster.ClusterService
INFO: [172.17.0.4]:5701 [dev] [3.8.2]
Members [2] {
Member [172.17.0.3]:5701 - 8b93a822-aa7f-460d-aa3e-568e0d85067c
Member [172.17.0.4]:5701 - b0ecea8e-59f1-440c-82ca-45a086842004 this
}
However the AppVerticle does NOT get redeployed.
The full source code is available on github:
https://github.com/conceptacid/vertx-ha-eval
I spent several hours debugging this, but finally found it.
So here is the solution:
Your verticle start method header is:
override fun start(startFuture: Future<Void>?)
You're overriding start methods which gives you the future that will waited for after the start of the verticle. Vert.x waits forever for the completion of this future since you do not call
startFuture.complete()
at the end of the method.
So the verticle will never be added to the verticle-list of the HAManager and so will not be redeployed.
Alternatively, you can use
override fun start()
as method header if your verticle does a simple, synchronous start-up.
Hope this helps.

How to get Task ID from within ECS container?

Hello I am interested in retrieving the Task ID from within inside a running container which lives inside of a EC2 host machine.
AWS ECS documentation states there is an environment variable ECS_CONTAINER_METADATA_FILE with the location of this data but will only be set/available if ECS_ENABLE_CONTAINER_METADATA variable is set to true upon cluster/EC2 instance creation. I don't see where this can be done in the aws console.
Also, the docs state that this can be done by setting this to true inside the host machine but would require to restart the docker agent.
Is there any other way to do this without having to go inside the EC2 to set this and restart the docker agent?
This doesn't work for newer Amazon ECS container versions anymore, and in fact it's now much simpler and also enabled by default. Please refer to this docu, but here's a TL;DR:
If you're using Amazon ECS container agent version 1.39.0 and higher, you can just do this inside the docker container:
curl -s "$ECS_CONTAINER_METADATA_URI_V4/task" \
| jq -r ".TaskARN" \
| cut -d "/" -f 3
Here's a list of container agent releases, but if you're using :latest – you're definitely fine.
The technique I'd use is to set the environment variable in the container definition.
If you're managing your tasks via Cloudformation, the relevant yaml looks like so:
Taskdef:
Type: AWS::ECS::TaskDefinition
Properties:
...
ContainerDefinitions:
- Name: some-name
...
Environment:
- Name: AWS_DEFAULT_REGION
Value: !Ref AWS::Region
- Name: ECS_ENABLE_CONTAINER_METADATA
Value: 'true'
This technique helps you keep everything straightforward and reproducible.
If you need metadata programmatically and don't have access to the metadata file, you can query the agent's metadata endpoint:
curl http://localhost:51678/v1/metadata
Note that if you're getting this information as a running task, you may not be able to connect to the loopback device, but you can connect to the EC2 instance's own IP address.
We set it with the so called user data, which are executed at the start of the machine. There are multiple ways to set it, for example: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/user-data.html#user-data-console
It could look like this:
#!/bin/bash
cat <<'EOF' >> /etc/ecs/ecs.config
ECS_CLUSTER=ecs-staging
ECS_ENABLE_CONTAINER_METADATA=true
EOF
Important: Adjust the ECS_CLUSTER above to match your cluster name, otherwise the instance will not connect to that cluster.
Previous answers are correct, here is another way of doing this:
From the ec2 instance where container is running, run this command
curl http://localhost:51678/v1/tasks | python -mjson.tool |less
From the AWS ECS cli Documentation
Command:
aws ecs list-tasks --cluster default
Output:
{
"taskArns": [
"arn:aws:ecs:us-east-1:<aws_account_id>:task/0cc43cdb-3bee-4407-9c26-c0e6ea5bee84",
"arn:aws:ecs:us-east-1:<aws_account_id>:task/6b809ef6-c67e-4467-921f-ee261c15a0a1"
]
}
To list the tasks on a particular container instance
This example command lists the tasks of a specified container instance, using the container instance UUID as a filter.
Command:
aws ecs list-tasks --cluster default --container-instance f6bbb147-5370-4ace-8c73-c7181ded911f
Output:
{
"taskArns": [
"arn:aws:ecs:us-east-1:<aws_account_id>:task/0cc43cdb-3bee-4407-9c26-c0e6ea5bee84"
]
}
My ECS solution as bash and Python snippets. Logging calls are able to print for debug by piping to sys.stderr while print() is used to pass the value back to a shell script
#!/bin/bash
TASK_ID=$(python3.8 get_ecs_task_id.py)
echo "TASK_ID: ${TASK_ID}"
Python script - get_ecs_task_id.py
import json
import logging
import os
import sys
import requests
# logging configuration
# file_handler = logging.FileHandler(filename='tmp.log')
# redirecting to stderr so I can pass back extracted task id in STDOUT
stdout_handler = logging.StreamHandler(stream=sys.stderr)
# handlers = [file_handler, stdout_handler]
handlers = [stdout_handler]
logging.basicConfig(
level=logging.INFO,
format="[%(asctime)s] {%(filename)s:%(lineno)d} %(levelname)s - %(message)s",
handlers=handlers,
datefmt="%Y-%m-%d %H:%M:%S",
)
logger = logging.getLogger(__name__)
def get_ecs_task_id(host):
path = "/task"
url = host + path
headers = {"Content-Type": "application/json"}
r = requests.get(url, headers=headers)
logger.debug(f"r: {r}")
d_r = json.loads(r.text)
logger.debug(d_r)
ecs_task_arn = d_r["TaskARN"]
ecs_task_id = ecs_task_arn.split("/")[2]
return ecs_task_id
def main():
logger.debug("Extracting task ID from $ECS_CONTAINER_METADATA_URI_V4")
logger.debug("Inside get_ecs_task_id.py, redirecting logs to stderr")
logger.debug("so that I can pass the task id back in STDOUT")
host = os.environ["ECS_CONTAINER_METADATA_URI_V4"]
ecs_task_id = get_ecs_task_id(host)
# This print statement passes the string back to the bash wrapper, don't remove
logger.debug(ecs_task_id)
print(ecs_task_id)
if __name__ == "__main__":
main()

nix-build: bash permission denied

I'm trying to learn to write Nix expressions, and I thought about doing my own very simple "Hello World!" (as is tradition).
So I have my dir with only this default.nix file :
{pkgs ? import <nixpkgs> {}}:
derivation {
system = "x86_64-linux";
name = "simple-bash-derivation-helloworld";
builder = pkgs.bash;
args = [ "-c" "echo 'Hello World' > $out" ];
}
Here is what I get when I try to build it:
nix-build
these derivations will be built:
/nix/store/3grmahx3ih4c50asj84p7xnpqpj32n5s-simple-bash-derivation-helloworld.drv
building path(s) ‘/nix/store/6psl3rc92311w37c1n6nj0a6jac16hv1-simple-bash-derivation-helloworld’
while setting up the build environment: executing ‘/nix/store/wb34dgkpmnssjkq7yj4qbjqxpnapq0lw-bash-4.4-p12’: Permission denied
builder for ‘/nix/store/3grmahx3ih4c50asj84p7xnpqpj32n5s-simple-bash-derivation-helloworld.drv’ failed with exit code 1
error: build of ‘/nix/store/3grmahx3ih4c50asj84p7xnpqpj32n5s-simple-bash-derivation-helloworld.drv’ failed
Removing the args line yields the same issue.
Why do I get a permission issue?
What would be the correct way to make a simple derivation just doing a bash echo?
Please note that this is a learning exercise: I do not want to use stdenv.mkDerivation here for example.
I am running nix-env (Nix) 1.11.9 on an Ubuntu 16.04 system.
Thanks in advance.
Try running ls command on /nix/store/wb34dgkpmnssjkq7yj4qbjqxpnapq0lw-bash-4.4-p12 and you will see it's a directory rather than an executable file (pointing to $out of the pkgs.bash derivation). If you wanted to refer to bash binary you would use:
builder = "${pkgs.bash}/bin/bash";

How to edit "Version: xxx" from a script to automate a debian package build?

The Debian control file has a line like this (among many others):
Version: 1.1.0
We are using jenkins to build our application as a .deb package. in Jenkins we are doing something like this:
cp -r $WORKSPACE/p1.1/ourap/scripts/ourapp_debian $TARGET/
cd $TARGET
fakeroot dpkg-deb --build ourapp_debian
We would like to do shomething like this in our control file:
Packages: ourapp
Version: 1.1.$BUILD_NUMBER
but obviously this is not possible.
So we need something like a sed script to find the line starting with Version: and replace anything after it with a constant plus the BUILD_NUMBER env var which Jenkins creates.
We have tried things like this:
$ sed -i 's/xxx/$BUILD_NUMBER/g' control
then put "Version: xxx" in our file, but this doesn't work, and there must be a better way?
Any ideas?
We don't use the change-log, as this package will be installed on servers which no one has access to. the change logs are word docs given to the customer.
We don't use or need any of the Debian helper tools.
Create two files:
f.awk
function vp(s) { # return 1 for a string with version info
return s ~ /[ \t]*Version:/
}
function upd() { # an example of version number update function
v[3] = ENVIRON["BUILD_NUMBER"]
}
vp($0) {
gsub("[^.0-9]", "") # get rid of everything but `.' and digits
split($0, v, "[.]") # split version info into array `v' elements
upd()
printf "Version: %s.%s.%s\n", v[1], v[2], v[3]
next # done with this line
}
{ # print the rest without modifications
print
}
f.example
rest1
Version: 1.1.0
rest2
Run the command
BUILD_NUMBER=42 awk -f f.awk f.example
Expected output is
rest1
Version: 1.1.42
rest2
With single quote:
sed -ri "s/(Version.*\.)[0-9]*/\1$BUILD_NUMBER/g" <control file>
OR
sed -ni "/Version/{s/[0-9]*$/$BUILD_NUMBER/};p" <control file>

Resources