Rule in snakemake using singularity: unterminated quoted string - docker

I'm running a snakemake pipeline that for a specific rule loads a container:
rule counts:
params:
transcriptome=os.environ["INDEX"],
outdir= (os.environ["OUTDIR"] + "/counts/"),
indir= (os.environ["INDIR"] + "{sample}"),
name = lambda wildcards: SAMPLES[wildcards.sample]
output:
(os.environ["OUTDIR"] + "counts/" + "{sample}" + "/outs/web_summary.html")
container:
"docker://marcusczi/cellranger_clean"
shell:
"""
cellranger count --id={wildcards.sample} --transcriptome={params.transcriptome} --fastqs={params.indir} --sample={params.name}
mkdir -p {params.outdir}
mv ./{wildcards.sample}/ {params.outdir}
"""
Dry run looks fine, the rule itself I'm sure it works (tried it without the container). However, when I run it with docker I get this error:
Activating singularity image /some/path/.snakemake/singularity/c288fbc3fef5771f055a688c6678c24d.simg
/bin/sh: syntax error: unterminated quoted string
[ 1.228141] reboot: Power down
And then it waits for the missing files, and fails.
I think the answer to this situation might be related to this previous question, but I have tried everything i can think of in terms of escaping characters (except for the wildcards and variables within curly brackets because I'm guessing it should be fine, and if not why am i even using snakemake :-( ). The paths for the directories I'm using are valid and exist, the name and wildcard "sample" are in the shape "sample_123", nothing fancy.
It's also worth saying that there are no single or double quotes in any of these variables.
Thank you!!
Software and OS:
I am in macos catalina 10.15.5, running snakemake 5.20.1, and I have been using the beta version of singularity for macos (3.3.0-rc.1.658.g7427b73f1.dirty).
Running singularity outside Snakemake:
I tried running the singularity outside snakemake, the software that I'm trying to run starts, but then complains that there is no disk left on space (which is not true). I'm running the singularity as sudo singularity run -B "$(pwd):$(pwd)" docker://marcusczi/cellranger_clean
I think this latest error might be either 1) I'm not running singularity as I should..? Or 2) A false statement of what is happening since cellranger (the software I'm trying to run) often has misleading error messages.
Minimal reproducible example:
If you install snakemake, you should be able to reproduce my error when running snakemake -j1 --use-singularity in the same directory of the Snakefile.
Snakefile:
rule all:
input:
"output.txt"
rule counts:
output:
"output.txt"
container:
"docker://marcusczi/cellranger_clean"
shell:
"""
cellranger count --help
echo "hurray!" > {output}
"""

Related

xonsh "which" equivalent - how to test if a (subprocess mode) command is available?

I would like to test (from xonsh) if a command is available or not. If I try this from the xonsh command prompt:
which bash
Then it works:
user#server ~ $ which bash
/usr/bin/bash
But it does not work from xonsh script:
#!/usr/bin/env xonsh
$RAISE_SUBPROC_ERROR = True
try:
which bash
print("bash is available")
except:
print("bash is not available")
Because it results in this error:
NameError: name 'which' is not defined
I understand that which is a shell builtin. E.g. it is not an executable file. But it is available at the xnosh command prompt. Then why it is not available inside an xonsh script? The ultimate question is this: how can I test (from an xonsh script) if a (subprocess mode) command is available or not?
import shutil
print(shutil.which('bash'))
While nagylzs' answer led me to the right solution, I found it inadequate.
shutil.which defaults to os.environ['PATH']. On my machine, the default os.environ['PATH'] doesn't contain the active PATH recognized by xonsh.
~ $ os.environ['PATH']
'/usr/bin:/bin:/usr/sbin:/sbin'
I found I needed to pass $PATH to reliably resolve 'which' in the xonsh environment.
~ $ $PATH[:2]
['/opt/google-cloud-sdk/bin', '/Users/jaraco/.local/bin']
~ $ import shutil
~ $ shutil.which('brew', path=os.pathsep.join($PATH))
'/opt/homebrew/bin/brew'
The latest version of xonsh includes a built-in which command. Unfortunately, the version included will emit an error on stdout if the target isn't found, a behavior that is not great for non-interactive use.
As mentioned in another answer, which exists in the current version of xonsh (0.13.4 as of 15/12/2022) so your script would work. However, it outputs its own error message so it's necessary to redirect stderr to get rid of it.
Also, unless you redirect its stdout as well (using all>), it migh be a good idea to capture its output so the final version would look like this:
#!/usr/bin/env xonsh
$RAISE_SUBPROC_ERROR = True
try:
bash = $(which bash err> /dev/null)
print(f"bash is available: {bash}")
except:
print("bash is not available")

How to use Snakemake container for htslib (bgzip + tabix)

I have a pipeline which uses a global singularity image and rule-based conda wrappers.
However, some of the tools don't have wrappers (i.e. htslib's bgzip and tabix).
Now I need to learn how to run jobs in containers.
In the official documentation link it says:
"Allowed image urls entail everything supported by singularity (e.g., shub:// and docker://)."
Now I've tried the following image from singularity hub but I get an error:
minimal reproducible example:
config.yaml
# Files
REF_GENOME: "c_elegans.PRJNA13758.WS265.genomic.fa"
GENOME_ANNOTATION: "c_elegans.PRJNA13758.WS265.annotations.gff3"
Snakefile
# Directories------------------------------------------------------------------
configfile: "config.yaml"
# Setting the names of all directories
dir_list = ["REF_DIR", "LOG_DIR", "BENCHMARK_DIR", "QC_DIR", "TRIM_DIR", "ALIGN_DIR", "MARKDUP_DIR", "CALLING_DIR", "ANNOT_DIR"]
dir_names = ["refs", "logs", "benchmarks", "qc", "trimming", "alignment", "mark_duplicates", "variant_calling", "annotation"]
dirs_dict = dict(zip(dir_list, dir_names))
GENOME_INDEX=config["REF_GENOME"]+".fai"
VEP_ANNOT=config["GENOME_ANNOTATION"]+".gz"
VEP_ANNOT_INDEX=config["GENOME_ANNOTATION"]+".gz.tbi"
# Singularity with conda wrappers
singularity: "docker://continuumio/miniconda3:4.5.11"
# Rules -----------------------------------------------------------------------
rule all:
input:
expand('{REF_DIR}/{GENOME_ANNOTATION}{ext}', REF_DIR=dirs_dict["REF_DIR"], GENOME_ANNOTATION=config["GENOME_ANNOTATION"], ext=['', '.gz', '.gz.tbi']),
expand('{REF_DIR}/{REF_GENOME}{ext}', REF_DIR=dirs_dict["REF_DIR"], REF_GENOME=config["REF_GENOME"], ext=['','.fai']),
rule download_references:
params:
ref_genome=config["REF_GENOME"],
genome_annotation=config["GENOME_ANNOTATION"],
ref_dir=dirs_dict["REF_DIR"]
output:
os.path.join(dirs_dict["REF_DIR"],config["REF_GENOME"]),
os.path.join(dirs_dict["REF_DIR"],config["GENOME_ANNOTATION"]),
os.path.join(dirs_dict["REF_DIR"],VEP_ANNOT),
os.path.join(dirs_dict["REF_DIR"],VEP_ANNOT_INDEX)
resources:
mem=80000,
time=45
log:
os.path.join(dirs_dict["LOG_DIR"],"references","download.log")
singularity:
"shub://biocontainers/tabix"
shell: """
cd {params.ref_dir}
wget ftp://ftp.wormbase.org/pub/wormbase/releases/WS265/species/c_elegans/PRJNA13758/c_elegans.PRJNA13758.WS265.genomic.fa.gz
bgzip -d {params.ref_genome}.gz
wget ftp://ftp.wormbase.org/pub/wormbase/releases/WS265/species/c_elegans/PRJNA13758/c_elegans.PRJNA13758.WS265.annotations.gff3.gz
bgzip -d {params.genome_annotation}.gz
grep -v "#" {params.genome_annotation} | sort -k1,1 -k4,4n -k5,5n -t$'\t' | bgzip -c > {params.genome_annotation}.gz
tabix -p gff {params.genome_annotation}.gz
"""
rule index_reference:
input:
os.path.join(dirs_dict["REF_DIR"],config["REF_GENOME"])
output:
os.path.join(dirs_dict["REF_DIR"],GENOME_INDEX)
resources:
mem=2000,
time=30,
log:
os.path.join(dirs_dict["LOG_DIR"],"references", "faidx_index.log")
wrapper:
"0.64.0/bio/samtools/faidx"
Error
Building DAG of jobs...
Pulling singularity image shub://biocontainers/tabix.
WorkflowError:
Failed to pull singularity image from shub://biocontainers/tabix:
ESC[31mFATAL: ESC[0m While pulling shub image: failed to get manifest for: shub://biocontainers/tabix: the requested manifest was not found in singularity hub
File "/home/moldach/anaconda3/envs/snakemake/lib/python3.7/site-packages/snakemake/deployment/singularity.py", line 88, in pull
~
It appears this is a problem with the container?
(snakemake) [moldach#arc CONTAINER_TROUBLESHOOT]$ singularity pull shub://biocontainers/tabix
FATAL: While pulling shub image: failed to get manifest for: shub://biocontainers/tabix: the requested manifest was not found in singularity hub
In fact, I experience this problem with other biocontainers containers.
For example, I also need to use a container to do bowtie2 indexing and this is the error I get from the biocontainers/bowtie2 versus another developers container of the same tool comics/bowtie2:
^C(snakemake) [moldach#arc CONTAINER_TROUBLESHOOT]$ singularity pull docker://biocontainers/bowtie2
FATAL: While making image from oci registry: failed to get checksum for docker://biocontainers/bowtie2: Error reading manifest latest in docker.io/biocontainers/bowtie2: manifest unknown: manifest unknown
(snakemake) [moldach#arc CONTAINER_TROUBLESHOOT]$ singularity pull docker://comics/bowtie2
INFO: Converting OCI blobs to SIF format
INFO: Starting build...
Getting image source signatures
Copying blob a02a4930cb5d done
Does anyone know why?
Biocontainers does not allow latest as tag for their containers, and therefore you will need to specify the tag to be used.
From their doc:
The BioContainers community had decided to remove the latest tag. Then, the following command docker pull biocontainers/crux will fail. Read more about this decision in Getting started with Docker
When no tag is specified, it defaults to latest tag, which of course is not allowed here. See here for bowtie2's tags. Usage like this will work:
singularity pull docker://biocontainers/bowtie2:v2.4.1_cv1
Using another container solves the issue; however, the fact I'm getting errors from biocontainers is troubling given that these are both very common and used as examples in the literature so I will award the top-answer to whomever can solve that specific issue.
As it were, the use of stackleader/bgzip-utility solve the issue of actually running this rule in a container.
container:
"docker://stackleader/bgzip-utility"
Once again, for those coming to this post, it's probably best to test any container first before running snakemake, e.g. singularity pull docker://stackleader/bgzip-utility.

How to set environment variable in Mac 10.14.6 Mojave with 'Application Support' in pathfile?

I'm having trouble setting an environment variable that has a pathfile containing a space ' ' character.
Before you ask, I've already tried enclosing the whole pathfile within double quotes, single quotes, no quotes but escaping with backspace.
Could it be the something to do with the encoding? The variable would be:
export A_MEDIA="/Users/polo/Library/Application Support/Anki2/me/collection.media"
once I source ~/.bash_profile, I try cd $A_MEDIA (with or without quoting the name of the variable). The response is:
-bash: cd: /Users/polo/Library/Application: No such file or directory
It's as if bash didn't know how to interpret that space between 'Application' and 'Support'. It thinks the path goes from a folder named Application to a folder named Support. It just doesn't see them as a single folder name. Any help? Please?
Works for me so I strongly suspect what you showed us in your problem statement is not what you're actually doing. I normally use fish so this shows me setting the env var before starting bash to show that it correctly inherits the var and also setting and using it inside bash:
12:21 macbook opencv3 ~ > set -x A_MEDIA $HOME/Library/Application\ Support/Dock/
12:22 macbook opencv3 ~ > bash
running .bashrc
bash-5.0$ cd $A_MEDIA
bash: cd: too many arguments
bash-5.0$ cd "$A_MEDIA"
bash-5.0$ pwd
/Users/krader/Library/Application Support/Dock
bash-5.0$ export B_MEDIA="$HOME/Library/Application Support/Gitter"
bash-5.0$ cd $B_MEDIA
bash: cd: too many arguments
bash-5.0$ cd "$B_MEDIA"
bash-5.0$ pwd
/Users/krader/Library/Application Support/Gitter
bash-5.0$ exit
Note that in a POSIX shell like bash you should almost always use double-quotes around a var expansion so that if it contains whitespace the expanded value is not split on that whitespace.

snakemake: MissingOutputException within docker

I am trying to run a pipeline within a docker using snakemake. I am having problem using the sortmerna tool to produce {sample}_merged_sorted_mRNA and {sample}_merged_sorted output from control_merged.fq and treated_merged.fq input files.
Here my Snakefile:
SAMPLES = ["control","treated"]
for smp in SAMPLES:
print("Sample " + smp + " will be processed")
rule final:
input:
expand('/output/{sample}_merged.fq', sample=SAMPLES),
expand('/output/{sample}_merged_sorted', sample=SAMPLES),
expand('/output/{sample}_merged_sorted_mRNA', sample=SAMPLES),
rule sortmerna:
input: '/output/{sample}_merged.fq',
output: merged_file='/output/{sample}_merged_sorted_mRNA', merged_sorted='/output/{sample}_merged_sorted',
message: """---SORTING---"""
shell:
'''
sortmerna --ref /usr/share/sortmerna/rRNA_databases/silva-bac-23s-id98.fasta,/ usr/share/sortmerna/rRNA_databases/index/silva-bac-23s-id98: --reads {input} --paired_in -a 16 --log --fastx --aligned {output.merged_file} --other {output.merged_sorted} -v
'''
When runnig this I get:
Waiting at most 5 seconds for missing files.
MissingOutputException in line 57 of /input/Snakefile:
Missing files after 5 seconds:
/output/control_merged_sorted_mRNA
/output/control_merged_sorted
This might be due to filesystem latency. If that is the case, consider to increase the wait $ime with --latency-wait.
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /input/.snakemake/log/2018-11-05T091643.911334.snakemake.log
I tried to increase the latency with --latency-wait but I get the same result. Funny thing is that two output files control_merged_sorted_mRNA.fq and control_merged_sorted.fq are produced but the program fails and exits. The version of snakemake is 5.3.0. Any help?
snakemake fails because the outputs described by the rule sortmerna are not produced. This is not a latency problem, it is a problem with your outputs.
Your rule sortmerna expects as output:
/output/control_merged_sorted_mRNA
and
/output/control_merged_sorted
but the program you are using (I know nothing about sortmerna) is apparently producing
/output/control_merged_sorted_mRNA.fq
and
/output/control_merged_sorted.fq
Make sure that when you specify the options --aligned and --other on the command line of your program, it should be the real names of the files produced or if it is only the basename and the program will add a suffix .fq. If you are in the latter case, I suggest you use:
rule final:
input:
expand('/output/{sample}_merged.fq', sample=SAMPLES),
expand('/output/{sample}_merged_sorted', sample=SAMPLES),
expand('/output/{sample}_merged_sorted_mRNA', sample=SAMPLES),
rule sortmerna:
input:
'/output/{sample}_merged.fq',
output:
merged_file='/output/{sample}_merged_sorted_mRNA.fq',
merged_sorted='/output/{sample}_merged_sorted.fq'
params:
merged_file_basename='/output/{sample}_merged_sorted_mRNA',
merged_sorted_basename='/output/{sample}_merged_sorted'
message: """---SORTING---"""
shell:
"""
sortmerna --ref /usr/share/sortmerna/rRNA_databases/silva-bac-23s-id98.fasta,/usr/share/sortmerna/rRNA_databases/index/silva-bac-23s-id98: --reads {input} --paired_in -a 16 --log --fastx --aligned {params.merged_file_basename} --other {params.merged_sorted_basename} -v
"""

How to escape Jenkins parameterized build variables

I use Jenkins ver. 1.522 and I want to pass a long string with spaces and quotes as a parameter in the parameterized build section. The job only runs a python script.
My problem is that I can't find a way to escape my string so that jenkins passes it correctly to the script.
Assuming...
string: fixVersion in ("foo") AND issuetype in (Bug, Improvement) AND resolution = Fixed ORDER BY resolution ASC, assignee ASC, key DESC
variable name: bar
script name: coco.py
When I run the script in the terminal, everything is fine: python coco.py --option 'fixVersion in ("foo") AND issuetype in (Bug, Improvement) AND resolution = Fixed ORDER BY resolution ASC, assignee ASC, key DESC'
When I run the same script with jenkins using the parametrized build and try to escape the variable so it end up taken as one parameter by the py script it is oddly espacped by jenkins.
In my jenkins job I call the script: python coco.py --option \'${BAR}\'
and it ends up as:
python coco.py --option '"fixVersion' in '('\''foo'\'')' AND issuetype in '(Bug,' 'Improvement)' in '(Production,' 'Stage)' AND resolution = Fixed ORDER BY resolution ASC, assignee ASC, key 'DESC"'
I also tried \"${BAR}\", \"$BAR\",\'$BAR\'
What it the right way do acheive it?
Try
python coco.py --option "${BAR}"
Alternatively, if you need the single quotes surrounding everything
python coco.py --option \'"${BAR}"\'
In the cases you listed, bash will treat the spaces as delimiters. Putting the double quotes around a variable will preserve the whitespace in a string. Example
aString='foo bar'
for x in $aString; do echo $x; done
# foo
# bar
for x in "$aString"; do echo $x; done
# foo bar
I am using Jenkins v1.606 and ran into this same issue!
The issue that I saw passing user defined string params containing spaces into an execution shell would not properly format the string (only with a parameter that had 1 or more spaces). What you have to watch out for is reviewing the 'output' log. Jenkins will not properly display the string param value within the log.
Example (correct format for containing spaces):
docker exec -i container-base /bin/bash -c "cd /container/path/to/code/ && ./gradlew test_xml -P DISPLAY_NAME='${DISPLAY_NAME}' -P USERNAME='${USERNAME}' -P SERVER_NAME='${SERVER_NAME}'"
Jenkins Output of string (notice the string values format):
+ docker exec -i container-base /bin/bash -c 'cd /container/path/to/code/ && ./gradlew test_xml -P DISPLAY_NAME='\''VM10 USER D33PZ3R0'\'' -P USERNAME='\''d33pz3r0#stackoverflow.com'\'' -P SERVER_NAME='\''stackoverflow.com'\'''
Conclusion:
In my example, the literal command was encapsulated with <">, followed by surrounding the parameters with <'> to escape the literal cmd string and control the Jenkins string syntax. Remember not to just watch your Jenkins output log as it lead me wrong for an entire day while I fought with this! This should be the same for your issue as well, you do not need to escape with \' or other escape characters. Hope this helps!!

Resources