I am trying to align my samples to a reference genome using bwa mem.
I have over 300 samples which I created an index from a metadata file
but something isn't really working!
The loop i'm using is this (in SLURM)
#SBATCH --export=ALL # export all environment variables to the batch job
#SBATCH -D . # set working directory to .
(...)
# Commands
module load BWA/0.7.17-foss-2018a
module load SAMtools/1.3.1-foss-2018a
module load BCFtools/1.6-intel-2017b
reference=/gpfs/ts0/home/jn378/mussels/snp/genomes/gallo_v6.snail.svg
input_reads=/gpfs/ts0/home/jn378/mussels/snp/3.fastp
align=/gpfs/ts0/home/jn378/mussels/snp/5.bam-files
metadata=/gpfs/ts0/home/jn378/mussels/snp/SNP-array-metadata.txt
metadata=/gpfs/ts0/home/jn378/mussels/snp/SNP-array-metadata.txt
read1=( `cat $metadata | cut -f 4` )
read1_array=$input_reads/${read1[(($SLURM_ARRAY_TASKID))]}
read2=( `cat $metadata | cut -f 5` )
read2_array=$input_reads/${read2[(($SLURM_ARRAY_TASKID))]}
outbam=( `cat $metadata | cut -f 1` )
out=${outbam[(($SLURM_ARRAY_TASKID))]}
echo "reference" $reference
echo "read1" $read1_array
echo "read2" $read2_array
echo "alignment" $align/${out}_unsorted.raw.sam
#### Align with bwa mem ###
####bwa mem -t 4 $reference $read1_array $read2_array > ${align}/${out}_unsorted.raw.sam
but I keep getting this error:
bwa.sh: line 34: (()): syntax error: operand expected (error token is "))")
Could someone help me with this issue?
Many thanks!
The error message you get
bwa.sh: line 34: (()): syntax error: operand expected (error token is "))")
is because the variable you want to use is named SLURM_ARRAY_TASK_ID and not SLURM_ARRAY_TASKID. The latter is not set and the expression expands to
read1_array=$input_reads/${read1[(())]}
which Bash cannot parse.
So replace SLURM_ARRAY_TASKID with SLURM_ARRAY_TASK_ID and it should be ok.
Also note that the double parentheses are not needed, and it is always a good idea to double quote variables of paths in case some contain special chars, so you can write
read1_array="$input_reads/${read1[$SLURM_ARRAY_TASK_ID]}"
Related
The grep manual at the exit status section report:
EXIT STATUS
The exit status is 0 if selected lines are found, and 1 if not
found. If an error occurred the exit status is 2. (Note: POSIX
error handling code should check for '2' or greater.)
But the command:
echo ".
..
test.zip"|grep -vE '^[.]'
echo $?
echo "test.zip
test.txt"|grep -vE '^[.]'
echo $?
The value returned is always 0. I would have expected 1 and 0. What am I doing wrong?
Remember that grep is line based. If any line matches, you got a match. (In your first case test.zip matches (more precisely: you used with -v therefore you have asked for lines that do not match your pattern, and test.zip does exactly that, i.e. does not match your pattern. As a result your grep call was successful). Compare
$ grep -vE '^[.]' <<<$'.\na'; echo $?
a
0
with
$ grep -vE '^[.]' <<<$'.\n.'; echo $?
1
Note how the first command outputs the line a, that is it has found a match, which is why the exit status is 0. Compare that with the second example, where no line was matched.
References
<<< is a here string:
Here Strings
A variant of here documents, the format is:
[n]<<<word
The word undergoes brace expansion, tilde expansion, parameter and
variable expansion, command substitution, arithmetic expansion, and
quote removal. Pathname expansion and word splitting are not per-
formed. The result is supplied as a single string, with a newline
appended, to the command on its standard input (or file descriptor n if
n is specified).
$ cat <<<'hello world'
hello world
$'1\na' is used to get a multi line input (\n is replaced by newline within $'string', for more see man bash).
$ echo $'1\na'
1
a
What is wrong with this Makefile?
I want to compile some lua files to check if there are any unexpected globals defined. I'm doing this by grepping the output of luac -l and then ignoring known globals.
So for a given lua file everything is OK if grep doesn't find anything, having ignored known lua globals.
As grep's return status code is 0 if it does find something and 1 if it doesn't I want to force an error if the status code from the grep is 0 and allow everything to continue if it isn't.
The Makefile is like this
IGNORE_GLOBALS = "dofile\|string\|tostring\|tonumber\|math\|io\|type\|os\|table\|pairs\|next\|require"
all: $(patsubst src/common/%.lua, %.lua, $(wildcard src/common/*.lua))
%.lua:
#echo check $#
#luac -l src/common/$# | grep '.ETGLOBAL' | grep -v $(IGNORE_GLOBALS) && $(error Unexpected globals in $#) || echo "No unexpected globals in $#"
But when I run it immediately quits on the first file, which happens to have no unexpected globals with
Makefile:10: *** Unexpected globals in chat-cmd.lua. Stop.
line 10 is surprisingly the line before, i.e.
#echo check $#
Interestingly if I replace $(error ...) with echo ..., as in
#luac -l src/common/$# | grep '.ETGLOBAL' | grep -v $(IGNORE_GLOBALS) && echo "Unexpected globals in $#" || echo "No unexpected globals in $#"
it behaves as intended.
As #siffiejoe says in the comment. $(error) is make function and is run when the recipe as a whole is being evaluated (you can think of it like hoisting if that helps).
So as soon as the recipe needs to be run (and the first line executed) the $(error) call is evaluated.
Note: In the shell X && Y || Z is not a ternary operation. Z will be run if X succeeds and Y fails as well as when X fails. This doesn't matter here as echo cannot really fail but in general is worth paying attention to.
You want to use something more like #! lua ... | grep -v $(IGNORE_GLOBALS) || { echo 'Unexpected globals in $#'; exit 1; } there. This doesn't spit out the "everything's ok" message but removes the X && Y || Z ternary issue.
If you wanted to keep that message the simplest thing to do would be to move to an actual if statement.
I've got a small script called "onewhich". Its purpose is to behave like which, except that it will only give the FIRST occurrence of any executables specified as options, as found in the order they'd appear in the path.
So for example, if my path is /opt/bin:/usr/bin:/bin, and I have both /opt/bin/runme and /usr/bin/runme, then the command onewhich runme would return /opt/bin/runme.
But if I also have a /usr/bin/doit, then the command onewhich doit runme would return /usr/bin/doit instead.
The idea is to walk through the path, check for each executable specified, and if it exists, show it and exit.
Here's the script so far.
#!/bin/sh
for what in "$#"; do
for loc in `echo "${PATH}" | awk -vRS=: 1`; do
if [ -f "${loc}/${what}" ]; then
echo "${loc}/${what}"
exit 0
fi
done
done
exit 1
The problem is, I want to be better about PATH directories with special characters. Every second shell question here on StackOverflow talks about how bad it is to parse paths with tools like awk and sed. There's even a bash faq entry about it. (Proviso: I'm not using bash for this, but the recommendation is still valid.)
So I tried rewriting the script to separate paths in a pipe, like this"
#!/bin/sh
for what in "$#"; do
echo "${PATH}" | awk -vRS=: 1 | while read loc ; do
if [ -f "${loc}/${what}" ]; then
echo "${loc}/${what}"
exit 0
fi
done
done
exit 1
I'm not sure if this gives me any real advantage (since $loc is still inside quotes), but it also doesn't work because for some reason, the exit 0 seems to be ignored. Or ... it exits something (the sub-shell with the while loop that terminates the pipe, maybe), but the script exits with a value of 1 every time.
What's a better way to step through directories in ${PATH} without the risk that special characters will confuse things?
Alternately, am I reinventing the wheel? Is there maybe a way to do this that's built in to existing shell tools?
This needs to run in both Linux and FreeBSD, which is why I'm writing it in Bourne instead of bash.
Thanks.
This doesn't directly answer your question, but does eliminate the need to parse PATH at all:
onewhich () {
for what in "$#"; do
which "$what" 2>/dev/null && break
done
}
This just calls which on each command on the input list until it finds a match.
To parse PATH, you can simply set `IFS=':'.
if [ "${IFS:-x}" = "${IFS-x}" ]; then
# Only preserve the value of IFS if it is currently set
OLDIFS=$IFS
fi
IFS=":"
for f in $PATH; do # Do not quote $PATH, to allow word splitting
echo $f
done
if [ "${OLDIFS:-x}" = "${OLDIFS-x}" ]; then
IFS=$OLDIFS
fi
The above will fail if any of the directories in PATH actually contain colons.
Your first method looks to me as if it should work. In practical terms, if it's really the $PATH you'll be searching, it's unlikely you'll have spaces and newlines embedded in directories there. If you do, it's probably time to refactor.
But still, I don't think you're at risk from the possibility of bad names clobbering your loop, since you're wrapping variables in quotes. At worst, I suspect you might miss the odd valid executable, but I can't see how the script would generate errors. (I don't see how the script would miss valid executables, and I haven't tested - I'm just saying I don't see problems at first glance.)
As for your second question, about the loop, I think you've hit the nail on the head. When you run a pipe like this | that | while condition; do things; done, the while loop runs in its own shell at the end of the pipe. Exiting that shell may terminate the actions of the pipe, but that only brings you back to the parent shell, which has its own thread of execution that terminates with exit 1.
As for a better way to do this, I would consider which.
#!/bin/sh
for what in "$#"; do
which "$what"
done | head -1
And if you really want the exit values as well:
#!/bin/sh
for what in "$#"; do
which "$what" && exit 0
done
exit 1
The second might even be fewer resources, as it doesn't have to open a file handle and pipe through head.
You can also split your path using IFS. For example, if you wanted to wrap your loops the other way around, you could do this:
#!/bin/sh
IFS=":"
for loc in $PATH; do
for what in "$#"; do
if [ -x "$loc"/"$what" ]; then
echo "$loc"/"$what"
exit 0
fi
done
done
exit 1
Note that under normal circumstances, you might want to save the old value of $IFS, but you seem to be doing things in a stand-alone script, so the "new" value gets thrown out when the script exits.
All the above code is untested. YMMV.
Another way to get around the need to parse PATH at all is to run the builtin type command in new shell with a stripped environment (i. e. there simply are no functions or aliases to look up; cf. env -i sh -c 'type cmd 2>/dev/null).
# using `cmd` instead of $(cmd) for portability
onewhich() {
ec=0 # exit code
for cmd in "$#"; do
command -p env -i PATH="$PATH" sh -c '
export LC_ALL=C LANG=C
cmd="$1"
path="`type "$cmd" 2>/dev/null`"
if [ X"$path" = "X" ]; then
printf "%s\n" "error: command \"${cmd}\" not found in PATH" 1>&2
exit 1
else
case "$path" in
*\ /*)
path="/${path#*/}"
printf "%s\n" "$path";;
*)
printf "%s\n" "error: no disk file: $path" 1>&2
exit 1;;
esac
exit 0
fi
' _ "$cmd"
[ $? != 0 ] && ec=1
done
[ $ec != 0 ] && return 1
}
onewhich awk ls sed
onewhich builtin
onewhich if
Since which on success returns two full command paths if two commands are specified as arguments, exit 0 in the first onewhich script above aborts the program prematurely. In addition, if two commands are specified as arguments to which, the exit code of which is set to 1 even if only one command lookup failed (cf. which awk sedxyz ls; echo $?). To mimic this behaviour of the which command it is necessary to toggle on/off two variables (cnt and nomatches below).
onewhich() (
IFS=":"
nomatches=0
for cmd in "$#"; do
cnt=0
for loc in $PATH ; do
if [ $cnt = 0 ] && [ -x "$loc"/"$cmd" ]; then
echo "$loc"/"$cmd"
cnt=1
fi
done
[ $cnt = 0 ] && nomatches=1
done
[ $nomatches = 1 ] && exit 1 || exit 0 # exit 1: at least one cmd was not in PATH
)
onewhich awk ls sed
onewhich awk lsxyz sed
onewhich builtin
onewhich if
Is it possible to make javac output only the error locations and the error messages, and hide the source code dump?
Now I get:
$ javac t.java
t.java:1: <identifier> expected
class {
^
t.java:2: reached end of file while parsing
bar
^
t.java:4: reached end of file while parsing
^
3 errors
I want to get only:
$ javac ... t.java
t.java:1: <identifier> expected
t.java:2: reached end of file while parsing
t.java:4: reached end of file while parsing
I think there is no flag you could pass to javac, but you can simply filter the output through any program which removes the superfluous lines. Here an example with grep:
javac t.java 2>&1 | egrep '^[a-zA-Z0-9_/]+\.java:[0-9]+: '
You might have to change the part matching the file name if you have strange letters in your file name - this seems to work for the ASCII subset.
Getting permission denied error while executing shell command from ruby console.
And the same shell command is working from shell.
From Shell..
tests#tests-workstation:~$ "`grep '^datadir=' /etc/mysql/my.cnf | cut -f 2 -d '='`/db_backups"
bash: /db_backups: is a directory
tests#tests-workstation:~$
From ruby console..
>> %x["`grep '^datadir=' /etc/mysql/my.cnf | cut -f 2 -d '='`/db_backups"]
sh: /db_backups: Permission denied
=> ""
Any Idea !
You're trying to execute a directory and the shells are saying no; bash says no by saying "/db_backups: is a directory" whereas sh says "/db_backups: Permission denied". If you just execute the backedticked part:
grep '^datadir=' /etc/mysql/my.cnf | cut -f 2 -d '='
You'll almost certainly see no output at all and the reason is probably that your regular expression is too tight, something like this:
grep '^[ ]*datadir[ ]*=' /etc/mysql/my.cnf | cut -f2 -d'='
Would serve you better; the character classes contain a space and a tab.
Now that you're looking for the right things we can move on to why it still won't work. The %x[] quoter tries to execute its argument using the shell. When you feed the backticked grep stuff:
`grep '^[ ]*datadir[ ]*=' /etc/mysql/my.cnf | cut -f2 -d'='`/db_backups
to the shell, you should get a directory name that ends with /db_backups but you can't execute a directory. I think you want this to produce the directory name:
d = %x[echo `grep '^[ ]*datadir[ ]*=' /etc/mysql/my.cnf | cut -f2 -d'='`/db_backups].strip
Note the leading echo and the .strip call on the returned string. The .strip is necessary to remove the newline from what echo produces.
I think you're going through a lot of trouble for something that could easily be done with just a couple lines of Ruby:
dir = nil
File.open('/etc/mysql/my.cnf').each do |line|
if(m = line.match(/^\s*datadir\s*=\s*(\S+)/))
dir = m[1] + '/db_backups'
break
end
end
You could probably tighten that up a bit if you wanted but I think that that's at least less confusing than putting shell backticks inside Ruby backticks.
It looks like you just want to get field 2 from the file. Then just do it in Ruby using split
File.open("file").each do |line|
if line[/^datadir/]
print line.split("=",2)[0]
end
end
There is no need to specifically shell out to call grep. This is inefficient and non-portable