Error in GNU parallel dynamic string replacement - gnu-parallel

I have more than 50 file pairs with names in the following format: AA-7R-76L1.clean.R1.fastq.gz, AA-7R-76L1.clean.R2.fastq.gz
I tried to use parallel in the following way:
parallel --plus echo {%R..fastq.gz} ::: *.fastq.gz |parallel 'repair.sh in1={}.R1.fastq.gz in2={}.R2.fastq.gz out1={}.repd.R1.fastq.gz out2={}.repd.R2.fastq.gz outs={}.singletons.fastq.gz repair'
--plus echo should dynamically replace R1.fastq.gz, R2.fastq.gz to capture the sample name i.e.HB-7R-25L0.clean. It should then feed it to repair.sh
The error I get is, the first section extracts the entire filename and does not capture the sample name. Thus in1 and in2 becomes AA-7R-76L1.clean.R1.fastq.gz.R1.fastq.gz and AA-7R-76L1.clean.R2.fastq.gz.R2.fastq.gz
What is the error here?

Something like:
$ parallel --plus --dry-run 'repair.sh in1={} in2={/R1/R2} out1={/R1/fixed.R1} out2={/R1/fixed.R2} outs={%.R1.fastq.gz}_singletons.fastq repair' ::: *R1.fastq.gz
(Assuming R1 and R2 is not part of the *-part of the name).

Related

How to fix 'Unable to open [{2}]' error in Gnu Parallel

I want to parallelize an image processing step which uses two programs at the same time. My code works fine for a single image but when I try to parallelize it, it fails.
The two programs I am using are fx and getkey from USGS Integrated Software for Imagers and Spectrometers. I use fx to perform an arithmetic operation on my input image (which is 'f1' in the code below) and writes it to a new file (which is the 'to' parameter). getkey outputs the value of a requested keyword, which is a number in this case.
In the following code, I am subtracting the output of getkey from my input image, f1, and writing the result to a new file, which is defined by the 'to' parameter. This code works as I expect it to:
fx f1=W1660432760_1_overclocks_average_lwps5.cub to=testing_fx2.cub equation=f1-$(getkey from=W1660432760_1_overclocks_average_lwps5_stats.txt grpname=results keyword=average)
The problem comes when I try to parallelize it. The following code gives an error, saying 'Unable to open [{2}].'
parallel fx f1={1} to={1.}_minus_avg.cub equation=f1-$(getkey from={2} grpname=results keyword=average) ::: $(find *lwps5.cub) ::: $(find *stats.txt)
The result I am expecting is an output image with pixel values that are smaller by the getkey value compared to the input image.
If the two inputs should be combined in all ways:
parallel fx f1={1} to={1.}_minus_avg.cub 'equation=f1-$(getkey from={2} grpname=results keyword=average)' ::: *lwps5.cub ::: *stats.txt
If the two inputs should be linked:
parallel fx f1={1} to={1.}_minus_avg.cub 'equation=f1-$(getkey from={2} grpname=results keyword=average)' ::: *lwps5.cub :::+ *stats.txt
If neither of these solve you issue, then make a shell function that takes 2 arguments:
doit() {
arg1="$1"
arg2="$2"
# Do all your stuff with getkey and fx
}
export -f doit
# all combinations
parallel doit ::: *lwps5.cub ::: *stats.txt
# or linked
parallel doit ::: *lwps5.cub :::+ *stats.txt

How to quote each argument from gnu parallel?

Given some tab-delimited content:
Test|One|Two|Three
Again|||Another
And a bash function:
function print_last() {
echo "$4"
}
export -f print_last
And the parallel command: parallel -C "\|" print_last :::: data.tsv
My expected output is:
Three
Another
However, Another never prints because the function only receives two arguments for that row of data. This is caused by the empty cells in the tabular data. My data will have blank cells and an varying number of columns.
So, without changing my command to include numbered arguments (print_last "{1}" "{2}" "{3}" "{4}"), how can I ensure that blank values are sent to the function?
Since your function is called print_last maybe it will be enough to simply get the last element:
parallel -C "\|" echo {-1} :::: data.tsv
Otherwise abuse that -X will repeat context:
parallel -C "\|" -X print_last \"\"{} :::: data.tsv

How to make the output of Maxima cleaner?

I want to make use of Maxima as the backend to solve some computations used in my LaTeX input file.
I did the following steps.
Step 1
Download and install Maxima.
Step 2
Create a batch file named cas.bat (for example) as follows.
rem cas.bat
echo off
set PATH=%PATH%;"C:\Program Files (x86)\Maxima-5.31.2\bin"
maxima --very-quiet -r %1 > solution.tex
Save the batch in the same directory in which your input file below exists. It is just for the sake of simplicity.
Step 3
Create the input file named main.tex (for example) as follows.
% main.tex
\documentclass[preview,border=12pt,12pt]{standalone}
\usepackage{amsmath}
\def\f(#1){(#1)^2-5*(#1)+6}
\begin{document}
\section{Problem}
Evaluate $\f(x)$ for $x=\frac 1 2$.
\section{Solution}
\immediate\write18{cas "x: 1/2;tex(\f(x));"}
\input{solution}
\end{document}
Step 4
Compile the input file with pdflatex -shell-escape main and you will get a nice output as follows.
!
Step 5
Done.
Questions
Apparently the output of Maxima is as follows. I don't know how to make it cleaner.
solution.tex
1
-
2
$${{15}\over{4}}$$
false
Now, my question are
how to remove such texts?
how to obtain just \frac{15}{4} without $$...$$?
(1) To suppress output, terminate input expressions with dollar sign (i.e. $) instead of semicolon (i.e. ;).
(2) To get just the TeX-ified expression sans the environment delimiters (i.e. $$), call tex1 instead of tex. Note that tex1 returns a string, which you have to print yourself (while tex prints it for you).
Combining these ideas with the stuff you showed, I think your program could look like this:
"x: 1/2$ print(tex1(\f(x)))$"
I think you might find the Maxima mailing list helpful. I'm pretty sure there have been several attempts to create a system such as the one you describe. You can also look at the documentation.
I couldn't find any way to completely clean up Maxima's output within Maxima itself. It always echoes the input line, and always writes some whitespace after the output. The following is an example of a perl script that accomplishes the cleanup.
#!/usr/bin/perl
use strict;
my $var = $ARGV[0];
my $expr = $ARGV[1];
sub do_maxima_to_tex {
my $m = shift;
my $c = "maxima --batch-string='exptdispflag:false; print(tex1($m))\$'";
my $e = `$c`;
my #x = split(/\(%i\d+\)/,$e); # output contains stuff like (%i1)
my $f = pop #x; # remove everything before the echo of the last input
while ($f=~/\A /) {$f=~s/\A .*\n//} # remove echo of input, which may be more than one line
$f =~ s/\\\n//g; # maxima breaks latex tokens in the middle at end of line; fix this
$f =~ s/\n/ /g; # if multiple lines, get it into one line
$f =~ s/\s+\Z//; # get rid of final whitespace
return $f;
}
my $e1 = do_maxima_to_tex("diff($expr,$var,1)");
my $e2 = do_maxima_to_tex("diff($expr,$var,2)");
print <<TEX;
The first derivative is \$$e1\$. Differentiating a second time,
we get \$$e2\$.
TEX
If you name this script a.pl, then doing
a.pl z 3*z^4
outputs this:
The first derivative is $12\,z^3$. Differentiating a second time,
we get $36\,z^2$.
For the OP's application, a script like this one could be what is invoked by the write18 in the latex file.
If you really want to use LaTeX then the maxiplot package is the answer. It provides a maxima environment inside of which you enter Maxima commands. When you process your LaTeX file a Maxima batch file is generated. Process this file with Maxima and process your LaTeX file again to typeset the equations generated by Maxima.
If you would rather have 2D math input with live typesetting then use TeXmacs. It is a cross-platform document authoring environment (a word processor on steroids if you like) that includes plugins for Maxima, Mathematica and many more scientific computing tools. If you need to or are not satisfied with the typesetting, you can export your document to LaTeX.
I know this is a very old post. Excellent answers for the question asked by OP. I was using --very-quiet -r options on the command line for a long time like OP, but in maxima version 5.43.2 they behave differently. See maxima command line v5.43 is behaving differently than v5.41. I am answering this question with a cross reference because when incorporating these answers in your solutions, make sure the changes in behavior of those command line flags are also incorporated.

extract a line from a file using csh

I am writing a csh script that will extract a line from a file xyz.
the xyz file contains a no. of lines of code and the line in which I am interested appears after 2-3 lines of the file.
I tried the following code
set product1 = `grep -e '<product_version_info.*/>' xyz`
I want it to be in a way so that as the script find out that line it should save that line in some variable as a string & terminate reading the file immediately ie. it should not read furthermore aftr extracting the line.
Please help !!
grep has an -m or --max-count flag that tells it to stop after a specified number of matches. Hopefully your version of grep supports it.
set product1 = `grep -m 1 -e '<product_version_info.*/>' xyz`
From the man page linked above:
-m NUM, --max-count=NUM
Stop reading a file after NUM matching lines. If the input is
standard input from a regular file, and NUM matching lines are
output, grep ensures that the standard input is positioned to
just after the last matching line before exiting, regardless of
the presence of trailing context lines. This enables a calling
process to resume a search. When grep stops after NUM matching
lines, it outputs any trailing context lines. When the -c or
--count option is also used, grep does not output a count
greater than NUM. When the -v or --invert-match option is also
used, grep stops after outputting NUM non-matching lines.
As an alternative, you can always the command below to just check the first few lines (since it always occurs in the first 2-3 lines):
set product1 = `head -3 xyz | grep -e '<product_version_info.*/>'`
I think you're asking to return the first matching line in the file. If so, one solution is to pipe the grep result to head
set product1 = `grep -e '<product_version_info.*/>' xyz | head -1`

Unable to manipulate a byte array

I'm trying to pass a byte array from inside my rails app into another ruby script (still inside my rails app), for example:
`./app/animations/fade.sh "\x01\x01\x04\x00" &`
Yields ArgumentError (string contains null byte)
I suppose I'm stumped with how I can form this string and than pass it to my script, which will use it in this sort of fashion:
#sp.write ["#{ARGV[0]}", "f", "\x12"]
I'd like to form the string (on my rails app) like this if possible:
led = "\x01#{led.id}\x04\x00"
But I keep getting ArgumentError (string contains null byte) error. Is there a way I can form this string from elements in my rails app, then pass it to my external script?
You should just pass the data in through standard input, not the command line. You can use IO.popen for this purpose:
IO.popen("./app/animations/fade.sh", "w+") do |f|
f.write "\x01\x01\x04\x00"
end
And on the reading side:
input = $stdin.read
#sp.write [input, "f", "\x12"]
(By the way, it's more common to name Ruby scripts .rb instead of .sh; if fade.sh is meant to be a Ruby script, as I assume from the syntax you used in its example contents, you might want to name it fade.rb)
you could use base64 to pass the bytestring around
$ cat > test.sh
echo $1 | base64 -d
$ chmod a+x test.sh
and then from ruby:
irb
>> require 'base64'
=> true
>> `./test.sh "#{Base64.encode64 "\x01\x01\x04\x00"}"`
=> "\x01\x01\x04\x00"
Can your script accept input from STDIN instead? Perhaps using read.
If you can't do this, you could encode your null and escape your encoding.
E.G. 48656c6c6f0020576f726c64 could be encoded as 48656c6c6f200102020576f726c64
which in turn would be decoded again if both sides agree 2020=20 and 2001=00
Update I think encoding is what you'll have to do because I tried using read and it turns out to be a little too difficult. There's probably another option, but I don't see it yet.
Here's my script and two test runs:
dlamblin$ cat test.sh
echo "reading two lines of input, first line is length of second."
read len
read ans
echo "C string length of second line is:" ${#ans}
for ((c=0; c<$len; c++))
do
/bin/echo -n "${ans:$c:1},"
done
echo ' '
exit
dlamblin$ echo -e '12\0012Hello \0040World' | sh test.sh
reading two lines of input, first line is length of second.
C string length of second line is: 12
H,e,l,l,o, , ,W,o,r,l,d,
dlamblin$ echo -e '12\0012Hello \0000World' | sh test.sh
reading two lines of input, first line is length of second.
C string length of second line is: 5
H,e,l,l,o,,,,,,,,
#Octals \0000 \0012 \0040 are NUL NL and SP respectively

Resources