Input file:
6 31236622 HLA_C*05:01:01:01 A T . PASS AF=0.07724;MAF=0.07724;R2=0.98466;IMPUTED GT:DS:HDS:GP 1|0:0.999:0.999,0.000:0.001,0.999,0.000 0|0:0:0,0:1,0,0 1|1:1.994:0.995,1.000:0.000,0.006,0.994
6 29910248 HLA_A*01:01 A T . PASS AF=0.15969;MAF=0.15969;R2=0.97333;IMPUTED GT:DS:HDS:GP 0|0:0:0,0:1,0,0 1|0:1.000:1.000,0.000:0.000,1.000,0.000 0|0:0:0,0:1,0,0
6 31322134 HLA_B*55:01 A T . PASS AF=0.01091;MAF=0.01091;R2=0.94511;IMPUTED GT:DS:HDS:GP 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0
6 31322132 HLA_B*55 A T . PASS AF=0.01091;MAF=0.01091;R2=0.94485;IMPUTED GT:DS:HDS:GP 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0
6 31322006 HLA_B*44:02:01:01 A T . PASS AF=0.08074;MAF=0.08074;R2=0.97706;IMPUTED GT:DS:HDS:GP 1|0:0.999:0.999,0.000:0.001,0.999,0.000 0|0:0:0,0:1,0,0 1|1:1.997:0.998,0.999:0.000,0.003,0.997
I want to parse a specific number from each column after the "GT:DS:HDS:GP" column, specifically, the numbers after "x|x:". So desired output is:
0.999, 0, 1.994
0, 1.000, 0
0, 0, 0
0, 0, 0
0.999, 0, 1.997
To parse the desired values from (e.g.) line 4, I can use:
awk -F: '{for (i=5; i<=NF; i+=3) printf "%s%s", $i, (i+3 <= NF ? ", " : ORS)}'
Line 5 would require:
awk -F: '{for (i=9; i<=NF; i+=3) printf "%s%s", $i, (i+3 <= NF ? ", " : ORS)}'
So the problem with the input file is that column 3 (space delimited) contains a variable number of colons, which makes colons a poor delimiter for this particular input file (but the desired values are surrounded by colons!)
I though about using "|" as delimiter, with substr($i,3,?), but the desired values have an inconsistent number of digits (hence the "?").
Is there a flexible awk code to get the desired output?
You may try this awk:
awk -v OFS=', ' '$9 == "GT:DS:HDS:GP" {for (i=10; i<=NF; ++i) if ($i ~ /^[0-9]+\|[0-9]+:/ && split($i, a, /:/)) printf "%s", (i == 10 ? "" : OFS) a[2]; print ""}' file
0.999, 0, 1.994
0, 1.000, 0
0, 0, 0
0, 0, 0
0.999, 0, 1.997
An expanded form:
awk -v OFS=', ' '
$9 == "GT:DS:HDS:GP" {
for (i=10; i<=NF; ++i)
if ($i ~ /^[0-9]+\|[0-9]+:/ && split($i, a, /:/))
printf "%s", (i == 10 ? "" : OFS) a[2]
print ""
}' file
Why do you care about the space-delimited columns at all?
awk '{ sub(/.* GT:DS:HDS:GP */, "");
i = split($0, n, /[0-9]\|[0-9]:/);
sep = "";
for(x=2; x<=i; x++) {
sub(/:.*/, "", n[x]); printf("%s%s", sep, n[x]); sep=", " }
printf "\n"; }' file
We successively pick apart each line, first by removing everything through GT:DS:HDS:GP from the line, then by splitting the remaining string into n on the specified delimiter, and then cleaning up the resulting fields by removing everything after the first colon in each, and printing the result. (We skip the first one, which only contains the useless short or empty string before the first delimiter.)
Output for your sample:
0.999, 0, 1.994
0, 1.000, 0
0, 0, 0
0, 0, 0
0.999, 0, 1.997
I have no idea what these fields stands for so I just picked single-letter variable names; you can probably improve the readability by giving these variables more descriptive names.
I'm trying to use im4java to generate sample image with text on image with the pattern. My code:
ConvertCmd convertCmd = new ConvertCmd();
IMOperation imOperation = new IMOperation();
imOperation.size(564, 564);
imOperation.tile(patternImg);
imOperation.background("none");
imOperation.stroke("black");
imOperation.strokewidth(2);
imOperation.fill("white");
imOperation.gravity("center");
imOperation.pointsize(40);
imOperation.border(3, 3);
imOperation.label(generateImageRequestDTO.getMainText());
imOperation.composite();
imOperation.addImage(absolutePathWorkDir + "/" + "test.jpg");
convertCmd.run(imOperation);
Which generate such script:
convert \
-size "564x564" -tile "/var/images/patterns/1.jpg" \
-background "none"-stroke "black" -strokewidth "2" -fill "white" \
-gravity "center"-pointsize "40" -border "3x3" -label "This is some text" \
-composite "/var/images/workdir/test.jpg"
Which is almost what I need. This is the code which I'm trying to generate:
convert \
-size "564x564" tile:"/var/images/patterns/1.jpg" \
-background "none"-stroke "black" -strokewidth "2" -fill "white" \
-gravity "center"-pointsize "40" -border "3x3" label:"This is some text" \
-composite "/var/images/workdir/test.jpg"
Basiclly
-tile ==> tile:
-label ==> label:
what i'm missing here?
Thank You a lot #emcconville ! That works!
Try adding the tile: & label: prefix to imOperation.addImage method. Like imOperation.addImage("tile:"+patternImg); – emcconville
all.
I need help, I have a signal like this one
/\
/\ / \
/ \ /\ / \
0 ---------------------------------------
/ \ / \ / \ /
\/ \ / \ /
\/ \/
and I need to detect all peaks (negative and positive). all values are float and I get all 66ms. I want to know time between two peaks. I need help to achieve it, I think I need to store all values in an array with timestamp from last peak, any one have best approach to do it ?
Thanks.
To discover a peak you might want to discover a change in direction.
You would not necessarily have to store the values in an array.
Pseudocode:
//every frame:
frameIncrement++;
currentDir = currentVal - prevVal
if( (prevDir < 0 && currentDir > 0) || (prevDir > 0 && currentDir < 0)) {
//change in direction!
time = frameIncrement * 66
frameIncrement = 0
}
prevDir = currentDir
prevVal = currentVal
Hope this helps!
I am using adler32 function from zlib to calculate the weak checksum of a chunk of memory x (4096 in length). Everything is fine, but now I would like to perform the rolling checksum if the chunks from different file do not match. However, I am not sure how to write a function to perform that on the value returned by adler32 in zlib. So if the checksum does not match, how do I calculate rolling checksum by using original checksum, x + 1 byte and x + 4096 + 1? Basically trying to build rsync implementation.
Pysync has implemented rolling on top of zlib's Adler32 like this:
_BASE=65521 # largest prime smaller than 65536
_NMAX=5552 # largest n such that 255n(n+1)/2 + (n+1)(BASE-1) <= 2^32-1
_OFFS=1 # default initial s1 offset
import zlib
class adler32:
def __init__(self,data=''):
value = zlib.adler32(data,_OFFS)
self.s2, self.s1 = (value >> 16) & 0xffff, value & 0xffff
self.count=len(data)
def update(self,data):
value = zlib.adler32(data, (self.s2<<16) | self.s1)
self.s2, self.s1 = (value >> 16) & 0xffff, value & 0xffff
self.count = self.count+len(data)
def rotate(self,x1,xn):
x1,xn=ord(x1),ord(xn)
self.s1=(self.s1 - x1 + xn) % _BASE
self.s2=(self.s2 - self.count*x1 + self.s1 - _OFFS) % _BASE
def digest(self):
return (self.s2<<16) | self.s1
def copy(self):
n=adler32()
n.count,n.s1,n.s2=self.count,self.s1,self.s2
return n
But as Peter stated, rsync does not use Adler32 directly, but a faster variant of it.
Code of the rsync tool is bit hard to read, but checkout librsync. It's a completely separate project and it's much more readable. Take a look at rollsum.c and rollsum.h. There is an efficient implementation of the variant in C macros:
/* the Rollsum struct type*/
typedef struct _Rollsum {
unsigned long count; /* count of bytes included in sum */
unsigned long s1; /* s1 part of sum */
unsigned long s2; /* s2 part of sum */
} Rollsum;
#define ROLLSUM_CHAR_OFFSET 31
#define RollsumInit(sum) { \
(sum)->count=(sum)->s1=(sum)->s2=0; \
}
#define RollsumRotate(sum,out,in) { \
(sum)->s1 += (unsigned char)(in) - (unsigned char)(out); \
(sum)->s2 += (sum)->s1 - (sum)->count*((unsigned char)(out)+ROLLSUM_CHAR_OFFSET); \
}
#define RollsumRollin(sum,c) { \
(sum)->s1 += ((unsigned char)(c)+ROLLSUM_CHAR_OFFSET); \
(sum)->s2 += (sum)->s1; \
(sum)->count++; \
}
#define RollsumRollout(sum,c) { \
(sum)->s1 -= ((unsigned char)(c)+ROLLSUM_CHAR_OFFSET); \
(sum)->s2 -= (sum)->count*((unsigned char)(c)+ROLLSUM_CHAR_OFFSET); \
(sum)->count--; \
}
#define RollsumDigest(sum) (((sum)->s2 << 16) | ((sum)->s1 & 0xffff))
I have a text file which has hex values, one value on one separate line. A file has many such values one below another. I need to do some analysis of the values for which i need to but some kind of delimiter/marker say a '#' in this file before line numbers 32,47,62,77... difference between two line numbers in this patterin is 15 always.
I am trying to do it using awk. I tried few things but didnt work.
What is the command in awk to do it?
Any other solution involving some other language/script/tool is also welcome.
Thank you.
-AD
This is how you can use AWK for it,
awk 'BEGIN{ i=0; } \
{if (FNR<31) {print $0} \
else {i++; if (i%15) {print $0} else {printf "#%s\n",$0}}\
}' inputfile.txt > outputfile.txt
How it works,
BEGIN sets an iterator for counting from your starting line 32
FNR<31 starts counting from the 31st record (the next record needs a #)
input lines are called records and FNR is an AWK variable that counts them
Once we start counting, the i%15 prefixes a # on every 15th line
$0 prints the record (the line) as is
You can type all the text with white spaces skipping the trailing '\' on a single command line.
Or, you can use it as an AWK file,
# File: comment.awk
BEGIN{ i=0; }
$0 ~ {\
if (FNR<31) {print $0} \
else {\
i++; \
if (i%15) {\
print $0
}\
else {\
printf "#%s\n",$0
}\
}\
}
And run it as,
awk -f comment.awk inputfile.txt > outputfile.txt
Hope this will help you to use more AWK.
Python:
f_in = open("file.txt")
f_out = open("file_out.txt","w")
offset = 4 # 0 <= offset < 15 ; first marker after fourth line in this example
for num,line in enumerate(f_in):
if not (num-offset) % 15:
f_out.write("#\n")
f_out.write(line)
Haskell:
offset = 31;
chunk_size = 15;
main = do
{
(h, t) <- fmap (splitAt offset . lines) getContents;
mapM_ putStrLn h;
mapM_ ((putStrLn "#" >>) . mapM_ putStrLn) $
map (take chunk_size) $
takeWhile (not . null) $
iterate (drop chunk_size) t;
}