Biopython: Extract CDS from modified GenBank records? - biopython

I have some basic familiarity with python and have been extracting coding sequences from genbank records. However, I'm unsure how to handle records where the coding sequence has been modified (e.g. owing to correcting internal stop codons). An example of such a sequence is this genbank record (or accession: XM_021385495.1 if the link does not work).
In this example, I can translate the two coding sequences that I can access, but both have internal stop codons - and according to the notes also indels! This is the way I have accessed the CDS:
1 - gb_record.seq
2 - cds.location.extract(gb_record) for where feature == "CDS"
However, I need the sequence that has been corrected. As far as I can tell, I think I need to use the "transl_except" tags in the CDS feature but I am at a loss how to do this.
I wonder if anybody might be able to provide an example or some insight of how to do this?
Thanks
Jo

I've got some demo code written in python3 that should help explain this GenBank record.
import re
aa_convert_codon_di = {
'A':['[GRSK][CYSM].'],
'B':['[ARWM][ARWM][CTYWKSM]', '[GRSK][ARWM][TCYWKSM]'],
'C':['[TYWK][GRSK][TCYWKSM]'],
'D':['[GRSK][ARWM][TCYWKSM]'],
'E':['[GRSK][ARWM][AGRSKWM]'],
'F':['[TYWK][TYWK][CTYWKSM]'],
'G':['[GRSK][GRSK].'],
'H':['[CYSM][ARWM][TCYWKSM]'],
'I':['[ARWM][TYWK][^G]'],
'J':['[ARWM][TYWK][^G]', '[CYSM][TYWK].', '[TYWK][TYWK][AGRSKWM]'],
'K':['[ARWM][ARWM][AGRSKWM]'],
'L':['[CYSM][TYWK].', '[TYWK][TYWK][AGRSKWM]'],
'M':['[ARWM][TYWK][GRSK]'],
'N':['[ARWM][ARWM][CTYWKSM]'],
'O':['[TYWK][ARWM][GRSK]'],
'P':['[CYSM][CYSM].'],
'Q':['[CYSM][ARWM][AGRSKWM]'],
'R':['[CYSM][GRSK].', '[ARWM][GRSK][GARSKWM]'],
'S':['[TYWK][CYSM].', '[ARWM][GRSK][CTYWKSM]'],
'T':['[ARWM][CYSM].'],
'U':['[TYWK][GRSK][ARWM]'],
'V':['[GRSK][TYWK].'],
'W':['[TYWK][GRSK][GRSK]'],
'X':['...'],
'Y':['[TYWK][ARWM][CTYWKSM]'],
'Z':['[CYSM][ARWM][AGRSKWM]','[GRSK][ARWM][AGRSKWM]'],
'_':['[TYWK][ARWM][AGRSKWM]', '[TYWK][GRSK][ARWM]'],
'*':['[TYWK][ARWM][AGRSKWM]', '[TYWK][GRSK][ARWM]'],
'x':['[TYWK][ARWM][AGRSKWM]', '[TYWK][GRSK][ARWM]']}
dna_convert_aa_di = {
'ATA':'I', 'ATC':'I', 'ATT':'I', 'ATG':'M',
'ACA':'T', 'ACC':'T', 'ACG':'T', 'ACT':'T',
'AAC':'N', 'AAT':'N', 'AAA':'K', 'AAG':'K',
'AGC':'S', 'AGT':'S', 'AGA':'R', 'AGG':'R',
'CTA':'L', 'CTC':'L', 'CTG':'L', 'CTT':'L',
'CCA':'P', 'CCC':'P', 'CCG':'P', 'CCT':'P',
'CAC':'H', 'CAT':'H', 'CAA':'Q', 'CAG':'Q',
'CGA':'R', 'CGC':'R', 'CGG':'R', 'CGT':'R',
'GTA':'V', 'GTC':'V', 'GTG':'V', 'GTT':'V',
'GCA':'A', 'GCC':'A', 'GCG':'A', 'GCT':'A',
'GAC':'D', 'GAT':'D', 'GAA':'E', 'GAG':'E',
'GGA':'G', 'GGC':'G', 'GGG':'G', 'GGT':'G',
'TCA':'S', 'TCC':'S', 'TCG':'S', 'TCT':'S',
'TTC':'F', 'TTT':'F', 'TTA':'L', 'TTG':'L',
'TAC':'Y', 'TAT':'Y', 'TAA':'*', 'TAG':'*',
'TGC':'C', 'TGT':'C', 'TGA':'*', 'TGG':'W'}
dna_str = "ATGACCGAGGTGCAAGACCTTGCACTTGGATTTGTTGAACCTCATGAGGTTCCCCTGGGCCCCTGGACATCGCCTTTTTCCAGCGTTCCACCAGAGACTTCACCCAACTGCTGTGACTTTTCAAACATCATTGAGAGCGGCTTGATACAGTTAGGCCACTCTCGCAGCTGTGAAGTTGTGAAGGCAAACTCCAGCGACCCATTCCTTCTTCCTTCAGAAAAGCAACTCGAGGAGCAGCGGGAGGAAACCCAGCTCTATCCTGCAGCGAGCGGGGCTGCGCAAGAGGCAGGTGCTGCTCTCACGGCCCGAAGGCAGCTCCGAGCTGCCGGGTGCGGTCACGTCAGCGGCCGAGCTGCCCGGCGGGGTGTGCATAAGAGCGAGCTATATGTGCTGCGTGTCATCACGGAGCCTTTCAAGTCCCTCCCTCCTTCTCCACTGCTGGGGCTGCAGTGGGCACCGGGCAGGAGGAGCGGCCGCAGCCCCGCGGGGGTGGGACGAGTCTCTGGGGGCTGCGCCACTTGGAAGATTTGCATTGGGTACATTGATAGCATTGTGATTGATGGCCTATTTAATACCATAATGTGTTCTTTAGATTTCTTTTTGGAGAACTCAGAAGAAAATTTGAAGCCAGCTCCACTTTTTCCAGCACAAATGACCCTTACTGGCACAGAAATTCATTTTAAACTTTCTCTAGATAAAGAGGCTGATGATGGCTTTTATGACCTTATGGATGAACTACTGGGTGATATTTTCCGAATGTCTGCCCAAGTGAAGAGACTAGAAGCCCACCTGGAATCAGAACATTAGGAGGACTATATGAACAGTGTGTTTGATCTGTCTGAACTCAGGCAGGAGAGTATGGAGAGAGTAATAAACGTCACCAACAAGGCCTTGAAGTACAGAAGATCTCATGATAGCTATGCTTATCTCTGACTAGAGGATCAGCTTGAGTTTATGAGGCAATTTCTTCCTTGTGCTCGTGGTTTAATGTCCACACAGATATCTCTTACTGGCATCCCACTACTAAACTGTGTAAAAAGCAGGCAAGAAAGAAACTAGTTTAAATAACTTCCTATTTATGAAAATCTCTGTGTTCAGATGAGTAAGTTTGAAGACCCAAGAATTTTTGAAAGCTGGTTTAAGGTGATTATGAAGCCTTTCAAAATGACACTTCTAAACATTACTAAGAAGTGGAGCTGGATGTTTAAGTAGTACACTATAGAAATAATAAGATTGAGTCTGAATGACTTCAAAGACTTTATAAAAGTGACAGATGCTGGACTTCAAAGAGGGAGGCATTATTGTGCACTGGCAGAAATCACCGGTCACCTCTTGGCTGTGAAAGAGAGGCAGACAGCTGCTGGTGAATCCTTTGAACCTTTAAAAGAANTTGTTGCATTGTTGGAAAGCTACAGACAGAAGATGCCAGATCAAGTTTGCATCCAGTGTCAAATCAGTTGTATCCTGGGAGCCTTTAAGGGTTATGTACTTCTGGTTGGAGTAGGTGGTAGTGATAAATGAAGCTTGTCAAGGCTGGCAGCATGCATCTCTTCCCTGGAGGTCTTTTAAATCATATGGAAGAAAGACCATGAGAGCAAGAACCTGAAGGTAGATGTTGCCAGTTTGTGCATCAAGACTGGTGCCAAGAACATGCCCACAGTGTTTTTGCTGACAGATGCCCAGGTTCCAGATGAACGCTTTCTTGTGCTGATTAATGACTTGTTGGCATCAAGAGATCTTCCTGATCTGTTCAGTGGTGAAGATGAGGAGGGCAAAGTTGCAGGAGTCAGAAAAGAAGTCNNCCTGGGCTTGATGGACACCACAGAAAGCTGCTGGAGGTGGTTCTTTGGTAGAGCGCAGCAGCTGTTAAAAGTGTATGGTGAAGTAGAGTCGAAATGTTGTGCACTGGTCCAGGCAAATACAAAATTAGCAACAGCTAAAGAGAATCTAGAAACAATCTTGAAAAAGCTTATTTCTGAAAATGTGCATTGGAGCCAATCTGTTGAAAACCTCAAAGCATAAAAGAAAACTGTACTCAAGGATGTTACATCAGCAGCAGCGTTTGCATCTTTCTTTGGAGCCTTCACAAAACCATATAGTCAAGAACAGATGGAACATTTCTGGATTCTTTCTCTAAAGTCACAGGAGTGTCCTGTTCCTGTGATAGAGGGGCCAGACTCTGCCATCCTGATGAATGATGCTCCAAGAGCAGCACAGAGTAACAAGAGTCTGCTTGCTGATAGGGTGTCAGCAGAAAATGCCACTGCTCTGACACACTGTGAGCAGGGCCCTCTGATGATAGATCCCCAGAAACAGGGAATTGAATGGACACAGAATAAATACAGAACTGACTTTAAAGTCATGCATCTAGGAGAGAATGGTTATGTGTGTACTATTGATACAGCTTTGGCTTGTGGAGAGATTATACTAATTGAAAACATGGCTGAATCTATCGATCTCTTACTTGATCCCCTAACTGGAAGACATACAGGTAAAAGGGGAAGGAATACTTGCGCAATCAGAATTTCTTGAAGACAAAAAAAAAAAAAGTGTGAATTCTACAGGAATTTCCATCTCATCCTTCACACTAAGCTGGCTAACCCTCCCTGCAAGCCAGAGCTTNAGGCTCAGACCACTCTCATTATTTTCACAGATACCAGGGGCAGGCTGGAAGAACAGCTGTTGGCTGAGGTGGTGAGTGCTGAAAGGCCTGACTTGGAAAACCATACGTCAGCACTGGCGAAACAGAAGAGTGTCTCTGAAATCAAGCCCAAGCAGCTTGAGGACAACATGCTGCTCAGTCTGTCAGCTGCCCAGAGCACTTTTGTAGGTGACAGTGAACTTGAAGAGAAATTCAAGTCAACTGCAGGAGAAATGATTGTCCGCCCACATGTTCACAGCTTCTTATTTTGGCAAAAAGCTTCCACTGTAGACTCTGGAAGATTTCATATCTCTTTAGGACAAGGGCAGGAGATGGTTGTGGAGNGACAACTTGAGAAGGCTGCCAAGCCTGGCCACTGGCTTCTTCTCCAAAATATTAATGTGGTAGCCAAGTGGCTAGGAACCTTGGAAAAACTCCTCGAGCAATAGAGTGAAGAAAGTCACTGGTATTTCCGTGTCTTCACTAGTGCTGAACCAGCTCCAGCCCCAGAAGAGCACATCATTCTTCAAGGAGTACTTGAAAACTGAATTAAAATTACCAGACTATCAATAACACTGCCAGTTGTTAAGTGGATAAATGTATTCCTTTTTTTCCTTTGGCAGGATACCCTTGAACTGTGTGGCAAAGAACAGGAATTTAAGAGCATTCTTTTCTCCCTTCGTTATTTTCACACCCGTGTTGCCAGCAGACTCATTTGGCCTTCCAGGCTGCAATTAAGATACCCATACAATACTAGAGATCTCACTGTTTGCATCAGTGTGCCCTGCAACTATTTAGACACTTACACAGAGGTCAGACGCAGTGGTCAGAAAAACAAGTCTATAAAATCAGCTGATTCCAACCCTTAG"
aa_str = "MTEVQDLALGFVEPHEVPLGPWTSPFSSVPPETSPNCCDFSNIIESGLIQLGHSRSCEVVKANSSDPFLLPSEKQLEEQREETQLYPAASGAAQEAGAALTARRQLRAAGCGHVSGRAARRGVHKSELYVLRVITEPFKSLPPSPLLGLQWAPGRRSGRSPAGVGRVSGGCATWKICIGYIDSIVIDGLFNTIMCSLDFFLENSEENLKPAPLFPAQMTLTGTEIHFKLSLDKEADDGFYDLMDELLGDIFRMSAQVKRLEAHLESEHXEDYMNSVFDLSELRQESMERVINVTNKALKYRRSHDSYAYLXLEDQLEFMRQFLPCARGLMSTQISLTGIPLLNCVKSRQERNXFKXLPIYENLCVQMSKFEDPRIFESWFKVIMKPFKMTLLNITKKWSWMFKXYTIEIIRLSLNDFKDFIKVTDAGLQRGRHYCALAEITGHLLAVKERQTAAGESFEPLKEXVALLESYRQKMPDQVCIQCQISCILGAFKGYVLLVGVGGSDKXSLSRLAACISSLEVFXIIWKKDHESKNLKVDVASLCIKTGAKNMPTVFLLTDAQVPDERFLVLINDLLASRDLPDLFSGEDEEGKVAGVRKEVXLGLMDTTESCWRWFFGRAQQLLKVYGEVESKCCALVQANTKLATAKENLETILKKLISENVHWSQSVENLKAXKKTVLKDVTSAAAFASFFGAFTKPYSQEQMEHFWILSLKSQECPVPVIEGPDSAILMNDAPRAAQSNKSLLADRVSAENATALTHCEQGPLMIDPQKQGIEWTQNKYRTDFKVMHLGENGYVCTIDTALACGEIILIENMAESIDLLLDPLTGRHTGKRGRNTCAIRISXRQKKKKCEFYRNFHLILHTKLANPPCKPELXAQTTLIIFTDTRGRLEEQLLAEVVSAERPDLENHTSALAKQKSVSEIKPKQLEDNMLLSLSAAQSTFVGDSELEEKFKSTAGEMIVRPHVHSFLFWQKASTVDSGRFHISLGQGQEMVVEXQLEKAAKPGHWLLLQNINVVAKWLGTLEKLLEQXSEESHWYFRVFTSAEPAPAPEEHIILQGVLENXIKITRLSITLPVVKWINVFLFFLWQDTLELCGKEQEFKSILFSLRYFHTRVASRLIWPSRLQLRYPYNTRDLTVCISVPCNYLDTYTEVRRSGQKNKSIKSADSN"
mod_dna_str = ""
mod_aa_str = aa_str[:]
start = 0
for index in range(start, len(dna_str), 3):
codon = dna_str[index:index+3]
if len(mod_aa_str) == 0:
break
if codon in dna_convert_aa_di and dna_convert_aa_di[codon] == mod_aa_str[0]:
mod_aa_str = mod_aa_str[1:]
else:
codon_match = "|".join(aa_convert_codon_di[mod_aa_str[0]])
if len(re.findall(codon_match, codon)) > 0:
print(index, codon_match, codon)
mod_aa_str = mod_aa_str[1:]
Code output:
804 ... TAG
930 ... TGA
1056 ... TAG
1065 ... TAA
1209 ... TAG
1389 ... NTT
1518 ... TGA
1566 ... TAA
1800 ... NNC
2019 ... TAA
2529 ... TGA
2622 ... NAG
2985 ... NGA
3087 ... TAG
3186 ... TGA
From the note section of the CDS, we have: inserted 5 bases in 4 codons; deleted 2 bases in 2 codons; substituted 11 bases at 11 genomic stop codons".
How does this relate to our output? The reading frame never changes, suggesting that the 2 deleted bases are absent from the given nucleotide sequence. Five unknown nucleotides (N) exist in 4 codons (unknown amino acid, X). The authors of the sequence have accounted for indels. Eleven premature stop codons are present, which are simply translated as unknown amino acids. The "transl_except" tags match the locations of the premature stop codons. The nucleotides at these sites have not been altered. The authors provide XP_021241170 as a possible corrected translation product, but it's still very bad.

Related

AWK (or similar) - change 2 lines below the matching pattern

I have a problem that I think it's easiest to solve with awk but I wrapped my head around it.
Inside a file I have repeating output like this:
....
Name="BgpIpv4RouteConfig_XXX">
<Ipv4NetworkBlock id="13726"
StartIpList="x.y.z.t"
PrefixLength="30"
NetworkCount="10000"
... other output
then this block will repeat.
a)I want to match on BGPIpv4Route.*, then skip 2 lines (the "n" keyword of awk), then when reaching Prefix Length:
- either replace it with random (25,30)
or
- better but I guess harder (no idea came to mind for keeping track of what was used and looping among /25../30) -> first occurrence /25, second one /26...till /30, then rollback to /25
b) then next line with NetworkCount depending on the new value of PrefixCount calculate it as 65536 / 2^(32-Prefix Count)
eg: if PrefixCount on this occurrence was replaced with /25, then NetworkCount on the line following it = 65536 / 2 ^ 7 = 65536 / 128 = 512
I found some examples with inserting/changing a line after one that matched (or with a counter variable X lines below the match) but I got a bit confused with the value generation part and also with the changing of two lines where one is depending on the other.
Not sure I made any sense...my head is a bit overwhelmed with what I'm finding everywhere right now.
Thanks in advance!
this should do
$ awk 'BEGIN {q="\""; FS=OFS="="; n=split("25=26=27=28=29=30",ps)}
/BgpIpv4Route/ {c=c%n+1}
/PrefixLength/ {$2=q ps[c] q}
/NetworkCount/ {$2=q 65536/2^(32-ps[c]) q}1' file
perhaps minimize computation by changing to 2^(ps[c]-16)
If there are free standing PrefixLength and NetworkCount attributes perhaps you need to qualify them for each BgpIpv4Route context.

Lua Patterns - World of Warcraft Vanilla

I'm trying to get some data from the chat of the game but I can't figure out the pattern.
It's for an AddOn for a World of Warcraft Vanilla (private server).
gsub function:
http://wowprogramming.com/docs/api/gsub
http://wowwiki.wikia.com/wiki/API_gsub
I have been doing well with this explanation but now there's a part where I have something like this:
variable = gsub(string, "([%d+d]+)?...", "")
I don't know what the pattern should be since the string can be like one the following examples:
2d17h6m31s
1d8h31m40s
22h40m4s
8h6m57s
5m25s
37s
The "([%d+d]+)?" is actually multiple attempts of mine put in together.
I did read about the magic characters ( ) . % + - * ? [ ^ $ but there's still some that I don't understand. If I could get a simple resume explanation it would be great!
The important part of how the chat looks like:
Edit (ktb's comment):
Question: How can I take the full "99d23h59m59s" (^(.*s) didn't did the trick)?
In 99d23h59m59s, the 99 can be from 1 to 99 and it always has a d right after but it's optional if there's actually a <number>d or not. Then the same to <number>h (number's range goes from 1 to 24), <number>m (number's range goes from 1 to 59). There's always a ago in the end.
Update:
/run for key in pairs(string)do ChatFrame1:AddMessage(key)end
With that command I got all the functions's names of string.functionName(), here's the list:
string.sub()
string.gfind()
string.rep()
string.gsub()
string.char()
string.dump()
string.find()
string.upper()
string.len()
string.format()
string.byte()
string.lower()
Information update:
Unlike several other scripting languages, Lua does not use POSIX regular expressions (regexp) for pattern matching. The main reason for this is size: A typical implementation of POSIX regexp takes more than 4,000 lines of code. This is bigger than all Lua standard libraries together. In comparison, the implementation of pattern matching in Lua has less than 500 lines. Of course, the pattern matching in Lua cannot do all that a full POSIX implementation does. Nevertheless, pattern matching in Lua is a powerful tool and includes some features that are difficult to match with standard POSIX implementations.
Source.
Unlike some other systems, in Lua a modifier can only be applied to a character class; there is no way to group patterns under a modifier. For instance, there is no pattern that matches an optional word (unless the word has only one letter). Usually you can circumvent this limitation using some of the advanced techniques that we will see later.
Source.
I can't find the "advanced techniques" told in the quote above. I only found this which I'm not sure yet.
function get_time_stamp(str)
local s,m,h,d = string.match(str:reverse(),"oga s(%d*)m?(%d*)h?(%d*)d?(%d*)")
return d and d:reverse() or 0, h and h:reverse() or 0, m and m:reverse() or 0, s and s:reverse() or 0
end
local day,hour,minute,second = get_time_stamp("2d17h6m31s ago")
print (day,hour,minute,second) -- output: 2 17 6 31
day,hour,minute,second = get_time_stamp("5m25s ago")
print (day,hour,minute,second) -- output: 0 0 5 25
If you are wondering why I use reverse, it's because we know for sure second will always exist but the others won't, if we don't use reverse then we won't know what order the numbers are in when output by string.match. Here is example what I mean, if you did local d,h,m,s = string.match("5m25s ago","(%d*)d?(%d*)h?(%d*)m?(%d+)s ago") Then print(d,h,m,s) would say that days was 5 and seconds were 25. In reverse we know with absolute certainty the order of output.
I ran into the same pattern limitations several years ago with a WoW addon. It took a bit of searching, but I dug up my parsing function.
parse_duration.lua
--
-- string:parseDuration() - parse a pseudo ISO-8601 duration of the form
-- [nd][nh][nm][ns], where 'n' is the numerical value of the time unit and
-- suffix designates time unit as follows: 'd' - days, 'h' - hours,
-- 'm' - minutes, and, 's' - seconds. Unspecified time units have a value
-- of 0.
--
function string:parseDuration()
local ts = {d=0, h=0, m=0, s=0}
for v in self:lower():gfind("%d+[dhms]") do
ts[v:sub(-1)] = tonumber(v:sub(1,-2))
end
return ts
end
The following tests your sample data.
duration_utest.lua
require "parse_duration"
local function main()
local testSet = {
"2d17h6m31s ago something happened",
"1d8h31m40s ago something happened",
"22h40m4s ago something happened",
"8h6m57s ago something happened",
"5m25s ago something happened",
"37s ago something happened",
"10d6s alias test 1d2h3m4s should not be parsed"
}
for i,testStr in ipairs(testSet) do
-- Extract timestamp portion
local tsPart = testStr:match("%S+")
local ts = tsPart:parseDuration()
io.write( tsPart, " -> { ")
for k,v in pairs(ts) do
io.write(k,":",v," ")
end
io.write( "}\n" )
end
end
main()
Results
2d17h6m31s -> { m:6 d:2 s:31 h:17 }
1d8h31m40s -> { m:31 d:1 s:40 h:8 }
22h40m4s -> { m:40 d:0 s:4 h:22 }
8h6m57s -> { m:6 d:0 s:57 h:8 }
5m25s -> { m:5 d:0 s:25 h:0 }
37s -> { m:0 d:0 s:37 h:0 }
10d6s -> { m:0 d:10 s:6 h:0 }

AQL different results from stream UDF depending on output style (table, json)

I'm trying to create aggregation (map | reduce) with UDF but something is wrong on the very begining. In Aerospike I have a set with bin 'u' (secondary index) and bin 'v' which is a list of objects (auctions with transactions lists and other auction data) and I have a stream UDF to aggregate internal structure of 'v':
function trans_sum_by_years(s)
local function transform(rec)
local l = map()
local x = map()
local trans, auctions = 0, 0
for i in list.iterator(rec['v'] or list()) do
auctions = auctions + 1
for t in list.iterator(i['t'] or list()) do
trans = trans + 1
date = os.date("*t", t['ts'])
if l[date['year']] ~= nil then
l[date['year']] = l[date['year']] + t['price'] * t['qty']
else
l[date['year']] = t['price'] * t['qty']
end
end
end
x.auctions = auctions
x.trans = trans
x.v = l
return x
end
return s : map(transform)
end
The problem is that output is very diffrent depending on setting output on table or json. In first case it seems everything is OK:
{"trans":594, "auctions":15, "v":{2010:1131030}}
{"trans":468, "auctions":68, "v":{2011:1472976, 2012:5188}}
......
On second I get empty object from internal record aggregation.
{
"trans_sum_b...": {
"trans": 389,
"auctions": 89,
"v": {}
}
},
{
"trans_sum_b...": {
"trans": 542,
"auctions": 30,
"v": {}
}
}
.....
I prefer json output and wasted couple hours to find out why I get empty 'v' field without success. So my question is "what the hell is going on" ;-) If my code is correct, what is wrong with the json output, that I don't see the results. If my code is wrong, why it's wrong and why table output results with what I need.
#user1875438 Your code is correct. It seems that there is bug in aql.
My result is the same as yours, the field of v is empty when using json mode.
I used tcpdump to grab the responses of aerospike-server when running these two commands, and found out the responses are the same, so I think it's very possible there is bug in aql tool.
159 0x0050: 0001 0000 0027 0113 0007 5355 4343 4553 .....'....SUCCES
160 0x0060: 5383 a603 7472 616e 7301 a903 6175 6374 S...trans...auct
161 0x0070: 696f 6e73 01a2 0376 81cd 07ce 01 ions...v.....
162 01:57:38.255065 IP localhost.hbci > localhost.57731: Flags [P.], seq 98:128, ack 144, win 42853, options [nop,nop,TS val 976630236 ecr 976630223], length 30
163 0x0000: 4500 0052 55f8 4000 4006 0000 7f00 0001 E..RU.#.#.......
I just posted an issue here.
The answer is simple as hell. But I'm new in Aerospike/Lua and I don't trust my knowledge so I searched for error everywhere but within AQL/UDF area. The problem is more fundamental and interferes with the specification of the JSON itself.
Keys in JSON have to be strings! So tostring(date['year']) solves problem.
Other question is does it is a bug or a feature :-) If Aerospike's map type allow integer keys should there be automatic key conversion from integer to string to satisfy JSON specification or not? IMHO there should be but probably some people disagree claiming that map type is not for integer keys...

H5PY Writes Very Slow

I have a h5py dataset like below. I want to index the records by string instead of by numeric value. So, e.g. I would be able to get the value of the first record by dset[dset.attrs['id1']].
I am trying to write the attributes with the code below, but it is extremely slow. If I do a %timeit dset.attrs[rid] = idx in the loop a single write is about 310ms. The strings I am writing are 36 characters. I have about 100k records I need to write, which would take about 9 hours. Something must be terribly wrong? Also the CPU is pegged.
ids = ['id1', 'id2', 'id3']
h5 = h5py.File("/tmp/ds.h5", "w")
dset = h5.create_dataset("lds", (100000, ), dtype='float32')
for idx, id in enumerate(ids): # loop takes forever
dset.attrs[id] = idx # takes about ~310ms
EDIT
Minimal "working" example.
for idx, rid in enumerate(range(10)):
%timeit dset.attrs[str(rid)] = idx
10 loops, best of 3: 470 ms per loop
10 loops, best of 3: 470 ms per loop
...
Nearly 0.5 second for a single write.
Use the latest value for parameter libver. This is a lot faster. So, e.g.
h5py.File('ds.h5', 'w', libver='latest')
See here: https://github.com/h5py/h5py/issues/705

Output of Erlang bit packing

I am not able to understand bit packing in erlang.
Suppose:
R=4, G=6 and B=8
then why is the output like this:
<< R:5,G:5,B:6 >>
output: <<33,136>>.
I don't get it. Can anyone please explain?
<< R:5,G:5,B:6 >>
This record we allocate 5,5 and 6 bits, and the result is a 2-byte binary sequence. To better understand why this happens, start the reverse conversion. Transform numbers 33 and 136 in binary form:
integer_to_list(33,2).
integer_to_list(136,2).
"100001"
"10001000"
We get the following lines. Since each segment of the binary sequence is a multiple of 8, supplement the presentation of 33 zeros to the left.
L2=lists:append("00",lists:append(integer_to_list(33,2),integer_to_list(136,2))).
"0010000110001000"
Proceed to the decoding of. The third argument indicates the number of bits
V1 = list_to_integer(lists:sublist(L2,5),2).
V2 = list_to_integer(lists:sublist(L2,6,5),2).
V3 = list_to_integer(lists:sublist(L2,11,6),2).
4
6
8
Sorry for my English,hope I explained clearly.

Resources