AQL different results from stream UDF depending on output style (table, json) - lua

I'm trying to create aggregation (map | reduce) with UDF but something is wrong on the very begining. In Aerospike I have a set with bin 'u' (secondary index) and bin 'v' which is a list of objects (auctions with transactions lists and other auction data) and I have a stream UDF to aggregate internal structure of 'v':
function trans_sum_by_years(s)
local function transform(rec)
local l = map()
local x = map()
local trans, auctions = 0, 0
for i in list.iterator(rec['v'] or list()) do
auctions = auctions + 1
for t in list.iterator(i['t'] or list()) do
trans = trans + 1
date = os.date("*t", t['ts'])
if l[date['year']] ~= nil then
l[date['year']] = l[date['year']] + t['price'] * t['qty']
else
l[date['year']] = t['price'] * t['qty']
end
end
end
x.auctions = auctions
x.trans = trans
x.v = l
return x
end
return s : map(transform)
end
The problem is that output is very diffrent depending on setting output on table or json. In first case it seems everything is OK:
{"trans":594, "auctions":15, "v":{2010:1131030}}
{"trans":468, "auctions":68, "v":{2011:1472976, 2012:5188}}
......
On second I get empty object from internal record aggregation.
{
"trans_sum_b...": {
"trans": 389,
"auctions": 89,
"v": {}
}
},
{
"trans_sum_b...": {
"trans": 542,
"auctions": 30,
"v": {}
}
}
.....
I prefer json output and wasted couple hours to find out why I get empty 'v' field without success. So my question is "what the hell is going on" ;-) If my code is correct, what is wrong with the json output, that I don't see the results. If my code is wrong, why it's wrong and why table output results with what I need.

#user1875438 Your code is correct. It seems that there is bug in aql.
My result is the same as yours, the field of v is empty when using json mode.
I used tcpdump to grab the responses of aerospike-server when running these two commands, and found out the responses are the same, so I think it's very possible there is bug in aql tool.
159 0x0050: 0001 0000 0027 0113 0007 5355 4343 4553 .....'....SUCCES
160 0x0060: 5383 a603 7472 616e 7301 a903 6175 6374 S...trans...auct
161 0x0070: 696f 6e73 01a2 0376 81cd 07ce 01 ions...v.....
162 01:57:38.255065 IP localhost.hbci > localhost.57731: Flags [P.], seq 98:128, ack 144, win 42853, options [nop,nop,TS val 976630236 ecr 976630223], length 30
163 0x0000: 4500 0052 55f8 4000 4006 0000 7f00 0001 E..RU.#.#.......
I just posted an issue here.

The answer is simple as hell. But I'm new in Aerospike/Lua and I don't trust my knowledge so I searched for error everywhere but within AQL/UDF area. The problem is more fundamental and interferes with the specification of the JSON itself.
Keys in JSON have to be strings! So tostring(date['year']) solves problem.
Other question is does it is a bug or a feature :-) If Aerospike's map type allow integer keys should there be automatic key conversion from integer to string to satisfy JSON specification or not? IMHO there should be but probably some people disagree claiming that map type is not for integer keys...

Related

Lua length of Frame for Parsing

I have an binary file with shows glibberish infos if i open it in Notepad.
I am working on an plugin to use with wireshark.
So my problem is that I need help. I am reading in an File and need to find 'V' '0' '0' '1' (0x56 0x30 0x30 0x31) in the File, because its the start of an Header, with means there is an packet inside. And I need to do this for the whole file, like parsing. Also should start the Frame with V 0 0 1 and not end with it.
I currently have an Code where I am searching for 0x7E and parse it. What I need is the length of the frame. For example V 0 0 1 is found, so the Length from V to the Position before the next V 0 0 1 in the File. So that I can work with the length and add it to an captured length to get the positions, that wireshark can work with.
For example my unperfect Code for working with 0x7E:
local line = file:read()
local len = 0
for c in (line or ''):gmatch ('.') do
len = len + 1
if c:byte() == 0x7E then
break
end
end
if not line then
return false
end
frame.captured_length = len
Here is also the Problem that the Frame ends with 7E which is wrong. I need something that works perfectly for 'V' '0' '0' '1'. Maybe I need to use string.find?
Please help me!
Thats an example how my file looks like if i use the HEX-Editor in Visual Studio Code.
Lua has some neat pattern tools. Here's a summary:
(...) Exports all captured text within () and gives it to us.
-, +, *, ?, "Optional match as little as possible", "Mandatory match as much as possible", "optional match as much as possible", "Optional match only once", respectively.
^ and $: Root to start or end of file, respectively.
We'll be using this universal input and output to test with:
local output = {}
local input = "V001Packet1V001Packet2oooV001aaandweredonehere"
The easiest way to do this is probably to recursively split the string, with one ending at the character before "V", and the other starting at the character after "1". We'll use a pattern which exports the part before and after V001:
local this, next = string.match(input, "(.-)V001(.*)")
print(this,next) --> "", "Packet1V001Packet2..."
Simple enough. Now we need to do it again, and we also need to eliminate the first empty packet, because it's a quirk of the pattern. We can probably just say that any empty this string should not be added:
if this ~= "" then
table.insert(output, this)
end
Now, the last packet will return nil for both this and next, because there will not be another V001 at the end. We can prepare for that by simply adding the last part of the string when the pattern does not match.
All put together:
local function doStep(str)
local this, next = string.match(str, "(.-)V001(.*)")
print(this,next)
if this then
-- There is still more packets left
if this ~= "" then
-- This is an empty packet
table.insert(output, this)
end
if next ~= "" then
-- There is more out there!
doStep(next)
end
else
-- We are the last survivor.
table.insert(output, str)
end
end
Of course, this can be improved, but it should be a good starting point. To prove it works, this script:
doStep(input)
print(table.concat(output, "; "))
prints this:
Packet1; Packet2ooo; aaandweredonehere

Biopython: Extract CDS from modified GenBank records?

I have some basic familiarity with python and have been extracting coding sequences from genbank records. However, I'm unsure how to handle records where the coding sequence has been modified (e.g. owing to correcting internal stop codons). An example of such a sequence is this genbank record (or accession: XM_021385495.1 if the link does not work).
In this example, I can translate the two coding sequences that I can access, but both have internal stop codons - and according to the notes also indels! This is the way I have accessed the CDS:
1 - gb_record.seq
2 - cds.location.extract(gb_record) for where feature == "CDS"
However, I need the sequence that has been corrected. As far as I can tell, I think I need to use the "transl_except" tags in the CDS feature but I am at a loss how to do this.
I wonder if anybody might be able to provide an example or some insight of how to do this?
Thanks
Jo
I've got some demo code written in python3 that should help explain this GenBank record.
import re
aa_convert_codon_di = {
'A':['[GRSK][CYSM].'],
'B':['[ARWM][ARWM][CTYWKSM]', '[GRSK][ARWM][TCYWKSM]'],
'C':['[TYWK][GRSK][TCYWKSM]'],
'D':['[GRSK][ARWM][TCYWKSM]'],
'E':['[GRSK][ARWM][AGRSKWM]'],
'F':['[TYWK][TYWK][CTYWKSM]'],
'G':['[GRSK][GRSK].'],
'H':['[CYSM][ARWM][TCYWKSM]'],
'I':['[ARWM][TYWK][^G]'],
'J':['[ARWM][TYWK][^G]', '[CYSM][TYWK].', '[TYWK][TYWK][AGRSKWM]'],
'K':['[ARWM][ARWM][AGRSKWM]'],
'L':['[CYSM][TYWK].', '[TYWK][TYWK][AGRSKWM]'],
'M':['[ARWM][TYWK][GRSK]'],
'N':['[ARWM][ARWM][CTYWKSM]'],
'O':['[TYWK][ARWM][GRSK]'],
'P':['[CYSM][CYSM].'],
'Q':['[CYSM][ARWM][AGRSKWM]'],
'R':['[CYSM][GRSK].', '[ARWM][GRSK][GARSKWM]'],
'S':['[TYWK][CYSM].', '[ARWM][GRSK][CTYWKSM]'],
'T':['[ARWM][CYSM].'],
'U':['[TYWK][GRSK][ARWM]'],
'V':['[GRSK][TYWK].'],
'W':['[TYWK][GRSK][GRSK]'],
'X':['...'],
'Y':['[TYWK][ARWM][CTYWKSM]'],
'Z':['[CYSM][ARWM][AGRSKWM]','[GRSK][ARWM][AGRSKWM]'],
'_':['[TYWK][ARWM][AGRSKWM]', '[TYWK][GRSK][ARWM]'],
'*':['[TYWK][ARWM][AGRSKWM]', '[TYWK][GRSK][ARWM]'],
'x':['[TYWK][ARWM][AGRSKWM]', '[TYWK][GRSK][ARWM]']}
dna_convert_aa_di = {
'ATA':'I', 'ATC':'I', 'ATT':'I', 'ATG':'M',
'ACA':'T', 'ACC':'T', 'ACG':'T', 'ACT':'T',
'AAC':'N', 'AAT':'N', 'AAA':'K', 'AAG':'K',
'AGC':'S', 'AGT':'S', 'AGA':'R', 'AGG':'R',
'CTA':'L', 'CTC':'L', 'CTG':'L', 'CTT':'L',
'CCA':'P', 'CCC':'P', 'CCG':'P', 'CCT':'P',
'CAC':'H', 'CAT':'H', 'CAA':'Q', 'CAG':'Q',
'CGA':'R', 'CGC':'R', 'CGG':'R', 'CGT':'R',
'GTA':'V', 'GTC':'V', 'GTG':'V', 'GTT':'V',
'GCA':'A', 'GCC':'A', 'GCG':'A', 'GCT':'A',
'GAC':'D', 'GAT':'D', 'GAA':'E', 'GAG':'E',
'GGA':'G', 'GGC':'G', 'GGG':'G', 'GGT':'G',
'TCA':'S', 'TCC':'S', 'TCG':'S', 'TCT':'S',
'TTC':'F', 'TTT':'F', 'TTA':'L', 'TTG':'L',
'TAC':'Y', 'TAT':'Y', 'TAA':'*', 'TAG':'*',
'TGC':'C', 'TGT':'C', 'TGA':'*', 'TGG':'W'}
dna_str = "ATGACCGAGGTGCAAGACCTTGCACTTGGATTTGTTGAACCTCATGAGGTTCCCCTGGGCCCCTGGACATCGCCTTTTTCCAGCGTTCCACCAGAGACTTCACCCAACTGCTGTGACTTTTCAAACATCATTGAGAGCGGCTTGATACAGTTAGGCCACTCTCGCAGCTGTGAAGTTGTGAAGGCAAACTCCAGCGACCCATTCCTTCTTCCTTCAGAAAAGCAACTCGAGGAGCAGCGGGAGGAAACCCAGCTCTATCCTGCAGCGAGCGGGGCTGCGCAAGAGGCAGGTGCTGCTCTCACGGCCCGAAGGCAGCTCCGAGCTGCCGGGTGCGGTCACGTCAGCGGCCGAGCTGCCCGGCGGGGTGTGCATAAGAGCGAGCTATATGTGCTGCGTGTCATCACGGAGCCTTTCAAGTCCCTCCCTCCTTCTCCACTGCTGGGGCTGCAGTGGGCACCGGGCAGGAGGAGCGGCCGCAGCCCCGCGGGGGTGGGACGAGTCTCTGGGGGCTGCGCCACTTGGAAGATTTGCATTGGGTACATTGATAGCATTGTGATTGATGGCCTATTTAATACCATAATGTGTTCTTTAGATTTCTTTTTGGAGAACTCAGAAGAAAATTTGAAGCCAGCTCCACTTTTTCCAGCACAAATGACCCTTACTGGCACAGAAATTCATTTTAAACTTTCTCTAGATAAAGAGGCTGATGATGGCTTTTATGACCTTATGGATGAACTACTGGGTGATATTTTCCGAATGTCTGCCCAAGTGAAGAGACTAGAAGCCCACCTGGAATCAGAACATTAGGAGGACTATATGAACAGTGTGTTTGATCTGTCTGAACTCAGGCAGGAGAGTATGGAGAGAGTAATAAACGTCACCAACAAGGCCTTGAAGTACAGAAGATCTCATGATAGCTATGCTTATCTCTGACTAGAGGATCAGCTTGAGTTTATGAGGCAATTTCTTCCTTGTGCTCGTGGTTTAATGTCCACACAGATATCTCTTACTGGCATCCCACTACTAAACTGTGTAAAAAGCAGGCAAGAAAGAAACTAGTTTAAATAACTTCCTATTTATGAAAATCTCTGTGTTCAGATGAGTAAGTTTGAAGACCCAAGAATTTTTGAAAGCTGGTTTAAGGTGATTATGAAGCCTTTCAAAATGACACTTCTAAACATTACTAAGAAGTGGAGCTGGATGTTTAAGTAGTACACTATAGAAATAATAAGATTGAGTCTGAATGACTTCAAAGACTTTATAAAAGTGACAGATGCTGGACTTCAAAGAGGGAGGCATTATTGTGCACTGGCAGAAATCACCGGTCACCTCTTGGCTGTGAAAGAGAGGCAGACAGCTGCTGGTGAATCCTTTGAACCTTTAAAAGAANTTGTTGCATTGTTGGAAAGCTACAGACAGAAGATGCCAGATCAAGTTTGCATCCAGTGTCAAATCAGTTGTATCCTGGGAGCCTTTAAGGGTTATGTACTTCTGGTTGGAGTAGGTGGTAGTGATAAATGAAGCTTGTCAAGGCTGGCAGCATGCATCTCTTCCCTGGAGGTCTTTTAAATCATATGGAAGAAAGACCATGAGAGCAAGAACCTGAAGGTAGATGTTGCCAGTTTGTGCATCAAGACTGGTGCCAAGAACATGCCCACAGTGTTTTTGCTGACAGATGCCCAGGTTCCAGATGAACGCTTTCTTGTGCTGATTAATGACTTGTTGGCATCAAGAGATCTTCCTGATCTGTTCAGTGGTGAAGATGAGGAGGGCAAAGTTGCAGGAGTCAGAAAAGAAGTCNNCCTGGGCTTGATGGACACCACAGAAAGCTGCTGGAGGTGGTTCTTTGGTAGAGCGCAGCAGCTGTTAAAAGTGTATGGTGAAGTAGAGTCGAAATGTTGTGCACTGGTCCAGGCAAATACAAAATTAGCAACAGCTAAAGAGAATCTAGAAACAATCTTGAAAAAGCTTATTTCTGAAAATGTGCATTGGAGCCAATCTGTTGAAAACCTCAAAGCATAAAAGAAAACTGTACTCAAGGATGTTACATCAGCAGCAGCGTTTGCATCTTTCTTTGGAGCCTTCACAAAACCATATAGTCAAGAACAGATGGAACATTTCTGGATTCTTTCTCTAAAGTCACAGGAGTGTCCTGTTCCTGTGATAGAGGGGCCAGACTCTGCCATCCTGATGAATGATGCTCCAAGAGCAGCACAGAGTAACAAGAGTCTGCTTGCTGATAGGGTGTCAGCAGAAAATGCCACTGCTCTGACACACTGTGAGCAGGGCCCTCTGATGATAGATCCCCAGAAACAGGGAATTGAATGGACACAGAATAAATACAGAACTGACTTTAAAGTCATGCATCTAGGAGAGAATGGTTATGTGTGTACTATTGATACAGCTTTGGCTTGTGGAGAGATTATACTAATTGAAAACATGGCTGAATCTATCGATCTCTTACTTGATCCCCTAACTGGAAGACATACAGGTAAAAGGGGAAGGAATACTTGCGCAATCAGAATTTCTTGAAGACAAAAAAAAAAAAAGTGTGAATTCTACAGGAATTTCCATCTCATCCTTCACACTAAGCTGGCTAACCCTCCCTGCAAGCCAGAGCTTNAGGCTCAGACCACTCTCATTATTTTCACAGATACCAGGGGCAGGCTGGAAGAACAGCTGTTGGCTGAGGTGGTGAGTGCTGAAAGGCCTGACTTGGAAAACCATACGTCAGCACTGGCGAAACAGAAGAGTGTCTCTGAAATCAAGCCCAAGCAGCTTGAGGACAACATGCTGCTCAGTCTGTCAGCTGCCCAGAGCACTTTTGTAGGTGACAGTGAACTTGAAGAGAAATTCAAGTCAACTGCAGGAGAAATGATTGTCCGCCCACATGTTCACAGCTTCTTATTTTGGCAAAAAGCTTCCACTGTAGACTCTGGAAGATTTCATATCTCTTTAGGACAAGGGCAGGAGATGGTTGTGGAGNGACAACTTGAGAAGGCTGCCAAGCCTGGCCACTGGCTTCTTCTCCAAAATATTAATGTGGTAGCCAAGTGGCTAGGAACCTTGGAAAAACTCCTCGAGCAATAGAGTGAAGAAAGTCACTGGTATTTCCGTGTCTTCACTAGTGCTGAACCAGCTCCAGCCCCAGAAGAGCACATCATTCTTCAAGGAGTACTTGAAAACTGAATTAAAATTACCAGACTATCAATAACACTGCCAGTTGTTAAGTGGATAAATGTATTCCTTTTTTTCCTTTGGCAGGATACCCTTGAACTGTGTGGCAAAGAACAGGAATTTAAGAGCATTCTTTTCTCCCTTCGTTATTTTCACACCCGTGTTGCCAGCAGACTCATTTGGCCTTCCAGGCTGCAATTAAGATACCCATACAATACTAGAGATCTCACTGTTTGCATCAGTGTGCCCTGCAACTATTTAGACACTTACACAGAGGTCAGACGCAGTGGTCAGAAAAACAAGTCTATAAAATCAGCTGATTCCAACCCTTAG"
aa_str = "MTEVQDLALGFVEPHEVPLGPWTSPFSSVPPETSPNCCDFSNIIESGLIQLGHSRSCEVVKANSSDPFLLPSEKQLEEQREETQLYPAASGAAQEAGAALTARRQLRAAGCGHVSGRAARRGVHKSELYVLRVITEPFKSLPPSPLLGLQWAPGRRSGRSPAGVGRVSGGCATWKICIGYIDSIVIDGLFNTIMCSLDFFLENSEENLKPAPLFPAQMTLTGTEIHFKLSLDKEADDGFYDLMDELLGDIFRMSAQVKRLEAHLESEHXEDYMNSVFDLSELRQESMERVINVTNKALKYRRSHDSYAYLXLEDQLEFMRQFLPCARGLMSTQISLTGIPLLNCVKSRQERNXFKXLPIYENLCVQMSKFEDPRIFESWFKVIMKPFKMTLLNITKKWSWMFKXYTIEIIRLSLNDFKDFIKVTDAGLQRGRHYCALAEITGHLLAVKERQTAAGESFEPLKEXVALLESYRQKMPDQVCIQCQISCILGAFKGYVLLVGVGGSDKXSLSRLAACISSLEVFXIIWKKDHESKNLKVDVASLCIKTGAKNMPTVFLLTDAQVPDERFLVLINDLLASRDLPDLFSGEDEEGKVAGVRKEVXLGLMDTTESCWRWFFGRAQQLLKVYGEVESKCCALVQANTKLATAKENLETILKKLISENVHWSQSVENLKAXKKTVLKDVTSAAAFASFFGAFTKPYSQEQMEHFWILSLKSQECPVPVIEGPDSAILMNDAPRAAQSNKSLLADRVSAENATALTHCEQGPLMIDPQKQGIEWTQNKYRTDFKVMHLGENGYVCTIDTALACGEIILIENMAESIDLLLDPLTGRHTGKRGRNTCAIRISXRQKKKKCEFYRNFHLILHTKLANPPCKPELXAQTTLIIFTDTRGRLEEQLLAEVVSAERPDLENHTSALAKQKSVSEIKPKQLEDNMLLSLSAAQSTFVGDSELEEKFKSTAGEMIVRPHVHSFLFWQKASTVDSGRFHISLGQGQEMVVEXQLEKAAKPGHWLLLQNINVVAKWLGTLEKLLEQXSEESHWYFRVFTSAEPAPAPEEHIILQGVLENXIKITRLSITLPVVKWINVFLFFLWQDTLELCGKEQEFKSILFSLRYFHTRVASRLIWPSRLQLRYPYNTRDLTVCISVPCNYLDTYTEVRRSGQKNKSIKSADSN"
mod_dna_str = ""
mod_aa_str = aa_str[:]
start = 0
for index in range(start, len(dna_str), 3):
codon = dna_str[index:index+3]
if len(mod_aa_str) == 0:
break
if codon in dna_convert_aa_di and dna_convert_aa_di[codon] == mod_aa_str[0]:
mod_aa_str = mod_aa_str[1:]
else:
codon_match = "|".join(aa_convert_codon_di[mod_aa_str[0]])
if len(re.findall(codon_match, codon)) > 0:
print(index, codon_match, codon)
mod_aa_str = mod_aa_str[1:]
Code output:
804 ... TAG
930 ... TGA
1056 ... TAG
1065 ... TAA
1209 ... TAG
1389 ... NTT
1518 ... TGA
1566 ... TAA
1800 ... NNC
2019 ... TAA
2529 ... TGA
2622 ... NAG
2985 ... NGA
3087 ... TAG
3186 ... TGA
From the note section of the CDS, we have: inserted 5 bases in 4 codons; deleted 2 bases in 2 codons; substituted 11 bases at 11 genomic stop codons".
How does this relate to our output? The reading frame never changes, suggesting that the 2 deleted bases are absent from the given nucleotide sequence. Five unknown nucleotides (N) exist in 4 codons (unknown amino acid, X). The authors of the sequence have accounted for indels. Eleven premature stop codons are present, which are simply translated as unknown amino acids. The "transl_except" tags match the locations of the premature stop codons. The nucleotides at these sites have not been altered. The authors provide XP_021241170 as a possible corrected translation product, but it's still very bad.

Lua Patterns - World of Warcraft Vanilla

I'm trying to get some data from the chat of the game but I can't figure out the pattern.
It's for an AddOn for a World of Warcraft Vanilla (private server).
gsub function:
http://wowprogramming.com/docs/api/gsub
http://wowwiki.wikia.com/wiki/API_gsub
I have been doing well with this explanation but now there's a part where I have something like this:
variable = gsub(string, "([%d+d]+)?...", "")
I don't know what the pattern should be since the string can be like one the following examples:
2d17h6m31s
1d8h31m40s
22h40m4s
8h6m57s
5m25s
37s
The "([%d+d]+)?" is actually multiple attempts of mine put in together.
I did read about the magic characters ( ) . % + - * ? [ ^ $ but there's still some that I don't understand. If I could get a simple resume explanation it would be great!
The important part of how the chat looks like:
Edit (ktb's comment):
Question: How can I take the full "99d23h59m59s" (^(.*s) didn't did the trick)?
In 99d23h59m59s, the 99 can be from 1 to 99 and it always has a d right after but it's optional if there's actually a <number>d or not. Then the same to <number>h (number's range goes from 1 to 24), <number>m (number's range goes from 1 to 59). There's always a ago in the end.
Update:
/run for key in pairs(string)do ChatFrame1:AddMessage(key)end
With that command I got all the functions's names of string.functionName(), here's the list:
string.sub()
string.gfind()
string.rep()
string.gsub()
string.char()
string.dump()
string.find()
string.upper()
string.len()
string.format()
string.byte()
string.lower()
Information update:
Unlike several other scripting languages, Lua does not use POSIX regular expressions (regexp) for pattern matching. The main reason for this is size: A typical implementation of POSIX regexp takes more than 4,000 lines of code. This is bigger than all Lua standard libraries together. In comparison, the implementation of pattern matching in Lua has less than 500 lines. Of course, the pattern matching in Lua cannot do all that a full POSIX implementation does. Nevertheless, pattern matching in Lua is a powerful tool and includes some features that are difficult to match with standard POSIX implementations.
Source.
Unlike some other systems, in Lua a modifier can only be applied to a character class; there is no way to group patterns under a modifier. For instance, there is no pattern that matches an optional word (unless the word has only one letter). Usually you can circumvent this limitation using some of the advanced techniques that we will see later.
Source.
I can't find the "advanced techniques" told in the quote above. I only found this which I'm not sure yet.
function get_time_stamp(str)
local s,m,h,d = string.match(str:reverse(),"oga s(%d*)m?(%d*)h?(%d*)d?(%d*)")
return d and d:reverse() or 0, h and h:reverse() or 0, m and m:reverse() or 0, s and s:reverse() or 0
end
local day,hour,minute,second = get_time_stamp("2d17h6m31s ago")
print (day,hour,minute,second) -- output: 2 17 6 31
day,hour,minute,second = get_time_stamp("5m25s ago")
print (day,hour,minute,second) -- output: 0 0 5 25
If you are wondering why I use reverse, it's because we know for sure second will always exist but the others won't, if we don't use reverse then we won't know what order the numbers are in when output by string.match. Here is example what I mean, if you did local d,h,m,s = string.match("5m25s ago","(%d*)d?(%d*)h?(%d*)m?(%d+)s ago") Then print(d,h,m,s) would say that days was 5 and seconds were 25. In reverse we know with absolute certainty the order of output.
I ran into the same pattern limitations several years ago with a WoW addon. It took a bit of searching, but I dug up my parsing function.
parse_duration.lua
--
-- string:parseDuration() - parse a pseudo ISO-8601 duration of the form
-- [nd][nh][nm][ns], where 'n' is the numerical value of the time unit and
-- suffix designates time unit as follows: 'd' - days, 'h' - hours,
-- 'm' - minutes, and, 's' - seconds. Unspecified time units have a value
-- of 0.
--
function string:parseDuration()
local ts = {d=0, h=0, m=0, s=0}
for v in self:lower():gfind("%d+[dhms]") do
ts[v:sub(-1)] = tonumber(v:sub(1,-2))
end
return ts
end
The following tests your sample data.
duration_utest.lua
require "parse_duration"
local function main()
local testSet = {
"2d17h6m31s ago something happened",
"1d8h31m40s ago something happened",
"22h40m4s ago something happened",
"8h6m57s ago something happened",
"5m25s ago something happened",
"37s ago something happened",
"10d6s alias test 1d2h3m4s should not be parsed"
}
for i,testStr in ipairs(testSet) do
-- Extract timestamp portion
local tsPart = testStr:match("%S+")
local ts = tsPart:parseDuration()
io.write( tsPart, " -> { ")
for k,v in pairs(ts) do
io.write(k,":",v," ")
end
io.write( "}\n" )
end
end
main()
Results
2d17h6m31s -> { m:6 d:2 s:31 h:17 }
1d8h31m40s -> { m:31 d:1 s:40 h:8 }
22h40m4s -> { m:40 d:0 s:4 h:22 }
8h6m57s -> { m:6 d:0 s:57 h:8 }
5m25s -> { m:5 d:0 s:25 h:0 }
37s -> { m:0 d:0 s:37 h:0 }
10d6s -> { m:0 d:10 s:6 h:0 }

Lua: Hexadecimal Word to Binary Conversion

I'm attempting to create a Lua program to monitor periodic status pings of a slave device. The slave device sends its status in 16-bit hexadecimal words, which I need to convert to a binary string since each bit pertains to a property of the device. I can receive the input string, and I have a table containing 16 keys for each parameter. But I am having a difficult time understanding how to convert the hexadecimal word into a string of 16-bits so I can monitor it.
Here is a basic function of what I'm starting to work on.
function slave_Status(IP,Port,Name)
status = path:read(IP,Port)
sTable = {}
if status then
sTable.ready=bit32.rshift(status:byte(1), 0)
sTable.paused=bit32.rshift(status:byte(1), 1)
sTable.emergency=bit32.rshift(status:byte(1), 2)
sTable.started=bit32.rshift(status:byte(1), 3)
sTable.busy=bit32.rshift(status:byte(1), 4)
sTable.reserved1=bit32.rshift(status:byte(1), 5)
sTable.reserved2=bit32.rshift(status:byte(1), 6)
sTable.reserved3=bit32.rshift(status:byte(1), 7)
sTable.reserved4=bit32.rshift(status:byte(2), 0)
sTable.delay1=bit32.rshift(status:byte(2), 1)
sTable.delay2=bit32.rshift(status:byte(2), 2)
sTable.armoff=bit32.rshift(status:byte(2), 3)
sTable.shieldoff=bit32.rshift(status:byte(2), 4)
sTable.diskerror=bit32.rshift(status:byte(2), 5)
sTable.conoff=bit32.rshift(status:byte(2), 6)
sTable.envoff=bit32.rshift(status:byte(2), 7)
end
end
I hope this approach is understandable? I'd like to receive the Hex strings, for example 0x18C2 and turn that to 0001 1000 1100 0010, shifting the right-most bit to the right and placing that into the proper key. Then later in the function I would monitor if that bit had changed for the better or worse.
If I run a similar function in Terminator in Linux, and print out the pairs I get the following return:
49
24
12
6
3
1
0
0
56
28
14
7
3
1
0
0
This is where I am not understanding how to take each value and set it to bits
I'm pretty new to this so I do not doubt that there is an easier way to do this. If I need to explain further I will try.
tonumber(s, 16) will convert hex representation to decimal and string.char will return a symbol/byte representation of a number. Check this recent SO answer for an example of how they can be used; the solution in the answer may work for you.
I'd approach this in a different fashion than the one suggested by Paul.
First, create a table storing the properties of devices:
local tProperty = {
"ready",
"paused",
"emergency",
"started",
"busy",
"reserved1",
"reserved2",
"reserved3",
"reserved4",
"delay1",
"delay2",
"armoff",
"shieldoff",
"diskerror",
"conoff",
"envoff",
}
Then, since your device sends the data as 0xYYYY, you can call tonumber directly (if not a string). Use a function to store each bit in a table:
function BitConvert( sInput )
local tReturn, iNum = {}, tonumber( sInput ) -- optionally pass 16 as second argument to tonumber
while iNum > 0 do
table.insert( tReturn, 1, iNum % 2 )
iNum = math.floor( iNum / 2 )
end
for i = #tProperty - #tReturn, 1, -1 do
table.insert( tReturn, 1, 0 )
end
return tReturn
end
And then, map both the tables together:
function Map( tKeys, tValues )
local tReturn = {}
for i = 1, #tKeys do
tReturn[ tKeys[i] ] = tValues[i]
end
return tReturn
end
In the end, you would have:
function slave_Status( IP, Port, Name )
local status = path:read( IP, Port )
local sTable = Map( tProperty, BitConvert(status) )
end

Attoparsec allocates a ton of memory on large 'take' call

So I am writing a packet sniffing app. Basically I wanted it to sniff for tcp sessions, and then parse them to see if they are http, and if they are, and if they have the right content type, etc, save them as a file on my hard drive.
So, to that end, I wanted it to be efficient. Since the current http library is string based, and I will be dealing with large files, and I only really needed to parse http responses, I decided to roll my own in attoparsec.
When I finished my program, I found that when I was parsing a 9 meg http response with a wav file in it, when I profiled it, it was allocating a gig of memory when it was trying to parse out the body of the http response. When I look at the HTTP.prof I see some lines:
httpBody Main 362 1 0.0 0.0 93.8 99.3
take Data.Attoparsec.Internal 366 1201 0.0 0.0 93.8 99.3
takeWith Data.Attoparsec.Internal 367 3603 0.0 0.0 93.8 99.3
demandInput Data.Attoparsec.Internal 375 293 0.0 0.0 93.8 99.2
prompt Data.Attoparsec.Internal 378 293 0.0 0.0 93.8 99.2
+++ Data.Attoparsec.Internal 380 586 93.8 99.2 93.8 99.2
So as you can see, somewhere within httpbody, take is called 1201 times, causing 500+ (+++) concatenations of bytestrings, which causes an absurd amount of memory allocation.
Here's the code. N is just the content length of the http response, if there is one. If there isn't one it just tries to take everything.
I wanted it to return a lazy bytestring of 1000 or so character bytestrings, but even if I change it to just take n and return a strict bytestring, it still has those allocations in it (and it uses 14 gig of memory).
httpBody n = do
x <- if n > 0
then AC.take n
else AC.takeWhile (\_ -> True)
if B.length x == 0
then return Nothing
else return (Just x)
I was reading a blog by the guy who did combinatorrent and he was having the same issue, but I never heard of a resolution. Has anyone ever run across this problem before or found a solution?
Edit: Okay, well I left this up the entire day and got nothing. After researching the problem I don't think there is a way to do it without adding a lazy bytestring accessor to attoparsec. I also looked at all the other libraries and they either lacked bytestrings or other things.
So I found a workaround. If you think about an http request, it goes headers, newline, newline, body. Since the body is last, and parsing returns a tuple with both what you parsed and what is remaining of the bytestring, I can skip parsing the body inside attoparsec and instead pluck the body straight off the bytestring that is left.
parseHTTPs bs = if P.length results == 0
then Nothing
else Just results
where results = foldParse(bs, [])
foldParse (bs,rs) = case ACL.parse httpResponse bs of
ACL.Done rest r -> addBody (rest,rs) r
otherwise -> rs
addBody (rest,rs) http = foldParse (rest', rs')
where
contentlength = ((read . BU.toString) (maybe "0" id (hdrContentLength (rspHeaders http))))
rest' = BL.drop contentlength rest
rs' = rs ++ [http { rspBody = body' }]
body'
| contentlength == 0 = Just rest
| BL.length rest == 0 = Nothing
| otherwise = Just (BL.take contentlength rest)
httpResponse = do
(code, desc) <- statusLine
hdrs <- many header
endOfLine
-- body <- httpBody ((read . BU.toString) (maybe "0" id (hdrContentLength parsedHeaders)))
return Response { rspCode = code, rspReason = desc, rspHeaders = parseHeaders hdrs, rspBody = undefined }
It is a little messy, but ultimately it works fast and allocates nothing more than I wanted. So basically you fold over the bytestring collecting http data structures, then in between collections, I check the content length of the structure I just got, pull an appropriate amount from the remaining bytestring, and then continue on if there is any bytestring left.
Edit: I actually finished up this project. Works like a charm. I isn't cabalized properly but if someone wants to view the entire source, you can find it at https://github.com/onmach/Audio-Sniffer.
combinatorrent guy here :)
If memory serves, the problem with attoparsec is that demands input a little bit at a time, building up a lazy bytestring which is finally concatenated. My "solution" was to roll the input function myself. That is, I get the input stream for attoparsec from a network socket and I know how many bytes to expect in a message. Basically, I split into two cases:
The message is small: Read up to 4k from the socket and eat that Bytestring a little bit at a time (slices of bytestrings are fast and we throw away the 4k after it has been exhausted).
The message is "large" (large here means around 16 Kilobyte in bittorrent speak): We calculate how much the 4k chunk we have can fulfill, and then we simply request the underlying network socket to fill things in. We now have two bytestrings, the remaining part of the 4k chunk and the large chunk. They have all data, so concatenating those and parsing them in is what we do.
You may be able to optimize the concatenation step away.
The TL;DR version: I handle it outside attoparsec and handroll the loop to avoid the problem.
The relevant combinatorrent commit is fc131fe24, see
https://github.com/jlouis/combinatorrent/commit/fc131fe24207909dd980c674aae6aaba27b966d4
for the details.

Resources