how to find a particular redis key memory size in lua script - lua

redis.call('select','14')
local allKeys = redis.call('keys','orgId#1:logs:email:uid#*')
for i = 1 , #allKeys ,1
do
local object11 = redis.call('DEBUG OBJECT',allKeys[i])
print("kk",object11[1])
end
Here "DEBUG OBJECT" is run successfully on redis-cli, but if we want to run through lua script on multiple key. That send error like this.
(error) ERR Error running script (call to f_b003d960240545d9540ebc2319d863221045
3815): Wrong number of args calling Redis command From Lua script

DEBUG OBJECT is not a good bet. It shows the serialized length of the value, so it is just the size of the object once stored on an RDB file.
To have some hint about the size of an object in Redis, you need to resort to more complex techniques, but you can only get an approximation. You need to run:
TYPE
OBJECT ENCODING
The object-type specific command to get its length.
Sample a few elements to understand the average string length of the object.
Based on this four informations, you need to check the Redis source code to check the different memory footprints of the internal structures used, and do the math. Not easy...
A much more viable approximation is to just to use:
APPROX_USED_MEM = num_elements * avg_size * overhead_factor
You may want to pick an overhead factor which makes sense for a variety of data types. The error is big, but is an approximation good enough for some use cases. Maybe overhead_factor may be something like 2.
TLDR: What you are trying to do is complex and leads to errors. In the future the idea is to provide a MEMORY command which is able to do this.

Related

Abaqus: efficiently export large xy data set from Abaqus

I am trying to export XY data objects from sets of the size of 20-40k elements, but Abaqus is slowing down considerably, and even crashing. In fact, when I create the xy data, Abaqus gives me a warning saying that "the number of xyDataObjects is very large, and might cause performance issues". And so it does.
My usual procedure to is to create the xy data and then export in rpt format. Can someone suggest another method less prone to crashing? Would it be more efficient to divide the output element set into two or more subsets, and concatenate them after exporting?
The method recommended by #hgazibara in the comments is certainly sufficient, but it is laborious.
An easier method, I found, is a package called Abaqus2Matlab, which scrapes any variable you want from the odb. See here: http://www.abaqus2matlab.com/

Lua script for statistics from Diameter 3GPP

I'm trying to create a lua script to go through a Diameter pcap, gather information interesting for me and generate a statistic.
This is partially successful, working script can be found in GitHub but I'm still having some doubts
Field.new() and multiple occurrences of an AVP
I'm using Field.new() to retrieve AVPs, for example:
local rrField = Field.new("diameter.3GPP-Reporting-Reason")
local toField = Field.new("diameter.CC-Total-Octets")
But in a single packet there might be multiple occurrences of an AVP. Of course, I can access them as an array from
local rrFields = {rrField()}
local toFields = {toField()}
But I'm missing a reference where from the AVP was retrieved. A a good example is Result-Code AVP:
It this single Diameter message it occurs three times, but in result I'm getting just an array of three 2001's without a good understanding on which level this appeared.
Situation is becoming even more messy when a single package contains multiple Diameter messages. Then I even cannot figure from which message the AVP is.
Function tap.packet(pinfo, tvb, tapdata) does not populate tapdata
Another idea was to dig into tapdata. If I understood correctly 11.4.1.5. listener.packet, the tapdata (aka tapinfo) shall be populated with dissected data, right? Hence I should be able to parse the message.
However, regardless how hard I try, tapdata always is unset (i.e. nil). In GitHub code
tap = Listener.new("diameter", filter)
but I also experimented with the 3rd parameter, setting it to true (hoping for generating all fields, even in cost of performance penalty). No luck.
[Update 2020/03/20]
Self-answering to Function tap.packet(pinfo, tvb, tapdata) does not populate tapdata
After examining source code of Wireshark (tshark) it turns out that Diameter does not populate this variable as tapdata does not have reference to Diameter. I've tried to add it to taps definition and the variable (table) has been populated, even names of the hashes are OK. But variables in the hashes are not... Anyway, here is the change:
MBP:wireshark jhartman$ git diff epan/wslua/taps
diff --git a/epan/wslua/taps b/epan/wslua/taps
index 11b1132171..ea28865109 100644
--- a/epan/wslua/taps
+++ b/epan/wslua/taps
## -62,4 +62,5 ## tcp ../dissectors/packet-tcp.h tcp_info_t
#tls ../dissectors/packet-tls.h ssl_info_t
#tr ../dissectors/packet-tr.h tr_info_t
wlan ../dissectors/packet-ieee80211.h wlan_hdr_t
+diameter ../dissectors/packet-diameter.h diam_sub_dis_t
#wsp ../dissectors/packet-wsp.h wsp_info_t
Question
Is this approach right? Or should I use other ways - such as chained dissectors or post dissector? But it was not clear to me if I can access dissected data to the level I need?
Any help will be very much appreciated.
Thank you in advance and best regards,
Jarek

Why I always fail on loading big files in lua? [duplicate]

The overview is I am prototyping code to understand my problem space, and I am running into 'PANIC: unprotected error in call to Lua API (not enough memory)' errors. I am looking for ways to get around this limit.
The environment bottom line is Torch, a scientific computing framework that runs on LuaJIT, and LuaJIT runs on Lua. I need Torch because I eventually want to hammer on my problem with neural nets on a GPU, but to get there I need a good representation of the problem to feed to the nets. I am (stuck) on Centos Linux, and I suspect that trying to rebuild all the pieces from source in 32bit mode (this is reported to extend the LuaJIT memory limit to 4gb) will be a nightmare if it works at all for all of the libraries.
The problem space itself is probably not particularly relevant, but in overview I have datafiles of points that I calculate distances between and then bin (i.e. make histograms of) these distances to try and work out the most useful ranges. Conveniently I can create complicated Lua tables with various sets of bins and torch.save() the mess of counts out, then pick it up later and inspect with different normalisations etc. -- so after one month of playing I am finding this to be really easy and powerful.
I can make it work looking at up to 3 distances with 15 bins each (15x15x15 plus overhead), but this only by adding explicit garbagecollection() calls and using fork()/wait() for each datafile so that the outer loop will keep running if one datafile (of several thousand) still blows the memory limit and crashes the child. This gets extra painful as each successful child process now has to read, modify and write the current set of bin counts -- and my largest files for this are currently 36mb. I would like to go larger (more bins), and would really prefer to just hold the counts in the 15 gigs of RAM I can't seem to access.
So, here are some paths I have thought of; please do comment if you can confirm/deny that any of them will/won't get me outside of the 1gb boundary, or will just improve my efficiency within it. Please do comment if you can suggest another approach that I have not thought of.
am I missing a way to fire off a Lua process that I can read an arbitrary table back in from? No doubt I can break my problem into smaller pieces, but parsing a return table from stdio (as from a system call to another Lua script) seems error prone, and writing/reading small intermediate files will be a lot of disk i/o.
am I missing a stash-and-access-table-in-high-memory module ? This seems like what I really want, but not found it yet
can FFI C data structures be put outside the 1gb? Doesn't seem like that would be the case but certainly I lack a full understanding of what is causing the limit in the first place. I suspect that this will just get me an efficiency improvement over generic Lua tables for the few pieces that have moved beyond prototyping? (unless I do a bunch of coding for each change)
Surely I can get out by writing an extension in C (Torch appears to support nets that should go outside of the limit), but my brief investigation there turns up references to 'lightuserdata' pointers -- does this mean that a more normal extension won't get outside 1gb either? This also seems like it has the heavy development cost for what should be a prototyping exercise.
I know C well so going the FFI or extension route doesn't bother me - but I know from experience that encapsulating algorithms in this way can be both really elegant and really painful with two places to hide bugs. Working through data structures containing tables within tables on the stack doesn't seem great either. Before I make this effort I would like to be certain that the end result really will solve my problem.
Thanks for reading the long post.
Only object allocated by LuaJIT itself are limited to the first 2GB of memory. This means that tables, strings, full userdata (i.e. not lightuserdata), and FFI objects allocated with ffi.new will count towards the limit, but objects allocated with malloc, mmap, etc. are not subjected to this limit (regardless if called by a C module or the FFI).
An example for allocating a structure with malloc:
ffi.cdef[[
typedef struct { int bar; } foo;
void* malloc(size_t);
void free(void*);
]]
local foo_t = ffi.typeof("foo")
local foo_p = ffi.typeof("foo*")
function alloc_foo()
local obj = ffi.C.malloc(ffi.sizeof(foo_t))
return ffi.cast(foo_p, obj)
end
function free_foo(obj)
ffi.C.free(obj)
end
The new GC to be implemented in LuaJIT 3.0 IIRC will not have this limit, but I haven't heard any news on it's development recently.
Source: http://lua-users.org/lists/lua-l/2012-04/msg00729.html
Here is some follow-up information for those who find this question later:
The key information is as posted by Colonel Thirty Two, that C module extensions and FFI code can easily get outside of the limit. (and the referenced lua list post reminds that plain Lua tables that go outside the limit will be very slow to garbage collect)
It took me some time to pull the pieces together to both access and save/load my objects, so here it is in one place:
I used lds at https://github.com/neomantra/lds as a starting point, in particular the 1-D Array code.
This broke using torch.save(), as it doesn't know how to write the new objects. For each object I added the code below (using Array as the example):
function Array:load(inp)
for i=1,#inp do
self._data[i-1] = tonumber(inp[i])
end
return self
end
function Array:serialize ()
local siz = tonumber(self._size)
io.write(' lds.ArrayT( ffi.typeof("double"), lds.MallocAllocator )( ', siz , "):load({")
for i=0,siz-1 do
io.write(string.format("%a,", self._data[i]))
end
io.write("})")
end
Note that my application specifically uses doubles and malloc(), so a better implementation would store and use these in self rather than hard coding above.
Then as discussed in PiL and elsewhere, I needed a serializer that would handle the object:
function serialize (o)
if type(o) == "number" then
io.write(o)
elseif type(o) == "string" then
io.write(string.format("%q", o))
elseif type(o) == "table" then
io.write("{\n")
for k,v in pairs(o) do
io.write(" ["); serialize(k); io.write("] = ")
serialize(v)
io.write(",\n")
end
io.write("}\n")
elseif o.serialize then
o:serialize()
else
error("cannot serialize a " .. type(o))
end
end
and this needs to be wrapped with:
io.write('do local _ = ')
serialize( myWeirdTable )
io.write('; return _; end')
and then the output from that can be loaded back in with
local myWeirdTableReloaded = dofile('myWeirdTableSaveFile')
See PiL (Programming in Lua book) for dofile()
Hope that helps someone!
You can use the torch tds module. From the README:
Data structures which do not rely on Lua memory allocator, nor being limited by Lua garbage collector.
Only C types can be stored: supported types are currently number, strings, the data structures themselves (see nesting: e.g. it is possible to have a Hash containing a Hash or a Vec), and torch tensors and storages. All data structures can store heterogeneous objects, and support torch serialization.

HHVM staticly typing lookup tables and keeping them fully cached in RAM

I'm doing scientific research, processing through millions of combinations of multi-megabyte arrays.
For you to be capable of answering this question you will need to have knowledge/experience of all of the following
how HHVM is able to cache data structures in RAM between requests
how to tell HHVM data structures will be constant
how to declare array index and value types
I need to process the entire arrays, so it's a lot of data to be loaded and processed. (millions of requests within minutes on a LAN). The faster I can complete requests the quicker I can complete my work. If HHVM has to do work loading this data on each request, it accounts for a significant fraction of the time to complete the request (sometimes more than half, it depends on the complexity of the analysis I'm doing at the time).
I have found a method that has allowed me to keep these data structures cached in RAM (no loading from files, interpreting code, pushing to the array hundreds of thousands of times for no reason, no pointless repetitive unserialize etc), and thus I have eliminated this massive measurable delay.
I have 3 questions regarding how I can make this even faster:
Is the way I'm doing it now creating a global scope penalty?
How can I declare my arrays as constant and tell HHVM what data types to expect?
If I declare my arrays as constant is it even necessary to declare the types for HHVM?
Instead of using nested arrays, if I use 3 separate data structures ImmVector, PackedArray, or define a class would it be faster?
Keep in mind that anything that prevents HHVM from caching the data structure in RAM between requests should be regarded as unacceptable.
Lookuptable35543.php
<?php
$data = [
["uuid (20 chars)", 5336, 7373],
["uuid (20 chars)", 5336, 7373],
#more lines as above
];
?>
Some of these files are many MB in size and there are a lot of them
Main.php
<?php
function main() {
require /path/to/Lookuptable35543.php;
#(Do stuff with $data)
}
?>
This is working quite well, as Main.php gets thousands of requests, in a short period of time, HHVM keeps Lookuptable.php's data structure in memory. Avoiding pointless processing and IO, as it just sits in RAM, ready for use. (I have more than enough RAM)
Unfortunately, the only way I know how to make HHVM hold the lookup table in RAM is, I set $data in the global scope inside my lookup####.php file (then require the lookup file into a function in the data processing file: Main.php)? This way HHVM doesnt bother loading the file or re executing the code to create $data, because it can see that $data can be determined at compile time, and it will not ever change during runtime. This works but I dont know if there is a penalty from having the $data exist in the lookup###.php file's global scope. (Or maybe its not global at all because it is required into main.php's function?)
What if I return $data from a function inside Lookup.php and call that function from Main.php like this
Main.php
Would the HHVM JIT the result of getData() in RAM?
Somehow I associate functions with unpredictability... but maybe HHVM is clever enough to know that the functions result can be determined at compile time, and never changes?
I can't put the lookup table inside Main.php because I require different lookup tables based on the type of request.
Is there a way I can tell HHVM that my outer array will always have an integer index that never changes, and the values of the outer array will always be an array?
Perhaps I need to use ImmVector?
Then is there a way to tell HHVM that my inner array will always be a fixed length string followed by 2 integers, always, no extra elements, contents never changes?
I'd prefer not to use OO or create a class. How can I declare types, procedural style?
If a class is absolutely necessary can you please give example code suitable for my requirements above?
Will it be faster if I dont nest arrays?
I just realized I could have one array with integer index and values of fixed length string. Then a 2nd array with integer index and integer values, and a 3rd one with integer index and integer values.
If you're not familiar with this HHVM caching technique please do not waste mutual time suggesting a database, redis, APC, unserialize, etc. The fastest is for HHVM to just keep my various $data variables in RAM. Even unserializing $data from a ramdisk file is slow, because then the entire data structure must be parsed as a string and converted into a data structure in memory for every request. APC has the same problem as far as i know. I dont want to even have to copy $data. The lookup tables are immutable, read only. They must just stay fully structured in RAM. My current caching solution (at the top of this question) has already given me huge gains, but as per my 3 questions I think there may be more gains to be had?
Incase you're wondering, I have measured the latency of various data loading or caching methods.
Now I basically want to keep the caching situation I have, but give the HHVM JIT maximum confidence about how to type my data, so it can save time not running type or even bound (array size) checks.
Edit
Ok so nobody has been able to give me any code examples yet, so I'm just trying stuff out.
Here's what I've found out so far.
const arrays don't work yet in HHVM. const foo = ['uuid1',43,43];
throws an error about HHVM only supporting constants with scalar values.
Vector with Array values: I don't know how it will perform yet... I expect it will be better than a normal array. This is valid HH code.
This is progress, because HHVM should be able to cache this in the same way, HHVM knows this whole structure is constant, and HHVM knows the indexes are all integers.
What I'm still not entirely happy about with this structure is this:
Consider this code
for ($n=0;$n<count($iv);++$n) if ($x > $iv[$n][1]) dosomething();
Will HHVM perform a type check on $if[$n][2] on every loop iteration?
In my definition of $iv above, there is nothing that says the 2nd element of the inner array will be an integer.
How can I improve on this?
Can disabling the type checker be of any use? Does this only hide errors from the external type checker, or does it prevent HHVM from constantly doing type checks? (I'm thinking it's the first thing)
Perhaps if I could make my own user-defined type that would solve the problem?
<?hh
#I don't know what mechanisms for UDT's exist, so this code is made-up
CreateUDT foo = <string,int,int>;
$iv = ImmVector<foo> {
['uuid1',425,244],
['uuid2',658,836]
};
print_r($iv);
I found a reference to this at Hack Collections Literal Syntax Vector<Foo> unfortunately it might not be available to use yet.
I'm a software engineer at Facebook working on HHVM.
This entire question reeks of premature optimization to me. Have you done profiling and determined that loading this array is actually a bottleneck for your app? (Not just microbenchmarks, but how it actually affects the performance, latency, RPS, etc of realistic pageloads.) And also isolated from other effects, e.g., if this array is a cache or some sort of precomputed data, you need to isolate the win of precomputing the data from the actual time to load it by caching it in various different ways.
In general, HHVM is very good at dealing with arrays, since they are so hot in nearly every codepath -- and in particular at constant arrays like this one. To your questions about how to inform it of the shape and types of things in the arrays, HHVM can figure that all out for itself, and is very good at doing so on constant arrays composed entirely of constants. (And the ways it thinks about arrays aren't quite the ways you think about arrays, so it can probably do a better job anyway!) Basically, unless profiling says this is actually a hotspot -- which I'm pretty skeptical of -- I wouldn't worry too much about it. A couple general notes to be aware of:
Measure every performance diff. Don't prematurely optimize -- use profiling to guide. The developer productivity lost by premature optimizations getting in the way can be lethal.
Get things out of toplevel ("pseudomains") as much as possible. A function which returns a static or constant array should be just fine, and will in general help HHVM optimize code even better.
Avoid references as much as possible, especially in this array if you care about performance so much.
You probably should look into repo authoritiative mode which can help HHVM optimize lots of things even more -- but in particular for this case, the more aggressive inlining that repo auth mode can do might be a win.
Edit, aside:
because then the entire data structure must be parsed as a string and converted into a data structure in memory for every request. APC has the same problem as far as i know
This is exactly what I mean by premature optimization: you're rejecting APC without even trying it, even if it might be a cleaner way of doing what you want. It turns out that, in most cases, HHVM actually can optimize away the serialization/deserialization of storing arrays in APC, particularly if they are constant arrays that are never modified. As above, HHVM is very good at optimizing lots of common patterns. Just write code that's clean, profile it, and fix the hotspots.
Okay I've solved my first question.
I don't have any global scope issues. My require is being done from inside function main(), so it's as if the code from lookuptable####.php is being inserted into function main().
HHVM docs: "If the include occurs inside a function..."
Basically if you were to open lookuptable####.php it looks like the code is in global scope, but that's not the file that is being requested from hhvm. main.php is the one being requested, thus there is no code in global scope.
I think I've answered my 2nd question, it's currently at the bottom of my question. I'm not 100% convinced, but I'm pretty happy to move ahead and test it.

retrieve sequence alignment score produced by emboss in biopython

I'm trying to retrieve the alignment score of two sequences compared using emboss in biopython. The only way that I know is to retrieve it from an output text file produced by emboss. The problem is that there will be hundreds of these files to iterate over. Is there an easier/cleaner method to retrieve the alignment score, without resorting to that? This is the main part of the code that I'm using.
From Bio.Emboss.Applications import StretcherCommandline
needle_cline = StretcherCommandline(asequence=,bsequence=,gapopen=,gapextend=,outfile=)
stdout, stderr = needle_cline()
I had the same problem and after some time spent on searching for a neat solution I popped up a white flag.
However, to speed up significantly the processing of output files I did the following things:
1) I used re python module for handling regular expressions to extract all data needed.
2) I created a ramdisk space for the output files. The use of a ramdisk here allowed for processing and exchanging all the data in RAM memory (much faster than writing and reading the output files from a hard drive, not to mention it saves your hdd in case of processing massive number of alignments).
I don't know if there is one specifically for your command.
For Primer3CommandLine, there is Primer3. Make your life much easier with something like:
from Bio.Emboss import Primer3
inputFile = "./wherever/your/outputfileis.out"
with open(inputFile) as fileHandle:
record = Primer3.parse(fileHandle)
# XXX check is len>0
primers = record.next().primers
numPrimers = len(primers)
# you should have access to each primer, using a for loop
# to check how to access the data you care about. For example:
I would also check http://biopython.org/wiki/SeqIO#Sequence_Input

Resources