preventing extreme memory usage

preventing extreme memory usage - memory

my script works on a line by line basis. The task per line implies use of a Grammar, but is not too complicated. I notice that when the input has a lot of lines, say 150_000, the memory usage is gigantic, over 6G, and my computer hangs. I used --profile on shorter input, but that gives me no clue where to look for a solution. Most names in there do not refer to my code.
Any suggestions on where I can prevent spurious memory use or find out what is causing this?
here is the core loop:
method process() {
my $promise = Promise.start({
&!start-handler();
my $intro = 1; # to skip the directory part of the file.
for $!source-name.IO.lines {
...
}
}).then({ $!DB.completed });
await $promise;
}
Thanks,
Theo

I found one thing that caused havoc. By changing the line
m:g/ << $<funname>=<[A..Z]>+ '(' <{ %fun{$<funname>}++ }> /;
by something like
if $formula ~~ m:g/ << $<funname>=<[A..Z]>+ '(' / {
for $/.list -> $funname {
%fun{$<funname>}++
};
}
the script gets a zillion times faster and runs without problems.
However, see #JonathanWorthington's reaction below. My conclusion:
I should have used:
m:g/ << $<funname>=<[A..Z]>+ '(' { %fun{$<funname>}++ } /;
Again, this used to work in previous editions of rakudo.
This is Rakudo version 2020.01 built on MoarVM version 2020.01.1
implementing Perl 6.d.

Related

How can I configure a nix pkg priority for ghcWithPackages (in code)

I am running on nixOS. As a user, I install xmonad-with-packages & ghc-with-packages. I have a configuration file and overlays to achieve this, but I regularly hit conflicts:
error: packages '/nix/store/jzjxvm4kll3b2pw7plhksgl1jxvwnkw8-xmonad-with-packages-8.6.4/share/man/man1/ghc.1.gz' and '/nix/store/dnk4znsf8avwjalj350wj59vznpzcihl-ghc-8.6.4-with-packages/share/man/man1/ghc.1.gz' have the same priority 5; use 'nix-env --set-flag priority NUMBER INSTALLED_PKGNAME' to change the priority of one of the conflicting packages (0 being the highest priority)
I can use nix-env --set-flag priority to resolve this, but
It seems to be erratic - sometimes it works, sometimes it doesn't. I'd like to say I'm being inconsistent about usage, but I don't believe I am. For example, with the error above, I run
nix-env --set-flag priority 4 ghc-8.6.4-with-packages
setting flag on 'ghc-8.6.4-with-packages'
...and then the re-running nix-env -i, I see the same error :-(
Whenever I change host (far too often), I have to run it again, which is annoying, but moreover leads to the sort of potential inconsistency that I am striving to avoid by using nix.
I'd really like to configure the priorities in my overlays (or config.nix), but how do I do that? My overlays/myHaskellEnv.nix looks like this:
self: super: {
myHaskellEnv = self.haskellPackages.ghcWithHoogle
(haskellPackages: with haskellPackages;
[ aeson ... ])
}
my overlays/myXmonad.nix like this:
self: super: {
myXmonad = super.xmonad-with-packages.override {
packages = hPkgs: with hPkgs; [ xmonad-contrib ];
};
}
and my config.nix like this:
{ nixpkgs ? import <nixpkgs> {} }:
with nixpkgs;
{
inherit myXmonad myHaskellEnv;
}
Any help or pointers gratefully recieved!

Comprehension pattern and neo4j-embedded

I have a problem when using comprehension with a neo4j-embedded (version 3.5.3).
For exemple, this kind of query works perfectly fine with neo4j enterprise 3.5.3, but does not work with neo4j-embedded :
MATCH (myNode:MyNode {myId:'myid'})
MATCH path = ( (myNode) -[*0..]- (otherNode:MyNode) )
WHERE
ALL(n in nodes(path) where [ (n)<--(state:MyState) | state.isConnected ][0] = true)
RETURN myNode, otherNode
The error I get when using neo4j-embedded is difficult to understand, and looks like an internal error :
org.neo4j.driver.v1.exceptions.DatabaseException: This expression should not be added to a logical plan:
VarExpand(myNode, BOTH, OUTGOING, List(), otherNode, UNNAMED62, VarPatternLength(0,None), ExpandInto, UNNAMED62_NODES, UNNAMED62_RELS, Equals(ContainerIndex(PatternComprehension(None,RelationshipsPattern(RelationshipChain(NodePattern(Some(Variable( UNNAMED62_NODES)),List(),None,None),RelationshipPattern(Some(Variable( REL136)),List(),None,None,INCOMING,false,None),NodePattern(Some(Variable(state)),List(LabelName(MyState)),None,None))),None,Property(Variable(state),PropertyKeyName(isConnected))),Parameter( AUTOINT1,Integer)),True()), True(), List((Variable(n),Equals(ContainerIndex(PatternComprehension(None,RelationshipsPattern(RelationshipChain(NodePattern(Some(Variable(n)),List(),None,None),RelationshipPattern(Some(Variable( REL136)),List(),None,None,INCOMING,false,None),NodePattern(Some(Variable(state)),List(LabelName(MyState)),None,None))),None,Property(Variable(state),PropertyKeyName(isConnected))),Parameter( AUTOINT1,Integer)),True())))) {
LHS -> CartesianProduct() {
LHS -> Selection(Ands(Set(In(Property(Variable(myNode),PropertyKeyName(myId)),ListLiteral(List(Parameter( AUTOSTRING0,String))))))) {
LHS -> NodeByLabelScan(myNode, LabelName(MyNode), Set()) {}
}
RHS -> NodeByLabelScan(otherNode, LabelName(MyNode), Set()) {}
}
}
Any idea ?

It was quite a complicated issue but here is the full explanation.
First, I found that it was not specific to neo4j-embedded. The internal error exception was raised because of an assert in Neo4J, witch would trigger an exception only if the flag -ea (enable assertions) is set. And this flag is set only when running tests with maven or any IDE.
Drilling down Neo4J's code on github, I found also that this assert whas added because of some concerns on recursive comprehension pattern. (The commit is here : https://github.com/neo4j/neo4j/commit/dfbe8ce397f7b72cf7d9b9ff1500f24a5c4b70b0)
In my case, I do use comprehension pattern but not recursively, so I think everything should be fine, except when unit testing :)
I submitted the problem to Neo4J's support, and they will provide a fix in a future release.

Passing strings to .wasm module

I've been stuck on this for a while now and I cannot seem to find good resources to my problem. I am coming from and "only C" background, so most of the web dev stuff is completely new for me.
I wrote a C function float editDistance(char *str1, char *str2) that returns the edit distance of 2 char arrays. Right now the goal is to successfully call this function from a JS environment.
After ensuring that the code works with the recommended Emscipten ccall method, I decided to move on. Now
I use Emscripten to compile the C code with flags -O3, -s WASM=1, -s EXPORTED_FUNCTIONS="['_editDistance']", and -s SIDE_MODULE=1 -s to Wasm. The JS code I'm trying to wrap around my WebAssembly is:
// Allocate memory for the wasm module to run in. (65536*256 bit)
let wasmMemory = new WebAssembly.Memory({
initial: 256
});
let info = {
env: {
abort: function() {},
memoryBase: 0,
tableBase: 0,
memory: wasmMemory,
table: new WebAssembly.Table({initial: 2, element: 'anyfunc'}),
}
}
// Define the strings
let str1 = "abcd";
let str2 = "abcd";
// Allocate memory on the wasm partition for the HEAPU8
let HEAPU8 = new Uint8Array(wasmMemory.buffer);
// Create the char arrays on the heap from the strings
let stackPtr = 0;
let str1Ptr = stackPtr;
stackPtr = stringToASCIIArray(str1, HEAPU8, stackPtr);
let str2Ptr = stackPtr;
stackPtr = stringToASCIIArray(str2, HEAPU8, stackPtr);
// Read the wasm file and instantiate it with the above environment setup. Then
// call the exported function with the string pointers.
let wasmBinaryFile = 'bin/edit_distanceW.wasm';
fetch(wasmBinaryFile, {credentials:"same-origin"})
.then((response) => response.arrayBuffer())
.then((binary) => WebAssembly.instantiate(binary,info))
.then((wa) => alert(wa.instance.exports._editDistance(str1Ptr, str2Ptr)));
// Converts a string to an ASCII byte array on the specified memory
function stringToASCIIArray(str, outU8Array, idx){
let length = str.length + 1;
let i;
for(i=0; i<length; i++){
outU8Array[idx+i] = str.charCodeAt(i);
}
outU8Array[idx+i]=0;
return (idx + length);
}
The generated wasm file when converted to wat demands these imports:
(import "env" "abort" (func (;0;) (type 0)))
(import "env" "memoryBase" (global (;0;) i32))
(import "env" "tableBase" (global (;1;) i32))
(import "env" "memory" (memory (;0;) 256))
(import "env" "table" (table (;0;) 2 anyfunc))
.. and exports these:
(export "__post_instantiate" (func 7))
(export "_editDistance" (func 9))
(export "runPostSets" (func 6))
(elem (;0;) (get_global 1) 8 1))
Now, when I test the code the strings are passed to the C module without a problem. A few function calls are even made on them (strLen) before things go south. In the C function there is this nasty nested loop that does the main computation, iterating thru a 2D array while reading the characters from the strings (C code just been ported from a paper with an ugly pseudo code, so pardon me the variable names):
do{
for(p=0; p<editDistance; p++){
// Do stuff
}
// Do more stuff
editDistance++;
} while(fkp[len2*2-len1][editDistance] != len1);
Before the function enters the for() loop, the module still has the strings on memory str1Ptr=0x00 and str2Ptr=0x05 with the correct length and content. On the contrary, immediately after entering the for() loop the memory gets overwritten by garbage (mostly 0s), corrupting the end result. I suspect some stack saving and restoration problems on the scope change, as the exact same code compiled to my PC using gcc works like a charm.
Any idea what setup I'm missing that hinders the correct completion of the C function?

If you are starting out you probably want to use the emscripten-generated JS glue. That is, don't use SIDE_MODULE=1 and instead output to a files calle .js. The emscripten compiler will then generate both a .js and a .wasm file. You can then include the .js file in your project and it will handle all the loading and setup for you.
If you try to load the wasm file yourself, you will need to do a lot of work to replicate the emscripten environment, which will require a lot of internal details of emscripten. Also, those internal details of subject to change when you update to the new version of emscripten so you are creating more work for yourself.

Grails CSV Plugin - Concurrency

I am using the plugin: Grails CSV Plugin in my application with Grails 2.5.3.
I need to implement the concurrency functionality with for example: GPars, but I don't know how I can do it.
Now, the configuration is sequential processing. Example of my code fragment:
Thanks.

Implementing concurrency in this case may not give you much of a benefit. It really depends on where the bottleneck is. For example, if the bottleneck is in reading the CSV file, then there would be little advantage because the file can only be read in sequential order. With that out of the way, here's the simplest example I could come up with:
import groovyx.gpars.GParsPool
def tokens = csvFileLoad.inputStream.toCsvReader(['separatorChar': ';', 'charset': 'UTF-8', 'skipLines': 1]).readAll()
def failedSaves = GParsPool.withPool {
tokens.parallel
.map { it[0].trim() }
.filter { !Department.findByName(it) }
.map { new Department(name: it) }
.map { customImportService.saveRecordCSVDepartment(it) }
.map { it ? 0 : 1 }
.sum()
}
if(failedSaves > 0) transactionStatus.setRollbackOnly()
As you can see, the entire file is read first; hence the main bottleneck. The majority of the processing is done concurrently with the map(), filter(), and sum() methods. At the very end, the transaction is rolled back if any of the Departments failed to save.
Note: I chose to go with a map()-sum() pair instead of using anyParallel() to avoid having to convert the parallel array produced by map() to a regular Groovy collection, perform the anyParallel(), which creates a parallel array and then converts it back to a Groovy collection.
Improvements
As I already mentioned in my example the CSV file is first read completely before the concurrent execution begins. It also attempts to save all of the Department instances, even if one failed to save. You may want that (which is what you demonstrated) or not.

sanitizing a Lua table input

Let's say I want a Lua table that will be provided from a third party, not totally reliable, from a file or other IO source.
I get the table as a string, like "{['valid'] = 10}" and I can load it as
externalTable = loadstring("return " .. txtTable)()
But this opens a breach to code injection, ie.: txtTable = os.execute('rm -rf /')
So I did this sanitizing function:
function safeLoadTable(txtTable)
txtTable = tostring(txtTable)
if (string.find(txtTable, "(", 1, true))
then return nil end
local _start = string.find(txtTable, "{", 1, true)
local _end = string.find(string.reverse(txtTable), "}", 1, true)
if (_start == nil or _end == nil)
then return nil end
txtTable = string.sub(txtTable, _start, #txtTable - _end + 1)
print("cropped to ", txtTable)
local pFunc = loadstring("return " .. txtTable)
if (pFunc) then
local _, aTable = pcall(pFunc)
return aTable
end
end
In the worst case it should return nil.
Can this be considered safe against a "regular bad-intentioned person" :)

You could run the unsafe code in a sandbox.
Here is how a simple sandbox could look in Lua 5.1 (error handling omitted for brevity):
local script = [[os.execute("rm -rf /")]]
local env = { print=print, table=table, string=string }
local f, err = loadstring(script)
if err then
-- handle syntax error
end
setfenv(f, env)
local status, err = pcall(f)
if not status then
-- handle runtime error
end
In Lua 5.2 you can load the script into it's own environment using the load function.
The result would be a runtime error returned from pcall:
attempt to index global 'os' (a nil value)
EDIT
As Lorenzo Donati pointed out in the comments this is not a complete solution to stop rogue scripts. It essentially allows you to white-list functions and tables that are approved for user scripts.
For more info about handling rogue scripts I would suggest this SO question:
Embedded Lua - timing out rogue scripts (e.g. infinite loop) - an example anyone?

I don't think it is safe. Try this:
print(safeLoadTable [[{ foo = (function() print"yahoo" end)() } ]])
EDIT
or this, for more fun:
print(safeLoadTable [[{ foo = (function() print(os.getenv "PATH") end)() } ]])
I won't suggest the alternative of replacing that os.getenv with os.execute, though. :-)
The problem is not easy to solve. Code injection avoidance is not at all simple in this case because you are executing a piece of Lua code when doing that loadstring. No simple string matching technique is really safe. The only secure way would be to implement a parser for a subset of the Lua table syntax and use that parser on the string.
BTW, even Lua team stripped off the bytecode verifier from Lua 5.2 since they discovered that it was amenable to attacks, and bytecode is a far simpler language than Lua source code.

I created sandbox.lua for exactly this purpose. It'll handle both insecure stuff as well as DOS-type attacks, assuming that your environment has access to the debug facility.
https://github.com/kikito/sandbox.lua
Note that for now it is Lua 5.1-compatible only.

Running in sandbox isn't safe, inspecting source code is not very simple. An idea: inspect bytecode!
Emmm, actually that's not very simple either, but here is a lazy implementation: http://codepad.org/mGqQ0Y8q

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

preventing extreme memory usage - memory

Related

How can I configure a nix pkg priority for ghcWithPackages (in code)

Comprehension pattern and neo4j-embedded

Passing strings to .wasm module

Grails CSV Plugin - Concurrency

sanitizing a Lua table input

Categories

Resources