How do I prevent message flooding with open_port? - erlang

I'm trying to read a 100gb file through stdin one line at a time using
Port = open_port({fd, 0, 1}, [in, binary, {line, 4096}]),
but this floods my system with messages until I run out of ram. Is there a away to make it like {active, once} with ports? There is also io:get_line() but I was wondering if this could work.

No, there is not flow control over ports so if you can't process fast enough you should use another method of processing. You can set binary mode on STDIN using
ok = io:setopts(standard_io, [binary]),
and then you can read it using file:read_line(standard_io) if you are using version 17 or newer (there was performance impacting bug).

Related

Do I need to start multiple server-side workers for just a handful of ZeroMQ clients?

I am using Chumak in erlang, opening a ROUTER socket.
I have a handful (4 or so) clients that use the Python zmq library to send REQ requests to this server.
Things work fine most of the time, but sometimes a client will have disconnect issues (reconnecting automatically is in the client code, and it works). I've found that when an error occurs in one client connection, it seems to move on to others as well, and I get a lot of
** {{noproc,{gen_server,call,[<0.31596.16>,incomming_queue_out]}},
on the server.
On the server side, I'm just opening one chumak socket and looping:
{ok, Sock} = chumak:socket( router ),
{ok, _} = chumak:bind( Sock, tcp, "0.0.0.0", ?PORT ),
spawn_link( fun() -> loop( Sock ) end ),
...
loop( CmdSock ) ->
{ok, [Identity, <<>>, Data]} = chumak:recv_multipart( Sock ),
...
The ZeroMQ docs seem to imply that one listening socket is enough unless I have many clients.
Do I misunderstand them?
No, there is no need to increase number of Socket instances
Abstractions are great to reduce a need to understand all the details under the hood for a typical user. That ease of life stops whenever such user has to go into performance tuning or debugging incidents.
Let's step in this way:
- unless some mastodon beast sized data payloads are to get moved through, there is quite enough to have a single ROUTER-AccessPoint into a Socket-instance, for say tens, hundreds, thousands of REQ-AccessPoints on the client side(s).
- yet, such numbers will increase the performance envelope requirements for the ROUTER-side Context-instance, so as to remain capable of handling all the Scalable Formal Communication Archetype ( pre-scribed ) handling, so as to all happen in due time and fair fashion.
This means, one can soon realise benefits from spawning Context-instances with more than its initial default solo-thread + in all my high-performance setups I advocate for using zmq.AFFINITY mappings, so as to squeeze indeed a max performance on highest-priority Socket-instances, whereas leaving non-critical resources sharing a common sub-set of the Context-instance's IO-thread-pool.
Next comes RAM
Yes, the toys occupy memory.
Check all the .{RCV|SND}BUF, .MAXMSGSIZE, .{SND|RCV}HWM, .BACKLOG, .CONFLATE
Next comes LINK-MANAGEMENT
Do not hesitate to optimise .IMMEDIATE, .{RCV|SND}BUF, .RECONNECT_IVL, .RECONNECT_IVL_MAX, .TCP_KEEPALIVE, .TCP_KEEPALIVE_CNT, .TCP_KEEPALIVE_INTVL, .TCP_KEEPALIVE_IDLE
Always set .LINGER right upon instantiations, as drop-outs cease to be lethal.
Next may come a few defensive and performance helper tools:
.PROBE_ROUTER, .TCP_ACCEPT_FILTER, .TOS, .HANDSHAKE_IVL
Next step?
If no memory-related troubles remain in the game and once mentioning reconnections, my suspect would be to rather go and setup .IMMEDIATE + possibly let ROUTER benefit from explicit PROBE_ROUTER signalling.

How To Use Erlang ssl:close/2

I have an SSL server, and I want to downgrade this after receiving the first ssl:recv to a raw gen_tcp. Assuming this can be used to do that I can not find an example on how to use this. And I am not so good at using the Erlang/OTP documentation yet http://erlang.org/doc/man/ssl.html#close-2
I am a bit confused with NewController::pid() from the documentation:
How = timeout() | {NewController::pid(), timeout()}
NewController::pid() here refers to the process you want to set as the "controlling process" for the downgraded TCP socket. gen_tcp functions on the socket will only work if called from that process. You'll want to send self() here unless you want to use the downgraded TCP socket from another process.
The only example I could find of ssl:close/2 being used with a tuple as the second argument is this test. Here's a simplified version of that code to get you started:
% Assuming `SSLSocket` is the SSL socket.
{ok, TCPSocket} = ssl:close(SSLSocket, {self(), 10000}),
% You can use `TCPSocket` with `gen_tcp` now.
gen_tcp:send(TCPSocket, "foo"),

zlib inflate giving data error in erlang

I have a java client which is sending some message to an erlang server process listening on TCP.The java client sends the data using outputstream.On the server side i am using following call to uncompress the data after initialising zlib
zlib:inflate(ZStream, Data),
where Data is binary.I am getting data_error on this call.
Under what conditions do I get data_error with zlib.
Try setting a 0 or -15 WindowBits, would help if you paste more code like the zlib:inflateInit call, the binary dump of Data variable, and the Java side zlib init.
If you are streaming the data in relatively small chunks, you can use my ezlib on Github.
Performance wise it's around 69 % faster than erlang driver and also works better when you have concurrent sessions.
To integrate, use rebar as you would do for any other erlang app. To run a small example:
StringBin = <<"this is a string compressed with zlib nif library">>,
{ok, DeflateRef} = ezlib:new(?Z_DEFLATE),
{ok, InflateRef} = ezlib:new(?Z_INFLATE),
CompressedBin = ezlib:process(DeflateRef, StringBin),
DecompressedBin = ezlib:process(InflateRef, CompressedBin).
Do not use it to compress large blocks, because you can block the erlang scheduler. I will change this in the subsequent versions.

Read a filestream (named pipe) with a timeout in Smalltalk

I posted this to the Squeak Beginners list too - I'll be sure to make sure any answers from there get here :)
I'm using Squeak 4.2 and working on the smalltalk end of a named pipe connection, which sends a message to the named pipe server with:
msg := 'Here''s Johnny!!!!'.
pipe nextPutAll: msg; flush.
It should then receive an acknowledgement, which will be a 32-byte md5 hash of the received message (which the smalltalk app can then verify). It's possible the named pipe server may have gone away or otherwise been unable to deal with the request, and so I'd like to set a timeout on reading the acknowledgement. I've tried using this:
ack := [ pipe next: 32 ] valueWithin: (Duration seconds: 3) onTimeout: [ 'timeout'. ].
and then made the pipe server pause artificially to test the code. But the smalltalk thread blocks on the read and doesn't carry on (even after the timeout), although if I then get the pipe server to send the correct response (after a 5 second delay, for example), the value of 'ack' is 'timeout'. Obviously the timeout did what it's supposed to do, but couldn't 'unblock' the blocking read on the pipe.
Is there a way to accomplish this even with a blocking FileStream read? I'd rather avoid a busy wait on there being 32 characters available if at all possible.
This one may come in handy but not on Windows I am afraid.
http://www.samadhiweb.com/blog/2013.07.27.unixdomainsockets.html

On external port how to only close the output and wait for exit_status

Im using a port to run a pipeline with uncompresses and dd's some data:
Port = open_port({spawn, "bzcat | sudo dd of=/dev/foo},
[stream, use_stdio, exit_status]),
What I would like to do is produce a end-of-file situation on the output which causes the pipeline to complete and eventually exit.
I would like to wait for this completion and also capture the exit_status.
When I just call port_close it looks to me as if the pipeline is just terminated and there is no wait for completion. Also I don't get any exit_status ....
How can I accomplish waiting for exit before my next step (which requires the dd to have completed).
Did some experiments and it looks like at least port_close doesn't kill the process, you just don't find out when its done. Is this correct?
If you just need to wait for spawned by open_port command to complete you need to wait for exit_status message:
1> Port = open_port({spawn, "sleep 7"}, [exit_status]).
#Port<0.497>
2> receive {Port, {exit_status, Code}} -> Code after 10000 -> timeout end.
0
Update (about to say a port just close the output pipe): I think you can't just close the output pipe with the default spawn driver. Default driver doesn't have any control commands and port_close although don't kill spawned command but completely erase all port's state.
Possible solutions:
Write input stream to a file first and then run bzip/dd sequence on that file;
Write your own driver or NIF (Maybe some open source implementations already exist?)
Use some external script and control protocol, for example full (or chunk) length can be transferred before the actual content so the script will know when to close the connection
Several rather ugly workarounds to this problem can be found here: limitations of erlang:open_port() and os:cmd()
Some even use netcat to map the problem to a tcp connection.

Resources