Should/can I do nested receives for TCP data? - erlang

Can I Nest receive {tcp, Socket, Bin} -> calls? For example I have a top level loop called Loop, which upon receipt of tcp data calls a function, parse_header, to parse header data (an integer which indicates the kind of data to follow and thus its size), after that I need to receive the entire payload before moving on. I might only receive 4 bytes when I need a full 20 bytes and would like to call receive in a separate function called parse_payload. So the call chain would look like loop->parse_header->parse_payload and I would like parse_payload to call receive {tcp, Socket, Bin} ->. I don't know if this ok or if I'm completely going to mess things up and can only do it in the Loop function. Can someone enlighten me? If I am allowed to do this is am I violating some sort of best practice?

Maybe you can check the sample code for "erlang programming".
The download page is Erlang Programming Source Code
In file socket_examples.erl, please check "receive_data" function.
For perse message, I think you should determine how to seperate messages one by one (fixed length or with termination byte), then parse message's header, and payload.
receive_data(Socket, SoFar) ->
receive
{tcp,Socket,Bin} -> %% (3)
receive_data(Socket, [Bin|SoFar]);
{tcp_closed,Socket} -> %% (4)
list_to_binary(reverse(SoFar)) %% (5)
end.

You can also set a gen_tcp socket in passive mode. This way, the owning process won't receive the input by messages but has to fetch it using gen_tcp:recv(Socket, ByteCount) which returns either {ok, Input} or {error, Reason}. As this methods waits infinitely for the bytes you might want to add a timeout using gen_tcp:recv/3. (Erlang documentation of gen_tcp:recv)
While at first glance it might seem the process is now completely unable to react to messages sent to it, there is the following workaround improving the situation a bit:
f1(X) ->
receive
message1 ->
... do something ...,
f1(X);
message2 ->
... do something ...,
f1(X)
after 0 %timeout in ms
{ok, Input} = gen_tcp:recv(Socket, ByteCount, Timeout),
... do something ... % maybe call some times gen_tcp:recv again
f1(X)
end.
If you don't add a timeout to gen_tcp:recv here, other processes could wait ages for f1 to handle their messages.

Related

Erlang client-server example using gen_tcp is not receiving anything

I am trying to receive data at client side, but nothing is received.
Server code that sends message
client(Socket, Server) ->
gen_tcp:send(Socket,"Please enter your name"),
io:format("Sent confirmation"),
{ok, N} = gen_tcp:recv(Socket,0),
case string:tokens(N,"\r\n") of
[Name] ->
Client = #client{socket=Socket, name=Name, pid=self()},
Server ! {'new client', Client},
client_loop(Client, Server)
end.
Client that should receive and print out
client(Port)->
{ok, Sock} = gen_tcp:connect("localhost",Port,[{active,false},{packet,2}]),
A = gen_tcp:recv(Sock,0),
A.
I think your client is faulty because it specifies:
{packet, 2}
yet the server specifies (in code not shown) :
{packet, 0}
In Programming Erlang (2nd) on p. 269 it says:
Note that the arguments to packet used by the client and the server
must agree. If the server was opened with {packet,2} and the client with {packet,4}, then nothing would work.
The following client can successfully receive text from the server:
%%=== Server: {active,false}, {packet,0} ====
client(Port) ->
{ok, Socket} = gen_tcp:connect(
localhost,
Port,
[{active,false},{packet,0}]
),
{ok, Chunk} = gen_tcp:recv(Socket, 0),
io:format("Client received: ~s", [Chunk]),
timer:sleep(1000),
Name = "Marko",
io:format("Client sending: ~s~n", [Name]),
gen_tcp:send(Socket, Name),
loop(Socket).
loop(Socket) ->
{ok, Chunk} = gen_tcp:recv(Socket, 0),
io:format("Client received: ~s~n", [Chunk]),
loop(Socket).
However, I think that both the chatserver and my client have serious issues. When you send a message through a TCP (or UDP) connection, you have to assume that the message will get split into an indeterminate number of chunks--each with an arbitrary length. When {packet,0} is specified, I think recv(Socket, 0) will only read one chunk from the socket, then return. That chunk may be the entire message, or it might be only a piece of the message. To guarantee that you've read the entire message from the socket, I think you have to loop over the recv():
get_msg(Socket, Chunks) ->
Chunk = gen_tcp:recv(Socket, 0),
get_msg(Socket, [Chunk|Chunks]).
Then the question becomes: how do you know when you've read the entire message so that you can end the loop? {packet,0} tells Erlang not to prepend a length header to a message, so how do you know where the end of the message is? Are more chunks coming, or did the recv() already read the last chunk? I think the marker for the end of the message is when the other side closes the socket:
get_msg(Socket, Chunks) ->
case gen_tcp:recv(Socket, 0) of
{ok, Chunk} ->
get_msg(Socket, [Chunk|Chunks]);
{error, closed} ->
lists:reverse(Chunks);
{error, Other} ->
Other
end.
But that raises another issue: if the chatserver is looping on a recv() waiting for a message from the client, and after the client sends a message to the server the client loops on a recv() waiting for a message from the server, and both sides need the other side to close the socket to break out of their recv() loops, then you will get deadlock because neither side is closing their socket. As a result, the client will have to close the socket in order for the chatserver to break out of its recv() loop and process the message. But, then the server can't send() anything back to the client because the client closed the socket. As a result, I don't know if you can do two way communication when {packet,0} is specified.
Here are my conclusions about {packet, N} and {active, true|false} from reading the docs and searching around:
send():
When you call send(), no data* is actually transferred to the destination. Instead, send() blocks until the destination calls recv(), and only then is data transferred to the destination.
* In "Programming Erlang (2nd)", on p. 176 it says that a small amount of data will be pushed to the destination when you call send() due to the way an OS buffers data, and thereafer send() will block until a recv() pulls data to the destination.
Default options:
You can get the defaults for a socket by specifying an empty list for its options, then doing:
Defaults = inet:getopts(Socket, [mode, active, packet]),
io:format("Default options: ~w~n", [Defaults]).
--output:--
Default options: {ok,[{mode,list},{active,true},{packet,0}]}
You can use inet:getopts() to show that gen_tcp:accept(Socket) returns a socket with the same options as Socket.
{active, true} {active,false}
+--------------+----------------+
{packet, 1|2|4}: | receive | recv() |
| no loop | no loop |
+--------------+----------------+
{packet, 0|raw}: | receive | recv() |
(equivalent) | loop | loop |
+--------------+----------------+
{active, false}
Messages do not land in the mailbox. This option is used to prevent clients from flooding a server's mailbox with messages. Do not try to use a receive block to extract 'tcp' messages from the mailbox--there won't be any. When a process wants to read a message, the process needs to read the message directly from the socket by calling recv().
{packet, 1|2|4}:
The packet tuple specifies the protocol that each side expects messages to conform to. {packet, 2} specifies that each message will be preceded by two bytes, which will contain the length of the message. That way, a receiver of a message will know how long to keep reading from the stream of bytes to reach the end of the message. When you send a message over a TCP connection, you have no idea how many chunks the message will get split into. If the receiver stops reading after one chunk, it might not have read the whole message. Therefore, the receiver needs an indicator to tell it when the whole message has been read.
With {packet, 2}, a receiver will read two bytes to get the length of the message, say 100, then the receiver will wait until it has read 100 bytes from the randomly sized chunks of bytes that are streaming to the receiver.
Note that when you call send(), erlang automatically calculates the number of bytes in the message and inserts the length into N bytes, as specified by {packet, N}, and appends the message. Likewise, when you call recv() erlang automatically reads N bytes from the stream, as specified by {packet, N}, to get the length of the message, then recv() blocks until it reads length bytes from the socket, then recv() returns the whole message.
{packet, 0 | raw} (equivalent):
When {packet, 0} is specified, recv() will read the number of bytes specified by its Length argument. If Length is 0, then I think recv() will read one chunk from the stream, which will be an arbitrary number of bytes. As a result, the combination of {packet, 0} and recv(Socket, 0) requires that you create a loop to read all the chunks of a message, and the indicator for recv() to stop reading because it has reached the end of the message will be when the other side closes the socket:
get_msg(Socket, Chunks) ->
case gen_tcp:recv(Socket, 0) of
{ok, Chunk} ->
get_msg(Socket, [Chunk|Chunks]);
{error, closed} ->
lists:reverse(Chunks);
{error, Other} ->
Other
end.
Note that a sender cannot simply call gen_tcp:close(Socket) to signal that it is done sending data (see the description of gen_tcp:close/1 in the docs). Instead, a sender has to signal that is is done sending data by calling gen_tcp:shutdown/2.
I think the chatserver is faulty because it specifies {packet, 0} in combination with recv(Socket, 0), yet it does not use a loop for the recv():
client_handler(Sock, Server) ->
gen_tcp:send(Sock, "Please respond with a sensible name.\r\n"),
{ok,N} = gen_tcp:recv(Sock,0), %% <**** HERE ****
case string:tokens(N,"\r\n") of
{active, true}
Messages sent through a TCP (or UDP) connection are automatically read from the socket for you and placed in the controlling process's mailbox. The controlling process is the process that called accept() or the process that called connect(). Instead of calling recv() to read messages directly from the socket, you extract messages from the mailbox with a receive block:
get_msg(Socket)
receive
{tcp, Socket, Chunk} -> %Socket is already bound!
...
end
{packet, 1|2|4}:
Erlang automatically reads all the chunks of a message from the socket for you and places a complete message (with the length header stripped off) in the mailbox:
get_msg(Socket) ->
receive
{tcp, Socket, CompleteMsg} ->
CompleteMsg,
{tcp_closed, Socket} ->
io:format("Server closed socket.~n")
end.
{packet, 0 | raw} (equivalent):
Messages will not have a length header, so when Erlang reads from the socket, Erlang has no way of knowing when the end of the message has arrived. As a result, Erlang places each chunk it reads from the socket into the mailbox. You need a loop to extract all the chunks from the mailbox, and the other side has to close the socket to signal that no more chunks are coming:
get_msg(ClientSocket, Chunks) ->
receive
{tcp, ClientSocket, Chunk} ->
get_msg(ClientSocket, [Chunk|Chunks]);
{tcp_closed, ClientSocket} ->
lists:reverse(Chunks)
end.
The recv() docs mention something about recv()'s Length argument only being applicable to sockets in raw mode. But because I don't know when a Socket is in raw mode, I don't trust the Length argument. But see here: Erlang gen_tcp:recv(Socket, Length) semantics. Okay, now I'm getting somewhere: from the erlang inet docs:
{packet, PacketType}(TCP/IP sockets)
Defines the type of packets to use for a socket. Possible values:
raw | 0
No packaging is done.
1 | 2 | 4
Packets consist of a header specifying the number of bytes in the packet, followed by that
number of bytes. The header length can be one, two, or four bytes, and containing an
unsigned integer in big-endian byte order. Each send operation generates the header, and the
header is stripped off on each receive operation.
The 4-byte header is limited to 2Gb [message length].
As the examples at Erlang gen_tcp:recv(Socket, Length) semantics confirm, when {packet,0} is specified, a recv() can specify the Length to read from the TCP stream.

Erlang : gen_server - reply to two clients

As a newbie, writing a toy matching (trading) engine using gen_server.
Once a trade/match occurs there is need to notify both the clients.
Documentation says that :
reply(Client, Reply) -> Result
Types:
Client - see below
Reply = term()
Result = term()
This function can be used by a gen_server to explicitly send a reply
to a client that called call/2,3 or multi_call/2,3,4, when the reply
cannot be defined in the return value of Module:handle_call/3.
Client must be the From argument provided to the callback function. Reply is an arbitrary term, which will be given back to
the client as the return value of call/2,3 or multi_call/2,3,4.
The return value Result is not further defined, and should always be
ignored.
Given the above how is it possible to send notification to the other client.
SAMPLE SEQUENCE OF ACTIONS
C1 -> Place order IBM,BUY,100,10.55
Server -> Ack C1 for order
C2 -> Place order IBM,SELL,100,10.55
Server -> Ack C2 for order
-> Trade notification to C2
-> Trade notification to C1 %% Can I use gen_server:reply()
%% If yes - How ?
Well, you can't. Your ACK is already a reply. And only single reply is acceptable by gen_server:call contract. I mean, gen_server:call will only wait for one reply.
Generally gen_server:reply can be implemented like
reply({Pid, Ref}, Result) ->
Pid ! {Ref, Result}.
That means that if you try sending multiple replies, you just get some weired messages in the message box of the caller process.
Proposal
Instead, I believe, you should send associate every trade with some reference, and send message to the caller with that reference CX_Ref during the ACK procedure. Then, when you have to send a notification, you just emit message {C1_Ref, Payload} to C1 and {C2_Ref, Payload} to C2.
Also you may want to introduce some monitoring to handle broker crashes.

erlang supervisor best way to handle ibrowse:send_req conn_failed

new to Erlang and just having a bit of trouble getting my head around the new paradigm!
OK, so I have this internal function within an OTP gen_server:
my_func() ->
Result = ibrowse:send_req(?ROOTPAGE,[{"User-Agent",?USERAGENT}],get),
case Result of
{ok, "200", _, Xml} -> %<<do some stuff that won't interest you>>
,ok;
{error,{conn_failed,{error,nxdomain}}} -> <<what the heck do I do here?>>
end.
If I leave out the case for handling the connection failed then I get an exit signal propagated to the supervisor and it gets shut down along with the server.
What I want to happen (at least I think this is what I want to happen) is that on a connection failure I'd like to pause and then retry send_req say 10 times and at that point the supervisor can fail.
If I do something ugly like this...
{error,{conn_failed,{error,nxdomain}}} -> stop()
it shuts down the server process and yes, I get to use my (try 10 times within 10 seconds) restart strategy until it fails, which is also the desired result however the return value from the server to the supervisor is 'ok' when I would really like to return {error,error_but_please_dont_fall_over_mr_supervisor}.
I strongly suspect in this scenario that I'm supposed to handle all the business stuff like retrying failed connections within 'my_func' rather than trying to get the process to stop and then having the supervisor restart it in order to try it again.
Question: what is the 'Erlang way' in this scenario ?
I'm new to erlang too.. but how about something like this?
The code is long just because of the comments. My solution (I hope I've understood correctly your question) will receive the maximum number of attempts and then do a tail-recursive call, that will stop by pattern-matching the max number of attempts with the next one. Uses timer:sleep() to pause to simplify things.
%% #doc Instead of having my_func/0, you have
%% my_func/1, so we can "inject" the max number of
%% attempts. This one will call your tail-recursive
%% one
my_func(MaxAttempts) ->
my_func(MaxAttempts, 0).
%% #doc This one will match when the maximum number
%% of attempts have been reached, terminates the
%% tail recursion.
my_func(MaxAttempts, MaxAttempts) ->
{error, too_many_retries};
%% #doc Here's where we do the work, by having
%% an accumulator that is incremented with each
%% failed attempt.
my_func(MaxAttempts, Counter) ->
io:format("Attempt #~B~n", [Counter]),
% Simulating the error here.
Result = {error,{conn_failed,{error,nxdomain}}},
case Result of
{ok, "200", _, Xml} -> ok;
{error,{conn_failed,{error,nxdomain}}} ->
% Wait, then tail-recursive call.
timer:sleep(1000),
my_func(MaxAttempts, Counter + 1)
end.
EDIT: If this code is in a process which is supervised, I think it's better to have a simple_one_for_one, where you can add dinamically whatever workers you need, this is to avoid delaying initialization due to timeouts (in a one_for_one the workers are started in order, and having sleep's at that point will stop the other processes from initializing).
EDIT2: Added an example shell execution:
1> c(my_func).
my_func.erl:26: Warning: variable 'Xml' is unused
{ok,my_func}
2> my_func:my_func(5).
Attempt #0
Attempt #1
Attempt #2
Attempt #3
Attempt #4
{error,too_many_retries}
With 1s delays between each printed message.

How to save state in an Erlang process?

I am learning Erlang and trying to figure out how I can, and should, save state inside a process.
For example, I am trying to write a program that given a list of numbers in a file, tells me whether a number appears in that file. My approach is to uses two processes
cache which reads the content of the file into a set, then waits for numbers to check, and then replies whether they appear in the set.
is_member_loop(Data_file) ->
Numbers = read_numbers(Data_file),
receive
{From, Number} ->
From ! {self(), lists:member(Number, Numbers)},
is_member_loop(Data_file)
end.
client which sends numbers to cache and waits for the true or false response.
check_number(Number) ->
NumbersPid ! {self(), Number},
receive
{NumbersPid, Is_member} ->
Is_member
end.
This approach is obviously naive since the file is read for every request. However, I am quite new at Erlang and it is unclear to me what would be the preferred way of keeping state between different requests.
Should I be using the process dictionary? Is there a different mechanism I am not aware of for that sort of process state?
Update
The most obvious solution, as suggested by user601836, is to pass the set of numbers as a param to is_member_loop instead of the filename. It seems to be a common idiom in Erlang and there is a good example in the fantastic online book Learn you some Erlang.
I think, however, that the question still holds for more complex state that I'd want to preserve in my process.
Simple solution, you can pass to your function is_member_loop(Data_file) the list of numbers rather then the file name.
The best solution when you deal with a state consists in using a gen_server. To learn more you should take a look at records and gen_server behaviour (this may also be useful).
In practice:
1) start with a module (yourmodule.erl) based on gen_server behaviour
2) read your file in the init function of the gen_server and pass it as state field:
init([]) ->
Numbers = read_numbers(Data_file),
{ok, #state{numbers=Numbers}}.
3) write a function which will be used to trigger a call to the gen_server
check_number(Number) ->
gen_server:call(?MODULE, {check_number, Number}).
4) write the code in order to handle messages triggered from your function
handle_call({check_number, Number}, _From, #state{numbers=Numbers} = State) ->
Reply = lists:member(Number, Numbers)},
{reply, Reply, State};
handle_call(_Request, _From, State) ->
Reply = ok,
{reply, Reply, State}.
5) export from yourmodule.erl function check_number
-export([check_number/1]).
Two things to be explained about point 4:
a) we extract values inside the record State using pattern matching
b) As you may see I left the generic handle call, otherwise your gen_server will fail due to wrong pattern matching whenever a message different from {check_number, Number} is received
Note: if you are new to erlang, don't use process dictionary
Not sure how idiomatic this is, since I'm not exactly an Erlang pro yet, but I'd handle this by using ETS. Basically,
read_numbers_to_ets(DataFile) ->
Table = ets:new(numbers, [ordered_set]),
insert_numbers(Table, DataFile),
Table.
insert_numbers(Table, DataFile) ->
case read_next_number(DataFile) of
eof -> ok;
Num -> ets:insert(numbers, {Num})
end.
you could then define your is_member as
is_member(TableId, Number) ->
case ets:match(TableId, {Number}) of
[] -> false; %% no match from ets
[[]] -> true %% ets found the number you're looking for in that table
end.
Instead of taking a Data_file, your is_member_loop would take the id of the table to do a lookup on.

Rate-limited event handler in erlang/OTP

I have a data source that produces point at a potentially high rate, and I'd like to perform a possibly time-consuming operation on each point; but I would also like the system to degrade gracefully when it becomes overloaded, by dropping excess data points.
As far as I can tell, using a gen_event will never skip events. Conceptually, what I would like the gen_event to do is to drop all but the latest pending events before running the handlers again.
Is there a way to do this with standard OTP ? or is there a good reason why I should not handle things that way ?
So far the best I have is using a gen_server and relying on the timeout to trigger the expensive events:
-behaviour(gen_server).
init() ->
{ok, Pid} = gen_event:start_link(),
{ok, {Pid, none}}.
handle_call({add, H, A},_From,{Pid,Data}) ->
{reply, gen_event:add_handler(Pid,H,A), {Pid,Data}}.
handle_cast(Data,{Pid,_OldData}) ->
{noreply, {Pid,Data,0}}. % set timeout to 0
handle_info(timeout, {Pid,Data}) ->
gen_event:sync_notify(Pid,Data),
{noreply, {Pid,Data}}.
Is this approach correct ? (esp. with respect to supervision ? )
I can't comment on supervision, but I would implement this as a queue with expiring items.
I've implemented something that you can use below.
I made it a gen_server; when you create it you give it a maximum age for old items.
Its interface is that you can send it items to be processed and you can request items that have not been dequeued It records the time at which it receives every item. Every time it receives an item to be processed, it checks all the items in the queue, dequeueing and discarding those that are older than the maximum age. (If you want the maximum age to be always respected, you can filter the queue before you return queued items)
Your data source will cast data ({process_this, Anything}) to the work queue and your (potentially slow) consumers process will call (gimme) to get data.
-module(work_queue).
-behavior(gen_server).
-export([init/1, handle_cast/2, handle_call/3]).
init(DiscardAfter) ->
{ok, {DiscardAfter, queue:new()}}.
handle_cast({process_this, Data}, {DiscardAfter, Queue0}) ->
Instant = now(),
Queue1 = queue:filter(fun({Stamp, _}) -> not too_old(Stamp, Instant, DiscardAfter) end, Queue0),
Queue2 = queue:in({Instant, Data}, Queue1),
{noreply, {DiscardAfter, Queue2}}.
handle_call(gimme, From, State = {DiscardAfter, Queue0}) ->
case queue:is_empty(Queue0) of
true ->
{reply, no_data, State};
false ->
{{value, {_Stamp, Data}}, Queue1} = queue:out(Queue0),
{reply, {data, Data}, {DiscardAfter, Queue1}}
end.
delta({Mega1, Unit1, Micro1}, {Mega2, Unit2, Micro2}) ->
((Mega2 - Mega1) * 1000000 + Unit2 - Unit1) * 1000000 + Micro2 - Micro1.
too_old(Stamp, Instant, DiscardAfter) ->
delta(Stamp, Instant) > DiscardAfter.
Little demo at the REPL:
c(work_queue).
{ok, PidSrv} = gen_server:start(work_queue, 10 * 1000000, []).
gen_server:cast(PidSrv, {process_this, <<"going_to_go_stale">>}),
timer:sleep(11 * 1000),
gen_server:cast(PidSrv, {process_this, <<"going to push out previous">>}),
{gen_server:call(PidSrv, gimme), gen_server:call(PidSrv, gimme)}.
Is there a way to do this with standard OTP ?
No.
is there a good reason why I should not handle things that way ?
No, timing out early can increase the performance of the entire system. Read about how here.
Is this approach correct ? (esp. with respect to supervision ? )
No idea, you haven't provided the supervision code.
As a bit of extra information to your first question:
If you can use 3rd party libraries outside of OTP, there are a few out there that can add preemptive timeouts, which is what you are describing.
There are two that I am familiar with the first is dispcount, and the second is chick (I'm the author of chick, i'll try not to advertise the project here).
Dispcount works really good for single resources that only have a limited number of jobs that can be run at the same time and does no queuing. you can read about it here (warning lots of really interesting information!).
Dispcount didn't work for me because i would have had to spawn 4000+ pools of processes to handle the amount of different queues inside of my app. I wrote chick because I needed a way to dynamically increase and decrease my queue length, as well as being able to queue up requests and deny others, without having to spawn 4000+ pools of processes.
If I were you I would try out discount first (as most solutions do not need chick), and then if you need something a bit more dynamic then a pool that can respond to a certain number of requests try out chick.

Resources