My scenario is as follows -
I have a client C with function foo() which performs some computation.
I'd like a server S, which doesn't know about foo(), to execute this function instead, and send the result back to the client.
I am trying to determine the best way to perform this in Erlang. I am considering:
Hot code swapping - i.e. "upgrade" code in S such that it has the function foo(). Execute and send back to the client.
In a distributed manner where nodes are all appropriately registered, do something along the lines of S ! C:foo() - for the purpose of "sending" the function to process/node S
Are there other methods (or features of the language) that I am not thinking of?
Thanks for the help!
If the computation function is self contained i.e. does not depend on any other modules or functions on the client C, then what you need to do is a fun (Functional Objects). A fun can be sent across the network and applied by a remote machine and in side the fun, the sender has embedded their address and a way of getting the answer back. So the executor may only see a fun to which they may or may not give an argument, yet inside the fun, the sender has forced a method where by the answer will automatically be sent back. The fun is an abstraction of very many tasks within one thing, and it can be moved around as arguments.
At the client, you can have code like this:
%% somewhere in the client
%% client runs on node() == 'client#domain.com'
-module(client).
-compile(export_all).
-define(SERVER,{server,'server#domain.com'}).
give_a_server_a_job(Number)-> ?SERVER ! {build_fun(),Number}.
build_fun()->
FunObject = fun(Param)->
Answer = Param * 20/1000, %% computation here
rpc:call('client#domain.com',client,answer_ready,[Answer])
end,
FunObject.
answer_ready(Answer)->
%%% use Answer for all sorts of funny things....
io:format("\n\tAnswer is here: ~p~n",[Answer]).
The server then has code like this:
%%% somewhere on the server
%%% server runs on node() == 'server#domain.com'
-module(server).
-compile(export_all).
start()-> register(server,spawn(?MODULE,loop,[])).
loop()->
receive
{Fun,Arg} ->
Fun(Arg), %% server executes job
%% job automatically sends answer back
%% to client
loop();
stop -> exit(normal);
_ -> loop()
end.
In this way, the job executor needs not know about how to send back the reply, The job itself comes knowing how it will send back the answer to however sent the job!. I have used this method of sending functional objects across the network in several project, its so cool !!!
#### EDIT #####
If you have a recursive problem, You manipulate recursion using funs. However, you will need at least one library function at the client and/or the server to assist in recursive manipulations. Create a function which should be in the code path of the client as well as the server.
Another option is to dynamically send code from the server to the client and then using the library: Dynamic Compile erlang to load and execute erlang code at the server from the client. Using dynamic compile, here is an example:
1> String = "-module(add).\n -export([add/2]). \n add(A,B) -> A + B. \n".
"-module(add).\n -export([add/2]). \n add(A,B) -> A + B. \n"
2> dynamic_compile:load_from_string(String).
{module,add}
3> add:add(2,5).
7
4>
What we see above is a piece of module code that is compiled and loaded dynamically from a string. If the library enabling this is available at the server and client , then each entity can send code as a string and its loaded and executed dynamically at the other. This code can be unloaded after use. Lets look at the Fibonacci function and how it can be sent and executed at the server:
%% This is the normal Fibonacci code which we are to convert into a string:
-module(fib).
-export([fib/1]).
fib(N) when N == 0 -> 0;
fib(N) when (N < 3) and (N > 0) -> 1;
fib(N) when N > 0 -> fib(N-1) + fib(N-2).
%% In String format, this would now become this piece of code
StringCode = " -module(fib).\n -export([fib/1]). \nfib(N) when N == 0 -> 0;\n fib(N) when (N < 3) and (N > 0) -> 1;\n fib(N) when N > 0 -> fib(N-1) + fib(N-2). \n".
%% Then the client would send this string above to the server and the server would %% dynamically load the code and execute it
send_fib_code(Arg)->
{ServerRegName,ServerNode} ! {string,StringCode,fib,Arg},
ok.
get_answer({fib,of,This,is,That}) ->
io:format("Fibonacci (from server) of ~p is: ~p~n",[This,That]).
%%% At Server
loop(ServerState)->
receive
{string,StringCode,Fib,Arg} when Fib == fib ->
try dynamic_compile:load_from_string(StringCode) of
{module,AnyMod} ->
Answer = AnyMod:fib(Arg),
%%% send answer back to client
%%% should be asynchronously
%%% as the channels are different & not make
%% client wait
rpc:call('client#domain.com',client,get_answer,[{fib,of,Arg,is,Answer}])
catch
_:_ -> error_logger:error_report(["Failed to Dynamic Compile & Load Module from client"])
end,
loop(ServerState);
_ -> loop(ServerState)
end.
That piece of rough code can show you what am trying to say. However, you remember to unload all un-usable dynamic modules. Also you can a have a way in which the server tries to check wether such a module was loaded already before loading it again. I advise that you donot copy and paste the above code. Look at it and understand it and then write your own version that can do the job. success !!!
If you do S ! C:foo() it will compute on client side function foo/1 from module C and send its result to process S. It doesn't seem like what you want to do. You should do something like:
% In client
call(S, M, F, A) ->
S ! {do, {M, F, A}, self()},
receive
{ok, V} -> V
end.
% In server
loop() ->
receive
{do, {M, F, A}, C} ->
C ! {ok, apply(M, F, A)},
loop()
end.
But in real scenario you would have to do a lot more work e.g. mark your client message to perform selective receive (make_ref/0), catch error in server and send it back to client, monitor server from client to catch server down, add some timeout and so. Look how are gen_server:call/2 and rpc:call/4,5 implemented and it is reason why there is OTP to save you from most of gotcha.
Related
I am learning Erlang from a Ruby background and having some difficulty grasping the thought process. The problem I am trying to solve is the following:
I need to make the same request to an api, each time I receive a unique ID in the response which I need to pass into the next request until there is not ID returned. From each response I need to extract certain data and use it for other things as well.
First get the iterator:
ShardIteratorResponse = kinetic:get_shard_iterator(GetShardIteratorPayload).
{ok,[{<<"ShardIterator">>,
<<"AAAAAAAAAAGU+v0fDvpmu/02z5Q5OJZhPo/tU7fjftFF/H9M7J9niRJB8MIZiB9E1ntZGL90dIj3TW6MUWMUX67NEj4GO89D"...>>}]}
Parse out the shard_iterator..
{_, [{_, ShardIterator}]} = ShardIteratorResponse.
Make the request to kinesis for the streams records...
GetRecordsPayload = [{<<"ShardIterator">>, <<ShardIterator/binary>>}].
[{<<"ShardIterator">>,
<<"AAAAAAAAAAGU+v0fDvpmu/02z5Q5OJZhPo/tU7fjftFF/H9M7J9niRJB8MIZiB9E1ntZGL90dIj3TW6MUWMUX67NEj4GO89DETABlwVV"...>>}]
14> RecordsResponse = kinetic:get_records(GetRecordsPayload).
{ok,[{<<"NextShardIterator">>,
<<"AAAAAAAAAAFy3dnTJYkWr3gq0CGo3hkj1t47ccUS10f5nADQXWkBZaJvVgTMcY+nZ9p4AZCdUYVmr3dmygWjcMdugHLQEg6x"...>>},
{<<"Records">>,
[{[{<<"Data">>,<<"Zmlyc3QgcmVjb3JkISEh">>},
{<<"PartitionKey">>,<<"BlanePartitionKey">>},
{<<"SequenceNumber">>,
<<"49545722516689138064543799042897648239478878787235479554">>}]}]}]}
What I am struggling with is how do I write a loop that keeps hitting the kinesis endpoint for that stream until there are no more shard iterators, aka I want all records. Since I can't re-assign the variables as I would in Ruby.
WARNING: My code might be bugged but it's "close". I've never ran it and don't see how last iterator can look like.
I see you are trying to do your job entirely in shell. It's possible but hard. You can use named function and recursion (since release 17.0 it's easier), for example:
F = fun (ShardIteratorPayload) ->
{_, [{_, ShardIterator}]} = kinetic:get_shard_iterator(ShardIteratorPayload),
FunLoop =
fun Loop(<<>>, Accumulator) -> % no clue how last iterator can look like
lists:reverse(Accumulator);
Loop(ShardIterator, Accumulator) ->
{ok, [{_, NextShardIterator}, {<<"Records">>, Records}]} =
kinetic:get_records([{<<"ShardIterator">>, <<ShardIterator/binary>>}]),
Loop(NextShardIterator, [Records | Accumulator])
end,
FunLoop(ShardIterator, [])
end.
AllRecords = F(GetShardIteratorPayload).
But it's too complicated to type in shell...
It's much easier to code it in modules.
A common pattern in erlang is to spawn another process or processes to fetch your data. To keep it simple you can spawn another process by calling spawn or spawn_link but don't bother with links now and use just spawn/3.
Let's compile simple consumer module:
-module(kinetic_simple_consumer).
-export([start/1]).
start(GetShardIteratorPayload) ->
Pid = spawn(kinetic_simple_fetcher, start, [self(), GetShardIteratorPayload]),
consumer_loop(Pid).
consumer_loop(FetcherPid) ->
receive
{FetcherPid, finished} ->
ok;
{FetcherPid, {records, Records}} ->
consume(Records),
consumer_loop(FetcherPid);
UnexpectedMsg ->
io:format("DROPPING:~n~p~n", [UnexpectedMsg]),
consumer_loop(FetcherPid)
end.
consume(Records) ->
io:format("RECEIVED:~n~p~n",[Records]).
And fetcher:
-module(kinetic_simple_fetcher).
-export([start/2]).
start(ConsumerPid, GetShardIteratorPayload) ->
{ok, [ShardIterator]} = kinetic:get_shard_iterator(GetShardIteratorPayload),
fetcher_loop(ConsumerPid, ShardIterator).
fetcher_loop(ConsumerPid, {_, <<>>}) -> % no clue how last iterator can look like
ConsumerPid ! {self(), finished};
fetcher_loop(ConsumerPid, ShardIterator) ->
{ok, [NextShardIterator, {<<"Records">>, Records}]} =
kinetic:get_records(shard_iterator(ShardIterator)),
ConsumerPid ! {self(), {records, Records}},
fetcher_loop(ConsumerPid, NextShardIterator).
shard_iterator({_, ShardIterator}) ->
[{<<"ShardIterator">>, <<ShardIterator/binary>>}].
As you can see both processes can do their job concurrently.
Try from your shell:
kinetic_simple_consumer:start(GetShardIteratorPayload).
Now your see that your shell process turns to consumer and you will have your shell back after fetcher will send {ItsPid, finished}.
Next time instead of
kinetic_simple_consumer:start(GetShardIteratorPayload).
run:
spawn(kinetic_simple_consumer, start, [GetShardIteratorPayload]).
You should play with spawning processes - it's erlang main strength.
In Erlang, you can write loop using tail recursive functions. I don't know the kinetic API, so for simplicity, I just assume, that kinetic:next_iterator/1 return {ok, NextIterator} or {error, Reason} when there are no more shards.
loop({error, Reason}) ->
ok;
loop({ok, Iterator}) ->
do_something_with(Iterator),
Result = kinetic:next_iterator(Iterator),
loop(Result).
You are replacing loop with iteration. First clause deals with case, where there are no more shards left (always start recursion with the end condition). Second clause deals with case, where we got some iterator, we do something with it and call next.
The recursive call is last instruction in the function body, which is called tail recursion. Erlang optimizes such calls - they don't use call stack, so they can run infinitely in constant memory (you will not get anything like "Stack level too deep")
Hi I'm a newbie in Erlang and I just started learning about processes. Here I have a typical process loop:
loop(X,Y,Z) ->
receive
{do} ->
NewX = X+1,
NewY = Y+1,
NewZ = Z+1,
Product = NewX * NewY * NewZ,
% do something
loop(NewX,NewY,NewZ)
end.
How do I get the latest value of Product from a function let's say get_product()? I know that message passing will be the logical option but is there a more optimal way of extracting the value?
Here are methods to communicate between Erlang processes I am aware of, and my (possibly wrong) assessment of theirs relative performance.
Message passing. This method will suit most of your needs. I don't know how it is actually implemented, but from my point of view it should be as fast as putting a pointer into a queue and retrieving it back.
Exterior methods, e.g. sockets, files, pipes. These methods might be faster for communicating between different nodes, depending on a problem you solve, your solution and environment your program will be executed in. Inter-node communication in Erlang is done via TCP connections, so if you want to use self written code to communicate via TCP sockets, you should try really hard to outperform Erlang's implementation.
ETS, Dets. These methods won't be faster than message passing (ETS) or file (Dets) assuming best possible implementation.
NIF. You can write one method to save value in your NIF library and one to retrieve it. This one has a potential to outperform message passing since you can just save a value into a variable and return it back when needed and it has no overhead on pattern matching in receive.
Process dictionary. You can get another process dictionary using erlang:process_info(Pid, dictionary) call, in the Pid process you can put value in that dictionary using put(Key, Value) call.
Also, if you want to speed up your Erlang application take a look at HiPE, it might help.
Before switching from message passing to anything from this list to gain in speed you should measure it first!
I assumed this is what you want:
-module(lab).
-compile(export_all).
start() ->
InitialState = {1,1,1},
Pid = spawn(?MODULE, loop, [InitialState]),
register(server, Pid).
loop(State) ->
{X, Y, Z} = State,
receive
tick ->
NewX = X+1,
NewY = Y+1,
NewZ = Z+1,
NewState = {NewX, NewY, NewZ},
loop(NewState);
{get_product, From} ->
Product = X * Y * Z,
From ! Product,
loop(State);
_ ->
io:format("Unknown message received.~n"),
loop(State)
end.
get_product() ->
server ! {get_product, self()},
receive
Product ->
Product
end.
tick() ->
server ! tick.
From within the Erlang shell:
1> c(lab).
{ok,lab}
2> lab:start().
true
3> lab:get_product().
1
4> lab:tick().
tick
5> lab:get_product().
8
6> lab:tick().
tick
7> lab:tick().
tick
8> lab:get_product().
64
In most applications, its hard to avoid the need to query large amounts of information which a user wants to browse through. This is what led me to cursors. With mnesia, cursors are implemented using qlc:cursor/1 or qlc:cursor/2. After working with them for a while and facing this problem many times,
11> qlc:next_answers(QC,3).
** exception error: {qlc_cursor_pid_no_longer_exists,<0.59.0>}
in function qlc:next_loop/3 (qlc.erl, line 1359)
12>
It has occured to me that the whole cursor thing has to be within one mnesia transaction: executes as a whole once. like this below
E:\>erl
Eshell V5.9 (abort with ^G)
1> mnesia:start().
ok
2> rd(obj,{key,value}).
obj
3> mnesia:create_table(obj,[{attributes,record_info(fields,obj)}]).
{atomic,ok}
4> Write = fun(Obj) -> mnesia:transaction(fun() -> mnesia:write(Obj) end) end.
#Fun<erl_eval.6.111823515>
5> [Write(#obj{key = N,value = N * 2}) || N <- lists:seq(1,100)],ok.
ok
6> mnesia:transaction(fun() ->
QC = cursor_server:cursor(qlc:q([XX || XX <- mnesia:table(obj)])),
Ans = qlc:next_answers(QC,3),
io:format("\n\tAns: ~p~n",[Ans])
end).
Ans: [{obj,20,40},{obj,21,42},{obj,86,172}]
{atomic,ok}
7>
When you attempt to call say: qlc:next_answers/2 outside a mnesia transaction, you will get an exception. Not only just out of the transaction, but even if that method is executed by a DIFFERENT process than the one which created the cursor, problems are bound to happen.
Another intresting finding is that, as soon as you get out of a mnesia transaction, one of the processes which are involved in a mnesia cursor (apparently mnesia spawns a process in the background), exits, causing the cursor to be invalid. Look at this below:
-module(cursor_server).
-compile(export_all).
cursor(Q)->
case mnesia:is_transaction() of
false ->
F = fun(QH)-> qlc:cursor(QH,[]) end,
mnesia:activity(transaction,F,[Q],mnesia_frag);
true -> qlc:cursor(Q,[])
end.
%% --- End of module -------------------------------------------
Then in shell, i use that method:
7> QC = cursor_server:cursor(qlc:q([XX || XX <- mnesia:table(obj)])).
{qlc_cursor,{<0.59.0>,<0.30.0>}}
8> erlang:is_process_alive(list_to_pid("<0.59.0>")).
false
9> erlang:is_process_alive(list_to_pid("<0.30.0>")).
true
10> self().
<0.30.0>
11> qlc:next_answers(QC,3).
** exception error: {qlc_cursor_pid_no_longer_exists,<0.59.0>}
in function qlc:next_loop/3 (qlc.erl, line 1359)
12>
So, this makes it very Extremely hard to build a web application in which a user needs to browse a particular set of results, group by group say: give him/her the first 20, then next 20 e.t.c. This involves, getting the first results, send them to the web page, then wait for the user to click NEXT then ask qlc:cursor/2 for the next 20 and so on. These operations cannot be done, while hanging inside a mnesia transaction !!! The only possible way, is by spawning a process which will hang there, receiving and sending back next answers as messages and receiving the next_answers requests as messages like this:
-define(CURSOR_TIMEOUT,timer:hours(1)).
%% initial request is made here below
request(PageSize)->
Me = self(),
CursorPid = spawn(?MODULE,cursor_pid,[Me,PageSize]),
receive
{initial_answers,Ans} ->
%% find a way of hidding the Cursor Pid
%% in the page so that the subsequent requests
%% come along with it
{Ans,pid_to_list(CursorPid)}
after ?CURSOR_TIMEOUT -> timedout
end.
cursor_pid(ParentPid,PageSize)->
F = fun(Pid,N)->
QC = cursor_server:cursor(qlc:q([XX || XX <- mnesia:table(obj)])),
Ans = qlc:next_answers(QC,N),
Pid ! {initial_answers,Ans},
receive
{From,{next_answers,Num}} ->
From ! {next_answers,qlc:next_answers(QC,Num)},
%% Problem here ! how to loop back
%% check: Erlang Y-Combinator
delete ->
%% it could have died already, so we be careful here !
try qlc:delete_cursor(QC) of
_ -> ok
catch
_:_ -> ok
end,
exit(normal)
after ?CURSOR_TIMEOUT -> exit(normal)
end
end,
mnesia:activity(transaction,F,[ParentPid,PageSize],mnesia_frag).
next_answers(CursorPid,PageSize)->
list_to_pid(CursorPid) ! {self(),{next_answers,PageSize}},
receive
{next_answers,Ans} ->
{Ans,pid_to_list(CursorPid)}
after ?CURSOR_TIMEOUT -> timedout
end.
That would create a more complex problem of managing process exits, tracking / monitoring e.t.c. I wonder why the mnesia implementers didnot see this !
Now, that brings me to my questions. I have been walking around the web for solutions and you could please check out these links from which the questions arise: mnemosyne, Ulf Wiger's Solution to Cursor Problems, AMNESIA - an RDBMS implementation of mnesia.
1. Does anyone have an idea on how to handle mnesia query cursors in a different way from what is documented, and is worth sharing ?
2. What are the reasons why mnesia implemeters decided to force the cursors within a single transaction: even the calls for the next_answers ?
3. Is there anything, from what i have presented, that i do not understand clearly (other than my bad buggy illustration code - please ignore those) ?
4. AMNESIA (on section 4.7 of the link i gave above), has a good implementation of cursors, because the subsequent calls for the next_answers, do not need to be in the same transaction, NOR by the same process. Would you advise anyone to switch from mnesia to amnesia due to this and also, is this library still supported ?
5. Ulf Wiger , (the author of many erlang libraries esp. GPROC), suggests the use of mnesia:select/4. How would i use it to solve cursor problems in a web application ? NOTE: Please do not advise me to leave mnesia and use something else, because i want to use mnesia for this specific problem. I appreciate your time to read through all this question.
The motivation is that transaction grabs (in your case) read locks.
Locks can not be kept outside of transactions.
If you want, you can run it in a dirty_context, but you loose the
transactional properties, i.e. the table may change between invocations.
make_cursor() ->
QD = qlc:sort(mnesia:table(person, [{traverse, select}])),
mnesia:activity(async_dirty, fun() -> qlc:cursor(QD) end, mnesia_frag).
get_next(Cursor) ->
Get = fun() -> qlc:next_answers(Cursor,5) end,
mnesia:activity(async_dirty, Get, mnesia_frag).
del_cursor(Cursor) ->
qlc:delete_cursor(Cursor).
I think this may help you :
use async_dirty instead of transaction
{Record,Cont}=mnesia:activity(async_dirty, fun mnesia:select/4,[md,[{Match_head,[Guard],[Result]}],Limit,read])
then read next Limit number of records:
mnesia:activity(async_dirty, fun mnesia:select/1,[Cont])
full code:
-record(md,{id,name}).
batch_delete(Id,Limit) ->
Match_head = #md{id='$1',name='$2'},
Guard = {'<','$1',Id},
Result = '$_',
{Record,Cont} = mnesia:activity(async_dirty, fun mnesia:select/4,[md,[{Match_head,[Guard],[Result]}],Limit,read]),
delete_next({Record,Cont}).
delete_next('$end_of_table') ->
over;
delete_next({Record,Cont}) ->
delete(Record),
delete_next(mnesia:activity(async_dirty, fun mnesia:select/1,[Cont])).
delete(Records) ->
io:format("delete(~p)~n",[Records]),
F = fun() ->
[ mnesia:delete_object(O) || O <- Records]
end,
mnesia:transaction(F).
remember you can not use cursor out of one transaction
I am learning Erlang and trying to figure out how I can, and should, save state inside a process.
For example, I am trying to write a program that given a list of numbers in a file, tells me whether a number appears in that file. My approach is to uses two processes
cache which reads the content of the file into a set, then waits for numbers to check, and then replies whether they appear in the set.
is_member_loop(Data_file) ->
Numbers = read_numbers(Data_file),
receive
{From, Number} ->
From ! {self(), lists:member(Number, Numbers)},
is_member_loop(Data_file)
end.
client which sends numbers to cache and waits for the true or false response.
check_number(Number) ->
NumbersPid ! {self(), Number},
receive
{NumbersPid, Is_member} ->
Is_member
end.
This approach is obviously naive since the file is read for every request. However, I am quite new at Erlang and it is unclear to me what would be the preferred way of keeping state between different requests.
Should I be using the process dictionary? Is there a different mechanism I am not aware of for that sort of process state?
Update
The most obvious solution, as suggested by user601836, is to pass the set of numbers as a param to is_member_loop instead of the filename. It seems to be a common idiom in Erlang and there is a good example in the fantastic online book Learn you some Erlang.
I think, however, that the question still holds for more complex state that I'd want to preserve in my process.
Simple solution, you can pass to your function is_member_loop(Data_file) the list of numbers rather then the file name.
The best solution when you deal with a state consists in using a gen_server. To learn more you should take a look at records and gen_server behaviour (this may also be useful).
In practice:
1) start with a module (yourmodule.erl) based on gen_server behaviour
2) read your file in the init function of the gen_server and pass it as state field:
init([]) ->
Numbers = read_numbers(Data_file),
{ok, #state{numbers=Numbers}}.
3) write a function which will be used to trigger a call to the gen_server
check_number(Number) ->
gen_server:call(?MODULE, {check_number, Number}).
4) write the code in order to handle messages triggered from your function
handle_call({check_number, Number}, _From, #state{numbers=Numbers} = State) ->
Reply = lists:member(Number, Numbers)},
{reply, Reply, State};
handle_call(_Request, _From, State) ->
Reply = ok,
{reply, Reply, State}.
5) export from yourmodule.erl function check_number
-export([check_number/1]).
Two things to be explained about point 4:
a) we extract values inside the record State using pattern matching
b) As you may see I left the generic handle call, otherwise your gen_server will fail due to wrong pattern matching whenever a message different from {check_number, Number} is received
Note: if you are new to erlang, don't use process dictionary
Not sure how idiomatic this is, since I'm not exactly an Erlang pro yet, but I'd handle this by using ETS. Basically,
read_numbers_to_ets(DataFile) ->
Table = ets:new(numbers, [ordered_set]),
insert_numbers(Table, DataFile),
Table.
insert_numbers(Table, DataFile) ->
case read_next_number(DataFile) of
eof -> ok;
Num -> ets:insert(numbers, {Num})
end.
you could then define your is_member as
is_member(TableId, Number) ->
case ets:match(TableId, {Number}) of
[] -> false; %% no match from ets
[[]] -> true %% ets found the number you're looking for in that table
end.
Instead of taking a Data_file, your is_member_loop would take the id of the table to do a lookup on.
Can I Nest receive {tcp, Socket, Bin} -> calls? For example I have a top level loop called Loop, which upon receipt of tcp data calls a function, parse_header, to parse header data (an integer which indicates the kind of data to follow and thus its size), after that I need to receive the entire payload before moving on. I might only receive 4 bytes when I need a full 20 bytes and would like to call receive in a separate function called parse_payload. So the call chain would look like loop->parse_header->parse_payload and I would like parse_payload to call receive {tcp, Socket, Bin} ->. I don't know if this ok or if I'm completely going to mess things up and can only do it in the Loop function. Can someone enlighten me? If I am allowed to do this is am I violating some sort of best practice?
Maybe you can check the sample code for "erlang programming".
The download page is Erlang Programming Source Code
In file socket_examples.erl, please check "receive_data" function.
For perse message, I think you should determine how to seperate messages one by one (fixed length or with termination byte), then parse message's header, and payload.
receive_data(Socket, SoFar) ->
receive
{tcp,Socket,Bin} -> %% (3)
receive_data(Socket, [Bin|SoFar]);
{tcp_closed,Socket} -> %% (4)
list_to_binary(reverse(SoFar)) %% (5)
end.
You can also set a gen_tcp socket in passive mode. This way, the owning process won't receive the input by messages but has to fetch it using gen_tcp:recv(Socket, ByteCount) which returns either {ok, Input} or {error, Reason}. As this methods waits infinitely for the bytes you might want to add a timeout using gen_tcp:recv/3. (Erlang documentation of gen_tcp:recv)
While at first glance it might seem the process is now completely unable to react to messages sent to it, there is the following workaround improving the situation a bit:
f1(X) ->
receive
message1 ->
... do something ...,
f1(X);
message2 ->
... do something ...,
f1(X)
after 0 %timeout in ms
{ok, Input} = gen_tcp:recv(Socket, ByteCount, Timeout),
... do something ... % maybe call some times gen_tcp:recv again
f1(X)
end.
If you don't add a timeout to gen_tcp:recv here, other processes could wait ages for f1 to handle their messages.