Erlang SSL TCP Server And Garbage Collection - erlang

Edit: The issue seems to be with SSL acccpeting and a memory leak.
I have noticed if you have long lived Processes(its a server), and clients send data to the server(recv), the Erlang garbage collection never gets called (or rarely)
Servers need data (to preform actions), and the data can be variable length (due to a message like "Hello" or "How are you doing"). Because of this, it seems like the Erlang process will accumulate garbage.
How can you properly handle this, the Erlang process has to touch the recv data, so is it unavoidable? Or do you have to come up with designs that touches the variable length data the less amount of times (like immediately passing it to a port driver).
Spawning a worker to process the data is a bad solution(millions of connections ...), and using workers would basically be the same thing, right? So that leaves me with very few options.
Thanks ...

If the server holds on to the received message longer than it needs to, it's a bug in the server implementation. Normally, the server should forget all or most references to the data in a request when that request has finished processing, and the data will then become garbage and will eventually get collected. But if you stick the data from each request in a list in the process state, or in an ets table or similar, you will get a memory leak.
There is a bit of an exception with binaries larger than 64 bytes, because they are handled by reference counting, and to the memory allocator it can look like there's no need to perform a collection yet, although the number of bytes used off-heap by such binaries can be quite large.

Just incase anyone finds themselves in the same boat. This is known/happening when hammering the server with many connections at once. I think.
Without ssl, my sessions are at ~8KB roughly, and resources are being triggered for GC as expected. With SSL its an increase to ~150KB, and the memory keeps growing and growing and growing.
http://erlang.org/pipermail/erlang-questions/2017-August/093037.html

You might be having a problem with large SSL/TLS session tables. We (OTP team) have historically had some issues with those tables as they may grow very large if you do not have some limiting mechanisms. Alas in the latest ssl version one of the limiting mechanism was broken, however it is easily fixed by this patch. As a workaround you can also sacrifice some efficiency and disable session reuse.
diff --git a/lib/ssl/src/ssl_manager.erl b/lib/ssl/src/ssl_manager.erl
index ca9aaf4..ef7c3de 100644
--- a/lib/ssl/src/ssl_manager.erl
+++ b/lib/ssl/src/ssl_manager.erl
## -563,7 +563,7 ## server_register_session(Port, Session, #state{session_cache_server_max = Max,
do_register_session(Key, Session, Max, Pid, Cache, CacheCb) ->
try CacheCb:size(Cache) of
- Max ->
+ Size when Size >= Max ->
invalidate_session_cache(Pid, CacheCb, Cache);
_ ->
CacheCb:update(Cache, Key, Session),

Related

Why do you use `stream` for GRPC/protobuf file transfers?

I've seen a couple examples like this:
service Service{
rpc updload(stream Data) returns (google.protobuf.Empty) {};
rpc download(google.protobuf.Empty) returns (stream Data) {};
}
message Data { bytes bytes = 1; }
What is the purpose of using stream, does it make the transfer more efficient?
In theory - yes - I obviously wan't to stream my file transfers but that's what happens over a connection... So, what is the actual benefit to this keyword, does it enforce some form of special buffering to reduce some overhead? Either way, the data is being transmitted, in full!
It's more efficient because, within a single call, multiple messages may be sent.
This avoids, not only re-establishing another (hopefully TLS i.e. even more work) connection with the server but also avoids spinning up client and server "stubs"; both the client and server are ready for more messages.
It's somewhat similar to being connected on a telephone call with your friend who, before hanging up, says "Oh, another thing...". Instead of hanging up the call and then, 5 minutes later, calling you back, interrupting dinner and causing you to pause a movie.
The answer is very similar to the gRPC + Image Upload question, although from a different perspective.
Doing a large download (10+ MB) as a single response message puts strong limits on the size of that download, as the entire response message is sent and processed at once. For most use cases, it is much better to chunk a 100 MB file into 1-10 MB chunks than require all 100 MB to be in memory at once. That also allows the downloader to begin processing the file before the entire file is acquired which reduces processing latency.
Without streaming, chunking would require multiple RPCs, which are annoying to coordinate and have performance complications. Because there is latency to complete RPCs, for reasonable performance you either have to do many RPCs in parallel (but how many?) or have a large batch size (but how big?). Multiple RPCs can also hit colder application caches, as each RPC goes to a different backend.
Using streaming provides the same throughput as the non-chunking approach without as many headaches of normal chunking approaches. Since streaming is pipelined (server can start sending next chunk as soon as previous chunk is sent) there's no added per-chunk latency between the client and server. This makes it much easier to choose a chunk size, as there is a wide range of "reasonable" sizes that will behave similarly and the system will naturally react as network performance varies.
While sending a message on an existing stream has less overhead than creating a new RPC, for many users the difference is negligible and it is generally better to structure your RPCs in a way that is architecturally beneficial to your application and not just to eek out small optimizations in gRPC. The reason to use the stream in this case is to make your application perform better at lower complexity.

Data Retrieval Throughput - ETS lookup vs inter-process Messaging

suppose we have an erlang application which involves thousands of processes. Suppose there is a single resource X which may be a tuple, a list, or any erlang term, which all these processes may need to read / pick out something from it, at any moment in time.
An example of such an occurrence, is say, an API system, in which client processes may need to read and write on a remote machine. Ant it happens that you do not want, for each read/write request, a new connection to be created. So, what you do, you create a pool of connections, consider them as a pool of open pipes/sockets/channels.
Now, this pool of resources is to be shared by thousands of processes such that for each read or write demand, you want that process to retrieve any available open channel/resource.
Question is, what if i have a process (a single process) hold this information, whether in its process dictionary or in its receive loop. It would mean that all the processes would have to send a message to this process whenever they need a free resource. This single process would have a huge mailbox at any time because of the high demand for this single resource. OR I could use an ETS Table, and have only one row, say, #resources{key=pool,value= List_of_openSockets_or_channels}. But this would mean that, all our processes would attempt to make a read from the ETS Table for the same row at (high probability) same instantaneous times.
How would the ETS Table handle, if 10,000 process atttempt a read, for the same row/record from it, at the same time/at almost same time ? and yet, if i use a process, its mailbox, if 10,000 processes send a message to it, at same time, for the same resource (and it would need to reply each requestor). And remember this action may occur so frequently. What option (dis-regarding availability issues of process going down blah blah), would provide higher throughput, in a way that, processes would get what they need faster ? Is there any other better way, of handling high demand data structures in the Erlang VM in a way that will provide very fast access to millions of processes, even if they all needed that resource at the same time ?
Short answer: profile. Try different approaches and verify how your system behaves.
Firstly, I would look at ETS' {read_concurrency, true} option. From the documentation:
{read_concurrency,boolean()} Performance tuning. Default is false.
When set to true, the table is optimized for concurrent read
operations. When this option is enabled on a runtime system with SMP
support, read operations become much cheaper; especially on systems
with multiple physical processors. However, switching between read and
write operations becomes more expensive. You typically want to enable
this option when concurrent read operations are much more frequent
than write operations, or when concurrent reads and writes comes in
large read and write bursts (i.e., lots of reads not interrupted by
writes, and lots of writes not interrupted by reads). You typically do
not want to enable this option when the common access pattern is a few
read operations interleaved with a few write operations repeatedly. In
this case you will get a performance degradation by enabling this
option. The read_concurrency option can be combined with the
write_concurrency option. You typically want to combine these when
large concurrent read bursts and large concurrent write bursts are
common.
Secondly, I would look at caching possibilities. Are the processes reading that information only once or multiple times? If they're accessing it multiple times, you could read it once and store it in your process state.
Thirdly, you could try to replicate and distribute that piece of information across your system. Divide et impera.
If you use the process approach, in order to avoid having all the read requests serialized on the message queue of the 'server' process you must replicate.
Using an ETS table with read_concurrency feels more natural and it is something that I used when developing the parallel version of Dialyzer. However, ETS access was never a bottleneck in that case.

Cannot allocate 298930300 bytes of memory (of type "old_heap")

While load testing my erlang server with increasing number (100, 200, 3000,....) of processes using +P which is the maximum number of concurrent processes, as well as making 10 process sending 1 message to the rest of the created processes, I got a message on the erlang console:
"Crash dump was written to: erl_crash.dump. eheap_alloc: Cannot allocate 298930300 bytes of memory (of type "old_heap"). Abnormal termination".
I'm using Windows XP. The is no problem when I create the process (it's working). The crash happens after the process starts communicating (sending hi & receiving hello) and this is the only problem I have (by the way, +hms which sets the default heap size of processes).
How can I resolve this?
If somebody will find it useful as one of possible reasons for such problem(since I haven't found any specific answer anywhere)
we've experienced similar problem with rabbitmq server (linux, 64bit, persistent queue, watermarks with default config)
eheap_alloc: Cannot allocate yyy bytes of memory (of type "heap")
eheap_alloc: Cannot allocate xxx bytes of memory (of type "old_heap")
The problem was in re-queueing too much messages at once. Our "monitoring" code used "get" message with re-queue option without limiting number of messages to get & re-queue(in our case -all messages in the queue which was 4K)
So at a time it tried to add all this messages back to queue the server failed with above message.
hope it will save few hours to someone.
Have a look at that erl_crash.dump file using the Crashdump Viewer:
/usr/local/lib/erlang/lib/observer-1.0/priv/bin/cdv erl_crash.dump
(Apologies for the Unix path; you should be able to find a cdv.bat in your installation on Windows.)
Look at the process list; in my experience there's fairly often a process with a really long message queue where you didn't expect it.
You ran out of memory. Try decreasing the default heap size or limit the number of processes you start.
More advanced solutions include profiling your application to see if you can save some memory there, for example better sharing of binaries or less use of lists and large messages (which will copy the data to every process it's sent to).
One of your processes tries allocate almost 300MB memory. You have probably memory leak in your server. In proper design you should not have so much big heap in one process if it is not intended.

OutOfMemoryException Processing Large File

We are loading a large flat file into BizTalk Server 2006 (Original release, not R2) - about 125 MB. We run a map against it and then take each row and make a call out to a stored procedure.
We receive the OutOfMemoryException during orchestration processing, the Windows Service restarts, uses full 2 GB memory, and crashes again.
The server is 32-bit and set to use the /3GB switch.
Also I've separated the flow into 3 hosts - one for receive, the other for orchestration, and the third for sends.
Anyone have any suggestions for getting this file to process wihout error?
Thanks,
Krip
If this is a flat file being sent through a map you are converting it to XML right? The increase in size could be huge. XML can easily add a factor of 5-10 times over a flat file. Especially if you use descriptive or long xml tag names (which normally you would).
Something simple you could try is to rename the xml nodes to shorter names, depending on the number of records (sounds like a lot) it might actually have a pretty significant impact on your memory footprint.
Perhaps a more enterprise approach, would be to subdivide this in a custom pipeline into separate message packets that can be fed through the system in more manageable chunks (similar to what Chris suggests). Then the system throttling and memory metrics could take over. Without knowing more about your data it would be hard to say how to best do this, but with a 125 MB file I am guessing that you probably have a ton of repeating rows that do not need to be processed sequentially.
Where does it crash? Does it make it past the Transform shape? Another suggestion to try is to run the transform in the Receive Port. For more efficient processing, you could even debatch the message and have multiple simultaneous orchestration instances be calling the stored procs. This would definately reduce the memory profile and increase performance.

Which requests cause a w3wp process to grow considerably?

On a production environment, how can one discover which Asp.Net http requests, whether aspx or asmx or custom, are causing the most memory pressure within a w3wp.exe process?
I don't mean memory leaks here. It's a good healthy application that disposes all it's objects nicely. Microsoft's generational GC does it's work fine.
Some requests however, cause the w3wp process to grow its memory footprint considerably, but only for the duration of the request.
It is simply a question of the cost-efficiency and scalability of a production environment for a SAAS app, in order to regularly report back to the development department on their most memory hogging "pages", to return that (memory) pressure where it belongs, so to speak.
There doesn't seem to be anything like:
HttpContext.Request.PeakPrivateBytes or .CurrentPrivateBytes
or
Session.PeakPrivateBytes
You might want to use a tool like Performance Monitor to monitor the "Process\Working Set" for the W3WP.exe process and record it to a database. You then could could correlate it to the HTTP logs for the IIS Server.
It helps to have both the Perfmon data and HTTP logs both writing to an SQL database. Then you can use T-SQL to bring up requested pages by Date/Time around the time of the observed memory pressure. Use the DatePart function to build a Date/Time rounded to the desired accuracy of Second or Minute as needed.
Hope this helps.
Thanks,
-Glenn
If you are using InProc session state, all your session data is stored in w3wp's memory, and may be the cause of it growing.
I wouldn't worry about it.
It could be that the GC is happening during the request, and the CLR is allocating memory to move things around. Or it could be some other periodic servicing thing that comes along with ASPNET.
Unless you are prepared to go spelunking with perf counter analysis of generation 0,1,2 GC events , and etc, then I wouldn't worry about solving this "problem".
And it doesn't sound like it's a problem anyway - just a curiosity thing.

Resources