How to process large response using Spring WebClient without buffering it first - buffer

We want to process large response (tens of MB) using Spring WebClient. We are encountering as discussed here. But we do not want to increase the buffer size. We are only using a small subset of the response so if there is a way to directly pipe the response stream to Jackson for de-serialization, we could spare lots of unnecessary memory allocations. Is it possible to skip the buffer or not?

It seems that it's not possible. At the end we switched to plain OkHttp client and it handles the same load without any issue.


Why do you use `stream` for GRPC/protobuf file transfers?

I've seen a couple examples like this:
service Service{
rpc updload(stream Data) returns (google.protobuf.Empty) {};
rpc download(google.protobuf.Empty) returns (stream Data) {};
message Data { bytes bytes = 1; }
What is the purpose of using stream, does it make the transfer more efficient?
In theory - yes - I obviously wan't to stream my file transfers but that's what happens over a connection... So, what is the actual benefit to this keyword, does it enforce some form of special buffering to reduce some overhead? Either way, the data is being transmitted, in full!
It's more efficient because, within a single call, multiple messages may be sent.
This avoids, not only re-establishing another (hopefully TLS i.e. even more work) connection with the server but also avoids spinning up client and server "stubs"; both the client and server are ready for more messages.
It's somewhat similar to being connected on a telephone call with your friend who, before hanging up, says "Oh, another thing...". Instead of hanging up the call and then, 5 minutes later, calling you back, interrupting dinner and causing you to pause a movie.
The answer is very similar to the gRPC + Image Upload question, although from a different perspective.
Doing a large download (10+ MB) as a single response message puts strong limits on the size of that download, as the entire response message is sent and processed at once. For most use cases, it is much better to chunk a 100 MB file into 1-10 MB chunks than require all 100 MB to be in memory at once. That also allows the downloader to begin processing the file before the entire file is acquired which reduces processing latency.
Without streaming, chunking would require multiple RPCs, which are annoying to coordinate and have performance complications. Because there is latency to complete RPCs, for reasonable performance you either have to do many RPCs in parallel (but how many?) or have a large batch size (but how big?). Multiple RPCs can also hit colder application caches, as each RPC goes to a different backend.
Using streaming provides the same throughput as the non-chunking approach without as many headaches of normal chunking approaches. Since streaming is pipelined (server can start sending next chunk as soon as previous chunk is sent) there's no added per-chunk latency between the client and server. This makes it much easier to choose a chunk size, as there is a wide range of "reasonable" sizes that will behave similarly and the system will naturally react as network performance varies.
While sending a message on an existing stream has less overhead than creating a new RPC, for many users the difference is negligible and it is generally better to structure your RPCs in a way that is architecturally beneficial to your application and not just to eek out small optimizations in gRPC. The reason to use the stream in this case is to make your application perform better at lower complexity.

Why string is not disposed and how can I optimize the memory usage?

I am writing an excel parse web api that reads a structured file into objects(a recurrence function).
But I've noticed a significant memory spike while parsing some files and thus would throw OutOfMemory exception. The excel parse engine needs the whole file to be loaded before it can read its structure. And I found out it's not the loading that consumes the most of the memory, it's the parsing(turning excel to structured objects) and the json http return(serialize the objects to json) that finally kills the memory. For example, a 1M large file can parse into 70M json.
So I googled around, found this .net Memory Profiler and tried to analyze what was going on that led to this huge memory usage. Here is the snapshot that I captured while parsing the same file twice. I've noticed that there are huge string /Object[] that are not being GCed.
Now I'm at lost. What are the best practices when you are dealing with lots of List and lots of string? As to reducing the memory usage, where should I start looking into? What are the best practices while handling long running process(Adding queue? Use signalR to notify the process result?)?
Some guidance would be really appreciated!
I'd say you need to avoid loading the whole file at once.
Here is a good place to start:
Load Large Excel file in C#

What is the best size for a buffer in BlackBerry?

In my application I need to read data from an input stream. I have set the current buffer size for reading as 1024. But I have seen in some Android applications buffer size has been kept as 8192 (8 KB). Will there be any specific advantage if I increase the buffer size in my application to 8KB?
Any expert opinion will be much appreciated.
Edit: (I am using BB OS 6 and 7 and I am dealing with network inputstream.)
I can't say that I've found the universally best buffer size, but it seems to me that something in the range of 1KB to 8KB should be fine in most situations (for BlackBerry Java apps).
Keep in mind that if the amount of data is small (so you'd probably only need one or two buffers at 1KB-8KB), it's probably best just to use the IOUtilities method:
byte[] result = IOUtilities.streamToBytes(inputStream);
with which you don't need to actually pick a buffer size. But, if you know that result would be a large block of data, you're probably right in wanting to read one buffer at a time.
However, I would argue that the answer should almost always be obtained simply by building the app, and measuring performance with a few different values for byte buffer size. It's easy enough to change one constant, build, run and measure again, and then you're not guessing, or taking the advice of someone who doesn't know all the details of your app.
See here for information about BlackBerry Eclipse plugin memory analysis, and
here for BlackBerry Eclipse plugin profiling.
These tools are found in Eclipse by selecting the Window menu, then Show View -> Other... -> BlackBerry -> BlackBerry Memory Statistics View, or BlackBerry Profiler View, while debugging.
This way, you can see how much memory, or processor, the network code is using during the call to retrieve data and populate your buffer.
BlackBerry InputStream to String conversion
This question was also asked in the official BlackBerry forum here:
The OP gave this clarification:
"I am reading from network. Once I establish socket connection with the server, the server will send me notifications one after the other. I need to read the notifications/data from the inputstream available in the socket connection. For this I have a background thread which checks anything is available in the inputstream and if something is available, it will read with the help of a buffer and then passes the read data to a StringBuffer."
Given this information, I have a different take, in that I think the BlackBerry network handling abstracts the Java application from the network buffer processing to the extent that the application buffer size will have little if any impact on the performance.
But be aware, this is only my opinion.
My response on that thread was as follows:
First thing to note is that the method "isAvailable()", in my experience, does not work correctly on OS 5.0 and earlier. It is fixed in OS 6 (at least from my testing).
Because isAvailable() was broken, (and for other application reasons) what I have implemented for a socket connection is that each message is preceded by a length. So in the socket connection, I read the length of the next message, and then the actual data. This is done with no blocking - in other words I read the entire message, regardless of size. I recommend you do the same. The message must exist in full somewhere so it makes no difference if it is in some memory managed by the socket connection, or in some memory managed by you.
Note also, until OS 6.0, when you did the read you would get all the data to fill the buffer you had - in other words it waited till the buffer was full. In OS 6.0 and later, the read can complete without giving you a full buffer.
In your case, you might be working in a post OS 6.0 only, so you could use isAvailable() - create a buffer of that size, and read everything. I can't see that it makes any difference whether you have the bytes in memory managed by the socket, or memory managed by you.
But in fact, I would argue that the best approach is the one that makes your processing simplest. So for example, if you know that the next message is 200 bytes, then read 200 bytes, and then process that message. Then read the next message.
You could spend a lot of time attempting to manage the buffers to match the underlying socket buffers. I don't know exactly how the underlying BlackBerry socket processing code works, but it doesn't put data directly into your buffers. So let it manage its buffer size to optimize the network, you manage your buffer size to optimize your processing. That will work best for everyone.

Multiple hits to an API bringing server to it's knees

I am using an API (Let's pretend its facebook) to gather data between two given dates. Because of API restrictions (like most) I can only grab so many at a time, and therefor have to page my way through the results.
Here is my issue/question though.. Is it better to
get fewer results back, and make more calls to the api
get more results back, and fewer calls to the api
I am running a 4GB instance of a cloud server..
The data I'm looking at is in XML format, and contains about 20k entries. Each entry contains probably another 20 tags within it. Once completely pulled down the data ends up being about 10MB.. my problem is that when my server is hitting the api, gathering this information the CPU and Memory spike to nearly 100%. I've tried retrieving 500 at a time, 1000 at a time, 5000 at a time.. is this something where I need to gather 20 at a time.. or is there something else I should look at?
I'm not sure what else to provide, if there is something I can provide just let me know
Updates based on answers
I host with Storm on Demand, which runs perfectly for us and seems to be great hardware -
I use HPricot to parse the XML (which could probably be optimized, I'm no expert here)
I do need all of the data, this service doesn't offer an export, only API.
EDIT [to help people stumbling on this later]
I switched from Hpricot to Nokogiri, MUCH faster.
Also, I was building an XML file in memory, apparently that is extremely intense, and was a very time consuming task. I've cut this operation down from about 10 minutes, to just over 1 minute by fixing these two things.
Here's a list of things to look at:
optimize your code. try profiling your code and see if you can improve it. Mast likely using a better parser (DOM vs SAX) is possible.
get a better hardware/hosting. 4GB is just memory. Most likely you are on a shared hosting/vm and CPU limited
offload some CPU/memory heavy operations to a faster service/application, like XML processing, data analysis, file io can be done in C/C++
in a proper cloud environment you should be able to spawn more VMs and adjust your jobs/load accordingly. That will cost more tough and require some kind of job manager.
The questions you need to ask is why is your CPU+ memory spiking? 4GB is plenty to be handling this data, so is your code optimized to handle this task? If not, what can you do?
Is your code optimized enough? Fair enough. You can now rewrite them using C extensions.
After optimizing your code, I'd suggest checking out processing this data 'later', as in a delayed job. This way you aren't blocking on the entire dataset which may strain your server.
You also mentioned you are running a cloud server, which I can assume you have access to more Virtual Machines. You can process this data in pararel to reduce stress per machine.

OutOfMemoryException Processing Large File

We are loading a large flat file into BizTalk Server 2006 (Original release, not R2) - about 125 MB. We run a map against it and then take each row and make a call out to a stored procedure.
We receive the OutOfMemoryException during orchestration processing, the Windows Service restarts, uses full 2 GB memory, and crashes again.
The server is 32-bit and set to use the /3GB switch.
Also I've separated the flow into 3 hosts - one for receive, the other for orchestration, and the third for sends.
Anyone have any suggestions for getting this file to process wihout error?
If this is a flat file being sent through a map you are converting it to XML right? The increase in size could be huge. XML can easily add a factor of 5-10 times over a flat file. Especially if you use descriptive or long xml tag names (which normally you would).
Something simple you could try is to rename the xml nodes to shorter names, depending on the number of records (sounds like a lot) it might actually have a pretty significant impact on your memory footprint.
Perhaps a more enterprise approach, would be to subdivide this in a custom pipeline into separate message packets that can be fed through the system in more manageable chunks (similar to what Chris suggests). Then the system throttling and memory metrics could take over. Without knowing more about your data it would be hard to say how to best do this, but with a 125 MB file I am guessing that you probably have a ton of repeating rows that do not need to be processed sequentially.
Where does it crash? Does it make it past the Transform shape? Another suggestion to try is to run the transform in the Receive Port. For more efficient processing, you could even debatch the message and have multiple simultaneous orchestration instances be calling the stored procs. This would definately reduce the memory profile and increase performance.
