I generate a very large .csv file from a database using the method outlined in
https://stackoverflow.com/a/13456219/141172
It works fine, up to a point. When the exported file is too large, I get an OutOfMemoryException.
If I turn off output buffering by modifying that code like this:
protected override void WriteFile(System.Web.HttpResponseBase response)
{
response.BufferOutput = false; // <--- Added this
this.Content(response.OutputStream);
}
the file download completes. However, it is several orders of magnitude slower than when output buffering was enabled (measured for the same file with buffering true/false, on localhost).
I understand that is slower, but why would it slow to a relative crawl? Is there anything I can do to improve processing speed?
UPDATE
It would also be an option to use File(Stream stream, String contentType) as suggested in the comments. However, I'm not sure how to create stream. The data is dynamically assembled based on a DB query, and a MemoryStream would run out of contiguous physical memory. Suggestions are welcome.
UPDATE 2
It was suggested in the comments that alternately reading from the database and writing to the stream is causing a degradation. I modified the code to perform the stream writing in a separate thread (using the producer/consumer pattern). There is no appreciable difference in performance.
I don't know what ASP.NET and IIS are doing exactly with output streaming but maybe too small chunks are being uses. Hook in a BufferedStream with a very big buffer, like 4MB.
According to your comments it worked. Now, tune down the buffer size to save memory and have a smaller working set. Good for cache.
As a subjective comment I'm disappointed that this is even necessary. IIS should use the right buffers automatically which is extremely easy with TCP connections.
EDIT FROM OP
Here is the code derived from this answer
public ActionResult Export()
{
// Domain specific stuff here
return new FileGeneratingResult("MyFile.txt", "text/text",
stream => this.StreamExport(stream), false);
}
private void StreamExport(Stream stream)
{
using (BufferedStream bs = new BufferedStream(stream, 256*1024))
using (StreamWriter sw = new StreamWriter(bs))
foreach (var stuff in MyData())
{
sw.Write(stuff);
}
}
In Eric's latest update, he mentioned using another thread. I too had this problem for implementing database exports. Here is some example code for the solution I used:
Handling with temporary file stream
Related
Original question: Does the IPFS.add() method automatically update my local DHT and propagate it to other peers?
In order to test whether the IPFS.add() method alone allows other peers to download content from a my pc, I ran this script on my windows pc:
import * as IPFS from "ipfs";
const node = await IPFS.create();
var file = await node.add("Shiiiiiiiiiiitttt");
and ran this code on my macbook to fetch the file:
import * as IPFS from "ipfs";
const node = await IPFS.create();
//fetching Shiiiiiiiiiiitttt
const stream = node.cat("QmetK5x9nLUG5jDwp7Un25n47exuNjDZ3cKvnKfC6Hebmi")
const decoder = new TextDecoder()
let data = ''
for await (const chunk of stream) {
// chunks of data are returned as a Uint8Array, convert it back to a string
data += decoder.decode(chunk, { stream: true })
console.log("decoding")
}
//At the end, as long as ipfs is running in owner node, there was no need for ipfs.dht.provide method call
console.log(data)
What I found out through this test was that as long as I keep running jsipfs daemon or the script itself on the windows pc that adds a file or text, I can retrieve it from other devices using IPFS.cat(). This confuses me deeply, since IPFS also has a separate method, IPFS.dht.provide(), and my current understanding of IPFS dictates that an updated dht propagation to other peers is necessary to enable them to fetch files. However, from my test, I can logically conclude that there has to be some method within IPFS.add() that propagates an updated distributed hash table or at least a similar alternative to other peers so that they know I have the file. I'm having a very difficult time finding these source methods for automatic dht propagation upon adding a file and would appreciate any help on finding the said methods or an under-the-hood explanation of what happens during IPFS.add() .
Check out the ipfs docks https://docs.ipfs.tech/concepts/ , far more important than you first think.
I start using a TFileStream and TStreamWriter to write simple text logfiles (instead of old Writeln(T,....)). And I have multiple applicatiosn writing to the same logfile.
Each appplication has its own TFileStream of course and they each open the file like this
FFileStream:=TFileStream.Create(LogName, fmOpenReadWrite+fmShareDenyNone)
FExporter:=TStreamWriter.Create(FFilestream, TEncoding.UTF8);
FExporter.NewLine:=#$0A;
FExporter.AutoFlush:=TRUE;
and write to the file with
FExporter.BaseStream.Seek(0, soFromEnd);
FExporter.Write('['+DateToStr(Now, FDateTimeFormat)+'] ['+TimeToStr(Now, FDateTimeFormat)+'] [#'+Lead0(GetCurrentThreadId, 5)+']: '+EntryText);
FExporter.WriteLine;
the result is somewhat "unsatisfactory" as the lines are displaced, empty lines in between and does not seem to work.
HOW would I do that correctly?
Writing multiples lines at the same time in multiples process may result in unexpected continue, because parallels execution.
You should assure that you are writing a block continually so WriteLine shoud be send inside the write using lineBreak at the end.
So the way you can write should be:
FExporter.BaseStream.Seek(0, soFromEnd);
FExporter.Write('['+DateToStr(Now, FDateTimeFormat)+'] ['+TimeToStr(Now, FDateTimeFormat)+'] [#'+Lead0(GetCurrentThreadId, 5)+']: '+EntryText + System.slineBreak);
//FExporter.WriteLine;
Update1:
As the link Oliver posted, sometime it can not work if the message size to be written is bigger than the OS file sector and, at that very moment, other process also try to write a message. Thus in this case the result content might be mixed.
So doing what I first purpose you would increase the probability to have the desired result, but may not be the solution in 100% of the cases.
To be 100% sure of writing continuous log in a single file, using multiples process, you should create a log process to receive a message from the others and to be the only responsible for writing synchronized log throughout threads.
I have read the official document.I'm confused that the document conflict itself.
Here is the document picked from the official:
However, this code is well-formed:
ws.async_read(b, [](error_code, std::size_t){});
ws.async_write(b.data(), [](error_code, std::size_t){});
ws.async_ping({}, {});
ws.async_close({}, {});
and here is another snippet:
This operation is implemented in terms of one or more calls to the next layer's async_write_some functions, and is known as a composed operation. The program must ensure that the stream performs no other write operations (such as websocket::stream::async_write, websocket::stream::async_write_some, or websocket::stream::async_close).
so, which one should I trust?
This part is correct:
https://www.boost.org/doc/libs/1_67_0/libs/beast/doc/html/beast/using_websocket/notes.html#beast.using_websocket.notes.thread_safety
The other text needs to be updated.
I am getting this exception when transforming a xml with xslt:
Caused by: java.lang.OutOfMemoryError: Java heap space
at net.sf.saxon.tree.tiny.TinyTree.condense(TinyTree.java:430)
at net.sf.saxon.tree.tiny.TinyBuilder.close(TinyBuilder.java:206)
at net.sf.saxon.event.ReceivingContentHandler.endDocument(ReceivingContentHandler.java:244)
at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:449)
at net.sf.saxon.event.Sender.send(Sender.java:177)
at net.sf.saxon.Controller.makeSourceTree(Controller.java:1910)
at net.sf.saxon.s9api.XsltTransformer.transform(XsltTransformer.java:573)
at net.sf.saxon.jaxp.TransformerImpl.transform(TransformerImpl.java:185)
at com.lomnido.service.XsltTransformService.$tt__transform(XsltTransformService.groovy:27)
I am using Saxon-HE, version 9.7.0-5
My code:
TransformerFactory factory = TransformerFactory.newInstance();
StreamSource xsltStream = new StreamSource(xslt)
factory.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true);
Transformer transformer = factory.newTransformer(xsltStream);
StreamSource ins = new StreamSource(input);
File tmp = File.createTempFile("test", "xslttransform")
StreamResult out = new StreamResult(tmp);
transformer.transform(ins, out);
The size of the xml file is about 100MB. Is there any way how I could avoid this problem? Is there something like streaming the input file? Is there an alternative to saxon? I need xslt 2.0 for my transformations.
Best regards,
Peter
Processing a 100Mb source document should be perfectly feasible without resorting to XSLT 3.0 streaming. Just make sure you have allocated enough memory to the Java VM. The source document generally takes about 5 times the raw XML size, but of course it depends on the detail. But if you run with -Xmx2g, I certainly wouldn't expect this to fail unless something unusual is going on.
Once the size reaches 500Mb you probably do want to start to think about using XSLT 3.0 streaming. But you haven't said anything about what the transformation is doing, so it could be very easy, it could be fairly difficult, or it could be impossible, depending on the actual transformation to be performed.
This is probably not of major importance, however I have noticed during testing that the performance of the print statement and also stdout is much faster in the Dart-Editor than from the command-line. From the command-line the performance of print takes around 36% longer than using stdout from the command-line. However, running the program from within the editor, using stdout takes around 900% longer than using the print statement in the editor, but both are considerably faster than from the command-line. ie. Print from a program running in the editor takes around 2.65% of the time it takes from the command-line.
Some relative timings based on average performance from my test :
Running program from command line (5000 iterations) :
print 1700 milliseconds.
stdout 1245 milliseconds.
Running program within Dart-Editor (5000 iterations) :
print 45 milliseconds
stdout 447 milliseconds.
Can someone explain to me the reason for these differences – in particular why performance in the Dart-Editor is so much faster? Also, is it acceptable practice to use stdout and what are the pros and cons versus using print?
Why is the Dart Editor faster?
Because the output handling by the command line is just really slow, and this blocks the output stream, and subsequently the call to print/stdout.
You can test this for yourself - test the following java program (with your own paths, of course):
public static void main(String[] args) {
try {
// the dart file does print and stdout in a loop
Process p = Runtime.getRuntime().exec("C:\\eclipse\\dart-sdk\\bin\\dart.exe D:\\DEVELOP\\Dart\\Console_Playground\\bin\\console_playground.dart");
BufferedReader in = new BufferedReader(new InputStreamReader(p.getInputStream()));
StringBuffer buf = new StringBuffer();
String line;
while((line = in.readLine()) != null) {
buf.append(line + "\r\n");
}
System.out.print(buf.toString());
} catch (IOException e) {
e.printStackTrace();
}
}
On my machine, this is even slightly faster than the Dart Editor (which probably does something like buffering the input and rendering it periodically, but I don't really know).
You will also see that adding a Thread.sleep(1); into the loop will severely impact the performance of the dart program, because the stream is blocked.
Should stdout be used?
I think that's highly subjective. I, for one, do whatever lets me write code more quickly. When i just want to dump a variable, i use print(myvar);. But with stdout, you can do neat stuff like this: stdout.addStream(new File(r"D:\test.csv").openRead());. Of course, if performance is an issue, it depends on how your application will be used - for example, called by another program (where print is faster) vs. command line (where stdout is faster, for some reason).
Why is stdout faster in command line?
I have no idea, sorry. It's the only environment I tested where print() is slower, so I'd guess it has something to do with how the console handles incoming data.