How do I write a futures::Stream to disk without storing it entirely in memory first?

How do I write a futures::Stream to disk without storing it entirely in memory first? - stream

There's an example of downloading a file with Rusoto S3 here:
How to save a file downloaded from S3 with Rusoto to my hard drive?
The problem is that it looks like it's downloading the whole file into memory and then writing it to disk, because it uses the write_all method which takes an array of bytes, not a stream. How can I use the StreamingBody, which implements futures::Stream to stream the file to disk?

Since StreamingBody implements Stream<Item = Vec<u8>, Error = Error>, we can construct a MCVE that represents that:
extern crate futures; // 0.1.25
use futures::{prelude::*, stream};
type Error = Box<std::error::Error>;
fn streaming_body() -> impl Stream<Item = Vec<u8>, Error = Error> {
const DUMMY_DATA: &[&[u8]] = &[b"0123", b"4567", b"89AB", b"CDEF"];
let iter_of_owned_bytes = DUMMY_DATA.iter().map(|&b| b.to_owned());
stream::iter_ok(iter_of_owned_bytes)
}
We can then get a "streaming body" somehow and use Stream::for_each to process each element in the Stream. Here, we just call write_all with some provided output location:
use std::{fs::File, io::Write};
fn save_to_disk(mut file: impl Write) -> impl Future<Item = (), Error = Error> {
streaming_body().for_each(move |chunk| file.write_all(&chunk).map_err(Into::into))
}
We can then write a little testing main:
fn main() {
let mut file = Vec::new();
{
let fut = save_to_disk(&mut file);
fut.wait().expect("Could not drive future");
}
assert_eq!(file, b"0123456789ABCDEF");
}
Important notes about the quality of this naïve implementation:
The call to write_all may potentially block, which you should not do in an asynchronous program. It would be better to hand off that blocking work to a threadpool.
The usage of Future::wait forces the thread to block until the future is done, which is great for tests but may not be correct for your real use case.
See also:
What is the best approach to encapsulate blocking I/O in future-rs?
How do I synchronously return a value calculated in an asynchronous Future in stable Rust?

Related

How can I receive data by POST in Hyper?

What I want to do is really what the title says. I would like to know how I can receive data per post in hyper, for example, suppose I execute the following command (with a server in hyper running on port :8000):
curl -X POST -F "field=#/path/to/file.txt" -F "tool=curl" -F "other-file=#/path/to/other.jpg" http://localhost:8000
Now, I'm going to take parf of the code on the main page of hyper as an example:
use std::{convert::Infallible, net::SocketAddr};
use hyper::{Body, Request, Response, Server};
use hyper::service::{make_service_fn, service_fn};
async fn handle(_: Request<Body>) -> Result<Response<Body>, Infallible> {
Ok(Response::new("Hello, World!".into()))
}
#[tokio::main]
async fn main() {
let addr = SocketAddr::from(([127, 0, 0, 1], 8000));
let make_svc = make_service_fn(|_conn| async {
Ok::<_, Infallible>(service_fn(handle))
});
let server = Server::bind(&addr).serve(make_svc);
if let Err(e) = server.await {
eprintln!("server error: {}", e);
}
}
So, now, with this basic code, how can I receive the data per post that my curl command above would send? How do I adapt my code to read the data? I've tried to search the internet, but what I found was that hyper doesn't actually split the request body depending on the HTTP method, it's all part of the same body. But I haven't been able to find a way to process data like the above with code like mine. Thanks in advance.
Edit
I tried the exact code that they left me in the answer. That is, this code:
async fn handle(req: Request<Body>) -> Result<Response<Body>, Infallible> {
let mut files = multipart::server::Multipart::from(req);
.....
}
But I get this error:
expected struct multipart::server::Multipart, found struct
hyper::Request
How can I solve that?

It is a single body, but the data is encoded in a way that contains the multiple files.
This is called multipart, and in order to parse the body correctly you need a multipart library such as https://crates.io/crates/multipart
To hyper integration you need to add the feature flag hyper in Cargo.toml
multipart = { version = "*", features = ["hyper"] }
Then
async fn handle(mut files: multipart::server::Multipart) -> Result<Response<Body>, Infallible> {
files.foreach_entry(|field| {
// contains name, filename, type ..
println!("Info: {:?}",field.headers);
// contains data
let mut bytes:Vec<u8> = Vec::new();
field.data.read_to_end(&mut bytes);
});
Ok(Response::new("Received the files!".into()))
}
You can also use it like this
async fn handle(req: Request<Body>) -> Result<Response<Body>, Infallible> {
let mut files = multipart::server::Multipart::from(req);
.....
}

Dart - Detect non-completing futures

I have a Dart console app that is calling into a third-party library.
When my console app calls the third-party library the call to the method returns however my CLI app then 'hangs' for 10 seconds or so before finally shutting down.
I suspect that the library has some type of resource that it has created but has not closed/completed.
My best guess is that it is a non-completed future.
So I'm looking for ways to detect resources that haven't been freed.
My first port of call would be looking for a technique to detect futures that haven't been completed but solutions for other resource types would be useful.
I'm currently using a runZoneGuarded, passing in a ZoneSpecification to hook calls.
Edit: with some experimentation, I've found I can detect timers and cancel them. In a simple experiment, I've found that a non-cancelled timer will cause the app to hang. If I cancel the timers (during my checkLeaks method) the app will shut down, however, this isn't enough in my real-world app so I'm still looking for ways to detect other resources.
Here is the experimental code I have:
#! /usr/bin/env dcli
import 'dart:async';
import 'package:dcli/dcli.dart';
import 'package:onepub/src/pub/global_packages.dart';
import 'package:onepub/src/pub/system_cache.dart';
import 'package:onepub/src/version/version.g.dart';
import 'package:pub_semver/pub_semver.dart';
void main(List<String> arguments) async {
print(orange('OnePub version: $packageVersion '));
print('');
print(globals);
// await globals.repairActivatedPackages();
await runZonedGuarded(() async {
Timer(Duration(seconds: 20), () => print('timer done'));
unawaited(Future.delayed(Duration(seconds: 20)));
var completer = Completer();
unawaited(
Future.delayed(Duration(seconds: 20), () => completer.complete()));
// await globals.activateHosted(
// 'dcli_unit_tester',
// VersionConstraint.any,
// null, // all executables
// overwriteBinStubs: true,
// url: null, // hostedUrl,
// );
print('end activate');
}, (error, stackTrace) {
print('Uncaught error: $error');
}, zoneSpecification: buildZoneSpec());
print('end');
checkLeaks();
// await entrypoint(arguments, CommandSet.ONEPUB, 'onepub');
}
late final SystemCache cache = SystemCache(isOffline: false);
GlobalPackages? _globals;
GlobalPackages get globals => _globals ??= GlobalPackages(cache);
List<void Function()> actions = [];
List<Source<Timer>> timers = [];
int testCounter = 0;
int timerCount = 0;
int periodicCallbacksCount = 0;
int microtasksCount = 0;
ZoneSpecification buildZoneSpec() {
return ZoneSpecification(
createTimer: (source, parent, zone, duration, f) {
timerCount += 1;
final result = parent.createTimer(zone, duration, f);
timers.add(Source(result));
return result;
},
createPeriodicTimer: (source, parent, zone, period, f) {
periodicCallbacksCount += 1;
final result = parent.createPeriodicTimer(zone, period, f);
timers.add(Source(result));
return result;
},
scheduleMicrotask: (source, parent, zone, f) {
microtasksCount += 1;
actions.add(f);
final result = parent.scheduleMicrotask(zone, f);
return result;
},
);
}
void checkLeaks() {
print(actions.length);
print(timers.length);
print('testCounter $testCounter');
print('timerCount $timerCount');
print('periodicCallbacksCount $periodicCallbacksCount');
print('microtasksCount $microtasksCount');
for (var timer in timers) {
if (timer.source.isActive) {
print('Active Timer: ${timer.st}');
timer.source.cancel();
}
}
}
class Source<T> {
Source(this.source) {
st = StackTrace.current;
}
T source;
late StackTrace st;
}
I'm my real-world testing I can see that I do have hanging timers caused by HTTP connections. As I originally guessed this does seem to point to some other problem with the HTTP connections not being closed down correctly.
Active Timer: #0 new Source (file:///home/bsutton/git/onepub/onepub/bin/onepub.dart:105:21)
#1 buildZoneSpec.<anonymous closure> (file:///home/bsutton/git/onepub/onepub/bin/onepub.dart:68:18)
#2 _CustomZone.createTimer (dart:async/zone.dart:1388:19)
#3 new Timer (dart:async/timer.dart:54:10)
#4 _HttpClientConnection.startTimer (dart:_http/http_impl.dart:2320:18)
#5 _ConnectionTarget.returnConnection (dart:_http/http_impl.dart:2381:16)
#6 _HttpClient._returnConnection (dart:_http/http_impl.dart:2800:41)
#7 _HttpClientConnection.send.<anonymous closure>.<anonymous closure>.<anonymous closure> (dart:_http/http_impl.dart:2171:25)
#8 _rootRunUnary (dart:async/zone.dart:1434:47)

In general, it's impossible to find things that doesn't happen.
There is no way to find all futures in the program.
With a zone, you might be able to intercept all the callbacks being "registered" in the zone, but you can't know which of them must be called. A future can have both value handlers and an error handlers, and at most one of them will ever be called. So, just because a callback on a future isn't called, it doesn't mean the future didn't complete.
A future most likely won't keep the isolate alive, though.
An incompleted future will just be garbage collected if nothing important is hanging on to it.
The most likely culprits for keeping an isolate alive are timers and receive ports.
(The VM internal implementation of timers, and I/O, and sockets, all use receive ports, so it's really just the ports.)
Again, there is no way to find all open ports programmatically.
You need a debugger with memory inspection tools for that.
I'd recommend using the developer tools to look for instances of ReceivePort or RawReceivePort that are not being garbage collected, and see whether they are still alive.
Also be careful with runZonedGuarded.
Since runZonedGuarded introduces a new error zone (because it introduces an uncaught error handler in the new zone), an error future created inside the zone will not be seen to complete outside the zone.
That means that the code:
await runZonedGuarded(() async {
will not work if the body throws. The error of the future is handled by the zone instead of the await, so the await just sees a future which never completes.

rootBundle.loadString hanging for large-ish (50k+) files due to isolate?

I'm trying to load a large-ish (1000 lines, 68k) text file using
final String enString = await rootBundle.loadString('res/string/string_en.json');
The Dart class function AssetBundle.loadString that loads the string is
Future<String> loadString(String key, { bool cache = true }) async {
final ByteData data = await load(key);
if (data == null)
throw FlutterError('Unable to load asset: $key');
// 50 KB of data should take 2-3 ms to parse on a Moto G4, and about 400 μs
// on a Pixel 4.
if (data.lengthInBytes < 50 * 1024) {
return utf8.decode(data.buffer.asUint8List());
}
// For strings larger than 50 KB, run the computation in an isolate to
// avoid causing main thread jank.
return compute(_utf8decode, data, debugLabel: 'UTF8 decode for "$key"');
}
Looking at the code above, if the file is bigger than 50k, as mine is, an isolate is used.
As a test, I cut my file in half (so 32k) and it loaded in a second (not using the isolate). But, unedited, the function hangs when the isolate is used.
My files is just a simple json file of key-value pairs. Here are the first few lines
{
"ctaButtonConfirm": "Confirm",
"ctaButtonContinue": "Continue",
"ctaButtonReview": "Review",
"balance": "Balance",
"totalBalance": "Total Balance",
"transactions": "Transactions",
:
Seem like it hangs when the isolate is used?
EDIT
Based on the loadString code above I wrote an extension function that doesn't use an isolate and it works fine, so it's looking like the isolate doesn't like my file?
extension AssetBundleX on AssetBundle {
Future<String> loadStringWithoutIsolate(String key) async {
final ByteData data = await load(key);
return utf8.decode(data.buffer.asUint8List());
}
}

You can't access rootBundle from spawned isolate.
So use main isolate instead.
Or in [docs](This is useful for operations that take longer than a few milliseconds, and which would therefore risk skipping frames. For tasks that will only take a few milliseconds, consider SchedulerBinding.scheduleTask instead.)
you can try SchedulerBinding.scheduleTask instead.

Quarkus RestEasy reactive InputStream response using wrong writer

I tried to optimize a reactive endpoint streaming input of an audio file based on Quarkus REST Score Console. I replaced generic Response with Reactive RestResponse. It increased the score to 100 but it is using ServerStringMessageBodyHandler instead of ServerInputStreamMessageBodyHandler now. Is there a way to tell Quarkus what MessageBodyHandler to use? Now it is calling .toString() method on inputStream object. I tried to return directly ByteArray, but the issue is the same. Any idea what is going on wrong here?
#GET
#Path("/{lectureId}/stream")
#Produces(MediaType.APPLICATION_OCTET_STREAM)
fun getLectureStreamById(
#RestHeader("Range") rangeParam: String?,
#RestPath lectureId: LectureId
): Uni<RestResponse<InputStream>> {
return lectureAudioService.getAudioFile(lectureId).map { lectureStream ->
downloadResponse(ByteArrayInputStream(lectureStream.data), filename = "$lectureId.mp3").build()
}
}
fun downloadResponse(
data: InputStream,
filename: String,
): ResponseBuilder<InputStream> {
return ResponseBuilder.ok(data)
.header("Content-Disposition", "attachment;filename=$filename")
}

Based on answer in github issue it should be fixed in upcoming releases but original approach was not good as well because it blocked event loop. Better approach will be:
#Path("/{filename}/async-file")
#GET
#Produces(MediaType.APPLICATION_OCTET_STREAM)
fun getAsyncFile(filename: String): Uni<RestResponse<AsyncFile>> {
return Uni.createFrom().emitter { emitter: UniEmitter<in RestResponse<AsyncFile>> ->
vertx.fileSystem().open(
"$filename.mp3", OpenOptions()
) { result: AsyncResult<AsyncFile> ->
if (result.succeeded()) emitter.complete(
ResponseBuilder.ok(result.result()).header("Content-Disposition", "attachment;filename=$filename.mp3").build()
) else emitter.fail(result.cause())
}
}
}
Thanks to #geoand

Gulp divide stream, use two destinations (streams)

I have a stream in gulp, but I want to split the stream into two, and put half in one destination, and the other half in another.
My thoughts is that I need to fork the stream twice, filter each of the new streams, use gulp.dest on each stream, then merge them back, and return them back to gulp.
I currently have the code,
function dumpCpuProfiles(profileDirectory) {
const _ = require('highland');
const stream = _();
const cpuProfiles = stream
.fork()
.pipe(filter('*.cpuprofile'))
.pipe(gulp.dest(profileDirectory));
const noCpuProfiles = _(cpuProfiles).filter(() => false);
const otherFiles = stream
.fork()
.pipe(filter(['**', '!**/*.cpuprofile']));
return _([noCpuProfiles, otherFiles]).merge();
}
However, I get the error,
TypeError: src.pull is not a function
at /home/user/project/node_modules/highland/lib/index.js:3493:17
at Array.forEach (native)
at pullFromAllSources (/home/user/project/node_modules/highland/lib/index.js:3492:15)
at Stream._generator (/home/user/project/node_modules/highland/lib/index.js:3449:13)
at Stream._runGenerator (/home/user/project/node_modules/highland/lib/index.js:949:10)
at Stream.resume (/home/user/project/node_modules/highland/lib/index.js:811:22)
at Stream._checkBackPressure (/home/user/project/node_modules/highland/lib/index.js:713:17)
at Stream.resume (/home/user/project/node_modules/highland/lib/index.js:806:29)
at Stream.pipe (/home/user/project/node_modules/highland/lib/index.js:895:7)
at Gulp.<anonymous> (/home/user/project/gulpfile.js:189:6)
The output is a stream, so I'm not too sure what the error is about. Any help would be massively useful! Thanks!

I use this little small monkeypatching trick to achieve it
Object.prototype.fork = function(_fn) {
_fn(this);
return this;
};
Stream is only event emitter, pipe method doesn't return old stream but a new one, so you can built fork functionality very easy.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart