rootBundle.loadString hanging for large-ish (50k+) files due to isolate?

rootBundle.loadString hanging for large-ish (50k+) files due to isolate? - dart

I'm trying to load a large-ish (1000 lines, 68k) text file using
final String enString = await rootBundle.loadString('res/string/string_en.json');
The Dart class function AssetBundle.loadString that loads the string is
Future<String> loadString(String key, { bool cache = true }) async {
final ByteData data = await load(key);
if (data == null)
throw FlutterError('Unable to load asset: $key');
// 50 KB of data should take 2-3 ms to parse on a Moto G4, and about 400 μs
// on a Pixel 4.
if (data.lengthInBytes < 50 * 1024) {
return utf8.decode(data.buffer.asUint8List());
}
// For strings larger than 50 KB, run the computation in an isolate to
// avoid causing main thread jank.
return compute(_utf8decode, data, debugLabel: 'UTF8 decode for "$key"');
}
Looking at the code above, if the file is bigger than 50k, as mine is, an isolate is used.
As a test, I cut my file in half (so 32k) and it loaded in a second (not using the isolate). But, unedited, the function hangs when the isolate is used.
My files is just a simple json file of key-value pairs. Here are the first few lines
{
"ctaButtonConfirm": "Confirm",
"ctaButtonContinue": "Continue",
"ctaButtonReview": "Review",
"balance": "Balance",
"totalBalance": "Total Balance",
"transactions": "Transactions",
:
Seem like it hangs when the isolate is used?
EDIT
Based on the loadString code above I wrote an extension function that doesn't use an isolate and it works fine, so it's looking like the isolate doesn't like my file?
extension AssetBundleX on AssetBundle {
Future<String> loadStringWithoutIsolate(String key) async {
final ByteData data = await load(key);
return utf8.decode(data.buffer.asUint8List());
}
}

You can't access rootBundle from spawned isolate.
So use main isolate instead.
Or in [docs](This is useful for operations that take longer than a few milliseconds, and which would therefore risk skipping frames. For tasks that will only take a few milliseconds, consider SchedulerBinding.scheduleTask instead.)
you can try SchedulerBinding.scheduleTask instead.

Related

Dart - Detect non-completing futures

I have a Dart console app that is calling into a third-party library.
When my console app calls the third-party library the call to the method returns however my CLI app then 'hangs' for 10 seconds or so before finally shutting down.
I suspect that the library has some type of resource that it has created but has not closed/completed.
My best guess is that it is a non-completed future.
So I'm looking for ways to detect resources that haven't been freed.
My first port of call would be looking for a technique to detect futures that haven't been completed but solutions for other resource types would be useful.
I'm currently using a runZoneGuarded, passing in a ZoneSpecification to hook calls.
Edit: with some experimentation, I've found I can detect timers and cancel them. In a simple experiment, I've found that a non-cancelled timer will cause the app to hang. If I cancel the timers (during my checkLeaks method) the app will shut down, however, this isn't enough in my real-world app so I'm still looking for ways to detect other resources.
Here is the experimental code I have:
#! /usr/bin/env dcli
import 'dart:async';
import 'package:dcli/dcli.dart';
import 'package:onepub/src/pub/global_packages.dart';
import 'package:onepub/src/pub/system_cache.dart';
import 'package:onepub/src/version/version.g.dart';
import 'package:pub_semver/pub_semver.dart';
void main(List<String> arguments) async {
print(orange('OnePub version: $packageVersion '));
print('');
print(globals);
// await globals.repairActivatedPackages();
await runZonedGuarded(() async {
Timer(Duration(seconds: 20), () => print('timer done'));
unawaited(Future.delayed(Duration(seconds: 20)));
var completer = Completer();
unawaited(
Future.delayed(Duration(seconds: 20), () => completer.complete()));
// await globals.activateHosted(
// 'dcli_unit_tester',
// VersionConstraint.any,
// null, // all executables
// overwriteBinStubs: true,
// url: null, // hostedUrl,
// );
print('end activate');
}, (error, stackTrace) {
print('Uncaught error: $error');
}, zoneSpecification: buildZoneSpec());
print('end');
checkLeaks();
// await entrypoint(arguments, CommandSet.ONEPUB, 'onepub');
}
late final SystemCache cache = SystemCache(isOffline: false);
GlobalPackages? _globals;
GlobalPackages get globals => _globals ??= GlobalPackages(cache);
List<void Function()> actions = [];
List<Source<Timer>> timers = [];
int testCounter = 0;
int timerCount = 0;
int periodicCallbacksCount = 0;
int microtasksCount = 0;
ZoneSpecification buildZoneSpec() {
return ZoneSpecification(
createTimer: (source, parent, zone, duration, f) {
timerCount += 1;
final result = parent.createTimer(zone, duration, f);
timers.add(Source(result));
return result;
},
createPeriodicTimer: (source, parent, zone, period, f) {
periodicCallbacksCount += 1;
final result = parent.createPeriodicTimer(zone, period, f);
timers.add(Source(result));
return result;
},
scheduleMicrotask: (source, parent, zone, f) {
microtasksCount += 1;
actions.add(f);
final result = parent.scheduleMicrotask(zone, f);
return result;
},
);
}
void checkLeaks() {
print(actions.length);
print(timers.length);
print('testCounter $testCounter');
print('timerCount $timerCount');
print('periodicCallbacksCount $periodicCallbacksCount');
print('microtasksCount $microtasksCount');
for (var timer in timers) {
if (timer.source.isActive) {
print('Active Timer: ${timer.st}');
timer.source.cancel();
}
}
}
class Source<T> {
Source(this.source) {
st = StackTrace.current;
}
T source;
late StackTrace st;
}
I'm my real-world testing I can see that I do have hanging timers caused by HTTP connections. As I originally guessed this does seem to point to some other problem with the HTTP connections not being closed down correctly.
Active Timer: #0 new Source (file:///home/bsutton/git/onepub/onepub/bin/onepub.dart:105:21)
#1 buildZoneSpec.<anonymous closure> (file:///home/bsutton/git/onepub/onepub/bin/onepub.dart:68:18)
#2 _CustomZone.createTimer (dart:async/zone.dart:1388:19)
#3 new Timer (dart:async/timer.dart:54:10)
#4 _HttpClientConnection.startTimer (dart:_http/http_impl.dart:2320:18)
#5 _ConnectionTarget.returnConnection (dart:_http/http_impl.dart:2381:16)
#6 _HttpClient._returnConnection (dart:_http/http_impl.dart:2800:41)
#7 _HttpClientConnection.send.<anonymous closure>.<anonymous closure>.<anonymous closure> (dart:_http/http_impl.dart:2171:25)
#8 _rootRunUnary (dart:async/zone.dart:1434:47)

In general, it's impossible to find things that doesn't happen.
There is no way to find all futures in the program.
With a zone, you might be able to intercept all the callbacks being "registered" in the zone, but you can't know which of them must be called. A future can have both value handlers and an error handlers, and at most one of them will ever be called. So, just because a callback on a future isn't called, it doesn't mean the future didn't complete.
A future most likely won't keep the isolate alive, though.
An incompleted future will just be garbage collected if nothing important is hanging on to it.
The most likely culprits for keeping an isolate alive are timers and receive ports.
(The VM internal implementation of timers, and I/O, and sockets, all use receive ports, so it's really just the ports.)
Again, there is no way to find all open ports programmatically.
You need a debugger with memory inspection tools for that.
I'd recommend using the developer tools to look for instances of ReceivePort or RawReceivePort that are not being garbage collected, and see whether they are still alive.
Also be careful with runZonedGuarded.
Since runZonedGuarded introduces a new error zone (because it introduces an uncaught error handler in the new zone), an error future created inside the zone will not be seen to complete outside the zone.
That means that the code:
await runZonedGuarded(() async {
will not work if the body throws. The error of the future is handled by the zone instead of the await, so the await just sees a future which never completes.

Firestore batch.commit adding more than 500 documents at a time

I am new to Firestore, trying to figure out a fast way to add some documents in Firestore using Dart.
I used the code below. I had about 3000 strings in a list, and it ended up adding all the 3000 documents, but it took a long time, about 10 minutes, and I also got an error message after batch.commit, that the 500 limit was exceeded, even though it added all 3000 documents.
I know I should break it down into 500 at a time, but the fact that it added all the documents in the list does not make sense to me. I checked in Firestore Console, and all the 3000 documents were there.
I need to create a document id every time I add a document. What am I doing wrong? Is it ok to use the add to get a document reference, and then batch.setData?
Future<void> addBulk(List<String> stringList) async {
WriteBatch batch = Firestore.instance.batch();
for (String str in stringList) {
// Check if str already exists, if it does not exist, add it to Firestore
if (str.isNotEmpty) {
QuerySnapshot querySnapshot = await myCollection
.where('name', isEqualTo: str)
.getDocuments(source: Source.cache);
if (querySnapshot.documents.length == 0) {
UserObject obj = UserObject(name: str);
DocumentReference ref = await myCollection.add(obj.toMap());
batch.setData(ref, UserObject.toMap(), merge: true);
}
}
}
batch.commit();
}

Your code is actually just adding each document separately, regardless of what you're doing with the batch. This line of code does it:
DocumentReference ref = await myCollection.add(obj.toMap());
If you remove that line (which, again, is not touching your batch), I'm sure you will just see a failure due to the batch size.
If you are just trying to generate a unique ID for the document to be added in the batch, use document() with no parameters instead:
DocumentReference ref = myCollection.document();

How do I write a futures::Stream to disk without storing it entirely in memory first?

There's an example of downloading a file with Rusoto S3 here:
How to save a file downloaded from S3 with Rusoto to my hard drive?
The problem is that it looks like it's downloading the whole file into memory and then writing it to disk, because it uses the write_all method which takes an array of bytes, not a stream. How can I use the StreamingBody, which implements futures::Stream to stream the file to disk?

Since StreamingBody implements Stream<Item = Vec<u8>, Error = Error>, we can construct a MCVE that represents that:
extern crate futures; // 0.1.25
use futures::{prelude::*, stream};
type Error = Box<std::error::Error>;
fn streaming_body() -> impl Stream<Item = Vec<u8>, Error = Error> {
const DUMMY_DATA: &[&[u8]] = &[b"0123", b"4567", b"89AB", b"CDEF"];
let iter_of_owned_bytes = DUMMY_DATA.iter().map(|&b| b.to_owned());
stream::iter_ok(iter_of_owned_bytes)
}
We can then get a "streaming body" somehow and use Stream::for_each to process each element in the Stream. Here, we just call write_all with some provided output location:
use std::{fs::File, io::Write};
fn save_to_disk(mut file: impl Write) -> impl Future<Item = (), Error = Error> {
streaming_body().for_each(move |chunk| file.write_all(&chunk).map_err(Into::into))
}
We can then write a little testing main:
fn main() {
let mut file = Vec::new();
{
let fut = save_to_disk(&mut file);
fut.wait().expect("Could not drive future");
}
assert_eq!(file, b"0123456789ABCDEF");
}
Important notes about the quality of this naïve implementation:
The call to write_all may potentially block, which you should not do in an asynchronous program. It would be better to hand off that blocking work to a threadpool.
The usage of Future::wait forces the thread to block until the future is done, which is great for tests but may not be correct for your real use case.
See also:
What is the best approach to encapsulate blocking I/O in future-rs?
How do I synchronously return a value calculated in an asynchronous Future in stable Rust?

How do I get a continuation token for a bulk INSERT on Azure Cosmos DB?

I want to upload a CSV file that represents 10k documents to be added to my Cosmos DB collection in a manner that's fast and atomic. I have a stored procedure like the following pseudo-code:
function createDocsFromCSV(csv_text) {
function parse(txt) { // ... parsing code here ... }
var collection = getContext().getCollection();
var response = getContext().getResponse();
var docs_to_create = parse(csv_text);
for(var ii=0; ii<docs_to_create.length; ii++) {
var accepted = collection.createDocument(collection.getSelfLink(),
docs_to_create[ii],
function(err, doc_created) {
if(err) throw new Error('Error' + err.message);
});
if(!accepted) {
throw new Error('Timed out creating document ' + ii);
}
}
}
When I run it, the stored procedure creates about 1200 documents before timing out (and therefore rolling back and not creating any documents).
Previously I had success updating (instead of creating) thousands of documents in a stored procedure using continuation tokens and this answer as guidance: https://stackoverflow.com/a/34761098/277504. But after searching documentation (e.g. https://azure.github.io/azure-documentdb-js-server/Collection.html) I don't see a way to get continuation tokens from creating documents like I do for querying documents.
Is there a way to take advantage of stored procedures for bulk document creation?

It’s important to note that stored procedures have bounded execution, in which all operations must complete within the server specified request timeout duration. If an operation does not complete with that time limit, the transaction is automatically rolled back.
In order to simplify development to handle time limits, all CRUD (Create, Read, Update, and Delete) operations return a Boolean value that represents whether that operation will complete. This Boolean value can be used a signal to wrap up execution and for implementing a continuation based model to resume execution (this is illustrated in our code samples below). More details, please refer to the doc.
The bulk-insert stored procedure provided above implements the continuation model by returning the number of documents successfully created.
pseudo-code:
function createDocsFromCSV(csv_text,count) {
function parse(txt) { // ... parsing code here ... }
var collection = getContext().getCollection();
var response = getContext().getResponse();
var docs_to_create = parse(csv_text);
for(var ii=count; ii<docs_to_create.length; ii++) {
var accepted = collection.createDocument(collection.getSelfLink(),
docs_to_create[ii],
function(err, doc_created) {
if(err) throw new Error('Error' + err.message);
});
if(!accepted) {
getContext().getResponse().setBody(count);
}
}
}
Then you could check the output document count on the client side and re-run the stored procedure with the count parameter to create the remaining set of documents until the count larger than the length of csv_text.
Hope it helps you.

ehcache is not honoring maxElementsInMemory

I have a fairly simple cache configuration:
<cache name="MyCache"
maxElementsInMemory="200000"
eternal="false"
timeToIdleSeconds="43200"
timeToLiveSeconds="43200"
overflowToDisk="false"
diskPersistent="false"
memoryStoreEvictionPolicy="LRU"
/>
I create my cache in the following way:
private Ehcache myCache =
CacheManager.getInstance().getEhcache("MyCache");
I use my cache like this:
public MyResponse processRequest(MyRequest request) {
Element element = myCache.get(request);
if (element != null) {
return (MyResponse)element.getValue();
} else {
MyResponse response = remoteService.process(request);
myCache.put(new Element(request, response));
return response;
}
}
Every 10,000 calls to processRequest() method, I log stats about my cache like this:
logger.debug("Cache name: " + myCache.getName());
logger.debug("Max elements in memory: " + myCache.getMaxElementsInMemory());
logger.debug("Memory store size: " + myCache.getMemoryStoreSize());
logger.debug("Hit count: " + myCache.getHitCount());
logger.debug("Miss count: " + myCache.getMissCountNotFound());
logger.debug("Miss count (because expired): " + myCache.getMissCountExpired());
..I see a good amount of hits, which tells me that it's working.
..However, what I'm seeing is that after a couple hours, the getMemoryStoreSize() is starting to exceed getMaxElementsInMemory(). Eventually, it gets bigger and bigger, and renders the jvm unstable because GC is starting to do Full GCs nonstop to reclaim memory (and I have a pretty large cap set). When I profiled the heap, it pointed to the LRU's SpoolingLinkedHashMap taking most of the space.
I do have a lot of requests hitting this cache, and my theory is that ehcache's LRU algorithm is perhaps not keeping up with evicting the elements when it's full. I tried LFU policy and it also caused the memory store to go over maxElements.
I then started looked at the ehcache code to see if I could prove my theory (inside LruMemoryStore$SpoolingLinkedHashMap):
private boolean removeLeastRecentlyUsedElement(Element element) throws CacheException {
//check for expiry and remove before going to the trouble of spooling it
if (element.isExpired()) {
notifyExpiry(element);
return true;
}
if (isFull()) {
evict(element);
return true;
} else {
return false;
}
}
..from here looks ok, then looked at the evict() method:
protected final void evict(Element element) throws CacheException {
boolean spooled = false;
if (cache.isOverflowToDisk()) {
if (!element.isSerializable()) {
if (LOG.isDebugEnabled()) {
LOG.debug(new StringBuffer("Object with key ").append(element.getObjectKey())
.append(" is not Serializable and cannot be overflowed to disk"));
}
} else {
spoolToDisk(element);
spooled = true;
}
}
if (!spooled) {
cache.getCacheEventNotificationService().notifyElementEvicted(element, false);
}
}
..this looks like it doesn't actually evict (despite the name) but rather relies on the caller to evict. So I looked at the implementation of the put() method and I don't see it calling it. I'm clearly missing something here and would appreciate some help on this.
Thanks!

Your configuration looks fine to me. Only need is to use right key for caching.
Do not put complete request object as your key for cache. Put some unique value from your request object. For example:
MyResponse response = remoteService.process(request);
myCache.put(new Element(request.getCustomerID(), response));
return response;
This should work for you. The reason your caching is not working is that each time your request object is new object; it never finds the response from cache, so it keeps adding into cache.

maxElementsInMemory attribute is deprecated, use maxEntriesLocalHeap instead

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart