Converting a Stream<int> to Stream<List<String>> in dart - dart

tl;dr;
Looking to convert a Stream<int> to a Stream<List<String>>.
Long version:
I'm new to dart/flutter and this style of programming in general, so pardon me for the noob question.
I'm sending a stream of char/uint8_t containing ASCII strings over bluetooth. This is received in the form of a Stream<int> in a flutter app. I'm looking to split this stream of bytes into lines of strings.
I'm thinking that the approach to this is to convert each int to a character (represented by a string in dart), followed by doing some sort of a split operation on the stream. I could not find good examples of doing a split on a stream, could someone help here?
Thanks.

You can transform the Stream using transform and map. Here is an example where I use two transformers to get a line of text from my Stream:
import 'dart:async';
import 'dart:convert';
Future<void> main() async {
final myStream = Stream.fromIterable(
[072, 101, 108, 108, 111, 032, 087, 111, 114, 108, 100]);
print(await myStream
.map((asciiValue) => String.fromCharCode(asciiValue))
.transform(const LineSplitter())
.first);
// Hello World
}
Should be noted that this code is really not that efficient. Normally, you get multiple List<int> events which contains a buffered amount of data and convert a bigger chunk. But since you have a Stream<int> this is properly the easiest way to do it where we convert each event to a letter.
Alternative, we can create our own buffered solution. But it really depends on how you get the data.

Related

Questions when using Streams with input from Linux pipes

Following questions as a beginner to Dart and RxDart. The versions of Dart and that of RxDart are latest as of yesterday.
In the following example Dart program, saved in file 't.dart', only one of the two options, A or B, is un-commented at a time. Before executing it a 'fifo' is created by executing 'mkfifo fifo'. The results of the execution are as below.
Questions:
Why does a Stream opened using File show only one byte received, whereas when using stdin Stream and input from the same fifo sees all the input?
Why does the RxDart operator take emits only one value?
Option-A: Executed as 'dart t.dart' in one window, and '(for i in A B C D; do echo -n $i; sleep 1; done) > fifo' another window in same directory. The output is:
byte count: 1, bytes: A
File is now closed.
Option-B: Executed as 'cat fifo | dart t.dart' in one window, and '(for i in A B C D; do echo -n $i; sleep 1; done) > fifo'. The output is:
byte count: 1, bytes: A
byte count: 1, bytes: B
byte count: 1, bytes: C
byte count: 1, bytes: D
File is now closed.
import 'dart:io';
import 'dart:convert';
main(List<String> args) {
// Option-A
// Stream<List<int>> inputStream = File("fifo").openRead();
// Option-B
// Stream<List<int>> inputStream = stdin;
inputStream
.transform(utf8.decoder)
.take(16)
.listen((bytes) => print('byte count: ${bytes.length}, bytes: ${bytes}'),
onDone: () { print('File is now closed.'); },
onError: (e) { print(e.toString()); }
);
}
(I'm not knowledgeable enough in the internals of how Dart I/O works to give a firm answer, so this is my best guess as to what is happening.)
What it seems is going on is that in Option A, you are creating a stream to a yet-to-exist file. Dart sees that the file doesn't yet exist, so publishing to the stream is delayed. Then when you run the echo script, it creates the file and appends the first value to it "A", after which you tell it to sleep for 1 second.
During that second, Dart sees that the file now exists and begins streaming data from it. It reads "A", and then it reaches the end of the file. As far as Dart is concerned, that's the end of the story, so it closes the stream. By the time the script adds the "B", "C", and "D" to the file, Dart has already finished executing the program and exited the process.
In Option B, rather than telling Dart to stream from a file, you are tapping into the process's input stream which (as far as I am aware) is going to remain open for as long as there is stuff being written to it. I have a feeling that understanding what is exactly happening requires better knowledge of cat and how piping works in the terminal than I possess, but I believe the long-story-short of it is that the cat program knows that the file is being written to which prevents it from terminating early. As such, whenever cat gets new data, it pipes that data to the Dart process's input stream.
Back to the Dart code, you are listening to the input stream which is still expecting data since cat is still executing, and as such hasn't closed. Only when the file writing process is complete does cat recognized that it has reached the true end of the file and shuts down, at which point Dart recognizes that it isn't going to get more data and so closes the input stream.
(As I said, this is merely my best guess, but I suspect that an easy way to tell would be to look at the times your Dart script and other script are finishing. If in Option A the Dart finishes long before the script does and in Option B they finish at roughly the same time, that would be sufficient evidence to me to indicate the above is indeed what is happening.)

processing a variable size map in pig

I have a data set that is incoming as
(str,[[40,74],[50,75],[60,73],[70,43]])
and I need to be able to get this in the output variable using pig:
str, 40, 74
str , 50, 75
str, 60, 73
str, 70, 43
and this could be variable set of elements.
Tried with tokenizing and then flatten, but that doesn't help as it creates token using comma. and end up being this way..
str , {([[40), (74]), ... }
Would any one suggestions on if I could use built in functions or write a UDF for this.
many thanks,
Ana
You will need to write a custom UDF to parse this. Assuming your data does not get more complicated than this, you can probably get away with a quick, shallow method of parsing using String.split with delimiter "],[".

Can h5py load a file from a byte array in memory?

My python code is receiving a byte array which represents the bytes of the hdf5 file.
I'd like to read this byte array to an in-memory h5py file object without first writing the byte array to disk. This page says that I can open a memory mapped file, but it would be a new, empty file. I want to go from byte array to in-memory hdf5 file, use it, discard it and not to write to disk at any point.
Is it possible to do this with h5py? (or with hdf5 using C if that is the only way)
You could try to use Binary I/O to create a File object and read it via h5py:
f = io.BytesIO(YOUR_H5PY_STREAM)
h = h5py.File(f,'r')
You can use io.BytesIO or tempfile to create h5 objects, which showed in official docs http://docs.h5py.org/en/stable/high/file.html#python-file-like-objects.
The first argument to File may be a Python file-like object, such as an io.BytesIO or tempfile.TemporaryFile instance. This is a convenient way to create temporary HDF5 files, e.g. for testing or to send over the network.
tempfile.TemporaryFile
>>> tf = tempfile.TemporaryFile()
>>> f = h5py.File(tf)
or io.BytesIO
"""Create an HDF5 file in memory and retrieve the raw bytes
This could be used, for instance, in a server producing small HDF5
files on demand.
"""
import io
import h5py
bio = io.BytesIO()
with h5py.File(bio) as f:
f['dataset'] = range(10)
data = bio.getvalue() # data is a regular Python bytes object.
print("Total size:", len(data))
print("First bytes:", data[:10])
The following example uses tables which can still read and manipulate the H5 format in lieu of H5PY.
import urllib.request
import tables
url = 'https://s3.amazonaws.com/<your bucket>/data.hdf5'
response = urllib.request.urlopen(url)
h5file = tables.open_file("data-sample.h5", driver="H5FD_CORE",
driver_core_image=response.read(),
driver_core_backing_store=0)

Parsing a file with BodyParser in Scala Play20 with new lines

Excuse the n00bness of this question, but I have a web application where I want to send a potentially large file to the server and have it parse the format. I'm using the Play20 framework and I'm new to Scala.
For example, if I have a csv, I'd like to split each row by "," and ultimately create a List[List[String]] with each field.
Currently, I'm thinking the best way to do this is with a BodyParser (but I could be wrong). My code looks something like:
Iteratee.fold[String, List[List[String]]]() {
(result, chunk) =>
result = chunk.splitByNewLine.splitByDelimiter // Psuedocode
}
My first question is, how do I deal with a situation like the one below where a chunk has been split in the middle of a line:
Chunk 1:
1,2,3,4\n
5,6
Chunk 2:
7,8\n
9,10,11,12\n
My second question is, is writing my own BodyParser the right way to go about this? Are there better ways of parsing this file? My main concern is that I want to allow the files to be very large so I can flush a buffer at some point and not keep the entire file in memory.
If your csv doesn't contain escaped newlines then it is pretty easy to do a progressive parsing without putting the whole file into memory. The iteratee library comes with a method search inside play.api.libs.iteratee.Parsing :
def search (needle: Array[Byte]): Enumeratee[Array[Byte], MatchInfo[Array[Byte]]]
which will partition your stream into Matched[Array[Byte]] and Unmatched[Array[Byte]]
Then you can combine a first iteratee that takes a header and another that will fold into the umatched results. This should look like the following code:
// break at each match and concat unmatches and drop the last received element (the match)
val concatLine: Iteratee[Parsing.MatchInfo[Array[Byte]],String] =
( Enumeratee.breakE[Parsing.MatchInfo[Array[Byte]]](_.isMatch) ><>
Enumeratee.collect{ case Parsing.Unmatched(bytes) => new String(bytes)} &>>
Iteratee.consume() ).flatMap(r => Iteratee.head.map(_ => r))
// group chunks using the above iteratee and do simple csv parsing
val csvParser: Iteratee[Array[Byte], List[List[String]]] =
Parsing.search("\n".getBytes) ><>
Enumeratee.grouped( concatLine ) ><>
Enumeratee.map(_.split(',').toList) &>>
Iteratee.head.flatMap( header => Iteratee.getChunks.map(header.toList ++ _) )
// an example of a chunked simple csv file
val chunkedCsv: Enumerator[Array[Byte]] = Enumerator("""a,b,c
""","1,2,3","""
4,5,6
7,8,""","""9
""") &> Enumeratee.map(_.getBytes)
// get the result
val csvPromise: Promise[List[List[String]]] = chunkedCsv |>>> csvParser
// eventually returns List(List(a, b, c),List(1, 2, 3), List(4, 5, 6), List(7, 8, 9))
Of course you can improve the parsing. If you do, I would appreciate if you share it with the community.
So your Play2 controller would be something like:
val requestCsvBodyParser = BodyParser(rh => csvParser.map(Right(_)))
// progressively parse the big uploaded csv like file
def postCsv = Action(requestCsvBodyParser){ rq: Request[List[List[String]]] =>
//do something with data
}
If you don't mind holding twice the size of List[List[String]] in memory then you could use a body parser like play.api.mvc.BodyParsers.parse.tolerantText:
def toCsv = Action(parse.tolerantText) { request =>
val data = request.body
val reader = new java.io.StringReader(data)
// use a Java CSV parsing library like http://opencsv.sourceforge.net/
// to transform the text into CSV data
Ok("Done")
}
Note that if you want to reduce memory consumption, I recommend using Array[Array[String]] or Vector[Vector[String]] depending on if you want to deal with mutable or immutable data.
If you are dealing with truly large amount of data (or lost of requests of medium size data) and your processing can be done incrementally, then you can look at rolling your own body parser. That body parser would not generate a List[List[String]] but instead parse the lines as they come and fold each line into the incremental result. But this is quite a bit more complex to do, in particular if your CSV is using double quote to support fields with commas, newlines or double quotes.

How do I convert a UTF-8 String into an array of bytes in Dart?

I'm creating a Redis client and would like to create a byte array for sending to the Redis server. To issue commands to the server, I need to convert Dart's UTF-8 strings into a bytes which can be written to a socket.
How can I do this?
For Dart >1.0 this is now done with the convert library.
import 'dart:convert';
List<int> bytes = utf8.encode("Some data");
print(bytes) //[115, 111, 109, 101, 32, 100, 97, 116, 97]
You need to import dart:utf and use its encodeUtf8 function. There is actually a existing redis client for Dart here which makes use of these functions.
for images, they might be base64 encoded refer to this
https://stackoverflow.com/a/65146858/4412553
Image.memory(base64.decode('base64EncodedImageString')),

Resources