How to serve Array[Byte] from spray.io - spray

I am using the following path in my spray-can server (using spray 1.2):
path("my"/"path"){
get{
complete{
val buf:Array[Byte] = functionReturningArrayofByte()
println(buf.length)
buf
}
}
}
The length of the buffer (and what is printed by the code above) is 2,263,503 bytes. However, when accessing my/path from a web browser, it downloads a file that is 10,528,063 bytes long.
I thought spray set the content type to application/octet-stream, and the content length, automatically when completing with an Array[Byte]. I don't realize what I may be doing wrong.
EDIT
I've run a small test and have seen that the array of bytes is output as a String. So, for example, if I had two bytes, for example 0xFF and 0x01, the output, instead of just the two bytes, would be the string [ 255, 1 ]. I just don't know how to make it output the raw content instead of a string representation of it.

Wrapping the buf as HttpData solves the problem:
path("my"/"path"){
get{
complete{
val buf:Array[Byte] = functionReturningArrayofByte()
HttpData(buf)
}
}
}

Related

How to read large portions of a file without exhausting memory in Rust?

I'm trying to re-write a portion of the GNU coreutils 'split' tool, to split a file in multiple parts of approximately the same size.
A part of my program is reading large portions of a file just to write them into another. On the memory side I don't want to map these portions in memory because they can be anywhere from zero bytes long up to several gigabytes.
Here's an extract of the code I wrote using a BufReader:
let file = File::open("myfile.txt");
let mut buffer = Vec::new();
let mut reader = BufReader::new(&file);
let mut handle = reader.take(length); // here length can be 10 or 1Go !
let read = handle.read_to_end(&mut buffer);
I feel like I'm mapping the whole chunk of file in memory because of the read_to_end(&mut buffer) call. Am I? If not, does it mean the the BufReader is doing its job and can I just admit that it's doing some kind of magic (abstraction) allowing me to "read" an entire portion of a file without really mapping it into memory? Or am I misusing these concepts in my code?
Yes, you're reading the whole chunk into memory. You can inspect buffer to confirm. If it has length bytes then there you go; there are length bytes in memory. There's no way BufReader could fake that.
Yes, if we look into the source of the read_to_end function we can see that the buffer you give it will be extended to hold the new data as it comes in if the available space in the vector is exhausted.
And even just in the docs, rust tells us that is read everything until EOF into the buffer:
Read all bytes until EOF in this source, placing them into buf
You can also take a look at the code presented in this question as a starting point using a BufReader:
use std::{
fs::File,
io::{self, BufRead, BufReader},
};
fn main() -> io::Result<()> {
const CAP: usize = 1024 * 128;
let file = File::open("my.file")?;
let mut reader = BufReader::with_capacity(CAP, file);
loop {
let length = {
let buffer = reader.fill_buf()?;
// do stuff with buffer here
buffer.len()
};
if length == 0 {
break;
}
reader.consume(length);
}
Ok(())
}
A better approach might be to set up an un-buffered Reader, and read bytes directly into the buffer while checking that you are not exceeding whatever byte or line bounds specified by the user, and writing the buffer contents to file.

Writing UInt16List via IOSink.Add, what's the result?

Trying to write audio samples to a file.
I have List of 16-bit ints
UInt16List _samples = new UInt16List(0);
I add elements to this list as samples come in.
Then I can write to an IOSink like so:
IOSink _ios = ...
List<int> _toWrite;
_toWrite.addAll(_samples);
_ios.add(_toWrite);
or
_ios.add(_samples);
just works, no issues with types despite the signature of add taking List<int> and not UInt16List.
As I read, in Dart the 'int' type is 64 bit.
Are both writes above identical? Do they produce packed 16-bit ints in this file?
A Uint16List is-a List<int>. It's a list of integers which truncates writes to 16-bits, and always reads out 16-bit integers, but it is a list of integers.
If you copy those integers to a plain growable List<int>, it will contain the same integer values.
So, doing ios.add(_sample) will do the same as ios.add(_toWrite), and most likely neither does what you want.
The IOSink's add method expects a list of bytes. So, it will take a list of integers and assume that they are bytes. That means that it will only use the low 8 bits of each integer, which will likely sound awful if you try to play that back as a 16-bit audio sample.
If you want to store all 16 bits, you need to figure out how to store each 16-bit value in two bytes. The easy choice is to just assume that the platform byte order is fine, and do ios.add(_samples.buffer.asUint8List(_samples.offsetInBytes, _samples.lengthInBytes)). This will make a view of the 16-bit data as twice as many bytes, then write those bytes.
The endianness of those bytes (is the high byte first or last) depends on the platform, so if you want to be safe, you can convert the bytes to a fixed byte order first:
if (Endian.host == Endian.little) {
ios.add(
_samples.buffer.asUint8List(_samples.offsetInBytes, _samples.lengthInBytes);
} else {
var byteData = ByteData(_samples.length * 2);
for (int i = 0; i < _samples.length; i++) {
byteData.setUint16(i * 2, _samples[i], Endian.little);
}
var littleEndianData = byteData.buffer.asUint8List(0, _samples.length * 2);
ios.add(littleEndianData);
}

Lua: How to gzip a string (gzip, not zlib) in memory?

Given a string, how can I compress it in memory with gzip? I'm using Lua.
It sounds like an easy problem, but there is a huge list of libraries. So far, all that I tried were either dead or I could produce only zlib compressed strings. In my use case, I need gzip compression, as it is expected by the receiver.
As a test, if you dump the compressed string to a file, zcat should be able to decompress it.
I'm using OpenResty, so any Lua library should be fine.
(The only solution that I got working so far is to dump the string in a file, call os.execute("gzip /tmp/example.txt") and read it back. Unfortunately, that is not a practical solution.)
It turns out that zlib is not far away from gzip. The difference is that gzip has an additional header.
To get this header, you can use lua-zlib like this:
local zlib = require "zlib"
-- input: string
-- output: string compressed with gzip
function compress(str)
local level = 5
local windowSize = 15+16
return zlib.deflate(level, windowSize)(str, "finish")
end
Explanation:
The second parameter of deflate is the window size. It makes sure that a gzip header is written. If you omit the parameter, you get a zlib compressed string.
level is the gzip compression level (1=worst to 9=best)
Here is the documentation of the deflate (source: lua-zlib documentation):
function stream = zlib.deflate([ int compression_level ], [ int window_size ])
If no compression_level is provided uses Z_DEFAULT_COMPRESSION (6),
compression level is a number from 1-9 where zlib.BEST_SPEED is 1
and zlib.BEST_COMPRESSION is 9.
Returns a "stream" function that compresses (or deflates) all
strings passed in. Specifically, use it as such:
string deflated, bool eof, int bytes_in, int bytes_out =
stream(string input [, 'sync' | 'full' | 'finish'])
Takes input and deflates and returns a portion of it,
optionally forcing a flush.
A 'sync' flush will force all pending output to be flushed to
the return value and the output is aligned on a byte boundary,
so that the decompressor can get all input data available so
far. Flushing may degrade compression for some compression
algorithms and so it should be used only when necessary.
A 'full' flush will flush all output as with 'sync', and the
compression state is reset so that decompression can restart
from this point if previous compressed data has been damaged
or if random access is desired. Using Z_FULL_FLUSH too often
can seriously degrade the compression.
A 'finish' flush will force all pending output to be processed
and results in the stream become unusable. Any future
attempts to print anything other than the empty string will
result in an error that begins with IllegalState.
The eof result is true if 'finish' was specified, otherwise
it is false.
The bytes_in is how many bytes of input have been passed to
stream, and bytes_out is the number of bytes returned in
deflated string chunks.

Why is "no code allowed to be all ones" in libjpeg's Huffman decoding?

I'm trying to satisfy myself that METEOSAT images I'm getting from their FTP server are actually valid images. My doubt arises because all the tools I've used so far complain about "Bogus Huffman table definition" - yet when I simply comment out that error message, the image appears quite plausible (a greyscale segment of the Earth's disc).
From https://github.com/libjpeg-turbo/libjpeg-turbo/blob/jpeg-8d/jdhuff.c#L379:
while (huffsize[p]) {
while (((int) huffsize[p]) == si) {
huffcode[p++] = code;
code++;
}
/* code is now 1 more than the last code used for codelength si; but
* it must still fit in si bits, since no code is allowed to be all ones.
*/
if (((INT32) code) >= (((INT32) 1) << si))
ERREXIT(cinfo, JERR_BAD_HUFF_TABLE);
code <<= 1;
si++;
}
If I simply comment out the check, or add a check for huffsize[p] to be nonzero (as in the containing loop's controlling expression), then djpeg manages to convert the image to a BMP which I can view with few problems.
Why does the comment claim that all-ones codes are not allowed?
It claims that because they are not allowed. That doesn't mean that there can't be images out there that don't comply with the standard.
The reason they are not allowed is this (from the standard):
Making entropy-coded segments an integer number of bytes is performed
as follows: for Huffman coding, 1-bits are used, if necessary, to pad
the end of the compressed data to complete the final byte of a
segment.
If the all 1's code was allowed, then you could end up with an ambiguity in the last byte of compressed data where the padded 1's could be another coded symbol.

serial data flow: How to ensure completion

I have a device that sends serial data over a USB to COM port to my program at various speeds and lengths.
Within the data there is a chunk of several thousands bytes that starts and ends with special distinct code ('FDDD' for start, 'FEEE' for end).
Due to the stream's length, occasionally not all data is received in one piece.
What is the recommended way to combine all bytes into one message BEFORE parsing it?
(I took care of the buffer size, but have no control over the serial line quality, and can not use hardware control with USB)
Thanks
One possible way to accomplish this is to have something along these lines:
# variables
# buffer: byte buffer
# buffer_length: maximum number of bytes in the buffer
# new_char: char last read from the UART
# prev_char: second last char read from the UART
# n: index to the buffer
new_char := 0
loop forever:
prev_char := new_char
new_char := receive_from_uart()
# start marker
if prev_char = 0xfd and new_char = 0xdd
# set the index to the beginning of the buffer
n := 0
# end marker
else if prev_char = 0xfe and new_char = 0xee
# the frame is ready, do whatever you need to do with a complete message
# the length of the payload is n-1 bytes
handle_complete_message(buffer, n-1)
# otherwise
else
if n < buffer_length - 1
n := n + 1
buffer[n] := new_char
A few tips/comments:
you do not necessarily need a separate start and end markers (you can the same for both purposes)
if you want to have two-byte markers, it would be easier to have them with the same first byte
you need to make sure the marker combinations do no occur in your data stream
if you use escape codes to avoid the markers in your payload, it is convenient to take care of them in the same code
see HDLC asynchronous framing (simply to encode, simple to decode, takes care of the escaping)
handle_complete_message usually either copies the contents of buffer elsewhere or swaps another buffer instead of buffer if in hurry
if your data frames do not have integrity checking, you should check if the payload length is equal to buffer_length- 1, because then you may have an overflow
After several tests, I came up with the following simple solution to my own question (for c#).
Shown is a minimal simplified solution. Can add length checking, etc.
'Start' and 'End' are string markers of any length.
public void comPort_DataReceived(object sender, SerialDataReceivedEventArgs e)
SerialPort port = (SerialPort)sender;
inData = port.ReadExisting();
{
if (inData.Contains("start"))
{
//Loop to collect all message parts
while (!inData.Contains("end"))
inData += port.ReadExisting();
//Complete by adding the last data chunk
inData += port.ReadExisting();
}
//Use your collected message
diaplaydata(inData);

Resources