How to break bigger chunk into smaller ones to fit into max chunk? - spray

My server is sending me chunked encoded data and I am able to get the chunked response in my client. The problem I am facing is that some chunks are too big, ca 10 MB. I am getting an exception as I have set max chunk size to 2M.
Is there any way to break this bigger chunk into smaller ones?

Hmm. Not sure what version of Spray you're using but starting from version 1.1 there is incoming-auto-chunking-threshold-size setting (details here). Here is example of service that supports chunked upload.
There are still number of limitations at the moment. You could read relevant discussion in open ticket.

you can devide big chunk in smaller chunks by using RandomAccessFile.
I had same problem. I resolved it using RandomAccessFile.
you can do it following way.
byte[] bytes;
String filePath = "path";
RandomAccessFile file1 = new RandomAccessFile(filePath, "r");
file1.seek(0);
bytes = new byte[(int)file1.length()/3];
file1.read(bytes);
file1.close();
String str = new String(bytes, "UTF-8");

Related

Reading file as stream of strings in Dart: how many events will be emitted?

Standard way to open a file in Dart as a stream is to use file.openRead() which returns a Stream<List<int>>.
The next standard step is to transform this stream with utf8.decoder SteamTranformer which returns Stream<String>.
I noticed that with the files I've tried this resulting stream only emits a single event with the whole file content represented as one string. But I feel like this should not be a general case since otherwise the API wouldn't need to return a stream of strings, a Future<String> would suffice.
Could you explain how can I observe the behavior when this stream emits more than one event? Is this dependent on the file size / disk IO rate / some buffers size?
It depends on file size and buffer size, and however the file operations are implemented.
If you read a large file, you will very likely get multiple events of a limited size. The UTF-8 decoder decodes chunks eagerly, so you should get roughly the same number of chunks after decoding. It might carry a few bytes across chunk boundaries, but the rest of the bytes are decoded as soon as possible.
Checking on my local machine, the buffer size seems to be 65536 bytes. Reading a file larger than that gives me multiple chunks.

Compression of 2-D array on the fly with iOS

I am currently using Swift to store some data on iOS. The values come as a 2-D integer array, defined as an [[Int]]. I need to save these integer arrays to disk. Currently, I am using the following function to do so:
func writeDataToFile(data: [[Int]], filename: String){
let fullfile = NSString(string: self.folderpath).stringByAppendingPathComponent(filename+".txt")
var fh = NSFileHandle(forWritingAtPath: fullfile)
if fh == nil{
NSFileManager.defaultManager().createFileAtPath(fullfile, contents: nil, attributes: nil)
fh = NSFileHandle(forWritingAtPath: fullfile)
}
fh?.writeData("Time: \(filename)\n".dataUsingEncoding(NSUTF16StringEncoding)!)
fh?.writeData("\(data)".dataUsingEncoding(NSUTF16StringEncoding)!)
fh?.closeFile()
}
Currently this function works just fine, but it produces files that are relatively large (1.1mb each - which when you are writing them at 1 Hz, gets huge fast). The arrays written have a fixed size and the values will be from 20000 < x < 35000. Is there a way to compress this data on the fly such that I can later read the data into say Python or some other language? Would it just be easier to use some library like Zip to compress the files into zips after writing? Is there some way to transform the data (without loss of data/fidelity) into an image (for compression purposes, not viewing purposes). There is some metadata that I would like to store along with the 2-D array, such as a timestamp.
Since you are currently saving those as string values, the simplest and fastest size reduction would be to save them as binary values (or base64 encoded strings). Then you could convert all of your int values into 2 byte sets (since unsigned 2 bytes can store up to 65536) and save the values that way. That would go from 5 bytes per int value down to 2 bytes per int value. Immediate savings of 60%.
For the Base64 encoding I use something I found on the internet called NSData+Base64. But in looking that up I just read:
In the iOS 7 and Mac OS 10.9 SDKs, Apple introduced new base64 methods on NSData that make it unnecessary to use a 3rd party base 64 decoding library. What's more, they exposed access to private base64 methods that are retrospectively available back as far as IOS 4 and Mac OS 6.
Link.
You could go much further into the compression by realizing that data from one element to the next will likely not change by the entire range, since heat maps will always be gradients. Then you could save the arrays as difference since the last element and likely get that down to a single byte (255 value) change set. But that may lose precision if you are viewing something with a very fast heat gradient (or using a low resolution camera).
If you eventually need to get into compression, I use GTMNSData+zlib and decompress it in a c# webservice. So with a little bit of work it is cross platform.
A proper answer for this would require more information about the problem domain. Most likely, 2D arrays are the wrong data structure for this but it's hard to tell without more info.
What's the data stored in these arrays?
Apple has had a compression library since last year:
https://developer.apple.com/library/ios/documentation/Performance/Reference/Compression/index.html

Jfif/jpeg parsing, bytes between streams

I'm parsing an Jpeg/JFIF file and I noticed that after the SOI (0xFF D8) I parse the different "streams" starting with 0xFFXX (where XX is a hexadecimal number) until I find the EOI (0XFFD9). Now the structure of the diffrent chunks is:
APP0 marker 2 Bytes
Length 2 Bytes
Now when I parse the a chunk I parse until i reach the length written in the 2 Bytes of the length field. After that I thought I would immediately find another Marker, followed by a length for the next chunk. According to my parser that is not always true, there might be data between the chunks. I couldn't find out what that data is, and if it is relevant to the image. Do you have any hints what this could be and how to interpret those bytes?
I'm lost and would be happy if somebody could point me in the correct direction. Thanks in advance
I've recently noticed this too. In my case it's an APP2 chunk which is the ICC profile which doesn't contain the length of the chunk.
In fact so far as I can see the length of the chunk needn't be the first 2 bytes (though it usually is).
In JFIF all 0xFF bytes are replaced with 0xFF 0x00 in the data section, so it should just be a matter of calculating the length from that. I just read until I hit another header, however I've noticed that sometimes (again in the ICC profile) there are byte sequences which don't make sense such as 0xFF 0x6D, so I may still be missing something.

Is buffer_size = file size useful for Rails send_file?

If the file sent with send_file is similar in size to the default buffer size (4096 bytes), does it always make sense to use stream: false? For example, are there proxies or browsers that break badly if the buffer size is non-standard?
Related:
Does the buffer size refer to the file or the HTTP packet?
Can you recommend a Firefox extension to inspect HTTP packets at this level?
If you look at the source, rails basically does
File.open(path, 'rb') do |file|
while buf = file.read(len)
output.write(buf)
end
end
And the buffer_size option controls the value of len. Very small values of len are inefficient in terms of IO activity, very large values are wasteful of memory. How this then gets broken up into tcp packets is not under your control. If you were to change the value, increasing it to the size of the file would just be wasteful - I don't think you'd need anything more than a few hundred k - 128k or 256k would be ample. Optimal buffer sizes will be operating system / hardware dependant.

Most memory efficient way to save binary file from the web with Python 2.6?

I'm trying to download (and save) a binary file from the web using Python 2.6 and urllib.
As I understand it, read(), readline() and readlines() are the 3 ways to read a file-like object.
Since the binary files aren't really broken into newlines, read() and readlines() read teh whole file into memory.
Is choosing a random read() buffer size the most efficient way to limit memory usage during this process?
i.e.
import urllib
import os
title = 'MyFile'
downloadurl = 'http://somedomain.com/myfile.avi'
webFile = urllib.urlopen(downloadurl)
mydirpath = os.path.join('c:', os.sep,'mydirectory',\
downloadurl.split('/')[-1])
if not os.path.exists(mydirpath):
print "Downloading...%s" % title
localFile = open(mydirpath, 'wb')
data = webFile.read(1000000) #1MB at a time
while data:
localFile.write(data)
data = webFile.read(1000000) #1MB at a time
webFile.close()
localFile.close()
print "Finished downloading: %s" % title
else:
print "%s already exists." % mydirypath
I chose read(1000000) arbitrarily because it worked and kept RAM usage down. I assume if I was working with a raw network buffer choosing a random amount would be bad since the buffer might run dry if the transfer rate was too low. But it seems urllib is already handling lower level buffering for me.
With that in mind, is choosing an arbitrary number fine? Is there a better way?
Thanks.
You should use urllib.urlretrieve for this. It will handle everything for you.
Instead of using your own read-write loop, you should probably check out the shutil module. The copyfileobj method will let you define the buffering. The most efficient method varies from situation to situation. Even copying the same source file to the same destination may vary due to network issues.

Resources