I have some files written on an Android device, it wrote bytes in big endian.
Now i try to read this file with iOS and there i need them in small endian.
I can make a for loop and
int temp;
for(...) {
[readFile getBytes:&temp range:NSMakeRange(offset, sizeof(int))];
target_array[i] = CFSwapInt32BigToHost(temp);
// read more like that
}
However it feels silly to read every single value and turn it before i can store it. Can i tell the NSData that i want the value read with a certain byte-order so that i can directly store it where it should be ?
(and save some time, as the data can be quite large)
I also worry about errors when some datatype changes and i forget to use the 16 instead of the 32 swap.
No, you need to swap every value. NSData is just a series of bytes with no value or meaning. It is your app that understands the meaning so it is your code logic that must swap each set of bytes as needed.
The data could be filled with all kinds of values of different sizes. 8-bit values, 16-bit values, 32-bit values, etc. as well as string data or just a stream of bytes that don't need any ordering at all. And the NSData can contain any combination of these values.
Given all of this, there is no simple way to tell NSData that the bytes need to be treated in a specific endianness.
If your data is, for example, nothing but 32-bit integer values stored in a specific endianness and you want to extract an array of bytes, create a helper class that does the conversion.
Related
Edit (abstract)
I tried to interpret Char/String data as Byte, 4 bytes at a time. This was because I could only get TComport/TDatapacket to interpret streamed data as String, not as any other data type. I still don't know how to get the Read method and OnRxBuf event handler to work with TComport.
Problem Summary
I'm trying to get data from a mass spectrometer (MS) using some Delphi code. The instrument is connected with a serial cable and follows the RS232 protocol. I am able to send commands and process the text-based outputs from the MS without problems, but I am having trouble with interpreting the data buffer.
Background
From the user manual of this instrument:
"With the exception of the ion current values, the output of the RGA are ASCII character strings terminated by a linefeed + carriage return terminator. Ion signals are represented as integers in units of 10^-16 Amps, and transmitted directly in hex format (four byte integers, 2's complement format, Least Significant Byte first) for maximum data throughput."
I'm not sure whether (1) hex data can be stored properly in a string variable. I'm also not sure how to (2) implement 2's complement in Delphi and (3) the Least Significant Byte first.
Following #David Heffernan 's advice, I went and revised my data types. Attempting to harvest binary data from characters doesn't work, because not all values from 0-255 can be properly represented. You lose data along the way, basically. Especially it your data is represented 4 bytes at a time.
The solution for me was to use the Async Professional component instead of Denjan's Comport lib. It handles datastreams better and has a built-in log that I could use to figure out how to interpret streamed resposes from the instrument. It's also better documented. So, if you're new to serial communications (like I am), rather give that a go.
I am currently using Swift to store some data on iOS. The values come as a 2-D integer array, defined as an [[Int]]. I need to save these integer arrays to disk. Currently, I am using the following function to do so:
func writeDataToFile(data: [[Int]], filename: String){
let fullfile = NSString(string: self.folderpath).stringByAppendingPathComponent(filename+".txt")
var fh = NSFileHandle(forWritingAtPath: fullfile)
if fh == nil{
NSFileManager.defaultManager().createFileAtPath(fullfile, contents: nil, attributes: nil)
fh = NSFileHandle(forWritingAtPath: fullfile)
}
fh?.writeData("Time: \(filename)\n".dataUsingEncoding(NSUTF16StringEncoding)!)
fh?.writeData("\(data)".dataUsingEncoding(NSUTF16StringEncoding)!)
fh?.closeFile()
}
Currently this function works just fine, but it produces files that are relatively large (1.1mb each - which when you are writing them at 1 Hz, gets huge fast). The arrays written have a fixed size and the values will be from 20000 < x < 35000. Is there a way to compress this data on the fly such that I can later read the data into say Python or some other language? Would it just be easier to use some library like Zip to compress the files into zips after writing? Is there some way to transform the data (without loss of data/fidelity) into an image (for compression purposes, not viewing purposes). There is some metadata that I would like to store along with the 2-D array, such as a timestamp.
Since you are currently saving those as string values, the simplest and fastest size reduction would be to save them as binary values (or base64 encoded strings). Then you could convert all of your int values into 2 byte sets (since unsigned 2 bytes can store up to 65536) and save the values that way. That would go from 5 bytes per int value down to 2 bytes per int value. Immediate savings of 60%.
For the Base64 encoding I use something I found on the internet called NSData+Base64. But in looking that up I just read:
In the iOS 7 and Mac OS 10.9 SDKs, Apple introduced new base64 methods on NSData that make it unnecessary to use a 3rd party base 64 decoding library. What's more, they exposed access to private base64 methods that are retrospectively available back as far as IOS 4 and Mac OS 6.
Link.
You could go much further into the compression by realizing that data from one element to the next will likely not change by the entire range, since heat maps will always be gradients. Then you could save the arrays as difference since the last element and likely get that down to a single byte (255 value) change set. But that may lose precision if you are viewing something with a very fast heat gradient (or using a low resolution camera).
If you eventually need to get into compression, I use GTMNSData+zlib and decompress it in a c# webservice. So with a little bit of work it is cross platform.
A proper answer for this would require more information about the problem domain. Most likely, 2D arrays are the wrong data structure for this but it's hard to tell without more info.
What's the data stored in these arrays?
Apple has had a compression library since last year:
https://developer.apple.com/library/ios/documentation/Performance/Reference/Compression/index.html
Core Data's NSAttributeDescription has integer types for 16-bit, 32-bit, and 64-bit numbers, but not for 8-bit numbers. Why is that? Is the recommended way to store 8-bit numbers in an Integer 16 type?
It seems wasteful from a storage perspective to double the data size (by using 16 bits to store the 8-bit number). Also, what happens if, due to programmer error, a number out of the range of an 8-bit number is stored in that Integer 16? Then any function/method that takes int8_t could be passed the wrong number. For example:
NSManagedObject *object = // fetch from store
int16_t value = object.value.intValue;
[otherObject methodThatTakesInt8:(int8_t)value]; // bad things happen if value isn't within an 8-bit range
I don't think the answer as to why NSAttributeDescription doesn't offer an 8-bit number is any more complicated than that Core Data doesn't have an 8-bit storage type. Which is probably a circular argument. Probably Apple just didn't see the worth.
As to your other concerns: what if the programmer wanted to store a 12-bit number? What if they wanted to store a 24-bit number? It seems odd to pull out 8 bits as a special case in the modern world. But the problem is easily solved: you can implement -willSave on any NSManagedObject subclass to validate data before it is committed to the store. Or you could implement your own custom setter (ultimately calling -setPrimitiveValue:forKey:) similarly to validate immediately upon set. In either case you can implement whatever strategy you want for an out-of-bounds number: raise an exception, saturate, whatever.
In addition to #Tommy's answer, if you're using a SQLite persistent store (which nearly everyone does when using Core Data), it's not actually wasteful from a storage perspective. SQLite uses dynamic typing, meaning that any column can contain a value of any type. Size requirements are determined based on the value being saved. If you tell Core Data that you want a 64-bit integer attribute but all values of that attribute would fit in 8 bits, you haven't actually wasted 7/8 of the space used for that attribute.
I was reading this example on how to use NSData for network messages.
When creating the NSData, the example uses:
unsigned int state = htonl(_state);
[data appendBytes:&state length:sizeof(state)];
When converting the NSData back, the example uses:
[data getBytes:buffer range:NSMakeRange(offset, sizeof(unsigned int))];
_state = ntohl((unsigned int)buffer);
Isn't it unnecessary to use htonl and ntohl in this example?
- since the data is being packed / unpacked on iOS devices, won't the byte ordering be the same, making it unnecessary to use htonl and ntohl.
- Isn't the manner in which it is used incorrect? The example uses htonl for packing, and ntohl for unpacking. But in reality, shouldn't one only do this if one knows that the sender or receiver is using a particular format?
The example uses htonl for packing, and ntohl for unpacking.
This is correct.
When a sender transfers data (integers, floats) over the network, it should reorder them to "network byte order". The receiver performs the decoding by reordering from network byte order to "host byte order".
But in reality, shouldn't one only do this if one knows that the sender or receiver is using a particular format?
Usually, a sender doesn't know the byte order of the receiver, and vice versa. Thus, in order to avoid ambiguity, one needs to define the byte order of the "network". This works well, provided sender and receiver actually do correctly encode/decode for the network.
Edit:
If you are concerned about performance of the encoding:
On modern CPUs the required machine code for byte swapping is quite fast.
On the language level, functions to encode and decode a range of bytes can be made quite fast as well. The Objective-C example in our post doesn't belong to those "fast" routines, though.
For example, since the host byte order is known at compile time, ntohl becomes an "empty" function (aka "NoOp") if the host byte order equals the network byte order.
Other byte swap utility functions, which extend on the ntoh family of macros for 64-bit, double and float values, may utilize C++ template tricks which may also become "NoOp" functions.
These "empty" functions can then be further optimized away completely, which effectively results in machine code which just performs a move from the source buffer to the destination buffer.
However, since the additional overhead for byte swapping is pretty small, these optimizations in case where swapping is not needed are only perceptible in high performance code. But your first statement:
[data getBytes:buffer range:NSMakeRange(offset, sizeof(unsigned int))];
is MUCH more expensive than the following statement
_state = ntohl((unsigned int)buffer);
even when byte swapping is required.
I need to read and interpret a binary file containing a TIFF image. I know there exist readers for doing this but I want to go the hard way. I found the TIFF format description and need to parse the binary file in small chunks. Assume I was able to read in memory the complete binary file. This means that I have a variable containing one long list of bytes.
I know via the format definition what the meaning is of the different groups of n bytes.
How can one define character variables with different lengths (sometimes 2, sometimes 3, sometimes 4 etc.) so that the variable address points to the right position in the image variable array?
With other words, assume my image is loaded into an array Image containing all bytes of the file.
The first 2 bytes I want to load in a string with length 2 bytes so that I can just link the address pointer to the first position in the Image array and automatically the first 2 bytes are associated with the first character string. A second string of 4 bytes would have another meaning and so I make the address for the second string of 4 bytes point to the 3rd position of the Image array.
Is this feasible in C++? I remember that this was a normal way of working for dynamical memory allocation in Fortran 77 in a simulation code I analysed a long time ago.
Thanks in advance for the hints!
Regards,
Stefan
The C++ language is easily capable of processing TIFF files from a byte array. The idea you have in mind is basically correct, but there a few problems with it. C strings are zero-terminated and the strings which appear in TIFF files are not necessarily zero terminated since their length is specified explicitly. It really is simpler to create a dedicated data structure to hold the TIFF-specific data fields and then parse the binary data into the structure. Your method will immediately run into trouble with the Motorola/Intel byte issue if your machine has the opposite endian-ness.