Is there a Design Pattern for parsing binary data like this? - parsing

I'm working on parsing some input from a UDP stream. The protocol is sort of like a binary query string. It'll send a code byte that tells you how to read the following bytes. For example a code value of 1 might mean that the next 4 bytes are an int intended to be an ID, a value of 2 might mean the next 4 bytes are an int meant to be a Velocity, a value of 3 might mean a float for latitude, a value of 4 might mean the next bytes are a string with a length prepended as an int.
Is there a design pattern for parsing things with these kinds of rules? I'm sure there has to be some approach that's better than a large switch on the code value. I'm using a BinaryReader in C#, but I imagine there's a language agnostic solution.

You probably want Strategy Pattern. Each Strategy instance will know how to parse it's type of data and how many bytes to consume and some kind of callback or builder object that will handle the relevant data that is read
interface for ReadStrategy{
Read(Stream stream, MyObject obj);
}
class VelocityReader{
Read(Stream stream, MyObject obj){
//read 4 bytes as int.
int value = stream.ReadInt32();
myObj.setVelocity(value);
}
}
You would also need a factory class that reads the first byte per record to know which strategy to use (could be implemented as a switch) or if you want to use even more patterns, add a method to the strategy to know how to recognize what its own code value is and use Chain of Responsibility to poll each strategy type to find the first one that can handle the code value.

Related

Handling TCP Data Without a large switch statement

I am working on an app that will send and receive data with a TCP socket in IOS using Swift.
I have the communication working fine but what I am trying to do is think of a way to handle the incoming data without a large switch statement.
The app could be sending out various requests at any time but I can't guarantee in what order I will get responses. The first part of each response contains a hex address that tells what information I am receiving is.
I need to take the incoming data and perform a different calculation on it depending on what it is. What I do right now is read the hex address as it comes in and then send it to a giant switch statement which then calles the proper function to convert the data.
I am trying to come up with something better than the giant switch statement. Although I cannot count on exactly what data I will receive in any given message I do know all the possible items that could be received.
Any suggestions that any one has would be appreciated I am not used to handling data like this.
A giant switch statement is very traditional here. Just make sure to separate your work from your switch. For example, avoid this:
switch byte {
case 0x01:
doing()
various()
things()
case 0x02:
doing()
other()
things()
...
}
That code can get pretty messy, though I admit I make this mistake all the time.... The better approach is to pull out the operations:
switch byte {
case 0x01: handleOperationA()
case 0x02: handleOperationB()
...
}
func handleOperationA() { ... }
func handleOperationB() { ... }
You of course can make a constants here for 0x01, 0x02, etc., but if this is the only place these values, then creating the constant can become duplicative. The name of the function provides just as much documentation as the name of the constant. There are trade-offs here.
Another possibility is to replace your switch with a Dictionary, mapping the value to a function (or if it's exactly one byte, and most of the values are used, an Array can even work here, but that's kind of rare).
Dictionaries are nice if things are variable, or if there are a very large number of possible values, but it's not always obvious which is more efficient (the optimizer can do a lot with a switch statement of integers; don't assume dictionary lookups are always faster).
But if you're writing a networking stack, or any kind of parser, embrace a large switch statement. They're completely normal. Just keep it simple.

How to convert hexadecimal data (stored in a string variable) to an integer value

Edit (abstract)
I tried to interpret Char/String data as Byte, 4 bytes at a time. This was because I could only get TComport/TDatapacket to interpret streamed data as String, not as any other data type. I still don't know how to get the Read method and OnRxBuf event handler to work with TComport.
Problem Summary
I'm trying to get data from a mass spectrometer (MS) using some Delphi code. The instrument is connected with a serial cable and follows the RS232 protocol. I am able to send commands and process the text-based outputs from the MS without problems, but I am having trouble with interpreting the data buffer.
Background
From the user manual of this instrument:
"With the exception of the ion current values, the output of the RGA are ASCII character strings terminated by a linefeed + carriage return terminator. Ion signals are represented as integers in units of 10^-16 Amps, and transmitted directly in hex format (four byte integers, 2's complement format, Least Significant Byte first) for maximum data throughput."
I'm not sure whether (1) hex data can be stored properly in a string variable. I'm also not sure how to (2) implement 2's complement in Delphi and (3) the Least Significant Byte first.
Following #David Heffernan 's advice, I went and revised my data types. Attempting to harvest binary data from characters doesn't work, because not all values from 0-255 can be properly represented. You lose data along the way, basically. Especially it your data is represented 4 bytes at a time.
The solution for me was to use the Async Professional component instead of Denjan's Comport lib. It handles datastreams better and has a built-in log that I could use to figure out how to interpret streamed resposes from the instrument. It's also better documented. So, if you're new to serial communications (like I am), rather give that a go.

What is the fastest mean to transfer a record in DCOM

I want to transfer some records with the following structure between two Windows PC computer using COM/DCOM. I prefer to transfer an array, say 100 members of TARec, at a time, not each record individually. Currently I am doing this using IStrings. I am looking to improve it using the raw records, to save the time to encode/decode the strings at both ends. Please share your experience.
type
TARec = record
A : TDateTime;
B : WORD;
C : Boolean;
D : Double;
end;
All the record's field type are OLE compatible. Many thanks in advance.
As Rudy suggests in the comments, if your data contains simple value types then a variant byte array can be a very efficient approach and quite simple to implement.
Since you have stated that your data already resides in an array, the basic approach would be:
Create a byte array of the required size to hold all your record data (use VarArrayCreate with type varByte)
Lock the array to obtain a pointer that is safe to use to reference the array contents in memory (VarArrayLock will lock and return a pointer to the array data)
Use CopyMemory to directly copy the data from your array of records to the byte array memory.
Unlock the variant array (VarArrayUnlock) and pass it through your COM/DCOM interface
On the other ('receiving') side you simply reverse the process:
Declare an array of records of the required size
Lock the variant byte array to obtain a pointer to the memory holding the bytes
Copy the byte array data into your record array
Unlock the byte array
This exact approach is something I have used very successfully in a very demanding COM/DCOM scenario (w.r.t efficiency/performance) in the past.
Things to be careful of:
If your data ever changes to include more complex types such as strings or dynamic arrays then additional work will be required to correctly transport these through a byte array.
If your data structure ever changes then the code on both sides of the interface will need to be updated accordingly. One way to protect against this is to incorporate some mechanism for the data to be identified as valid or not by the receiver. This could include a "version number" for example and/or a value (in a 'header' as part of the byte array, in addition to the array data, or passed as a separate parameter entirely - precise details don't really matter). If the receiver finds a version number or size that it is not expecting then it can report this gracefully rather than naively processing the data incorrectly and (most likely) crashing or throwing exceptions as a result.
Alignment/packing issues. Even with the same declaration for the record type, if code is compiled with different alignment settings then the size required for each record in memory could change (which is why a "version number" for the data structure format might not be reliable on its own). One way to avoid this would be to declare the record as packed, though this comes at the cost of a slight reduction in efficiency (and still relies on both sides of the interface agreeing that the data structure is packed).
There are just things to bear in mind however, not prescriptive. Just how complex/robust your implementation needs to be will be determined by your specific case.

Can i define in what endianess i read from NSData?

I have some files written on an Android device, it wrote bytes in big endian.
Now i try to read this file with iOS and there i need them in small endian.
I can make a for loop and
int temp;
for(...) {
[readFile getBytes:&temp range:NSMakeRange(offset, sizeof(int))];
target_array[i] = CFSwapInt32BigToHost(temp);
// read more like that
}
However it feels silly to read every single value and turn it before i can store it. Can i tell the NSData that i want the value read with a certain byte-order so that i can directly store it where it should be ?
(and save some time, as the data can be quite large)
I also worry about errors when some datatype changes and i forget to use the 16 instead of the 32 swap.
No, you need to swap every value. NSData is just a series of bytes with no value or meaning. It is your app that understands the meaning so it is your code logic that must swap each set of bytes as needed.
The data could be filled with all kinds of values of different sizes. 8-bit values, 16-bit values, 32-bit values, etc. as well as string data or just a stream of bytes that don't need any ordering at all. And the NSData can contain any combination of these values.
Given all of this, there is no simple way to tell NSData that the bytes need to be treated in a specific endianness.
If your data is, for example, nothing but 32-bit integer values stored in a specific endianness and you want to extract an array of bytes, create a helper class that does the conversion.

Is byte ordering the same across iOS devices, and does this make using htonl and ntohl unnecessary between iOS devices?

I was reading this example on how to use NSData for network messages.
When creating the NSData, the example uses:
unsigned int state = htonl(_state);
[data appendBytes:&state length:sizeof(state)];
When converting the NSData back, the example uses:
[data getBytes:buffer range:NSMakeRange(offset, sizeof(unsigned int))];
_state = ntohl((unsigned int)buffer);
Isn't it unnecessary to use htonl and ntohl in this example?
- since the data is being packed / unpacked on iOS devices, won't the byte ordering be the same, making it unnecessary to use htonl and ntohl.
- Isn't the manner in which it is used incorrect? The example uses htonl for packing, and ntohl for unpacking. But in reality, shouldn't one only do this if one knows that the sender or receiver is using a particular format?
The example uses htonl for packing, and ntohl for unpacking.
This is correct.
When a sender transfers data (integers, floats) over the network, it should reorder them to "network byte order". The receiver performs the decoding by reordering from network byte order to "host byte order".
But in reality, shouldn't one only do this if one knows that the sender or receiver is using a particular format?
Usually, a sender doesn't know the byte order of the receiver, and vice versa. Thus, in order to avoid ambiguity, one needs to define the byte order of the "network". This works well, provided sender and receiver actually do correctly encode/decode for the network.
Edit:
If you are concerned about performance of the encoding:
On modern CPUs the required machine code for byte swapping is quite fast.
On the language level, functions to encode and decode a range of bytes can be made quite fast as well. The Objective-C example in our post doesn't belong to those "fast" routines, though.
For example, since the host byte order is known at compile time, ntohl becomes an "empty" function (aka "NoOp") if the host byte order equals the network byte order.
Other byte swap utility functions, which extend on the ntoh family of macros for 64-bit, double and float values, may utilize C++ template tricks which may also become "NoOp" functions.
These "empty" functions can then be further optimized away completely, which effectively results in machine code which just performs a move from the source buffer to the destination buffer.
However, since the additional overhead for byte swapping is pretty small, these optimizations in case where swapping is not needed are only perceptible in high performance code. But your first statement:
[data getBytes:buffer range:NSMakeRange(offset, sizeof(unsigned int))];
is MUCH more expensive than the following statement
_state = ntohl((unsigned int)buffer);
even when byte swapping is required.

Resources