How to preserve data integrity while minimizing the transmission size - data-integrity

we have sensors in the wild that send their data to a server every day via TCP/IP, either through 3G or through satellite for the physical layer. The sensors can automatically switch from one to the other depending on their location and the quality of the signal with the local 3G operator.
Given that the 3G and satellite communications are very expensive, we want to minimize the amount of data to send. But also, we want to protect ourselves from lost data.
What would be the best strategy to ensure with reasonable certainty that the integrity of our data is preserved, while minimizing the amount of redundancy, i.e the amount of data transmitted ?
I've read about the zfec codec, but I'm not sure if we need to transmit all the chunks, or if we need to send a hash code along each chunk, or simply the minimum number of chunks and a hashcode for the whole file.

Related

ESP32: Store and Send data via BLE frequently

I'm developing a sensor based on the ESP32-DevKit board where I sense vibration from an accelerometer. The application/sensor goal is to store the accelerometer data for 20s and then send all the data through BLE.
I'm currently using the ESP32 ADC (12 bit) for a fast sampling rate (10-100KHz) to get an accurate signal. The next step is to store this signal, but it will take as size almost 2MB, so I don't know if I can store it in the ESP32 and send it later via BLE (packet by packet), therefore a lot of tasks will end up degrading the process time and Energy.
The main points are :
Fast sampling rate / accurate signal.
Sending data to phone with the lowest energy possible.
using ESP32-S2 to Store 2MB data and resend it to Phone app.
Is there any possibility of doing what I want to?
When storing the signal, have you considered compressing the data? If the accelerometer readings are very similar to the previous reading, then just storing the difference might save a lot of space, especially if you use a variable length format.
I have a project where I save GPS data, but because it is comparatively slow moving boats, the difference between two coordinates (every second or so) will be very small, so no point storing the full coordinates.

Bluetooth Low Energy data transmission on iOS

I'm recently working on a project which uses Bluetooth Low Energy. I implemented most of communication protocol, however I started having concerns, that actually I don't know how the data transmission works and if the solution that I implemented is going to behave in the same way with all devices.
So my main concern is what data chunk is received when I get a notification from peripheral(_:didUpdateValueFor:error:)? Is it only as big as negotatiated MTU size? Or maybe iOS receives information about chunk size and waits to receive it all before triggering peripheral(_:didUpdateValueFor:error:)?
When a peripheral sends chunks let's say 100 bytes each, can I assume that I will get always in a single notification 100 bytes? Or could it be last 50 bytes from previous chunk and first 50 bytes from the next one? That would be quite tricky and hard to detect where is the beginning of my frame.
I tried to find more information in Apple documentation but there is nothing about it.
My guess is that I receive always a single state of characteristic. Therefore it would mean that chunks depend on implementation on peripheral side. But what if characteristic is bigger than MTU size?
First, keep in mind that sending streaming data over a characteristic is not what characteristics are designed for. The point of characteristics is to represent some small (~20 bytes) piece of information like current battery level, device name, or current heartbeat. The idea is that a characteristics will change only when the underly value changes. It was never designed to be a serial protocol. So your default assumption should be that it's up to you to manage everything about that.
You should not write more data to a characteristic than the value you get from maximumWriteValueLength(for:). Chunking is your job.
Each individual value you write will appear to the receiver atomically. Remember, these are intended to be individual values, not chunks out of a larger data stream, so it would make no sense to overlap values from the same characteristic. "Atomically" means it all arrives or none of it. So if your MTU can handle 100 bytes, and you write 100 bytes, the other side will receive 100 bytes or nothing.
That said, there is very little error detection in BLE, and you absolutely can drop packets. It's up to you to verify that the data arrived correctly.
If you're able to target iOS 11+, do look at L2CAP, which is designed for serial protocols rather than using GATT.
If you can't do that, I recommend watching WWDC 2013 Session 703, which covers this use case in detail. (I am having trouble finding a link to it anymore, however.)

Could you use the internet to store data in the transmission space between countries?

Is it possible to bounce data back and forwards between lets say a USA computer and an Australian computer through the internet and just send these packets back and forwards and use this bounced data as a data storage?
As I understand it would take some time for the data to go from A to B, lets say 100 milliseconds, then therefore the data in transfer could be considered to be data in storage. If both nodes had a good bandwidth and free bandwidth, could data be stored in this transmission space? - by bounce the data back and forwards in a loop.
Would there be any reasons why this would not work.
The idea comes from a different idea I had some time ago where I thought you could store data in empty space by shooting laser pulse between two satellites a few light minutes apart. In the light minutes of space between then you could store data in this empty space as the transmission of data.
Would there be any reasons why this would not work.
Lost packets. Although some protocols (like TCP) have means to prevent packet loss, it involves the sender re-sending lost packets as needed. That means each node must still keep a copy of the data available to send it again (or the protocol might fail), so you'd still be using local storage while the communication does not complete.
If you took any networking classes, you would know the End-to-End principle, which states
The end-to-end principle states that application-specific functions ought to reside in the end hosts of a network rather than in intermediary nodes
Hence, you can not expect routers between your two hosts to keep the data for you. They have to freedom to discard it at anytime (or they themselves may crash at any time with your data in their buffer).
For more, you can read this wiki link:
End-to-End principle
It think this should actually work as in reality you store that information in various IO buffers of the numerous routers, switches and network cards. However the amount of storable information would probably be too small to have practical use, and network administrators of all levels are unlikely to enjoy and support such a creative approach.
Storing information in the delay line is a known approach and has been used to build memory devices in the past. However the past methods rely on delay during signal propagation over physical medium. As Internet mostly uses wires and electromagnetic waves that travel with the sound of light, not much information can be stored this way. Past memory devices mostly used sound waves.

Is there a UDP-based protocol that offers more robust sending of large data elements without datagram reliability?

On one end, you have TCP, which guarantees that packets arrive and that they arrive in order. It's also designed for the commodity Internet, with congestion control algorithms that "play nice" in traffic. On the other end of the spectrum, you have UDP, which doesn't guarantee arrival time and order of packets, and it allows you to send large data to a receiver. Somewhere in the middle, you have reliable UDP-based programs, such as UDT, that offer customized congestion control algorithms and reliability, but with greater speed and flexibility.
However, what I'm looking for is the capability to send large chunks of data over UDP (greater than the 64k datagram size of UDP), but without a concern for reliability of each individual datagram. The idea is that the large data is broken down into datagrams of a specified size (<= 64,000 bytes), probably with some header data stuck on the front and sent over the network. On the receiving side, these datagrams are read in and stored. If a datagram doesn't arrive, all of the datagrams associated with that transfer are simply thrown out by the client.
Most of the "reliable UDP" implementations try to maintain reliability of each datagram, but I'm only interested in the whole, and if I don't get the whole, it doesn't matter - throw it all away and wait for the next. I'd have to dig deeper, but it might be possible with custom congestion control algorithms in UDT. However, are there any protocols with this approach?
You could try ENet, whilst not specifically aimed at what you're trying to do it does have the concept of 'fragmented data blocks' whereby you send data larger than its MTU and it sends as a sequence of datagrams of its MTU with header details that relate one part of the sequence to the rest. The version I'm using only supports 'reliable' fragments (that is the ENet reliability layer will kick in to resend missing fragments) but I seem to remember seeing discussion on the mailing list about unreliable fragments which would likely to exactly what you want; i.e. deliver the whole payload if it all arrives and throw away the bits if it doesn't.
See http://enet.bespin.org/
Alternatively take a look at the answers to this question: What do you use when you need reliable UDP?

What is buffer? What are buffered reads and writes?

I heard the word buffer after a long time today and wondering if somebody can give a good overview of buffer and some examples of how it matters in today's world.
A buffer is generally a portion of memory that contains data that has not yet been fully committed to its intended device. In the case of buffered I/O, generally there is a fast device and a slow device. The devices themselves need not have disparate speeds, but perhaps the interfaces between them differ or perhaps it is more time-consuming to either produce or consume the data than the other part is.
The idea is that you temporarily store the generated data in a buffer so that it is not lost when the slower device isn't ready to handle it. Once the device is ready, the another buffer may take the current buffer's place and the consuming device will process the data in the first buffer.
In this manner, the slower device receives the data at a moderated pace rather than the fire-hose that the original data source can be.

Resources