How much impact does the network delay have on IoT Edge throughput? - azure-iot-edge

We have a customer who has deployed a number of iotedge transparent gateways and keeps routing data from tons of leaf devices to cloud.
Recently they noticed the output (edge to IoT Hub) cannot catch up the input on part of the edge devices, which is causing a severe latency issue for their messages.
Here's the information of the built-in metrics on edgeHub,named 8B:
edgehub_queue_length
8B: 981061
edgehub_message_send_duration_seconds
8B: ~110ms
{quantile="0.1"} 0.0632608
{quantile="0.5"} 0.1136008
{quantile="0.9"} 0.127605
{quantile="0.99"} 0.2449048
edgehub_message_process_duration_seconds
8B: 0.5-2.0 ms
We would like to clarify two questions:
What is the recommended network latency for iotedge gateway?
Are there any other methods we can do to improve the output throughput of
edgeHub?

Related

Difference between using Thingsboard Edge and Thingsboard CE on premise

After checking the documentation of Thingsboard and all its flavours, one question I have is, Does it make sense to use Thingsboard Edge when the Thingsboard CE/PE server is on-premise?
What I understood is that the point of the TB Edge is to do local aggregation/data processing, lower latency in data handling and visualisation, react to local alerts, and reduce data traffic.
However, some of these concerns are not critical when the main TB server is in the same local network and traffic and storage are not charged by a 3rd party.
What could be an example of such a case if any?
Thanks.

what factors determine "catch up speed" on Edge after a network outage

I have a customer with IoT Edge deployed to manufacturing plants in remote areas with spotty internet. They have leaf devices sending messages to IOT Edge and then to IoT Hub. They frequently have small outages (5, 10, 15 minutes). They often need to make timely decisions based on the data that makes it to IOT Hub from the plants. They've noticed, if they have a 15 minute outage, it can take anywhere from 15-30 minutes afterwards for IOT Edge to catch up.
Besides network speed itself, what are the factors that would influence that.. For example
- if we were hitting throttling based on their number of iot hub units, would that be surfaced in the edgeHub logs?
- if disk, network, etc can keep up, does edgeHub pretty much upload data as fast as possible (given throttling), or are there any other limits imposed by default?
- What is the default connection retry policy in edgeHub? is the same exponential backoff policy in the C# SDK? If so, could that be the case that if I have a 15 minute outage, that it's taking edgeHub a while after network recovery to 'try again'? If so, is that policy configurable in edgeHub? (via ENV variable or something?)
Any other things to check?

How to deal with the LoRa interference

I'm trying to test my LoRaWAN network and I'm having some very disappointing results.
I have a Ideetron lorank8v1 gateway which can work over distances of 15 km.
In addition I used the Ideetron Nexus Board as a microcontroller and a Nexus Demoboard where the sensors are mounted.
I'm sending 1 packet per minute with my humidity and temperature measurements.
After 200m my packets aren’t captured by the gateway anymore.
I've ran the packet logger software on my gateway and all packets that are captured have a bad CRC_CODE verification which I think is due to interference.
I have thought that I may have some interference from LTE/3G/4G networks but these networks are not 868MHz band in Greece.
My gateway has 8 simultaneous channels and my node uses SF9 with a 125KHz bandwidth. When I changed these parameters nothing changed. I'm using it in a urban area.
What should I do?
Maybe I have to configure the datarate, the spreadingfactor and the frequencies at which my node transmits?
Are there any better ideas out there?

How do IoT Edge "internal" messages count against my message quota?

IoT Hub is billed based on number of messages per day (including updating and retrieval of twins, etc). We know that IoT Edge uses some internal messages to operate, such as the reported health/status updates that appear in the portal for it's modules, retrieval of it's own device twin. module twins, etc.
How does this traffic affect the messages against my daily quota? i.e what "counts"? My expectation would be that explicit twin updates/retrievals from custom modules would count, but does the edgeAgent/edgeHub traffic count? If it does, how often does that happen?
Doesn't seem to be a lot of traffic, but it affects pricing and sizing IoT solutions, so needs to be factored in.
--Steve
IoT Edge is "free" with IoT Hub (i.e. the features are available on all IoT hubs; you don't have to bring in/pay for a separate resource), but you do pay for all traffic. Mostly that will just be your traffic (messages your devices/modules are sending/receiving), but Edge Agent and Edge Hub do twin operations when the edge device is starting up, and when things change. So if you deploy a new module to your edge device you'll see some Edge Agent twin traffic related to that. If you change some routes, you'll see the corresponding Edge Hub twin traffic.
As the product nears general availability, you can expect to see documentation that outlines how the Agent and Hub are using their twins, so you know what to expect.

Mirrored queue performance factors

We operate two dual-node brokers, each broker having quite different queues and workloads. Each box has 24 cores (H/T) worth of Xeon E5645 # 2.4GHz with 48GB RAM, connected by Gigabit LAN with ~150μs latency, running RHEL 5.6, RabbitMQ 3.1, Erlang R16B with HiPE off. We've tried with HiPE on but it made no noticeable performance impact, and was very crashy.
We appear to have hit a ceiling for our message rates of between 1,000/s and 1,400/s both in and out. This is broker-wide, not per-queue. Adding more consumers doesn't improve throughput overall, just gives that particular queue a bigger slice of this apparent "pool" of resource.
Every queue is mirrored across the two nodes that make up the broker. Our publishers and consumers connect equally to both nodes in a persistant way. We notice an ADSL-like asymmetry in the rates too; if we manage to publish a high rate of messages the deliver rate drops to high double digits. Testing with an un-mirrored queue has much higher throughput, as expected. Queues and Exchanges are durable, messages are not persistent.
We'd like to know what we can do to improve the situation. The CPU on the box is fine, beam takes a core and a half for 1 process, then another 80% each of two cores for another couple of processes. The rest of the box is essentially idle. We are using ~20GB of RAM in userland with system cache filling the rest. IO rates are fine. Network is fine.
Is there any Erlang/OTP tuning we can do? delegate_count is the default 16, could someone explain what this does in a bit more detail please?
This is difficult to answer without knowing more about how your producers and consumers are configured, which client library you're using and so on. As discussed on irc (http://dev.rabbitmq.com/irclog/index.php?date=2013-05-22) a minute ago, I'd suggest you attempt to reproduce the topology using the MulticastMain java load test tool that ships with the RabbitMQ java client. You can configure multiple producers/consumers, message sizes and so on. I can certainly get 5Khz out of a two-node cluster with HA on my desktop, so this may be a client (or application code) related issue.

Resources