MANIFEST file in Azure IoT Edge edgehub folder is increasing in size - azure-iot-edge

I can see one MANIFEST-000004 file in edgehub folder. Its increasing in size daily. What is this MANIFEST file and is there any way to reduce it? I am using edgeagent 1.0.10.4 and edgehub 1.0.10.4.

Looks like you have deployed Azure Blob Storage on IoT Edge.
Azure Blob Storage on IoT Edge provides a block blob and append blob storage solution at the edge. A blob storage module on your IoT Edge device behaves like an Azure blob service, except the blobs are stored locally on your IoT Edge device.
Ref: Store data at the edge with Azure Blob Storage on IoT Edge
In order to reduce the size of your blobs you need to Turn ON the deviceAutoDelete feature and then specify the time in minutes (deleteAfterMinutes) after which the blobs will be automatically deleted.

Related

IOT Edge Hub: testing for offline storage

We use Azure IOT and edgeHub and edgeAgent as modules for Edge runtime. We want to verify the capability of Offline storage is configured correctly in our environment
We have custom simulator module connected to custom publisher module that publishes to an API in the Cloud. The simulator is continuously producing a message around 10KB every 2 seconds. The publisher module could not talk outside because of a blocking firewall rule. I want to verify all the memory/RAM allocated for edgeHub is used and later overflows to disk and uses reasonable available disk space.
This exercise is taking longer to complete even when I run multiple modules/instances of simulator.
Queries:
How can I control the size of memory allocated to edgeHub. What is the correct createOptions to control/reduce allocated memory. It is currently allocated around 1.8GB.
I see generally during the exercise, RAM keeps increasing. But at some point, it drops down a little and keeps increasing after. Is there some kind of GC or optimization happening inside edgeHub?
b0fb729d25c8 edgeHub 1.36% 547.8MiB / 1.885GiB 28.38% 451MB / 40.1MB 410MB / 2.92GB 221
How can ensure that any of the messages produced by simulator are not lost. Is there a way to count the number of messages in edgeHub?
I have done the proper configuration to mount a directory from VM to container for persistent storage. Is there a specific folder inside edgeHub folder under which messages would be stored when overflown?
I will document the answers after the input I have received from Azure IoTHub/IoTEdge maintainers.
To control memory limit of containers/modules on Edge, specify as below with appropriate max memory limit in bytes
"createOptions": "{\"HostConfig\": { \"Memory\": 536870912 }}"
Messages are persistent because the implementation is RocksDB which stores to disk by default
To make the messages persistent even after edgeHub container recreation, mount a directory from Host to Container. In the example below /data from Host is mounted to appropriate location within edgeHub
"env": {
"storageFolder": {
"value": "/data"
}
}
To monitor the memory usage the metrics feature will be generally available in the 1.0.10
Monitor IoT Edge Devices At-scale

How to persist Sensor telemetry data into cold storage such as big data storage or cosmosdb

Azure digital twin gives example of using Time Series Insights. Looking for steps how to persist Sensor telemetry data into cold storage such as big data storage or cosmosdb for later retrieval for the business application.
We are currently implementing such a system and I made a few tests, among which create an endpoint of type "Event Hub" (through the API) and then configuring the "Capture" feature to put the collected data into AVRO files in a Data Lake. This works but may not be the most ideal solution for what we need so I'll explore streaming data from the IoT Hub to a SQL DB... Now I need to access that IoT Hub that was created through the API and is not available in the Azure Portal... Will keep you posted.
For future reference linking the Azure Digital Twins User Voice entry
Alina Stanciu commented · December 20, 2018 12:53 PM ·
Azure Digital Twins events and messages could be routed to Azure Event Hubs, Azure Service Bus topics, or Azure Event Grid for further processing.
For warm analytics, you could send the data to an Event Hub and from there to Azure Time Series Insights (TSI). TSI added recently cold storage as well (they are public preview for cold storage).
If you need more advanced analytics on cold storage, you can forward the data from Event Hub to Azure Data Lake (ADL). Our customers are storing today warm and cold data from sensor & spaces into TSI and ADL. We are looking into integrating our graph modeling with TSI and ADL so modeling will be defined in one place Digital Twins and will be discovered and recognized in downstream services.
I used a custom Edge Module routing messages to IoT Hub, then set up a Stream Analytics Job with IoT hub as input and an azure sql instance as the output. It was pretty painless.
https://learn.microsoft.com/en-us/azure/stream-analytics/stream-analytics-manage-job
https://learn.microsoft.com/en-us/azure/stream-analytics/stream-analytics-stream-analytics-query-patterns#query-example-send-data-to-multiple-outputs

aws iot file transfer

I am trying to use AWS IoT to communicate with my beaglebone board, I got the MQTT messages transferring from the board to the server using MQTT. I was wondering if there is a way to transfer files (text or binary) files to the server and from the server to beaglebone using AWS IoT.
The payload of a MQTT message is just a byte stream, so can carry just about anything (up to the max size of 268,435,456 bytes according to the spec [AWS may have other limits in it's implementation]).
You will have to implement your own code to publish files and to subscribe and save files. You will also have to implement a payload format that includes any metadata you might need (e.g. file names)
You can transfer file using MQTT, but you should first divide it to smaller pieces and then send it, but the payload has limitations 128 kB. More information about AWS IoT and its limitations here.
But I would suggest not using MQTT to transfer files, because messaging also cost money, so if the file size is big and you send it periodically, then it may cost you much. You can find AWS IoT Core prices here.
You can upload your file(s) to S3 bucket and then access the file(s) from there.

Is there a way to have a shared (temp) folder between apps or multiple instances of apps on Bluemix?

I am running a Rails app on Bluemix and want to use carrierwave for file uploads. So far no problem as I am using external storage to persist the files (ftp, s3, webdav etc.). However, in order to keep performance well I need to enable caching with carrierewave_backgrounder - and here it starts to get tricky. Thing is that I need to specify a temp folder for backgrounding the upload process (temp folder where the file remains before it is persisted on the actual storage), which is shared between all possible workers and app instances. If so at all, how can this be achieved?
Check out Object Storage - you can store files and then delete them when you no longer have a need for them. Redis is another option, as are any of the noSQL databases available on Bluemix.
typically in any cloud you never store on a file system of your VM or PaaS environment - reason being when you scale out, you have multiple VMS and a file written on one VM will not be available when 100s of VMs come up. The recommend practice is to look for storage services that the cloud platform provides. In Bluemix you have Storage options such as Cloud Object Storage, File Storgae and Block Storage.
As suggested before - you can take a look at the cloud object storage and utilize the service. Here is the documentation for Cloud Object Storage: https://ibm-public-cos.github.io/crs-docs/?&cm_mc_uid=06526771022514957173672&cm_mc_sid_50200000=1498597403&cm_mc_sid_52640000=1498599343. This contains quick start guide, storing, retrieving and API usage. Hope this helps.

how nuodb manages the storage size increase

Say my data store is going to increase in size, if the data increases how storage manager would manage the data. Does storage manager split the data with different domain machines ( definitely that is not the case)?
How exactly would the process work? What is the recommendation in this area, key-value store?
If you have a storage manager that is soon to run out of disk space, you can startup a new storage manager with a larger disk subsystem or that points to extensible cloud storage such as Amazon S3. Once the new storage manager is up-to-date the old one can be taken offline. This entire operation can be done while the database is running. Generally, we also recommend that you always run with at least 2 storage managers for redundancy.
If you have more questions, feel free to direct them to the NuoDB forum:
http://www.nuodb.com/community
NuoDB supports multiple back-end storage devices, including the Hadoop Distributed File System (HDFS). If you start a broker configured for HDFS, you can use HDFS tools to expand distributed storage on-the-fly and there's no need for any NuoDB admin operations. As Duke described, you can transition from a file-based Storage Manager to an HDFS one without interrupting services.
NuoDB interfaces with the storage layer using filesystem semantics for discrete storage units called "atoms". These map easily into the HDFS directory structure, simplifying administration on that end.

Resources