In AKS we dynamically create and use persistent volume with azure disk.
We setup initial size 20G for s service and we ran out of space.
Is there any way that we can setup a disk alert for dynamic disks to be informed?
Starting with agent version ciprod10052020, Container insights integrated agent now supports monitoring PV (persistent volume) usage : Reference
Create metric alerts :
To alert on system resource issues when they are experiencing peak demand and running near capacity, with Container insights you would create a log alert based on performance data stored in Azure Monitor Logs.
For refernce see alerts-metric-overview .
Container insights includes Average disc usage % and Average
persistent volume usage% alerts from among many alerts.
See Prerequisites and Steps to enable the metric alerts in
Azure Monitor from the Azure portal
Log-based alerts:
Log alerts run queries on Log Analytics data : Log alerts and prerquisites
Create an alert rule for container resource utilization - refer :
create an alert rule
Note: To create an alert rule for container resource utilization
requires you to switch to a new log alerts API as described in switch
API preference for log alerts
To Query to alert when free disk space on cluster nodes exceeds a threshold: see last section of resource-utilization-log-search-queries.
Related
Is it possible to specify disk type for OS disk when creating a node pool in AKS cluster? By default, I see the disk type used is Premium LRS. I know it is possible to control OS Disk Size via --node-osdisk-size when creating a new node pool, but I don't see an ability to specify something like --node-osdisk-type.
There is (now) a --os-disk-type parameter, see
https://learn.microsoft.com/en-us/azure/aks/cluster-configuration#use-ephemeral-os-on-new-clusters
There is no way to select the disk type when creating a Node Pool in AKS. If the VM support premium disks, it will use premium disk. It is one of the most upvote feature request for AKS
I have a cluster in Google Cloud. But I need to know information about resources usage.
In interface of each node there are three graphics about CPU, memory and disk usage. But all this graphics in the each node have warning "No data for this time interval" for any time interval.
I upgraded all clusters and nodes to the latest version 1.15.4-gke.22 and changed "Legacy Stackdriver Logging" to "Stackdriver Kubernetes Engine Monitoring".
But it didn't help.
In Stackdriver Workspace there is only "disk_read_bytes" with graphics, any other requests in Metric Explorer have only message "No data for this time interval"
If I do request "kubectl top nodes" in the command line, I see current data for CPU and memory. But I need to see it on Node detail page to understand the peak load. How can I configure it?
In my case, I was missing permissions on the IAM service account associated with the cluster - make sure it has the roles:
Monitoring Metrics Writer (roles/monitoring.metricWriter)
Logs Writer (roles/logging.logWriter)
Stackdriver Resource Metadata Writer (roles/stackdriver.resourceMetadata.writer)
This is documented here
Actually it sound strange because if you can get metrics in command line and the Stackdriver interface doesn't show them maybe it's a bug.
I recommend this: if you be able, create a cluster with the minimum resources, check the same Stackdriver metrics and if there are metrics, it can be a bug and you can report it on in the appropriate GCP channel.
Check the documentation about how to get support within GCP:
Best Practices for Working with Cloud Support
Getting support for Google Cloud
Is there any built-in way to monitor memory usage of an application running in managed Google Cloud Run instances?
In the "Metrics" page of a managed Cloud Run service, there is an item called "Container Memory Allocation". However, as far as I understand it, this graph refers to the instance's maximum allocated memory (chosen in the settings), and not to the memory actually used inside the container. (Please correct me if I'm wrong.)
In the Stackdriver Monitoring list of available metrics for managed Cloud Run ( https://cloud.google.com/monitoring/api/metrics_gcp#gcp-run ), there also doesn't seem to be any metric related to the memory usage, only to allocated memory.
Thank you in advance.
Cloud Run now exposes a new metrics named "Memory Utilization" in Cloud Monitoring, see more details here.
This metrics captures the container memory utilization distribution across all container instances of the revision. It is recommended to look at the percentiles of this metric: 50th percentile, 95th percentiles ad 99th percentiles to understand how utilized are your instances
Currently, there seems to be no way to monitor the memory usage of a Google Cloud Run instance through Stackdriver or on "Cloud Run" page in Google Cloud Console.
I have filed a feature request on your behalf, in order to add memory usage metrics to Cloud Run. You can see and track this feature request in the following link.
There is not currently a metric on memory utilization. However, if your service reaches a memory limit, the following log will appear in Stackdriver Logging with ERROR-level severity:
"Memory limit of 256M exceeded with 325M used. Consider increasing the memory limit, see https://cloud.google.com/run/docs/configuring/memory-limits"
(Replace specific numbers accordingly.)
Based on this log message, you could create a Log-based Metric for memory exceeded.
We use Azure IOT and edgeHub and edgeAgent as modules for Edge runtime. We want to verify the capability of Offline storage is configured correctly in our environment
We have custom simulator module connected to custom publisher module that publishes to an API in the Cloud. The simulator is continuously producing a message around 10KB every 2 seconds. The publisher module could not talk outside because of a blocking firewall rule. I want to verify all the memory/RAM allocated for edgeHub is used and later overflows to disk and uses reasonable available disk space.
This exercise is taking longer to complete even when I run multiple modules/instances of simulator.
Queries:
How can I control the size of memory allocated to edgeHub. What is the correct createOptions to control/reduce allocated memory. It is currently allocated around 1.8GB.
I see generally during the exercise, RAM keeps increasing. But at some point, it drops down a little and keeps increasing after. Is there some kind of GC or optimization happening inside edgeHub?
b0fb729d25c8 edgeHub 1.36% 547.8MiB / 1.885GiB 28.38% 451MB / 40.1MB 410MB / 2.92GB 221
How can ensure that any of the messages produced by simulator are not lost. Is there a way to count the number of messages in edgeHub?
I have done the proper configuration to mount a directory from VM to container for persistent storage. Is there a specific folder inside edgeHub folder under which messages would be stored when overflown?
I will document the answers after the input I have received from Azure IoTHub/IoTEdge maintainers.
To control memory limit of containers/modules on Edge, specify as below with appropriate max memory limit in bytes
"createOptions": "{\"HostConfig\": { \"Memory\": 536870912 }}"
Messages are persistent because the implementation is RocksDB which stores to disk by default
To make the messages persistent even after edgeHub container recreation, mount a directory from Host to Container. In the example below /data from Host is mounted to appropriate location within edgeHub
"env": {
"storageFolder": {
"value": "/data"
}
}
To monitor the memory usage the metrics feature will be generally available in the 1.0.10
Monitor IoT Edge Devices At-scale
Say my data store is going to increase in size, if the data increases how storage manager would manage the data. Does storage manager split the data with different domain machines ( definitely that is not the case)?
How exactly would the process work? What is the recommendation in this area, key-value store?
If you have a storage manager that is soon to run out of disk space, you can startup a new storage manager with a larger disk subsystem or that points to extensible cloud storage such as Amazon S3. Once the new storage manager is up-to-date the old one can be taken offline. This entire operation can be done while the database is running. Generally, we also recommend that you always run with at least 2 storage managers for redundancy.
If you have more questions, feel free to direct them to the NuoDB forum:
http://www.nuodb.com/community
NuoDB supports multiple back-end storage devices, including the Hadoop Distributed File System (HDFS). If you start a broker configured for HDFS, you can use HDFS tools to expand distributed storage on-the-fly and there's no need for any NuoDB admin operations. As Duke described, you can transition from a file-based Storage Manager to an HDFS one without interrupting services.
NuoDB interfaces with the storage layer using filesystem semantics for discrete storage units called "atoms". These map easily into the HDFS directory structure, simplifying administration on that end.