Monitor Google Cloud Run memory usage - google-cloud-run

Is there any built-in way to monitor memory usage of an application running in managed Google Cloud Run instances?
In the "Metrics" page of a managed Cloud Run service, there is an item called "Container Memory Allocation". However, as far as I understand it, this graph refers to the instance's maximum allocated memory (chosen in the settings), and not to the memory actually used inside the container. (Please correct me if I'm wrong.)
In the Stackdriver Monitoring list of available metrics for managed Cloud Run ( https://cloud.google.com/monitoring/api/metrics_gcp#gcp-run ), there also doesn't seem to be any metric related to the memory usage, only to allocated memory.
Thank you in advance.

Cloud Run now exposes a new metrics named "Memory Utilization" in Cloud Monitoring, see more details here.
This metrics captures the container memory utilization distribution across all container instances of the revision. It is recommended to look at the percentiles of this metric: 50th percentile, 95th percentiles ad 99th percentiles to understand how utilized are your instances

Currently, there seems to be no way to monitor the memory usage of a Google Cloud Run instance through Stackdriver or on "Cloud Run" page in Google Cloud Console.
I have filed a feature request on your behalf, in order to add memory usage metrics to Cloud Run. You can see and track this feature request in the following link.

There is not currently a metric on memory utilization. However, if your service reaches a memory limit, the following log will appear in Stackdriver Logging with ERROR-level severity:
"Memory limit of 256M exceeded with 325M used. Consider increasing the memory limit, see https://cloud.google.com/run/docs/configuring/memory-limits"
(Replace specific numbers accordingly.)
Based on this log message, you could create a Log-based Metric for memory exceeded.

Related

Problem: empty graphics in GKE cluster node detail (No data for this time interval). How can I fix it?

I have a cluster in Google Cloud. But I need to know information about resources usage.
In interface of each node there are three graphics about CPU, memory and disk usage. But all this graphics in the each node have warning "No data for this time interval" for any time interval.
I upgraded all clusters and nodes to the latest version 1.15.4-gke.22 and changed "Legacy Stackdriver Logging" to "Stackdriver Kubernetes Engine Monitoring".
But it didn't help.
In Stackdriver Workspace there is only "disk_read_bytes" with graphics, any other requests in Metric Explorer have only message "No data for this time interval"
If I do request "kubectl top nodes" in the command line, I see current data for CPU and memory. But I need to see it on Node detail page to understand the peak load. How can I configure it?
In my case, I was missing permissions on the IAM service account associated with the cluster - make sure it has the roles:
Monitoring Metrics Writer (roles/monitoring.metricWriter)
Logs Writer (roles/logging.logWriter)
Stackdriver Resource Metadata Writer (roles/stackdriver.resourceMetadata.writer)
This is documented here
Actually it sound strange because if you can get metrics in command line and the Stackdriver interface doesn't show them maybe it's a bug.
I recommend this: if you be able, create a cluster with the minimum resources, check the same Stackdriver metrics and if there are metrics, it can be a bug and you can report it on in the appropriate GCP channel.
Check the documentation about how to get support within GCP:
Best Practices for Working with Cloud Support
Getting support for Google Cloud

Is there a way to limit the performance data being recorded by AKS clusters?

I am using azure log analytics to store monitoring data from AKS clusters. 72% of the data stored is performance data. Is there a way to limit how often AKS reports performance data?
At this point we do not provide a mechanism to change performance metric collection frequency. It is set to 1 minute and cannot be changed.
We were actually thinking about adding an option to make more frequent collection as was requested by some customers.
Given the number of objects (pods, containers, etc) running in the cluster collecting even a few perf metrics may generate noticeable amount of data... You need that data in order to figure out what is going on in case of a problem.
Curious: you say your perf data is 72% of total - how much is it in terms om Gb/day, do you now? Do you have any active applications running on the cluster generating tracing? What we see is that once you stand up a new cluster, perf data is "the king" of volume, but once you start ading active apps that trace, logs become more and more of a factor in telemetry data volume...

Monitor HPC/PBS/torque usage

I am trying to figure out how best to monitor usage of our HPC resources. Specifically, I am trying to identify cpu usage, disk space consumed, and number of jobs run by group.
The pbs format allows the "-W" group_list flag to identify the group the script belongs to. I want to use this to monitor the cluster usage, but I can't find documentation on how to track this over time.
gmond and gmetric offer some functionality - I can see the parameters I'm interested in, but I can't figure out how to group these by the -W group_list flag or by user or some other metric.
Any advice?

EC2 CloudWatch memory metrics don't match what Top shows

I have a t2.micro EC2 instance, running at about 2% CPU. I know from other posts that the CPU usage shown in TOP is different to CPU reported in CloudWatch, and the CloudWatch value should be trusted.
However, I'm seeing very different values for Memory usage between TOP, CloudWatch, and NewRelic.
There's 1Gb of RAM on the instance, and TOP shows ~300Mb of Apache processes, plus ~100Mb of other processes. The overall memory usage reported by TOP is 800Mb. I guess there's 400Mb of OS/system overhead?
However, CloudWatch reports 700Mb of usage, and NewRelic reports 200Mb of usage (even though NewRelic reports 300Mb of Apache processes elsewhere, so I'm ignoring them).
The CloudWatch memory metric often goes over 80%, and I'd like to know what the actual value is, so I know when to scale if necessary, or how to reduce memory usage.
Here's the recent memory profile, seems something is using more memory over time (big dips are either Apache restart, or perhaps GC?)
Screenshot of memory usage over last 12 days
AWS doesn't supports Memory metrics of any EC2 instance. As Amazon does all his monitoring from outside the EC2 instance(servers), it is unable to capture the memory metrics inside the instance. But, for complete monitoring of an instance, you must need Memory Utilisation statistics for any instance, along with his CPU Utilisation and Network IO operations.
But, we can use custom metrics feature of cloudwatch to send any app-level data to Cloudwatch and monitor it using amazon tools.
You can follow this blog for more details: http://upaang-saxena.strikingly.com/blog/adding-ec2-memory-metrics-to-aws-cloudwatch
You can set a cron for 5 min interval in that instance, and all the data points can be seen in Cloudwatch.
CloudWatch doesn't actually provide metrics regarding memory usage for EC2 instance, you can confirm this here.
As a result, the MemoryUtilization metric that you are referring to is obviously a custom metric that is being pushed by something you have configured or some application running on your instance.
As a result, you need to determine what is actually pushing the data for this metric. The data source is obviously pushing the wrong thing, or is unreliable.
The behavior you are seeing is not a CloudWatch problem.

Memory usage metric identifier Google Compute Engine

I have installed the monitoring agent in my instance group disk. I need to autoscale instances based on memory usage, however when I am going to configure the target metric in GCE web console I miss the memory usage metric identifier. Which is the missing identifier or how can I autoscale my group based on memory usage?

Resources