Can Cloud Run respond in less than 200ms? - google-cloud-run

Cloud Run seems to respond in over 400ms. even on repeated calls (I assume 'warmed up' containers).
I deployed a simple node.js service (see code below) that responds with a static JSON.
The service endpoint does not require user authentication. I am calling us-central1 (from Ventura County, CA. USA)
I call the service and get latency of above 400ms. Sometimes I see latency in the 5000-6000 ms. range!
I understand Cloud Run is in beta, but I'd appreciate feedback from Google on what I should expect for performance and approx. when. Also, if I am configuring something wrong, let me know. If you have better results than me - please let me know what you are doing differently.
Service I deployed: https://github.com/dorongrinstein/cloudrun-dummy-service
Test code: https://github.com/dorongrinstein/cloudrun-test
I expect the output to be in the double digit ms. range.
I get an output that is in the mid 3-digit range.
FYI - I am in Ventura County, CA. My internet connection is fine. I used gcpping.com and my median latency to us-central1 is 70ms.

Yes Cloud Run can respond in less than 200ms.
You seemed to be hitting a bug in our networking infrastructure, which should now be fixed.

According to this Cloud Run engineer, you should expect similar cold start times in Cloud Run as in Cloud Functions.
One thing you could try is raising your containers' memory limit. At least in Cloud Functions startup latency has been found to depend highly on memory allocation.

Related

AWS CloudWatch CPUUtilization vs NETSNMP discrepancies

For a few weeks now, we've noticed our ScienceLogic monitoring platform that uses SNMP is unable to detect CPU spikes that CloudWatch alarms are picking up. Both platforms are configured to poll every 5min, but SL1 is not seeing any CPU spikes more than ~20%. Our CloudWatch CPU alarm is set to fire off at 90%, which has gone off twice in the past 12 hours for this EC2 instance. I'm struggling to understand why?
I know that CloudWatch pulls the CPUUtilization metric direct from the hypervisor, but I can't imagine it would differ so much from the CPU percentage captured by SNMP. Anyone have any thoughts? I wonder if I'm just seeing a scaling issue in SNMP?
SL1:
CloudWatch:
I tried contacting Sciencelogic, and they asked me for the "formula" that AWS uses to capture this metric, which I'm not really sure I understand the question lol.
Monitors running inside/outside of a compute unit (here a virtual machine) can observe different results, which I think is rather normal.
The SNMP agent is inside the VM, so its code execution is affected heavily by high CPU events (threads blocked). You can recall similar high CPU events on your personal computer, where if an application consumed all CPU resources, other applications naturally became slow and not responsive.
While CloudWatch sensors are outside, which are almost never affected by the VM internal events.

Influxdb high CPU usage jumping to 80 %?

I am relatively new to time series db world . I am running a Influxdb 1.8.x as a docker container, and I have configured the influxdb.conf file as a default config. Currently I am facing a issue of high CPU usage by influxdb, the CPU jumps to 80 to 90% and creating a problem for other process running on same machine.
I have tried a solution given here ->> Influx high CPU issue but unfortunately It did not work? I am unable to understand the reason behind the issue and also struggling to get support in terms of documentation or community help.
What I have tried so far:
updated the monitor section of influxdb.conf file like this ->> monitor DB
Checked the series cardinality SHOW SERIES CARDINALITY and it looks well within limits--9400(I am also not sure about the ideal number for high cardinality red flag)
I am looking for an approach, which will help me understand this problem the root cause?
Please let me know if you need any further information on same.
After reading about Influxdb debug and CPU profiling HTTP API influxdb I was able to pin-down the issue, the problem was in the way I was making the query, my query involved more complex functions and also GROUP BAY tag.I also tried query analysis using EXPLAIN ANALYZE (query) command to check how much time a query is taking to execute. I resolved that and noticed a huge Improvement in CPU load.
Basically I can suggest the following:
Run the CPU profile analysis using influxdb HTTP API with the command curl -o <file name> http://localhost:8086/debug/pprof/all?cpu=true e and collect result.
Visualize the result using tool like Pprof tool and find the problem
Also one can run basic commands like SHOW SERIES CARDINALITY and EXPLAIN ANALYZE <query> to understand the execution of the query
Before designing any schema and Influx client check the hardware recommendation ->> Hardware sizing guidelines

Will Google Cloud Run support GPU/TPU some day?

So far Google Cloud Run support CPU. Is there any plan to support GPU? It would be super cool if GPU available, then I can demo the DL project without really running a super expensive GPU instance.
So far Google Cloud Run support CPU. Is there any plan to support GPU?
It would be super cool if GPU available, then I can demo the DL
project without really running a super expensive GPU instance.
I seriously doubt it. GPU/TPUs are specialized hardware. Cloud Run is a managed container service that:
Enables you to run stateless containers that are invokable via HTTP requests. This means that CPU intensive applications are not supported. Inbetween HTTP request/response the CPU is idled to near zero. Your expensive GPU/TPUs would sit idle.
Autoscales based upon the number of requests per second. Launching 10,000 instances in seconds is easy to achieve. Imagine the billing support nightmare for Google if customers could launch that many GPU/TPUs and the size of the bills.
Is billed in 100 ms time intervals. Most requests fit into a few hundred milliseconds of execution. This is not a good execution or business model for CPU/GPU/TPU integration.
Provides a billing model which significantly reduces the cost of web services to near zero when not in use. You just pay for the costs to store your container images. When an HTTP request is received at the service URL, the container image is loaded into an execution unit and processing requests resume. Once requests stop, billing and resource usage also stop.
GPU/TPU types of data processing are best delivered by backend instances that protect and manage the processing power and costs that these processor devices provide.
You can use GPU with Cloud Run for Anthos
https://cloud.google.com/anthos/run/docs/configuring/compute-power-gpu

Docker performances for getting information: polling vs events

I have Docker swarm full of containers. I need to monitor when something is up or down. I can do this in 2 ways:
attaching to the swarm and listen to events.
polling service list
The issue with events is that there might be huge traffic, plus if some event is not processed, we will simply loose information on whats going on.
For me it is not super important to get immediate results, but to have correct information on whats going on.
Any pros/cons from real-life project?
Listening to events- its immediate, but risky as if your event listening program crashes because of any reason, you will miss an important information and lead to wrong result. This Registrator program is based on events.
Polling- eventual consistent result. but if it solves your problem it is less painful way to grabbing the data. No matter if your program crashes or restart. We are using this approach for service discovery in our project and so far it served the purpose.
From my experience, checking if something is up or down should be done using a health check, and should be agnostic to the underlying architecture running your service (otherwise you will have to write a new health check every time you change platform). Of course - you might have services with specific needs that cannot be monitored that way - if this is the case you're welcome to comment on that.
If you are using Swarm for stateless services only, I suggest creating a health check route that can verify the service is healthy and even disconnect faulty containers from the service.
If you are running statefull stuff this might be trickier, but there are solutions for that too, usually using some kind of monitoring agent over your statefull container (We are using cloudwatch since we run on AWS, but there are many alternatives)
Hope this helps.

Why is membase server so slow in response time?

I have a problem that membase is being very slow on my environment.
I am running several production servers (Passenger) on rails 2.3.10 ruby 1.8.7.
Those servers communicate with 2 membase machines in a cluster.
the membase machines each have 64G of memory and a100g EBS attached to them, 1G swap.
My problem is that membase is being VERY slow in response time and is actually the slowest part right now in all of the application lifecycle.
my question is: Why?
the rails gem I am using is memcache-northscale.
the membase server is 1.7.1 (latest).
The server is doing between 2K-7K ops per second (for the cluster)
The response time from membase (based on NewRelic) is 250ms in average which is HUGE and unreasonable.
Does anybody know why is this happening?
What can I do inorder to improve this time?
It's hard to immediately say with the data at hand, but I think I have a few things you may wish to dig into to narrow down where the issue may be.
First of all, do your stats with membase show a significant number of background fetches? This is in the Web UI statistics for "disk reads per second". If so, that's the likely culprit for the higher latencies.
You can read more about the statistics and sizing in the manual, particularly the sections on statistics and cluster design considerations.
Second, you're reporting 250ms on average. Is this a sliding average, or overall? Do you have something like max 90th or max 99th latencies? Some outlying disk fetches can give you a large average, when most requests (for example, those from RAM that don't need disk fetches) are actually quite speedy.
Are your systems spread throughout availability zones? What kind of instances are you using? Are the clients and servers in the same Amazon AWS region? I suspect the answer may be "yes" to the first, which means about 1.5ms overhead when using xlarge instances from recent measurements. This can matter if you're doing a lot of fetches synchronously and in serial in a given method.
I expect it's all in one region, but it's worth double checking since those latencies sound like WAN latencies.
Finally, there is an updated Ruby gem, backwards compatible with Fauna. Couchbase, Inc. has been working to add back to Fauna upstream. If possible, you may want to try the gem referenced here:
http://www.couchbase.org/code/couchbase/ruby/2.0.0
You will also want to look at running Moxi on the client-side. By accessing Membase, you need to go through a proxy (called Moxi). By default, it's installed on the server which means you might make a request to one of the servers that doesn't actually have the key. Moxi will go get it...but then you're doubling the network traffic.
Installing Moxi on the client-side will eliminate this extra network traffic: http://www.couchbase.org/wiki/display/membase/Moxi
Perry

Resources