Why service worker is even slower than normal network request on chrome - service-worker

I used service worker to precache a 15KB size page, but why serving it can even be slower than normal network requests? Sometimes the cache served from service worker can take ~200ms and sometimes can take ~600ms which may be even slower than network requests.
The service worker logic is pretty easy. It does an url match then use fetchEvent.respondWith to return the response.
It seems the problem is not related with Cache API. I tried to cache the page as a in-memory state in the service worker, in the case I guarantee that the state is not destroyed, the same size response served from service worker can still take 150ms~600ms
I tested it in chrome. It seems Safari has a comparatively better performance.

Related

Delay on requests from Google API Gateway to Cloud Run

I'm currently seeing delays of 2-3 seconds on my first requests coming into our APIs.
We've set the min instances to 1 to prevent cold start but this a delay is still occurring.
If I check the metrics I don't see any startup latencies in the specified timeframe so I have no insights in what is causing these delays. Tracing gives the following:
The only thing I can change, is switching to "CPU is always allocated" but this isn't helping in any way.
Can somebody give more information on this?
As mentioned in the Answer :
As per doc :
Idle instances As traffic fluctuates, Cloud Run attempts to reduce the
chance of cold starts by keeping some idle instances around to handle
spikes in traffic. For example, when a container instance has finished
handling requests, it might remain idle for a period of time in case
another request needs to be handled.
Cloud Run But, Cloud Run will terminate unused containers after some
time if no requests need to be handled. This means a cold start can
still occur. Container instances are scaled as needed, and it will
initialize the execution environment completely. While you can keep
idle instances permanently available using the min-instance setting,
this incurs cost even when the service is not actively serving
requests.
So, let’s say you want to minimize both cost and response time latency
during a possible cold start. You don’t want to set a minimum number
of idle instances, but you also know any additional computation needed
upon container startup before it can start listening to requests means
longer load times and latency.
Cloud Run container startup There are a few tricks you can do to
optimize your service for container startup times. The goal here is to
minimize the latency that delays a container instance from serving
requests. But first, let’s review the Cloud Run container startup
routine.
When Starting the service
Starting the container
Running the entrypoint command to start your server
Checking for the open service port
You want to tune your service to minimize the time needed for step 1a.
Let’s walk through 3 ways to optimize your service for Cloud Run
response times.
1. Create a leaner service
2. Use a leaner base image
3. Use global variables
As mentioned in the Documentation :
Background activity is anything that happens after your HTTP response
has been delivered. To determine whether there is background activity
in your service that is not readily apparent, check your logs for
anything that is logged after the entry for the HTTP request.
Avoid background activities if CPU is allocated only during request processing
If you need to set your service to allocate CPU only during request
processing, when the Cloud Run service finishes handling a
request, the container instance's access to CPU will be disabled or
severely limited. You should not start background threads or routines
that run outside the scope of the request handlers if you use this
type of CPU allocation. Review your code to make sure all asynchronous
operations finish before you deliver your response.
Running background threads with this kind of CPU allocation can create
unpredictable behavior because any subsequent request to the same
container instance resumes any suspended background activity.
As mentioned in the Thread reason could be that all the operations you performed have happened after the response is sent.
According to the docs the CPU is allocated only during the request processing by default so the only thing you have to change is to enable CPU allocation for background activities.
You can refer to the documentation for more information related to the steps to optimize Cloud Run response times.
You can also have a look on the blog related to use of Google API Gateway with Cloud Run.

Large percent of requests in CLRThreadPoolQueue

We have an ASP.NET MVC application hosted in an azure app-service. After running the profiler to help diagnose possible slow requests, we were surprised to see this:
An unusually high % of slow requests in the CLRThreadPoolQueue. We've now run multiple profile sessions each come back having between 40-80% in the CLRThreadPoolQueue (something we'd never seen before in previous profiles). CPU each time was below 40%, and after checking our metrics we aren't getting sudden spikes in requests.
The majority of the requests listed as slow are super simple api calls. We've added response caching and made them async. The only thing they do is hit a database looking for a single record result. We've checked the metrics on the database and the query avg run time is around 50ms or less. Looking at application insights for these requests confirms this, and shows that the database query doesn't take place until the very end of the request time line (I assume this is the request sitting in the queue).
Recently we started including SignalR into a portion of our application. Its not fully in use but it is in the code base. We since switched to using Azure SignalR Service and saw no changes. The addition of SignalR is the only "major" change/addition we've made since encountering this issue.
I understand we can scale up and/or increase the minWorkerThreads. However, this feels like I'm just treating the symptom not the cause.
Things we've tried:
Finding the most frequent requests and making them async (they weren't before)
Response caching to frequent requests
Using Azure SignalR service rather than hosting it on the same web
Running memory dumps and contacting azure support (they
found nothing).
Scaling up to an S3
Profiling with and without thread report
-- None of these steps have resolved our issue --
How can we determine what requests and/or code is causing requests to pile up in the CLRThreadPoolQueue?
We encountered a similar problem, I guess internally SignalR must be using up a lot of threads or some other contended resource.
We did three things that helped a lot:
Call ThreadPool.SetMinThreads(400, 1) on app startup to make sure that the threadpool has enough threads to handle all the incoming requests from the start
Create a second App Service with the same code deployed to it. In the javascript, set the SignalR URL to point to that second instance. That way, all the SignalR requests go to one app service, and all the app's HTTP requests go to the other. Obviously this requires a SignalR backplane to be set up, but assuming your app service has more than 1 instance you'll have had to do this anyway
Review the code for any synchronous code paths (eg. making a non-async call to the database or to an API) and convert them to async code paths

How to improve prerender speed for Twitterbot request?

For the project I am working, I've set up a prerender service on the same server as the project and use Nginx to pass social media requests to the prerender service.
I have observed that if an authorized user shares a page to the Twitter, it usually works, i.e. the meta tag image and text are rendered as a twitter card. However, if the user has shared other pages of the same project, the images are usually not rendered when the user visits his twitter posts.
From the Nginx access log, it seems Twitterbots made requests at the same time and the prerender service was too slow to render the pages. 499 status were shown in the Twitterbot requests and 504 were shown in the prerender log.
The server is hosted on the UpCloud using 1 CPU and 2 GB memory data plan. The prerender service is run in a docker container with 300MB, it will cache rendered pages for 60 seconds. Due to the memory quota, I hesitate to increase the cache duration.
I have been studying the server logs and possible solutions, but haven't been able to come up with other solution than refactoring the UI. Had anyone else struggled with this issue and how do you overcome it?
That seems like a pretty underpowered server for running a Prerender server. You might want to at least give it more RAM and possibly another CPU to get better performance. 504's shouldn't be happening often at all.
Depending on how long your pages take to prerender, caching for much longer than 60 seconds is highly recommended. You probably won't see many cache hits in 60 seconds (from users sharing URLs on twitter) for a single URL unless you have a very high traffic site.

Why network-first strategy is slower than no service worker?

While benchmarking performance of service worker with workbox, we found an interesting phenomena.
When service worker is applied, network-first strategy of workbox takes about 30 ms slower than no service worker networking. Then, we tried to skip workbox and implement network-first strategy manually, it is about 20ms slower.
My guess is that, if service worker kicks in, all request has to be handled by javascript code. It is the execution of JavaScript code that make the networking slower.
Then, I checked cache-first strategy, it turns out that fetching content from cache-storage is slower than fetching content from disk-cache(http cache) without service worker.
So, in my understanding, even though service worker offers us more control on caching, it is not guaranteed to be faster in caching, right?
There is a cost associated with starting up a service worker that was not previously running. This could be on the order of tens of milliseconds, depending on the device. Once that service worker starts up, if it doesn't handle your navigation requests (which are almost certainly the first request that a service worker would receive) by going against the cache, then it's likely you'll end up with worse performance than if there were no service worker present at all.
If you are going against the cache, then having a service worker present should offer roughly the same performance vs. looking things up against the HTTP browser cache once it's actually running, but there is the same startup cost that needs to be taking into account first.
The real performance benefits of using a service worker come from handling navigation requests for HTML in a cache-first manner, which is not something you could traditionally do with HTTP caching.
You can read more about these tradeoffs and best practices in
"High-performance service worker loading".

Service workers slow speed when serving from cache

I have some resources that I want to be cached and served at top speed to my app.
When I used appcache I got great serving speeds, but i was stuck with an appcache.
So I've replaced it with a service worker.
Then I tried the simplest strategy, just cache the static assets on install and serve them from the cache whenever fetched.
It worked, when I checked chrome's network panel I was happy to see my service worker in action, BUT - the load times were horrible, each resource load time doubled.
So I started thinking about other strategies, here you can find plenty of them, the cache and network race sounded interesting but i was deterred by the data usage.
So I've tried something different, I tried to aggressively cache the resources in the service worker's memory. Whenever my service worker is up and running it pools the relevant resources from the cache and save the response objects in memory for later use. When it gets a matching fetch it just responds with a clone of the in memory response.
This strategy proved to be fastest, here's a comparison I made:
So my question is pretty vague as my understanding in service workers is still pretty vague...
Does this all makes sense, can I keep static resources cache in memory?
What about the bloated memory usage, are there any negative implications in that? for instance - maybe the browser shuts down more frequently service workers with high memory consumption.
You can't rely on keeping Response objects in memory inside of a service worker and then responding directly with them, for (at least) two reasons:
Service workers have a short lifetime, and everything in the global scope of the service worker is cleared each time the service worker starts up again.
You can only read the body of a Response object once. Responding to a fetch request with a Response object will cause its body to be read. So if you have two requests for the same URL that are both made before the service worker's global scope is cleared, using the Response for the second time will fail. (You can work around this by calling clone() on the Response and using the clone to respond to the fetch event, but then you're introducing additional overhead.)
If you're seeing a significant slowdown in getting your responses from the service worker back to your page, I'd take some time to dig into what your service worker code actually looks like—and also what the code on your client pages look like. If your client pages have their main JavaScript thread locked up (due to heavyweight JavaScript operations that take a while to complete and never yield, for instance) that could introduce a delay in getting the response from the service worker to the client page.
Sharing some more details about how you've implemented your cache-based service worker would be a good first step.

Resources