Service workers slow speed when serving from cache - service-worker

I have some resources that I want to be cached and served at top speed to my app.
When I used appcache I got great serving speeds, but i was stuck with an appcache.
So I've replaced it with a service worker.
Then I tried the simplest strategy, just cache the static assets on install and serve them from the cache whenever fetched.
It worked, when I checked chrome's network panel I was happy to see my service worker in action, BUT - the load times were horrible, each resource load time doubled.
So I started thinking about other strategies, here you can find plenty of them, the cache and network race sounded interesting but i was deterred by the data usage.
So I've tried something different, I tried to aggressively cache the resources in the service worker's memory. Whenever my service worker is up and running it pools the relevant resources from the cache and save the response objects in memory for later use. When it gets a matching fetch it just responds with a clone of the in memory response.
This strategy proved to be fastest, here's a comparison I made:
So my question is pretty vague as my understanding in service workers is still pretty vague...
Does this all makes sense, can I keep static resources cache in memory?
What about the bloated memory usage, are there any negative implications in that? for instance - maybe the browser shuts down more frequently service workers with high memory consumption.

You can't rely on keeping Response objects in memory inside of a service worker and then responding directly with them, for (at least) two reasons:
Service workers have a short lifetime, and everything in the global scope of the service worker is cleared each time the service worker starts up again.
You can only read the body of a Response object once. Responding to a fetch request with a Response object will cause its body to be read. So if you have two requests for the same URL that are both made before the service worker's global scope is cleared, using the Response for the second time will fail. (You can work around this by calling clone() on the Response and using the clone to respond to the fetch event, but then you're introducing additional overhead.)
If you're seeing a significant slowdown in getting your responses from the service worker back to your page, I'd take some time to dig into what your service worker code actually looks likeā€”and also what the code on your client pages look like. If your client pages have their main JavaScript thread locked up (due to heavyweight JavaScript operations that take a while to complete and never yield, for instance) that could introduce a delay in getting the response from the service worker to the client page.
Sharing some more details about how you've implemented your cache-based service worker would be a good first step.

Related

Why service worker is even slower than normal network request on chrome

I used service worker to precache a 15KB size page, but why serving it can even be slower than normal network requests? Sometimes the cache served from service worker can take ~200ms and sometimes can take ~600ms which may be even slower than network requests.
The service worker logic is pretty easy. It does an url match then use fetchEvent.respondWith to return the response.
It seems the problem is not related with Cache API. I tried to cache the page as a in-memory state in the service worker, in the case I guarantee that the state is not destroyed, the same size response served from service worker can still take 150ms~600ms
I tested it in chrome. It seems Safari has a comparatively better performance.

Large percent of requests in CLRThreadPoolQueue

We have an ASP.NET MVC application hosted in an azure app-service. After running the profiler to help diagnose possible slow requests, we were surprised to see this:
An unusually high % of slow requests in the CLRThreadPoolQueue. We've now run multiple profile sessions each come back having between 40-80% in the CLRThreadPoolQueue (something we'd never seen before in previous profiles). CPU each time was below 40%, and after checking our metrics we aren't getting sudden spikes in requests.
The majority of the requests listed as slow are super simple api calls. We've added response caching and made them async. The only thing they do is hit a database looking for a single record result. We've checked the metrics on the database and the query avg run time is around 50ms or less. Looking at application insights for these requests confirms this, and shows that the database query doesn't take place until the very end of the request time line (I assume this is the request sitting in the queue).
Recently we started including SignalR into a portion of our application. Its not fully in use but it is in the code base. We since switched to using Azure SignalR Service and saw no changes. The addition of SignalR is the only "major" change/addition we've made since encountering this issue.
I understand we can scale up and/or increase the minWorkerThreads. However, this feels like I'm just treating the symptom not the cause.
Things we've tried:
Finding the most frequent requests and making them async (they weren't before)
Response caching to frequent requests
Using Azure SignalR service rather than hosting it on the same web
Running memory dumps and contacting azure support (they
found nothing).
Scaling up to an S3
Profiling with and without thread report
-- None of these steps have resolved our issue --
How can we determine what requests and/or code is causing requests to pile up in the CLRThreadPoolQueue?
We encountered a similar problem, I guess internally SignalR must be using up a lot of threads or some other contended resource.
We did three things that helped a lot:
Call ThreadPool.SetMinThreads(400, 1) on app startup to make sure that the threadpool has enough threads to handle all the incoming requests from the start
Create a second App Service with the same code deployed to it. In the javascript, set the SignalR URL to point to that second instance. That way, all the SignalR requests go to one app service, and all the app's HTTP requests go to the other. Obviously this requires a SignalR backplane to be set up, but assuming your app service has more than 1 instance you'll have had to do this anyway
Review the code for any synchronous code paths (eg. making a non-async call to the database or to an API) and convert them to async code paths

Why network-first strategy is slower than no service worker?

While benchmarking performance of service worker with workbox, we found an interesting phenomena.
When service worker is applied, network-first strategy of workbox takes about 30 ms slower than no service worker networking. Then, we tried to skip workbox and implement network-first strategy manually, it is about 20ms slower.
My guess is that, if service worker kicks in, all request has to be handled by javascript code. It is the execution of JavaScript code that make the networking slower.
Then, I checked cache-first strategy, it turns out that fetching content from cache-storage is slower than fetching content from disk-cache(http cache) without service worker.
So, in my understanding, even though service worker offers us more control on caching, it is not guaranteed to be faster in caching, right?
There is a cost associated with starting up a service worker that was not previously running. This could be on the order of tens of milliseconds, depending on the device. Once that service worker starts up, if it doesn't handle your navigation requests (which are almost certainly the first request that a service worker would receive) by going against the cache, then it's likely you'll end up with worse performance than if there were no service worker present at all.
If you are going against the cache, then having a service worker present should offer roughly the same performance vs. looking things up against the HTTP browser cache once it's actually running, but there is the same startup cost that needs to be taking into account first.
The real performance benefits of using a service worker come from handling navigation requests for HTML in a cache-first manner, which is not something you could traditionally do with HTTP caching.
You can read more about these tradeoffs and best practices in
"High-performance service worker loading".

Scaling Dynos with Heroku

I've currently got a ruby on rails app hosted on Heroku that I'm monitoring with New Relic. My app is somewhat laggy when using it, and my New Relic monitor shows me the following:
Given that majority of the time is spent in Request Queuing, does this mean my app would scale better if I used an extra worker dynos? Or is this something that I can fix by optimizing my code? Sorry if this is a silly question, but I'm a complete newbie, and appreciate all the help. Thanks!
== EDIT ==
Just wanted to make sure I was crystal clear on this before having to shell out additional moolah. So New Relic also gave me the following statistics on the browser side as you can see here:
This graph shows that majority of the time spent by the user is in waiting for the web application. Can I attribute this to the fact that my app is spending majority of its time in a requesting queue? In other words that the 1.3 second response time that the end user is experiencing is currently something that code optimization alone will do little to cut down? (Basically I'm asking if I have to spend money or not) Thanks!
Request Queueing basically means 'waiting for a web instance to be available to process a request'.
So the easiest and fastest way to gain some speed in response time would be to increase the number of web instances to allow your app to process more requests faster.
It might be posible to optimize your code to speed up each individual request to the point where your application can process more requests per minute -- which would pull requests off the queue faster and reduce the overall request queueing problem.
In time, it would still be a good idea to do everything you can to optimize the code anyway. But to begin with, add more workers and your request queueing issue will more than likely be reduced or disappear.
edit
with your additional information, in general I believe the story is still the same -- though nice work in getting to a deep understanding prior to spending the money.
When you have request queuing it's because requests are waiting for web instances to become available to service their request. Adding more web instances directly impacts this by making more instances available.
It's possible that you could optimize the app so well that you significantly reduce the time to process each request. If this happened, then it would reduce request queueing as well by making requests wait a shorter period of time to be serviced.
I'd recommend giving users more web instances for now to immediately address the queueing problem, then working on optimizing the code as much as you can (assuming it's your biggest priority). And regardless of how fast you get your app to respond, if your users grow you'll need to implement more web instances to keep up -- which by the way is a good problem since your users are growing too.
Best of luck!
I just want to throw this in, even though this particular question seems answered. I found this blog post from New Relic and the guys over at Engine Yard: Blog Post.
The tl;dr here is that Request Queuing in New Relic is not necessarily requests actually lining up in the queue and not being able to get processed. Due to how New Relic calculates this metric, it essentially reads a time stamp set in a header by nginx and subtracts it from Time.now when the New Relic method gets a hold of it. However, New Relic gets run after any of your code's before_filter hooks get called. So, if you have a bunch of computationally intensive or database intensive code being run in these before_filters, it's possible that what you're seeing is actually request latency, not queuing.
You can actually examine the queue to see what's in there. If you're using Passenger, this is really easy -- just type passenger status on the command line. This will show you a ton of information about each of your Passenger workers, including how many requests are sitting in the queue. If you run with preceded with watch, the command will execute every 2 seconds so you can see how the queue changes over time (so just execute watch passenger status).
For Unicorn servers, it's a little bit more difficult, but there's a ruby script you can run, available here. This script actually examines how many requests are sitting in the unicorn socket, waiting to be picked up by workers. Because it's examining the socket itself, you shouldn't run this command any more frequently than ~3 seconds or so. The example on GitHub uses 10.
If you see a high number of queued requests, then adding horizontal scaling (via more web workers on Heroku) is probably an appropriate measure. If, however, the queue is low, yet New Relic reports high request queuing, what you're actually seeing is request latency, and you should examine your before_filters, and either scope them to only those methods that absolutely need them, or work on optimizing the code those filters are executing.
I hope this helps anyone coming to this thread in the future!

Deferring blocking Rails requests

I found a question that explains how Play Framework's await() mechanism works in 1.2. Essentially if you need to do something that will block for a measurable amount of time (e.g. make a slow external http request), you can suspend your request and free up that worker to work on a different request while it blocks. I am guessing once your blocking operation is finished, your request gets rescheduled for continued processing. This is different than scheduling the work on a background processor and then having the browser poll for completion, I want to block the browser but not the worker process.
Regardless of whether or not my assumptions about Play are true to the letter, is there a technique for doing this in a Rails application? I guess one could consider this a form of long polling, but I didn't find much advice on that subject other than "use node".
I had a similar question about long requests that blocks workers to take other requests. It's a problem with all the web applications. Even Node.js may not be able to solve the problem of consuming too much time on a worker, or could simply run out of memory.
A web application I worked on has a web interface that sends request to Rails REST API, then the Rails controller has to request a Node REST API that runs heavy time consuming task to get some data back. A request from Rails to Node.js could take 2-3 minutes.
We are still trying to find different approaches, but maybe the following could work for you or you can adapt some of the ideas, I would love to get some feedbacks too:
Frontend make a request to Rails API with a generated identifier [A] within the same session. (this identifier helps to identify previous request from the same user session).
Rails API proxies the frontend request and the identifier [A] to the Node.js service
Node.js service add this job to a queue system(e.g. RabbitMQ, or Redis), the message contains the identifier [A]. (Here you should think about based on your own scenario, also assuming a system will consume the queue job and save the results)
If the same request send again, depending on the requirement, you can either kill the current job with the same identifier[A] and schedule/queue the lastest request, or ignore the latest request waiting for the first one to complete, or other decision fits your business requirement.
The Front-end can send interval REST request to check if the data processing with identifier [A] has completed or not, then these requests are lightweight and fast.
Once Node.js completes the job, you can either use the message subscription system or waiting for the next coming check status Request and return the result to the frontend.
You can also use a load balancer, e.g. Amazon load balancer, Haproxy. 37signals has a blog post and video about using Haproxy to off loading some long running requests that does not block shorter ones.
Github uses similar strategy to handle long requests for generating commits/contribution visualisation. They also set a limit of pulling time. If the time is too long, Github display a message saying it's too long and it has been cancelled.
YouTube has a nice message for longer queued tasks: "This is taking longer than expected. Your video has been queued and will be processed as soon as possible."
I think this is just one solution. You can also take a look EventMachine gem, that helps to improve the performance, handler parallel or async request.
Since this kind of problem may involve one or more services. Think about possibility of improving performance between those services(e.g. database, network, message protocol etc..), if caching may help, try out caching frequent requests, or pre-calculate results.

Resources