We noticed an interesting issue regarding our serverside rendering of our NextJs application.
The issue occurred when we decided to add fetching of our translation key values (which are fetched via API) on our serverside calls.
We made this decision to reduce CLS and give the user a better overal experience on first load. We didn't think much of it, since it's just another api call being handled returning the json before rendering the page.
Below you see a graph of our cpu usage on AWS, between 10:00 (10AM) and 14:00 (2PM) the fetching of the translation keys was live on production. We had to manually restart every 2h in order for the servers to survive (That is the peak you are seeing). After 16:00 (4PM) We removed this api call from that specific serverside call and you see that the server is stable. From 20:00 (8PM) we enabled an automatic restart to be sure that the servers would survive the night, but as you can see this was not necessary.
The API call in question, just returns a JSON object. It can contain up to 700lines of values which should be fine as our Product listing page can have a larger responses (up to 10k lines). Everything has caching on enabled, so next/static and the api responses. We were also thinking that it might have something to do with outgoing connections not closing in time. But I am making this post because no one really knows why this is an issue.
We are running the following setup:
Dockerized Application
Running on AWS Beanstalk
Cloudfront + Akamai
NextJs v12.2
Node v14.16
If anyone has the smallest idea in which direction to look, please let me know. We appreciate it.
Related
We have an ASP.NET MVC application hosted in an azure app-service. After running the profiler to help diagnose possible slow requests, we were surprised to see this:
An unusually high % of slow requests in the CLRThreadPoolQueue. We've now run multiple profile sessions each come back having between 40-80% in the CLRThreadPoolQueue (something we'd never seen before in previous profiles). CPU each time was below 40%, and after checking our metrics we aren't getting sudden spikes in requests.
The majority of the requests listed as slow are super simple api calls. We've added response caching and made them async. The only thing they do is hit a database looking for a single record result. We've checked the metrics on the database and the query avg run time is around 50ms or less. Looking at application insights for these requests confirms this, and shows that the database query doesn't take place until the very end of the request time line (I assume this is the request sitting in the queue).
Recently we started including SignalR into a portion of our application. Its not fully in use but it is in the code base. We since switched to using Azure SignalR Service and saw no changes. The addition of SignalR is the only "major" change/addition we've made since encountering this issue.
I understand we can scale up and/or increase the minWorkerThreads. However, this feels like I'm just treating the symptom not the cause.
Things we've tried:
Finding the most frequent requests and making them async (they weren't before)
Response caching to frequent requests
Using Azure SignalR service rather than hosting it on the same web
Running memory dumps and contacting azure support (they
found nothing).
Scaling up to an S3
Profiling with and without thread report
-- None of these steps have resolved our issue --
How can we determine what requests and/or code is causing requests to pile up in the CLRThreadPoolQueue?
We encountered a similar problem, I guess internally SignalR must be using up a lot of threads or some other contended resource.
We did three things that helped a lot:
Call ThreadPool.SetMinThreads(400, 1) on app startup to make sure that the threadpool has enough threads to handle all the incoming requests from the start
Create a second App Service with the same code deployed to it. In the javascript, set the SignalR URL to point to that second instance. That way, all the SignalR requests go to one app service, and all the app's HTTP requests go to the other. Obviously this requires a SignalR backplane to be set up, but assuming your app service has more than 1 instance you'll have had to do this anyway
Review the code for any synchronous code paths (eg. making a non-async call to the database or to an API) and convert them to async code paths
I am working on a Rails app that pulls up to 100 Instagram posts at once with the media/search endpoint and displays them on a page. The AJAX call that loads the photos takes a very long time on localhost, but once deployed to Heroku, takes much less time (10s versus 1s). Can anyone explain why Heroku is faster? I might not need to worry as much about caching my results.
Thanks!!
One major reason will be Heroku's phsyical hosting location -- I believe Instagram hosts with Amazon's AWS service (this may have changed after the Facebook acquisition):
Here at Instagram, we run our infrastructure on Amazon Web Services,
running instances on their Elastic Compute Cloud (EC2)
Heroku basically hosts through Amazon's cloud too, meaning they are ostensibly running on the same network. This will obviously cut latency down to a minimum, as well as the fact that Heroku's services are optimized for efficiency -- high speed Internet etc
Cache
Your question is really "should I be creating a cache for Instagram data in my system?"
The answer is "yes" - it's my experience you should never rely on a third party entirely, as apart from obvious latency issues, you'll also have to contend with a multitude of other problems (API outages, client bandwidth etc)
I'd personally look at storing as much data as possible in my own system. This doesn't mean to keep all in your main DB - you could utilize a Redis instance to store the third-party data you need
We had an interesting event the other day in our system where a burst of month old HTTP requests arrived to our ELB and from there to one of our coupled servers. We could tell that the requests were old by a timestamp we are sending from our client app (and from the fact it had no relevant data :) ).
Our system is hosted on AWS, using a group of EC2 instances behind an ELB which communicates in HTTP with the EC2's. Also, our client app runs on iOS.
A thing to notice - the old requests were dated to a day in which we had a server crash which lead to a great load on our remaining servers (resulting in a lot of hanged HTTP requests, i.e they were not processed)
Also, despite the group of old messages originally spanned across several minutes (which we know from the timestamps), they all came in a single bulk the other day (this is from the ELB metrics).
We are trying to figure out how or where could these requests stack up and maybe understand why it happened when it did.
Any insights, similar experiences or suggestions will be appreciated as we've failed to find similar events on the web, thanks!
I am currently using an AWS micro instance as a web server for a website that allows users to upload photos. Two questions:
1) When looking at my CloudWatch metrics, I have recently noticed CPU spikes, the website receives very little traffic at the moment, but becomes utterly unusable during these spikes. These spikes can last several hours and resetting the server does not eliminate the spikes.
2) Although seemingly unrelated, whenever I post a link of my website on Twitter, the server crashes (i.e.,Error Establishing a Database Connection). Once restarting Apache and MySQL, the website returns to normal functionality.
My only guess would be that the issue is somehow the result of deficiencies with the micro instance. Unfortunately, when I upgraded to the small instance, the site was actually slower due to fact that the micro instances can have two EC2 compute units.
Any suggestions?
If you want to stay in the free tier of AWS (micro instance), you should off load as much as possible away from your EC2 instance.
I would suggest you to upload the images directly to S3 instead of going through your web server (see some example for it here: http://aws.amazon.com/articles/1434).
S3 can also be used to serve most of your web pages (images, js, css...), instead of your weak web server. You can also add these files in S3 as origin to Amazon CloudFront (CDN) distribution to improve your application performance.
Another service that can help you in off loading the work is SQS (Simple Queue Service). Instead of working with online requests from users, you can send some requests (upload done, for example) as a message to SQS and have your reader process these messages on its own pace. This is good way to handel momentary load cause by several users working simultaneously with your service.
Another service is DynamoDB (managed NoSQL DB service). You can put on dynamoDB most of your current MySQL data and queries. Amazon DynamoDB also has a free tier that you can enjoy.
With the combination of the above, you can have your micro instance handling the few remaining dynamic pages until you need to scale your service with your growing success.
Wait… I'm sorry. Did you say you were running both Apache and MySQL Server on a micro instance?
First of all, that's never a good idea. Secondly, as documented, micros have low I/O and can only burst to 2 ECUs.
If you want to continue using a resource-constrained micro instance, you need to (a) put MySQL somewhere else, and (b) use something like Nginx instead of Apache as it requires far fewer resources to run. Otherwise, you should seriously consider sizing up to something larger.
I had the same issue: As far as I understand the problem is that AWS will slow you down when you reach a predefined usage. This means that they allow for a small burst but after that things will become horribly slow.
You can test that by logging in and doing something. If you use the CPU for a couple of seconds then the whole box will become extremely slow. After that you'll have to wait without doing anything at all to get things back to "normal".
That was the main reason I went for VPS instead of AWS.
Hi
I've been working on a medium sized MVC project. It works fine on the localhost at a good speed rate. In each page, there's a lot of server-side data retrieved, I use a lot of jquery to minimize the traffic to the server, but even then, the webpage loads very slowly. There are many events on which I retrieve json results, to get a specific number from the database and make calculations, this data takes a long time to be retrieved on the webpage, although on the localhost it is immediately shown. Also, when I submit pages, it takes awfully a lot of time to submit. I've published my project to GoDaddy's server and also my database is there. What could be the problem that is making the project that slow? How can I minimize it? And why is it only when the website is online and not on the localhost too?
As such, issue can be anywhere and only certain way to know is instrumenting the code. I will suggest that you add simple logging traces with date-time stamp in your server code (note that logging should be configurable, any logging framework (including System.Diagnostic.Trace) should support it) and check where the time is spent. For example, database trips can be expensive etc. If you don't find the culprit on server side code i.e. sever is serving the request in reasonable time then you have to look at the performance over network. Tools such as Fiddler (or Firefox) should help you here - sometimes issuing too many requests from browser is also problematic because browser may make only n concurrent requests or even server may have been configured to accept only n requests from particular client - this could result in serialization of request increasing total response time. These scenarios are difficult to catch on localhost because network latency is almost zero there. You may also use tool such as YSlow for related performance improvement suggestions. But please do your investigation first, find the bottlenecks and then ask for solutions to specific problems.
Run it in chrome. Turn on the developer tools. Expand the Console. watch for errors. Also from there you can monitor those network calls to see which is slow.
if MVC uses entity framework (based on LINQ), it will sure be slow
because LINQ is slow compared to the old ADO.NET