I want to be able to say that if a request from the same user (for an API) starts to happen quickly enough that their requests per minute reaches a certain level, I want to start denying the requests until it slows down. (Just like the guys at Zendesk did).
The question is two fold, what's an efficient way of calculating the request rate (minimal DB read/writes) and where in the MVC hierarchy (Action filter, Controller method override?) would this code best reside?
Two words, Reactive Framework.
It has all sorts of candy and sugary syntax to get throttling and managing of events to become less and less of a head ache, while I would bet will trickle back through and kill some complexity down stream.
Related
I am building a web crawler. It grabs a main page, which has about 50 links. Then I need to do 2 requests for each link. I dont want to make 101 requests within a few seconds. What is the best way to randomly space these out? I want to somewhat mimic human activity.
urls.each do |url|
# Do Something
sleep([1,2,3,4,5].sample)
end
This will work, but is this the best was to do it? Let's say this takes about 1 hour to run, will the server become slow to respond when when it's sleeping?
Mimicking human behavior is very hard. It's impossible if you're requesting literally every single link on a website anyway. I would probably just request one page per second. The server will not be impacted at all by a thread that's currently sleeping.
I'm having some difficulties thinking about a good way to assess the scalability of my school project. The assignment is simply put a twitter-service. Very bare bones. The main goal is to make it as scalable as possible. However, it's equally important to know how and why the assignment would scale. So the point is not really to create a very scalable project, but mainly to create a few different architectures for the server and see which one outperforms which one and why.
So so far I have the basic server architecture like this:
* 1 main process which holds the data
* 1 unique process server-side per user
A user sends messages to his own server-side process, which then simply delegates those messages to the central process.
To test this I would spawn 1 or more processes which would act as clients. I would spam the server with tweets and then assess how well it can withstand a certain load.
Now, to assess the scalability I came up with following metrics:
First off, a process is being a bottleneck if it's message queue is piling up. So, I would store the queuelength every time a tweet is processed (or every N tweets) and in the end calculate the average. If I run it on more cores and the average queue length goes down, it scales better.
Second, if I create N users to spam the server (on N processes or less) I simply time how long it takes for the server to process all these tweets.
Is there a better way to do this? I can't stop thinking that there should be better metrics..
Edt:
So far I have tried fprof and eprof. These tools however, show me how much time is spent in certain methods. While this is a good indicator where I can improve my code, it's not really a good indicator for scalability. It would be better if it would for example show the time spent per process.
Look at percept and percept2 if you are really interested in this.
I was thinking about setting up a project with Web API. Basically build the API first and program the web site using this API.
Although it's sound promising I was wondering:
If I separate the logic in a nice way, I might end up retrieving data on a web-page through multiple API call's, which in turn are multiple connections with the server with all the overhead etc..
For example, if I use, let's say 8 different API call's on one page, I can't imagine it won't have an impact on the web-page's performance.
So, have I misunderstood something? Or is this kind of overhead negligible - or does the need for multiple call's indicates that the design is wrong?
Thanks in advance.
Well, we did it. Web API server providing the REST access to all the data. Independent UI Clients consuming it as the only access-point to underlying peristence.
The first request takes some time. It is significantly longer. It must init all the UI Client stuff, and get the least needed data from a server. (Menu, user, access rights, metadata...list-view data)
The point, the real advantage, is hidden in the second, the third... request. Lot of stuff is already there on a UI Client. And even, if this is requested again, caching (Server, Client, both) could be introduced.
So, this would mean more requests (at least during the UI Client start up)... but it does not imply ... slower application.
The maintenance benefit is hidden (maybe it is not hidden, it should be obvious) in the Separation of Concern. On the server, we are no longer solving the issue, where to place the user data handling, the base-controller or child-controller... should there by the Master-page, the Layout-controller...
Solved. We are taking care about single, specific stuff, published via REST. One method, one business operation. And that's the dream if we'd like to keep that application alive and be the repairman and extender.
One aspect is that you can display the page to the end user very very fast . Once the page is loaded, use Jquery async calls and any Javscript template tool (like angularjs or mustacheJs) to call the web api simultaneously to build the client page views.
I have used this approach in multiple project and experience of the user is tremendous.
Most modern browsers support 6-8 parallel connections to the same site. So you do have to be careful about that. Unless you are connecting to that many separate systems, I would try to reduce the number of connections. Or ensure the calls are called asynchronously by different events to reduce the chance of parallel connections.
Making a series of HTTP calls to obtain data for your page will have an overhead. Only testing will tell you how that might impact in your scenario.
There is little point using Web API just because you can. You should have a legitimate reason for building a RESTful API. Even then, if it is primarily for your own consumption, design it to deliver a ViewModel for each page in one call.
I need to implement a draft application for a fantasy sports website. Each users will have 1m30 to choose a player on its team and if that time has elapsed it will be selected automatically. Our planned implementation will use Juggernaut to push the turn changes to each user participating in the draft. But I'm still not sure about how to handle latency.
The main issue here is if a user got a higher latency than the others, he will receive the turn changes a little bit later and his timer won't be synchronized. Say someone receive a turn change after choosing a player himself while on his side he think he still got 2 seconds left, how can we handle that case? Is it better to try to measure each user latency and adjust the client-side timer to minimize that issue? If so, how could we implement that?
This is a tricky issue, but there are some good solutions out there. Look into what time.gov does, and how it does it; essentially, as I understand it, they use Java to perform multiple repeated requests to the server, to attempt to get an idea of the latency involved in the communication, then they generate a measure of latency that they use to skew the returned time data. You could use the same process for your application, with even more accuracy; keeping track of what the latency is and how it varies over time lets you make some statistical inferences about how reliable your latency numbers are, etc. It can be a bit complex, but it can definitely allow you to smooth out your performance. My understanding is that this is what most MMOs do as well, to manage lag.
I'm going to create a view counter for articles. I have some questions:
Should I ignore article's author
when he opens the article?
I don't want to update database each
time. I can store in a
Dictionary<int, int> (articleId, viewCount) how many times
each article was viewed. After 100
hits I can update the database.
I should only count the hit once per
hour for each user and article. (if
the user opens one article many
times during one hour the view count
should be incremented only once).
For each question I want to know your suggestions how to do it right.
I'm especially interested how to do #3. Should I store the time when the user opened the article in a cookie? Does it mean that I should create a new cookie for each page?
I think I know the answer - they are analyzing the IIS log as Ope suggested.
Hidden image src is set to
http://stackoverflow.com/posts/3590653/ivc/[Random code]
[Random code] is needed because many people may share the same IP (in a network, for example) and the code is used to distinguish users.
Sure - I think that is a good idea
and 3. are related: The issue is where would you actually store this dictionary and logic.
An ASP.NET application or session scope are of course the easiest choice, but there you really need to understand the logic of application pools. ASP.NET applications are recycled from time to time: when there is no action on the site for a certain period or in special situations - e.g. if the process starts to take too much memory the application is shut down and a new one is started in the next request. There are events for session and application shut-down, but at least some years ago they were not really reliable: In many special cases they did not always fire. Perhaps they are better now, but it is painful to test. And 1 hour is really a long time: Usually sessions are kept alive only like 20 minutes after last request.
A reliable way would be to have a separate Windows service (a lot of work to program) or always storing to database with double-view analyses (quite a lot of overhead for such a small feature).
Do you have access to IIS logs? How about analyzing IIS logs e.g. every 30 minutes with some kind of timer process and taking the count from there? Or then just store all the hits to the database with user information and calculate the unique hits with a similar timed process.
One final question: Are you really sure none of the thousands of counter applications/services in the Internet wouldn't do the job close enough to your requirements?
Good luck!
This is the screenshot of this page in Firebug. You can see that there is a request which returns 204 status code (No Content).
This is stackoverflow's view counter. They are using a hidden image which point to a controller's action.
I have many articles. How to track which articles the user visited already?
P.S. BTW, why is this request made two times?