Should I POST high-rate user actions to my server on a per-action basis or send the batch of events once the session is closed? - ruby-on-rails

I'm building a site where users can watch a video and click as many times as they want to "like" it. It's a bit like Periscope's "Hearts" function for those who know it.
The viewers are viewing the video on a web browser for now. Every "Like" is input into a heroku-hosted REDIS instance, so the write/read are fairly cheap. However potentially there could be a high rate of simultaneous input as many users watch a video at the same time.
In this scenario, I'm facing two options:
Send an event to the REDIS instance every time the user "likes." convenience: story the "like" right away with all relevant information. Inconvenience: lots of concurrent likes into the server.
Cache the "likes" locally and only send to REDIS once the session is over. Problem: at any time the user can close his browser (and potentially never return) so the "like" information could be lost permanently.
Any advice on which option is preferable?

Don't cache.
First, it's a really big complication as you won't know when the session is really over.
Second, Redis increment is probably as fast or faster than your cache. I bet your concern is Rails only, not Redis.
You may eventually want to make another endpoint - maybe a simple Sinatra app - to simply handle likes. I noticed autosuggest gems sometimes do this (for example) and it saves all the overhead of a rails request.
If it is a successful app, the concern could be someone writing a script to 'like' continually. You may need to put in some throttle to allow a limited number of requests over time.

Related

Call API but not for every user

I would like to do something similar to this: Rails way to call external API from view?
But I don't want to call the API for every request from users because that would put a lot of unnecessary load on the API server and deplete my quota too fast.
Is there any way to cache the response from every 100th user and display the cached version to every other user or something of the sort? There's probably something already out there to do this, but I'm very new to Ruby and would appreciate some help.
There are numerous ways to achieve what you are looking for. I would advise against caching the response per xxx user, since there are many variables around days and times where traffic will be more strenuous than others. I would advise that you ask yourself what the behaviour of the method is. Is it to pull some complex data or would it just be a simple count? If real-time information is not important, what is an acceptable timeframe for the information to be cached?
If the answer to the above questions can be answered in time metric rather than xxx Users visiting, then you may want to use the built in Rails.cache, by defining the metric collection method in a helper and then calling from a view:
def method_to_call
Rails.cache.fetch("some_method", expires_in: 1.hour) do
SomeThing.to_cache
end
end
from here you can forecast your access to the API and be certain of your usage over a defined time period, without worrying about what times of day your website may be more busy, or any unexpected spikes in application usage.
If you want to cache per xxx user visit, I would highly recommend redis. It's a fantastic piece of software that is incredibly fast and scalable. It's a key value pair store that can hold the data around unique users and page views.
Another question to ask is are you caching on individual user or individual page view? Based on the answer you can store user id or page view count and have conditional logic to refresh the cache on each xxx metric. Performance should not be too much of an issue if you have some due diligence to clear the store every week or so, depending on the data stored.
When you get to large scales of caching you might have to think about the infrastructure of hosting a redis instance. Will you need a dedicated server? Is docker a viable option for a production redis? Can you host the redis instance on the same instance of the application? All of these possible overheads favour the initial approach, but again it is dependant on your needs.

opening and closing streaming clients for specific durations

I'd like to infrequently open a Twitter streaming connection with TweetStream and listen for new statuses for about an hour.
How should I go about opening the connection, keeping it open for an hour, and then closing it gracefully?
Normally for background processes I would use Resque or Sidekiq, but from my understanding those are for completing tasks as quickly as possible, not chilling and keeping a connection open.
I thought about using a global variable like $twitter_client but that wouldn't horizontally scale.
I also thought about building a second application that runs on one box to handle this functionality, but that seems excessive if it can be integrated into the main app somehow.
To clarify, I have no trouble starting a process, capturing tweets, and using them appropriately. I'm just not sure what I should be starting. A new app? A daemon of some sort?
I've never encountered a problem like this, and am completely lost. Any direction would be much appreciated!
Although not a direct fix, this is what I would look at:
Time
You're working with time, so I'd look at what time-centric processes could be used to induce the connection for an hour
Specifically, I'd look at running a some sort of job on the server, which you could fire at specific times (programmatically if required), to open & close the connection. I only have experience with resque, but as you say, it's probably not up to the job. If I find any better solutions, I'll certainly update the answer
Storage
Once you've connected to TweetStream, you'll want to look at how you can capture the tweets for that time period. It seems a waste to create a data table just for the job, so I'd be inclined to use something like Redis to store the tweets that you need
This can then be used to output the tweets you need, allowing you to simulate storing / capturing them, but then delete them after the hour-window has passed
Delivery
I don't know what context you're using this feature in, so I'll just give you as generic process idea as possible
To display the tweets, I'd personally create some sort of record in the DB to show the time you're pinging TweetStream that day (if it changes; if it's constant, just set a constant in an initializer), and then just include some logic to try and get the tweets from Redis. If you're able to collect them, show them as you wish, else don't print anything
Hope that gives you a broader spectrum of ideas?

View counter in ASP.NET MVC

I'm going to create a view counter for articles. I have some questions:
Should I ignore article's author
when he opens the article?
I don't want to update database each
time. I can store in a
Dictionary<int, int> (articleId, viewCount) how many times
each article was viewed. After 100
hits I can update the database.
I should only count the hit once per
hour for each user and article. (if
the user opens one article many
times during one hour the view count
should be incremented only once).
For each question I want to know your suggestions how to do it right.
I'm especially interested how to do #3. Should I store the time when the user opened the article in a cookie? Does it mean that I should create a new cookie for each page?
I think I know the answer - they are analyzing the IIS log as Ope suggested.
Hidden image src is set to
http://stackoverflow.com/posts/3590653/ivc/[Random code]
[Random code] is needed because many people may share the same IP (in a network, for example) and the code is used to distinguish users.
Sure - I think that is a good idea
and 3. are related: The issue is where would you actually store this dictionary and logic.
An ASP.NET application or session scope are of course the easiest choice, but there you really need to understand the logic of application pools. ASP.NET applications are recycled from time to time: when there is no action on the site for a certain period or in special situations - e.g. if the process starts to take too much memory the application is shut down and a new one is started in the next request. There are events for session and application shut-down, but at least some years ago they were not really reliable: In many special cases they did not always fire. Perhaps they are better now, but it is painful to test. And 1 hour is really a long time: Usually sessions are kept alive only like 20 minutes after last request.
A reliable way would be to have a separate Windows service (a lot of work to program) or always storing to database with double-view analyses (quite a lot of overhead for such a small feature).
Do you have access to IIS logs? How about analyzing IIS logs e.g. every 30 minutes with some kind of timer process and taking the count from there? Or then just store all the hits to the database with user information and calculate the unique hits with a similar timed process.
One final question: Are you really sure none of the thousands of counter applications/services in the Internet wouldn't do the job close enough to your requirements?
Good luck!
This is the screenshot of this page in Firebug. You can see that there is a request which returns 204 status code (No Content).
This is stackoverflow's view counter. They are using a hidden image which point to a controller's action.
I have many articles. How to track which articles the user visited already?
P.S. BTW, why is this request made two times?

Twitter app development best practices?

Let's imagine app which is not just another way to post tweets, but something like aggregator and need to store/have access to tweets posted throught.
Since twitter added a limit for API calls, app should/may use some cache, then it should periodically check if tweet was not deleted etc.
How do you manage limits? How do you think good trafficed apps live while not whitelistted?
To name a few.
Aggressive caching. Don't call out to the API unless you have to.
I generally pull down as much data as I can upfront and store it somewhere. Then I operate off the local store until it runs out and needs to be refreshed.
Avoid doing things in real time. Queue up requests and make them on a timer.
If you're on Linux, cronjobs are the easiest way to do this.
Combine requests as much as possible.
Well you have 100 requests per hour, so the question is how do you balance it between the various types of requests. I think the best option is the way is how TweetDeck which allows you to set the percentage and saves the rest of the % for posting (because that is important too):
(source: livefilestore.com)
Around the caching a database would be good, and I would ignore deleted ones - once you have downloaded the tweet it doesn't matter if it was deleted. If you wanted to, you could in theory just try to open the page with the tweet and if you get a 404 then it's been deleted. That means no cost against the API.

Storing Data In Memory: Session vs Cache vs Static

A bit of backstory: I am working on an web application that requires quite a bit of time to prep / crunch data before giving it to the user to edit / manipulate. The data request task ~ 15 / 20 secs to complete and a couple secs to process. Once there, the user can manipulate vaules on the fly. Any manipulation of values will require the data to be reprocessed completely.
Update: To avoid confusion, I am only making the data call 1 time (the 15 sec hit) and then wanting to keep the results in memory so that I will not have to call it again until the user is 100% done working with it. So, the first pull will take a while, but, using Ajax, I am going to hit the in-memory data to constantly update and keep the response time to around 2 secs or so (I hope).
In order to make this efficient, I am moving the intial data into memory and using Ajax calls back to the server so that I can reduce processing time to handle the recalculation that occurs w/ this user's updates.
Here is my question, with performance in mind, what would be the best way to storing this data, assuming that only 1 user will be working w/ this data at any given moment.
Also, the user could potentially be working in this process for a few hours. When the user is working w/ the data, I will need some kind of failsafe to save the user's current data (either in a db or in a serialized binary file) should their session be interrupted in some way. In other words, I will need a solution that has an appropriate hook to allow me to dump out the memory object's data in the case that the user gets disconnected / distracted for too long.
So far, here are my musings:
Session State - Pros: Locked to one user. Has the Session End event which will meet my failsafe requirements. Cons: Slowest perf of the my current options. The Session End event is sometimes tricky to ensure it fires properly.
Caching - Pros: Good Perf. Has access to dependencies which could be a bonus later down the line but not really useful in current scope. Cons: No easy failsafe step other than a write based on time intervals. Global in scope - will have to ensure that users do not collide w/ each other's work.
Static - Pros: Best Perf. Easies to maintain as I can directly leverage my current class structures. Cons: No easy failsafe step other than a write based on time intervals. Global in scope - will have to ensure that users do not collide w/ each other's work.
Does anyone have any suggestions / comments on what I option I should choose?
Thanks!
Update: Forgot to mention, I am using VB.Net, Asp.Net, and Sql Server 2005 to perform this task.
I'll vote for secret option #4: use the database for this. If you're talking about a 20+ second turnaround time on the data, you are not going to gain anything by trying to do this in-memory, given the limitations of the options you presented. You might as well set this up in the database (give it a table of its own, or even a separate database if the requirements are that large).
I'd go with the caching method of for storing the data across any page loads. You can name the cache you want to store the data in to avoid conflicts.
For tracking user-made changes, I'd go with a more old-school approach: append to a text file each time the user makes a change and then sweep that file at intervals to save changes back to DB. If you name the files based on the user/account or some other session-unique indicator then there's no issue with conflict and the app (or some other support app, which might be a better idea in general) can sweep through all such files and update the DB even if the session is over.
The first part of this can be adjusted to stagger the write out more: save changes to Session, then write that to file at intervals, then sweep the file at larger intervals. you can tune it to performance and choose what level of possible user-change loss will be possible.
Use the Session, but don't rely on it.
Simply, let the user "name" the dataset, and make a point of actively persisting it for the user, either automatically, or through something as simple as a "save" button.
You can not rely on the session simply because it is (typically) tied to the users browser instance. If they accidentally close the browser (click the X button, their PC crashes, etc.), then they lose all of their work. Which would be nasty.
Once the user has that kind of control over the "persistent" state of the data, you can rely on the Session to keep it in memory and leverage that as a cache.
I think you've pretty much just answered your question with the pros/cons. But if you are looking for some peer validation, my vote is for the Session. Although the performance is slower (do you know by how much slower?), your processing is going to take a long time regardless. Do you think the user will know the difference between 15 seconds and 17 seconds? Both are "forever" in web terms, so go with the one that seems easiest to implement.
perhaps a bit off topic. I'd recommend putting those long processing calls in asynchronous (not to be confused with AJAX's asynchronous) pages.
Take a look at this article and ping me back if it doesn't make sense.
http://msdn.microsoft.com/en-us/magazine/cc163725.aspx
I suggest to create a copy of the data in a new database table (let's call it EDIT) as you send the initial results to the user. If performance is an issue, do this in a background thread.
As the user edits the data, update the table (also in a background thread if performance becomes an issue). If you have to use threads, you must make sure that the first thread is finished before you start updating the rows.
This allows a user to walk away, come back, even restart the browser and commit whenever she feels satisfied with the result.
One possible alternative to what the others mentioned, is to store the data on the client.
Assuming the dataset is not too large, and the code that manipulates it can be handled client side. You could store the data as an XML data island or JSON object. This data could then be manipulated/processed and handled all client side with no round trips to the server. If you need to persist this data back to the server the end resulting data could be posted via an AJAX or standard postback.
If this does not work with your requirements I'd go with just storing it on the SQL server as the other comment suggested.

Resources