I've recently encountered cookieOverflow exception in my rails application. I've googled a bit and found this answer to be most helpful :
https://stackoverflow.com/a/9474262/169277
After having implemented storing sessions in database I'm trying to figure out the drawbacks of this approach so far I see around 1200 entries in sessions table which was populated in only few hours.
When does actual interaction with database occurs, only when writing data to session or?
This grows rather fast so is there a way to purge old unused sessions from db other than having some daily cron jobs or something.
I'm just looking some additional information regarding this approach, right now I'm thinking should I keep it or change logic of my app.
> 4KB in a cookie is a lot, so changing your app is probably not a bad idea to consider.
That said, 1200 in a few hours doesn't seem outlandish. If you're worried about growing it unbounded, you can use memcache or redis as a caching layer to store your cookies instead of your database. That would free you from worrying about growth in your database. The downside is that evictions probably mean that you're logging people out.
All that said, we have a number of daily cron-like jobs that clean out our database tables, not for sessions, but it's similar. They run at night when utilization is low anyway.
Related
I'm developing a polling application that will deal with an average of 1000-2000 votes per second coming from different users. In other words, it'll receive 1k to 2k requests per second with each request making a DB insert into the table that stores the voting data.
I'm using RoR 4 with MySQL and planning to push it to Heroku or AWS.
What performance issues related to database and the application itself should I be aware of?
How can I address this amount of inserts per second into the database?
EDIT
I was thinking in not inserting into the DB for each request, but instead writing to a memory stream the insert data. So I would have a scheduled job running every second that would read from this memory stream and generate a bulk insert, avoiding each insert to be made atomically. But i cannot think in a nice way to implement this.
While you can certainly do what you need to do in AWS, that high level of I/O will probably cost you. RDS can support up to 30,000 IOPS; you can also use multiple EBS volumes in different configurations to support high IO if you want to run the database yourself.
Depending on your planned usage patterns, I would probably look at pushing into an in-memory data store, something like memcached or redis, and then processing the requests from there. You could also look at DynamoDB, which might work depending on how your data is structured.
Are you going to have that level of sustained throughput consistently, or will it be in bursts? Do you absolutely have to preserve every single vote, or do you just need summary data? How much will you need to scale - i.e. will you ever get to 20,000 votes per second? 200,000?
These type of questions will help determine the proper architecture.
Until now, our site has had a modest amount of traffic. None of our developers are big ops guys, but we've stayed ahead of it and keep the site up and running pretty quick. That said, our dev team is stretched, we've accumulated some technical debt, and there's plenty of opportunity to optimize.
Without getting into specifics, we just found out that we'll be expecting a massive amount of traffic in the near future in a very short period time. On the order of several million hits in a few hours. Scaling is one thing, but this is several orders of magnitude greater than what we're seeing now.
We're a Rails app hosted on S3 using ELB, and Postgresql.
I wanted to field some recommendations for broad starting points for scaling and load testing given this situation.
Update: Sorry, EC2, late night :)
#LastZactionHero
Pretty interesting question, let me answer you in detail, I hope you are talking about some e-commerce applications, enterprise or B2B apps doenst see spikes as such. Since you already mentioned that you are hosted your rails app on s3. Let me make couple of things clear.
1)You cant host an rails app on s3. S3 is simple storage service. Where you can only store files.
2) I guess you have hosted your rails app on AWS ec2 with a elastic load balancer attached above the ec2 instances which is pretty good.
3)You have a self managed Postgresql deployed on a ec2 instance.
If you are running on AWS you are half way safe and you can easily scale up and scale down.
I can see one problem in your present model, that your db. AWS has got db as a service. Thats called Relation database service.Which supports Mysql Oracle and MS SQL server.
RDS comes with lot of features like auto back up of your database, high IOPS etc.
But it doesnt support your Postgresql. You need to have or manage a self managed ec2 instance and run postgresql database, but make sure its fail safe and you do have proper back and restore system at place.
AWS provides auto scaling api and command line tools, pretty easy.
You dont have worry about the bandwidth issue etc, but I admit Angelo's answer too.
You can use elastic mem cache for caching your app. Use CDN if need to speed your app. RDS can manage upto 30000 IOPS, its a monster to it will do lot of work for you.
Feel free to ask me if you need any kind of help.
(Disclaimer: I am a senior devOps engineer working for an e-commerce company, use ruby on rails)
Congratulations and I hope your expectation pans out!!
This is such a difficult question to comprehensively answer given the available information. For example, is your site heavy on db reads, writes or both (and is your sharding/replication strategy in line with your db strain)? Is bandwidth an issue, etc? Obvious points would focus on making sure you have access to the appropriate hardware and that your recipies for whatever you use to provision/deploy your hardware is up to date and good to go. You can often throw hardware at a sudden spike in traffic until you can get to the root of whatever bottlenecks you discover (and yes, you will discover them at inconvenient times!)
Regarding scaling your app, you should at least:
1) Cache whatever you can. Pay attention to cache expiration, etc.
2) Be sure your DB has appropriate indexes set up (essentially, you should have an index on any field you're searching on.)
3) Watch your logs closely to identify potential long queries, N+1 queries, long view renders, etc.
4) Do things like what Shopify outlines in this post: http://www.shopify.com/technology/7535298-what-does-your-webserver-do-when-a-user-hits-refresh#axzz2O0gJDotV
5) Set up a good monitoring system (Monit, God, etc) for each layer of your stack - sudden spikes in traffic can quickly bottleneck your application in unexpected places and lead to more issues. The cascade can happen quickly.
6) Set up cron to automate all those little tasks you currently do manually...that you will probably forget about doing once you're dealing with traffic spikes.
7) Google scaling rails and you'll see tons of good info.
8) etc, etc, etc...
You can use some profiling tools (rubyperf, or something like NewRelic, etc) Whatever response you get from them is probably best to be considered as a rough baseline at best. Simple reason being that your profiling is dependent on your hardware stack which will certainly change depending on actual traffic patterns. Pretty easy to do if you have a site with one page of static content...incredibly difficult to do if you have a CMS site with a growing db and growing traffic.
Good luck!!!
I've heard often that deploying a traditional monolithic Rails app (i.e. no internal Web API, no message queue, no Redis/memcached server) to multiple servers can produce a bunch of bugs that are very hard to debug but I'm having a hard time coming up with some concrete examples despite a few hours of googling
Some obvious issues that I can think of are:
Observers - likely will not work properly as the observation is only propagated on one server and not all of them (assuming there is no Message Queue)
Sessions - would probably need to store these in the database which would need it's own host
Caches - any sweepers would have issues propagating invalidations between servers.
Anyone else care to contribute? I'd really appreciate any articles others may have come across or just general wisdom :)
Observers are just code callbacks.
They run on each process, on each server.
Sessions have defaulted to the cookie store for the last few years.
So multiple servers are no problem.
If you don't have enough space in your cookie then I suggest you may be doing something wrong.
Cache invalidation is indeed a problem.
But it always is.
One solution is to break your cache out into a standalone service.
Sites like Facebook have giant farms of memcache
I think scaling and clustering is always a hard problem.
But this seems to be an old argument against rails.
If anything the last few years have seen rails shine in this respect.
With ec2, nosql, and server automation becoming quite a norm in the community.
My team is currently building a new SaaS application for our company (Amilia.com). We are in "alpha" release and the application was built to be deployed on a web farm.
For our session provider, we are using Sql Server mode (in DEV and TEST) and it seems to be not "scalable", hence we are looking for the best solution for handling sessions in asp.net (mvc3 in our case). We are currently using Sql Server but we would like to switch to an other system due to license cost.
We target 20 000 [EDITED, was 100k before] concurrent users. In session, we store a GUID, a string and a Cart object (we try to keep it as little as possible, this object allows us to save 3 queries at each request).
Here are the different solutions I've found :
ASP.NET built-in solutions:
No session : impossible in our case (eliminated)
In-Proc Mode : can't be used in a webfarm. (eliminated)
StateServer Mode : can be used in a webfarm but if the server goes down, I lose all my sessions. (eliminated)
StateServer Mode with a PartitionResolver using multiple servers (http://msdn.microsoft.com/en-ca/magazine/cc163730.aspx#S8) If I undestand well, if one of these servers goes down, only a part of my users will lose their session.
SqlServer Mode : can be used in a webfarm, if the server goes down, I can recover my sessions but the process is quite slow. Moreover, that database becomes a bottleneck in case of heavy load.
SqlServer Mode with a PartitionResolver using multiple servers (http://www.bulletproofideas.net/2011/01/true-scale-out-model-for-aspnet-session.html) : If one of these servers goes down, only a part of my users will lose their session. If the user was doing nothing between the downtime, he will recover his previous session otherwise he will be redirected to the signin screen.
Custom solutions :
Use MongoDB as Session storage (http://www.adathedev.co.uk/2011/05/mongodb-aspnet-session-state-store.html) It seems to be a good tradeoff but my knowledge in nosql is quite rudimentary so I cannot see the cons.
Use Memcached : the problem will be the same as StateServer mode and if the memcached server goes down, all my sessions are lost. Furthermore, I think Memcached is not dedicated to store session state ?
Use distributed memcached like ScaleOut (http://highscalability.com/product-scaleout-stateserver-memcached-steroids) : seems to be the best solution but it costs money.
Use repcached and memcached (http://repcached.lab.klab.org/), I've never seen an implementation of that solution.
We could easily go to Ms Azure and use tools provided by it but we have only one application, so if Microsoft doubles the price, we immediately double our infrastructure cost (but that's another subject).
So, what's the best way or at least what's your opinion about this ?
SQL Server session is pretty good. Since you already have a SQL Server database to store your primary data, you can just create another database and store the ASP.NET Session there.
About the scalability, I would say if you have 100,000 concurrent users, then your userbase must be more than 10 millions or more. You should do some practical estimate to see really how long it will take to reach such a concurrent user load. In my previous startup, we had millions of users all around the world, 24x7, but we hardly ever reached 10K concurrent users even though people used our site continuously for hours every day.
If you really have 100,000 concurrent users, license cost would be the least of your worry. With right business model, having 100K concurrent users means you have at least $10M revenue/year.
I have built myoffice.bt.com that uses SQL Server session and all primary data on a single SQL Server instance, but in two databases. Between 8 AM to 10 AM, millions of users hit our site. We hardly have any performance issue. With a dual Core server, 8 GB RAM, you can happily run a SQL Server instance and support such a load as long as you code it right. It all depends on how you have coded. If you have followed performance best practices, you can easily scale to millions of users on a single database server.
Take a look at my performance suggestions from:
http://omaralzabir.com/tag/performance/
I have used memcached clusters only to cache frequently used data. Never used for session for good reasons. There's been several occasions where a memcached server had to be rebooted. If we had used memcached for session, we would have lost all the sessions stored in that instance. So, I would not recommend storing sessions in memcached. But then again, how important is it for your app to maintain data in session? If you have a shopping cart, then as users add products on the cart, it must get persisted in database, not in session. Session is usually for short term storage. For any transactional data, you should never keep it on session, instead store it on relational tables directly.
I am always in support of not using Session. Developers abuse session all the time. Whenever they want to pass data from one page to another, they just put it on the Session. It results in bad design. If you truly want to scale to 100K concurrent user base, design your app to not use session at all. Any transactional data must be stored in database. Cart is a transactional object and thus it's not suitable for holding on Session. At some point you would need to know how many carts get started but never gets placed. So, you will need to store them in database permanently.
Remember, database based session is nothing but databased based serialization. Think very carefully on what you are serializing into database. You will have to clean it up as well since Session_End won't fire for database based session or in fact most of the out of proc sessions. So, essentially you are giving devs ability to just serialize data into database and bypass relational model. It always results in bad coding.
With permanent relational storage, fronted by a high performance cache like memcached, you have much better design to support large user base.
Hope this helps your concerns.
Assuming a MySQL datastore, when would you NOT want to use memcached in a Ruby on Rails app?
Don't use memcached if your application is able to handle all requests quickly. Adding memcached is extra mental overhead when it comes to coding your app, so don't do it unless you need it.
Scaling's "one swell problem to have".
Memcache is a strong distributed cache, but isn't any faster than local caching for some content. Caching should allow you to avoid bottlenecks, which is usually database requests and network requests. If you can cache your full page locally as HTML because it doesn't change very often (isn't very dynamic), then your web server can serve this up much faster than querying memcache. This is especially true if your memcache server, like most memcached servers, are on seperate machines.
Flip side of this is that I will sometimes use memcache locally instead of other caching options because I know someday I will need to move it off to its own server.
The main benefit of memcached is that it is a distributed cache. That means you can generate once, and serve from cache across many servers (this is why memcached was created). All the previous answers seem to ignore this - it makes me wonder if they have ever had to build a highly scalable app (which is exactly what memcached is for)
Danga Interactive developed memcached
to enhance the speed of
LiveJournal.com, a site which was
already doing 20 million+ dynamic page
views per day for 1 million users with
a bunch of webservers and a bunch of
database servers. memcached dropped
the database load to almost nothing,
yielding faster page load times for
users, better resource utilization,
and faster access to the databases on
a memcache miss.
(My bolding)
So the answer is: if your application is only ever likely to be deployed on a single server.
If you are ever likely to use more than one server (for scalability and redundancy) memcached is (nearly) always a good idea.
When you want to have fine-grained control about things expiring. From my tests, memcached only seems to have a timing resolution of about a second.
EG: if you tell something to expire in 1 second, it could stay around for between 1 and just over 2 seconds.