share session among different type of web servers - ruby-on-rails

Some web services in my company are built with different web apps.(Rails, Django, PHP)
What's the better practice to share the session status
So that user won't have to login again and again among different servers.
I build my Rails apps in AWS auto scaling group.
That is, even I browse the same website, but next time I may browser on another server, so that I have to login again. because the server doesn't have my session status.
I need some better idea or keywords for me to search about that kind of issue.
Thanks in advance

I can think of two ways in which you can achieve this objective
Implement a custom session handling mechanism that makes use of database session management, i.e. all sessions will be stored in a special table in the database and will be accessible to all the servers.
Have a Central Authentication Service (CAS) which will act as a proxy to all the other servers. This will then mean that this step has to happen before the requests reach the load balancer.
If you look around, option 1 might be recommended by many, but it may also be an overkill since you'll need custom session management in each of the servers. However, your choice would probably depend on the specific objectives you want to achieve, the overall flexibility of the system architecture and the amount of time you have on your hands. The CAS might be a more straightforward way of solving the problem.

Storing user sessions in your applications database wouldn't be recommended option for AWS.
The biggest problem with using a database, is that you need to write some clean up script that runs every so often to clear the table of all the expired user sessions. This is messy, creates more overhead, and puts more pressure on your DB.
However, if you do want to use an actual database for this, you should use a NoSQL database like Dynamo. This will give you much better performance than a relational database. It's probably more cost effective too in terms of data transfer. However, the biggest problem with this is that you still need that annoying clean up script. Note There is built in support in the SDK for using PHP with Dynamo for storing the user's session:
http://docs.aws.amazon.com/aws-sdk-php/v2/guide/feature-dynamodb-session-handler.html
The best but most costly solution is to use an ElastiCache cluster. You can set a TTL of your choice which means you won't have to worry about clean up scripts. Also ElastiCache will get you much better performance than Dynamo or any relational DB as the data is stored in the RAM of the ElastiCache nodes. The main drawback of ElastiCache is that currently, it can't scale dynamically. So if too many users logged in at once, if you didn't have a big enough cluster already provisioned, things could get ugly.
http://docs.aws.amazon.com/AmazonElastiCache/latest/UserGuide/WhatIs.html
But you can bet that all the biggest, baddest and best applications being hosted on AWS are either using Dynamo, ElastiCache or a custom NoSQL or Cache cluster solution for storing user sessions.
Note that both of these services are supported in all of the AWS SDKs:
https://aws.amazon.com/tools/

Related

Multiple Projects, Multiple languages, Same authentication

So I have multiple projects that I want to use a common core set of tables in a Postgres database to map an authentication scheme between them with. Things like a 'user' 'account' 'group' or other related user information is stored in these tables. The projects I have currently are a nodejs (multiple devices) and a Ruby web app (planning on multiple devices later on) and we could have a Django or another node project in the future as well. Is there an efficient, cost effective way to do this that would be scalable and reliable? I was thinking about using an s3 instance with a Postgres database hosted on it and pointing all my authentication from my multiple apps to that database but I wanted to see if others had thought about this problem as well.
First, please don't attempt to host postgres in S3.. I believe you may have meant EC2 with an EBS volume (which is really on s3). From an ease of use standpoint (particularly when considering ongoing maintenance), hosting any postgres instance on Amazon's RDS product is truly a pleasure. Without going into all the details of that product, I'll simply state that you can set up high availability (failover), backups, upgrades, and monitoring with just a few clicks.
That being said, RDS is not the cheapest solution, but the cost is not exorbitant either, depending on your load and number of simultaneous connections.
If all this database is going to do is authenticate people and then disconnect-- that very well will be overkill and will be a waste of resources. However, if you're housing a fairly complex set of permissions and other user data, it'll likely be a fairly straightforward solution.
Depending on your budget and requirements, you may benefit from running pgPool on your app server somewhere to pool the connections-- but I wouldn't start out using pgPool unless you need it.

How can one rails app on heroku access many databases

I want to be able to have one app access multiple databases on the HEROKU "system".
Can the connection to the database be changed dynamically?
Why I ask...
I have an app that has a lot of very processor heavy background jobs. If a given user uploads a product feed of say 50,000 product that have to be compared to existing products and update only the deltas it can take a "few" minutes.
Now to mitigate the delay I spin up multiple workers, each taking small bites out of the lot until there's none. I can get to about 20 workers before the GUI starts to feel sluggish because the DB is being hammered.
I've tuned some of the code and indexed the DB to some extent, and I'm sure there's more I could do, but it will eventually suffer the law of diminished returns.
For one user, I don't much care... if you upload 50k products you need to wait a bit..
But user one's choice to upload impacts user two. (different company so no cross over of data)..
Currently I handle different users by separating their data with schemas in postgresql.
The different users however share the same db connection and even on the best plan I can see a time when 20 users try to upload 50,000 products at the same time.(first of month/quarter for example).
User 21 would see a huge slow down on their system because of this..
So the question: Can I assign different users to different databases? User logs in, validates their info against a central DB, and then a different DB takes over?
My current solution is different instances of heroku. It's easy to maintain the code because it's one base and I just script the git push(es). The only issue is the different login URL's; which I suppose I could confront if I can't find an easy DB switch solution.
It sounds like you're able to shard your data by user, or set of users without much concern since you already separate them by schema. If that's the case, and you're using Ruby and ActiveRecord, look at https://github.com/tchandy/octopus. I imagine you're not looking to spin up databases on the fly, rather, you'll have them already built and ready to be used, and can add more as you go.
Granted, it sounds like what you're doing could be done a lot more effectively by using the right tool for that type of intensive processing like one of the Heroku Hadoop add-ons; nonetheless, if that's not an option for whatever reason, check out the gem above. There are a couple other gems like it, and of course you could technically manage your own ActiveRecord connections without this gem, but I think you'll find that will be painful really fast.
Of course, if you aren't using Ruby or ActiveRecord, still shard the data, and look for something like the gem above in your app's language :).
the postgres databases on heroku are configured with environment variables. when you run heroku config you should see:
DATABASE_URL: postgres://xxx.compute.amazonaws.com:5432/xxx
you can use these variables to connect to databases on other heroku instances or share a single database on different heroku apps.
if you try to run this kind of stuff on free heroku instances, i think it is against their terms of services.
if it's about scalability, i think you will just have to pay for a more expensive database instance...

Should I choose cloud?

I'm about to start development on a project with very uncertain load/traffic specifics. When it will be released there will certainly be very low load that can easily be handled by a single desktop quad code machine.
The problem is that there will be (after some invite-only period) a strong publicity for the product so I expect considerable traffic/load peaks.
I haven't read enough about cloud providers and I'm mostly leaning toward Amazon or Azure for the credibility these two companies have without checking them out as I should with others (ie. Rackspace that I suppose is also a cloud service provider).
What I want
I would like to create a normal Asp.net MVC web application that can be run on in-house single machine low-cost server. It would run web server along with database (relational and maybe also document) and fulltext search (not SQL FTS but rather high speed separate product like Lucene or Sphinx). But after initial invite-only period I'd like to move this app to the cloud to make it more traffic/load demand-friendly.
As much as I know Amazon offers a sort of virtual machine hosting which I understand you setup as a normal server but has possible flexible resources in terms of load power. I'm not sure if that can be accomplished on Azure as well.
Questions
What is your experience with application transition to cloud and which one did you choose and why?
What would you recommend I should think about when designing/developing the solution to make the transition as painless as possible.
Based on your experience is it better to move to the cloud (financial wise) or is it better to buy your own servers and load balance application yourself and maybe save money on the long run?
"Cloud" is such a vague term. Still, I think this is a very good question.
Basically, IaaS cloud hosting does not magically make your application scale. It's really a virtual private server with very short contract / cancellation periods.
For scalability, the magic lies not so much in the hosting, but in the horizontal scalability of the application code itself. This is related to all the distributed computing challenges. For example, adding more application servers is not always easy: you must be sure that you don't persist any user state in the server application (but rather in a database, static can be evil), caching can be problematic because local caches can make the situation worse if you're using a round-robin strategy, etc.
What is your experience with application transition to cloud and which one did you choose and why?
What would you recommend I should think about when designing/developing the solution to make the transition as painless as possible.
You don't really have to do anything different just to host on EC2 or Azure -- basically. But of course, it's not that easy when things grow.
For instance, EC2 instance storage is rather limited. Additional storage on EBS, however, does not provide comparable performance characteristics and can be a bit more laggy than a disk. The point here is that EBS does magically scale, and it's probably more PaaS than IaaS; but it's not a simple hard disk and it does, consequently, not behave like a hard drive. I don't know about Azure block storage. In general, expect additional abstraction layers to introduce problems of their own, no matter what they do.
Based on your experience is it better to move to the cloud (financial wise) or is it better to buy your own servers and load balance application yourself and maybe save money on the long run?
Typical cloud providers are more expensive than the usual 'round-the-corner VPS providers, but they are, to my experience, also much more reliable and professional. EC2 has a free tier (but it's quite small), Azure gives you a small instance for free for 3 months.
Doing the calculation right is rather tricky; for example, if you have to shut down your service for whatever reason, it's nice to be able to cancel now rather than pay another year - you might want to put this risk into your calculation. On the other hand, both EC2 and Azure will be considerably cheaper if you sign up for 6 or 12 months, rather than paying by the hour.
You might want to check out the free Azure plan, because it's nice to start fiddling around without any cost. A big advantage of cloud providers is that you can scale vertically very easily: buying a 16 core, 64GB RAM server machine is really expensive, but if there's so much traffic on your site, upgrading your plan won't be such a big issue.
As someone hasn't mention it yet...
AppHarbor has been amazing. You can push stuff in a matter of minutes. Deployment is a breeze. And setting up your project for it is easier as well. And it doesn't even require any major changes in your solution to fit in.
For the full-text search, you might consider something like Websolr.
A lot of this depends on what your app is doing (e.g., are there separable components that might benefit from running on different instances, vs. a simple CRUD application with a front end). One thing to consider is that in a cloud application you normally don't have a traditional relational database. As such, you have to choose either cloud or traditional hosting, or plan on coding your access layer twice. Azure does have relational databases (SQL Azure), although they're not identical to SQL Server 2008R2. You're going to have to research the pros/cons of a cloud setup for your specific situation.
As far as financial concerns, it's usually a lot cheaper to just get an account with a hosting company instead of a cloud service, since you pay by the month, instead of the hour (last time I checked an account with Azure running 24/7 for a month would cost about $40-$50, while you can get hosting for $15 a month). The savings with the cloud come in when you have to run several servers, and the cost of maintaining them surpasses the cost of the instance on the cloud platform.
So, sorry, there's no silver bullet answer for you. Read up on the different services available. Consider what your application needs, what prices will be, and go from there.
I have just migrated an MVC-based application from a dedicated server to Azure. When migrating the MSSQl-database, I first tried importing .bacpak files but some of the tables failed because of their size. I then used the SQL Database migratio wizard which worked fine for small tables but failed for tables with BLOB-fields. For these tables I had to use temporary intermediate tables. Then after a while after all the data was transferred setting up the Webapp was a breeze and we went in production. At first, everything seemed to work just fine, but after a couple of hours when the load got heavier, all kind of errors occurred. I went into the Azure portal and it was really easy to see the

ASP.NET MVC Session state using state partitioning, MongoDB or Memcached or...?

My team is currently building a new SaaS application for our company (Amilia.com). We are in "alpha" release and the application was built to be deployed on a web farm.
For our session provider, we are using Sql Server mode (in DEV and TEST) and it seems to be not "scalable", hence we are looking for the best solution for handling sessions in asp.net (mvc3 in our case). We are currently using Sql Server but we would like to switch to an other system due to license cost.
We target 20 000 [EDITED, was 100k before] concurrent users. In session, we store a GUID, a string and a Cart object (we try to keep it as little as possible, this object allows us to save 3 queries at each request).
Here are the different solutions I've found :
ASP.NET built-in solutions:
No session : impossible in our case (eliminated)
In-Proc Mode : can't be used in a webfarm. (eliminated)
StateServer Mode : can be used in a webfarm but if the server goes down, I lose all my sessions. (eliminated)
StateServer Mode with a PartitionResolver using multiple servers (http://msdn.microsoft.com/en-ca/magazine/cc163730.aspx#S8) If I undestand well, if one of these servers goes down, only a part of my users will lose their session.
SqlServer Mode : can be used in a webfarm, if the server goes down, I can recover my sessions but the process is quite slow. Moreover, that database becomes a bottleneck in case of heavy load.
SqlServer Mode with a PartitionResolver using multiple servers (http://www.bulletproofideas.net/2011/01/true-scale-out-model-for-aspnet-session.html) : If one of these servers goes down, only a part of my users will lose their session. If the user was doing nothing between the downtime, he will recover his previous session otherwise he will be redirected to the signin screen.
Custom solutions :
Use MongoDB as Session storage (http://www.adathedev.co.uk/2011/05/mongodb-aspnet-session-state-store.html) It seems to be a good tradeoff but my knowledge in nosql is quite rudimentary so I cannot see the cons.
Use Memcached : the problem will be the same as StateServer mode and if the memcached server goes down, all my sessions are lost. Furthermore, I think Memcached is not dedicated to store session state ?
Use distributed memcached like ScaleOut (http://highscalability.com/product-scaleout-stateserver-memcached-steroids) : seems to be the best solution but it costs money.
Use repcached and memcached (http://repcached.lab.klab.org/), I've never seen an implementation of that solution.
We could easily go to Ms Azure and use tools provided by it but we have only one application, so if Microsoft doubles the price, we immediately double our infrastructure cost (but that's another subject).
So, what's the best way or at least what's your opinion about this ?
SQL Server session is pretty good. Since you already have a SQL Server database to store your primary data, you can just create another database and store the ASP.NET Session there.
About the scalability, I would say if you have 100,000 concurrent users, then your userbase must be more than 10 millions or more. You should do some practical estimate to see really how long it will take to reach such a concurrent user load. In my previous startup, we had millions of users all around the world, 24x7, but we hardly ever reached 10K concurrent users even though people used our site continuously for hours every day.
If you really have 100,000 concurrent users, license cost would be the least of your worry. With right business model, having 100K concurrent users means you have at least $10M revenue/year.
I have built myoffice.bt.com that uses SQL Server session and all primary data on a single SQL Server instance, but in two databases. Between 8 AM to 10 AM, millions of users hit our site. We hardly have any performance issue. With a dual Core server, 8 GB RAM, you can happily run a SQL Server instance and support such a load as long as you code it right. It all depends on how you have coded. If you have followed performance best practices, you can easily scale to millions of users on a single database server.
Take a look at my performance suggestions from:
http://omaralzabir.com/tag/performance/
I have used memcached clusters only to cache frequently used data. Never used for session for good reasons. There's been several occasions where a memcached server had to be rebooted. If we had used memcached for session, we would have lost all the sessions stored in that instance. So, I would not recommend storing sessions in memcached. But then again, how important is it for your app to maintain data in session? If you have a shopping cart, then as users add products on the cart, it must get persisted in database, not in session. Session is usually for short term storage. For any transactional data, you should never keep it on session, instead store it on relational tables directly.
I am always in support of not using Session. Developers abuse session all the time. Whenever they want to pass data from one page to another, they just put it on the Session. It results in bad design. If you truly want to scale to 100K concurrent user base, design your app to not use session at all. Any transactional data must be stored in database. Cart is a transactional object and thus it's not suitable for holding on Session. At some point you would need to know how many carts get started but never gets placed. So, you will need to store them in database permanently.
Remember, database based session is nothing but databased based serialization. Think very carefully on what you are serializing into database. You will have to clean it up as well since Session_End won't fire for database based session or in fact most of the out of proc sessions. So, essentially you are giving devs ability to just serialize data into database and bypass relational model. It always results in bad coding.
With permanent relational storage, fronted by a high performance cache like memcached, you have much better design to support large user base.
Hope this helps your concerns.

Are there any stable and production quality nosql datastores?

Are there are production quality nosql stores that I can use on a production system. I have looked at cassandra, tokyodb, couchdb etc but none of them seem to be ready for deployments on production like environments. I am talking thousands of requests per minute and lots of reads/writes/updates. My only concern is speed and service times. Does anybody know of production systems that use nosql stores effectively ? Does anybody know of a nosql store that is backed by a big enterprise like Google/Yahoo/ IBM ?
Cassandra handles thousands of requests (including write-mostly workloads) per second, per machine, and its scaling-by-adding-machines has been there since day 1.
Here is a thread about Cassandra use in production and in-production-soon at dozens of companies: http://n2.nabble.com/Cassandra-users-survey-td4040068.html#a4040068
We're also adding more docs all the time, like http://wiki.apache.org/cassandra/Operations.
I think the NoSQL systems are an excellent choice if I you 'only' care about speed and service time (and not or less about stuff like consistency and transactions). Facebook uses Cassandra.
"Cassandra is used in Facebook as an email search system containing 25TB and over 100m mailboxes." http://highscalability.com/product-facebooks-cassandra-massive-distributed-store
I think CouchDb isn't really speedy, maybe you can use MongoDB: http://www.mongodb.org/display/DOCS/Production+Deployments
Also worth consideration is using a traditional RDBMS like MySQL to store schema-less. This method gives you the stability of a proven database server like MySQL with the flexibility a NoSQL solution.
Check out this blog posting on how FriendFeed does this.
BerkeleyDB is backed by Oracle
Using the native C interface one can reach close to 1 million read requests per second.
By the way, when you say thousands requests per minute, any 'normal' DB should be able to handle that easily too.
Redis is worth giving a try as Github uses redis to manage a heavy queue of background jobs.
My first instinct would be BerkeleyDB, with each application node on a SAMBA network to facilitate ACID conformance & network use. It also sports a SQLite interface. Other poster cites MemcacheDB also having BDB inside.
Another unique option would be OrientDB, also has a SQL interface, lots of network & cluster features.

Resources