How does Rails handle multiple incoming requests? [closed] - ruby-on-rails

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
How does Rails, handle multiple requests from different users without colliding? What is the logic?
E.g., user1 logged in and browses the site. At same time, user2, user3, ... log in and browse. How does rails manage this situation without any data conflict between users?

One thing to bear in mind here is that even though users are using the site simultaneously in their browsers, the server may still only be handling a single request at a time. Request processing may take less than a second, so requests can be queued up to be processed without causing significant delays for the users. Each response starts from a blank slate, taking only information from the request and using it to look up data from the database. it does not carry anything over from one request to the next. This is called the "stateless" paradigm.
If the load increases, more rails servers can be added. Because each response starts from scratch anyway, adding more servers doesn't create any problems to do with "sharing of information", since all information is either sent in the request or loaded from the database. It just means that more requests can be handled per second.
When their is a feeling of "continuity" for the user, for example staying logged into a website, this is done via cookies, which are stored on their machine and sent through as part of the request. The server can read this cookie information from the request and, for example, NOT redirect someone to the login page as the cookie is telling them they have logged in already as user 123 or whatever.

In case your question is about how Rails differ users the answer will be that is uses cookies to store session. You can read more about it here.
Also data does not conflict since you get fresh instance of controller for each request
RailsGuides:
When your application receives a request, the routing will determine
which controller and action to run, then Rails creates an instance of
that controller and runs the method with the same name as the action.

That is guaranteed not by Rails but by the database that the webservice uses. The property you mentioned is called isolation. This is among several properties that a practical database has to satisfy, known as ACID.

This is achieved using a "session": a bunch of data specific to the given client, available server-side.
There are plenty of ways for a server to store a session, typically Rails uses a cookie: a small (typically around 4 kB) dataset that is stored on user's browser and sent with every request. For that reason you don't want to store too much in there. However, you usually don't need much, you only need just enough to identify the user and still make it hard to impersonate him.
Because of that, Rails stores the session itself in the cookie (as this guide says). It's simple and requires no setup. Some think that cookie store is unreliable and use persistence mechanisms instead: databases, key-value stores and the like.
Typically the workflow is as follows:
A session id is stored in a cookie when the server decides to initialize a session
A server receives a request from the user, fetches session by its id
If the session says that it represents user X, Rails acts as if it's actually him
Since different users send different session ids, Rails treats them as different ones and outputs data relevant to a detected one: on a per-request basis.
Before you ask: yes, it is possible to steal the other person's session id and act in that person's name. It's called session hijacking and it's only one of all the possible security issues you might run into unless you're careful. That same page offers some more insight on how to prevent your users from suffering.

As additional case You could use something like a "puma" multithread server...

Related

Is there a canonical pattern for caching something related to a session on the server?

In my Rails app, once per user session, I need to have my server send a request to one of our other services to get some data about the user. I only want to make this request once per session because pinging another service every time the user makes a request will significantly slow down our response time. However, I can't store this information in a cookie client-side. This information has some security implications - if the user has the ability to lie to our server about what this piece of information is, they can gain access to data they're not authorized to see.
So what is the best way to cache or store a piece of data associated with a session on the Rails server?
I'm considering using Rails low-level caching, and I think it might even be correct:
Rails.cache.fetch(session.id, expires_in: 12.hours) do
OtherServiceAPI.get_sensitive_data(user.id)
end
I know that Rails often has one canonical way of doing things, though, so I want to be sure there's not a built-in, officially preferred way to associate a piece of data with a session. This question makes it look like there are potential pitfalls using the approach I'm considering as well, although it looks like those concerns may have been made obsolete in newer versions of Rails.
Is there a canonical pattern for what I'm trying to do? Or is the approach I'm considering idiomatic enough?

MVC 5 how to achieve POST that behaves like a redirect to GET with content

My client redirects to a https://domain.com/Controller/GetInfo?Querystring method. Now my query string is getting dangerously close to the 2K limit, so I need to reproduce this behavior but pack my query string into the content of the messages. Since it would be heresy (etc.) to try a GET with content, I'll use a POST. However, I can't redirect to a POST since a Redirect has no content.
So, what I am looking for is the best MVC 5 pattern to resolve this: I need to provide lots of content, but I want the resulting page hosted on my remote server (i.e. as if I had redirected)
Also, since I use load balanced servers in azure, I'd prefer maintaining my clean stateless server if at all possible (else I'll have to introduce session caching).
#AntP is absolutely right in the comments above. If your query string is approaching 2K, then you're abusing it.
If there's a particular object you're referencing, then you can simply include the id or some other identifying piece of it and use that to look it up again from your data store.
If there's no persistent record of the object, then you can use something like Session or TempData to store it between one request and the next.
Regardless, it's not possible to redirect with a request body, with also means it's not possible to redirect using POST. The reason for this that the a redirect is not something the server does, but rather the client. The server merely suggests that the client go to a different URL. It's then up to the client (web browser) to issue a new request for that URL. Since the client is the one issuing the request, it makes the decision about what data is or isn't included in that request, not the server.

How to properly handle asynchronous database replication?

I'm considering using Amazon RDS with read replicas to scale our database.
Some of our controllers in our web application are read/write, some of them are read-only. We already have an automated way for identifying which controllers are read-only, so my first approach would have been to open a connection to the master when requesting a read/write controller, else open a connection to a read replica when requesting a read-only controller.
In theory, that sounds good. But then I stumbled open the replication lag concept, which basically says that a replica can be several seconds behind the master.
Let's imagine the following use case then:
The browser posts to /create-account, which is read/write, thus connecting to the master
The account is created, transaction committed, and the browser gets redirected to /member-area
The browser opens /member-area, which is read-only, thus connecting to a replica. If the replica is even slightly behind the master, the user account might not exist yet on the replica, thus resulting in an error.
How do you realistically use read replicas in your application, to avoid these potential issues?
I worked with application which used pseudo-vertical partitioning. Since only handful of data was time-sensitive the application usually fetched from slaves and from master only in selected cases.
As an example: when the User updated their password application would always ask master for authentication prompt. When changing non-time sensitive data (like User Preferences) it would display success dialog along with information that it might take a while until everything is updated.
Some other ideas which might or might not work depending on environment:
After update compute entity checksum, store it in application cache and when fetching the data always ask for compliance with checksum
Use browser store/cookie for storing delta ensuring User always sees the latest version
Add "up-to-date" flag and invalidate synchronously on every slave node before/after update
Whatever solution you choose keep in mind it's subject of CAP Theorem.
This is a hard problem, and there are lots of potential solutions. One potential solution is to look at what facebook did,
TLDR - read requests get routed to the read only copy, but if you do a write, then for the next 20 seconds, all your reads go to the writeable master.
The other main problem we had to address was that only our master
databases in California could accept write operations. This fact meant
we needed to avoid serving pages that did database writes from
Virginia because each one would have to cross the country to our
master databases in California. Fortunately, our most frequently
accessed pages (home page, profiles, photo pages) don't do any writes
under normal operation. The problem thus boiled down to, when a user
makes a request for a page, how do we decide if it is "safe" to send
to Virginia or if it must be routed to California?
This question turned out to have a relatively straightforward answer.
One of the first servers a user request to Facebook hits is called a
load balancer; this machine's primary responsibility is picking a web
server to handle the request but it also serves a number of other
purposes: protecting against denial of service attacks and
multiplexing user connections to name a few. This load balancer has
the capability to run in Layer 7 mode where it can examine the URI a
user is requesting and make routing decisions based on that
information. This feature meant it was easy to tell the load balancer
about our "safe" pages and it could decide whether to send the request
to Virginia or California based on the page name and the user's
location.
There is another wrinkle to this problem, however. Let's say you go to
editprofile.php to change your hometown. This page isn't marked as
safe so it gets routed to California and you make the change. Then you
go to view your profile and, since it is a safe page, we send you to
Virginia. Because of the replication lag we mentioned earlier,
however, you might not see the change you just made! This experience
is very confusing for a user and also leads to double posting. We got
around this concern by setting a cookie in your browser with the
current time whenever you write something to our databases. The load
balancer also looks for that cookie and, if it notices that you wrote
something within 20 seconds, will unconditionally send you to
California. Then when 20 seconds have passed and we're certain the
data has replicated to Virginia, we'll allow you to go back for safe
pages.

How to avoid Race Conditions when a restful architecture doesn't make sense

I'm making a website that I don't think makes sense to implement with a restful architecture (at least not the portion relevant to this problem), but it's causing some problems with race conditions across multiple servers which share a database.
My website has info about users of another product, so it has a Users table (not users of my site though). Users have many files.
Users and files are populated by an automated service, not manually on the site. The service posts the files to the server, the server parses them and gets the username from the file. If the username is new, it creates a new user row in the table. It then returns about the file to the service that made the request.
The problems I'm seeing are when race conditions when multiple requests come in at the same time for related objects, and it causes things like violations of unique indexes in the db.
For example, there is a unique key on username. This code can be a problem if 2 requests from the automated service for files from the same user come in at the same time.
var myuser = db.users.FirstOrDefault(u => u.username == username);
if(myuser == null)
{
myuser = new user(username);
db.AddObject(user);
}
db.SaveChanges();
Request 1 will see that there is no user with username foo, so the if condition returns true. Request 2 sees the same thing, not knowing that request 1 already began creating the user, and when request 2 tries to save, it violates the unique key.
Is there a common pattern or solution to this problem? I know this wouldn't be a problem if the server was RESTful, but I don't think it's really feasible for the service to change the way it makes requests, so I'd like that to stay the same if possible. Right now, it just posts the file to the server, not knowing whether the user of that file existed already, or whether that file was posted to the server yet (it may post it more than once). Those objects are created if they don't exist yet, and if they do, the list of items is updated. But as far the service is concerned, it just wants to know certain info about the file, and isn't concerned with whether or not it already exists in my db.
I think it'd be too slow for it to try to create a user via a request, then try to create the file via a request, and then request info about the file in another request. Also, the service runs multiple requests at a time via Parallel.ForEach, and it'd be too slow for it to run it in a single thread.
The first thing is separation of concerns. If you have an automated service populating the data, then that service (or another piece of middleware) should be responsible for creating the database records. This shouldn't happen at run time in response to a request to your website.
Second, if you must do it this way, that is what locks are for. Each request to your website runs in it's own thread(s). So, if multiple threads need to access the same volatile resource (your DB) then you need to institute optimistic locking, so that the first thread in wins and any further threads will only be able to try to interact with that table or row (depending on the type of lock) once the first has completed its work.
Third, this is pretty much exactly what RESTful architecture attempts to solve. You can use ETags to version your resources so any attempt to POST to an outdated resource will return an HTTP error (409 Conflict) directing the client to refetch the original resource.

Client Server API pattern in REST (unreliable network use case)

Let's assume we have a client/server interaction happening over unreliable network (packet drop). A client is calling server's RESTful api (over http over tcp):
issuing a POST to http://server.com/products
server is creating an object of "product" resource (persists it to a database, etc)
server is returning 201 Created with a Location header of "http://server.com/products/12345"
! TCP packet containing an http response gets dropped and eventually this leads to a tcp connection reset
I see the following problem: the client will never get an ID of a newly created resource yet the server will have a resource created.
Questions: Is this application level behavior or should framework take care of that? How should a web framework (and Rails in particular) handle a situation like that? Are there any articles/whitepapers on REST for this topic?
The client will receive an error when the server does not respond to the POST. The client would then normally re-issue the request as they assume that it has failed. Off the top of my head I can think of two approaches to this problem.
One is that the client can generate some kind of request identifier, such as a guid, which it includes in the request. If the server receives a POST request with a duplicate GUID then it can refuse it.
The other approach is to PUT instead of POST to create. If you cannot get the client to generate the URI then you can ask the server to provide a new URI with a GET and then do a PUT to that URI.
If you search for something like "make POST idempotent" you will probably find a bunch of other suggestions on how to do this.
If it isn't reasonable for duplicate resources to be created (e.g. products with identical titles, descriptions, etc.), then unique identifiers can be generated on the server which can be tracked against created resources to prevent duplicate requests from being processed. Unlike Darrel's suggestion of generating unique IDs on the client, this would also prevent separate users from creating duplicate resources (which you may or may not find desirable). Clients will be able to distinguish between "created" responses and "duplicate" responses by their response codes (201 and 303 respectively, in my example below).
Pseudocode for generating such an identifier — in this case, a hash of a canonical representation of the request:
func product_POST
// the canonical representation need not contain every field in
// the request, just those which contribute to its "identity"
tags = join sorted request.tags
canonical = join [request.name, request.maker, tags, request.desc]
id = hash canonical
if id in products
http303 products[id]
else
products[id] = create_product_from request
http201 products[id]
end
end
This ID may or may not be part of the created resources' URIs. Personally, I'd be inclined to track them separately — at the cost of an extra lookup table — if the URIs were going to be exposed to users, as hashes tend to be ugly and difficult for humans to remember.
In many cases, it also makes sense to "expire" these unique hashes after some time. For example, if you were to make a money transfer API, a user transferring the same amount of money to the same person a few minutes apart probably indicates that the client never received the "success" response. If a user transfers the same amount of money to the same person once a month, on the other hand, they're probably paying their rent. ;-)
The problem as you describe it boils down to avoiding what are called double-adds. As mentioned by others, you need to make your posts idempotent.
This can be easily implemented at the framework level. The framework can keep a cache of completed responses. The requests have to have a request unique so that any retries are treated as such, and not as new requests.
If the successful response gets lost on its way to the client, the client will retry with the same request unique, the server will then respond with its cached response.
You are left with durability of the cache, how long to keep responses, etc. One approach is to remove responses from the server cache after a given period of time, this will depend on your app domain and traffic and can be left as a configurable step on the framework piece. Another approach is to force the client to sent acknowledgements. The acks can be sent either as separate requests (note that these could be lost too), or as extra data piggy backed on real requests.
Although what I suggest is similar to what others suggest, I strongly encourage you to keep this layer of network resiliency to do only that, deal with drop requests/responses and not allow it to deal with duplicate resources from separate requests which is an application level task. Merging both pieces will mush all functionality and will not leave you with a clear separation of responsibilities.
Not an easy problem, but if you keep it clean you can make your app much more resilient to bad networks without introducing too much complexity.
And for some related experiences by others go here.
Good luck.
As the other responders have pointed out, the basic problem here is that the standard HTTP POST method is not idempotent like the other methods. There is an effort underway to establish a standard for an idempotent POST method known as Post-Once-Exactly, or POE.
Now I'm not saying that this is a perfect solution for everybody in the situation you describe, but if it is the case that you are writing both the server and the client, you may be able to leverage some of the ideas from POE. The draft is here: https://datatracker.ietf.org/doc/html/draft-nottingham-http-poe-00
It isn't a perfect solution, which is probably why it hasn't really taken off in the six years since the draft was submitted. Some of the problems, and some clever alternate options are discussed here:
http://tech.groups.yahoo.com/group/rest-discuss/message/7646
HTTP is a stateless protocol, meaning the server can't open an HTTP connection. All connections get initialized by the client. So you can't solve such an error on the server side.
The only solution I can think of: If you know, which client created the product, you can supply it the products it created, if it pulls that information. If the client never contacts you again, you won't be able to transmit information about the new product.

Resources