Is it possible to ensure that requests come from a specific domain? - ruby-on-rails

I'm making a Rails polling site, which should have results that are very accurate. Users vote using POST links. I've taken pains to make sure users only vote once, and know exactly what they're voting for.
But it occurred to me that third parties with an interest in the results could put up POST links on their own websites, that point to my voting paths. They could skew my results this way, for example by adding a misleading description.
Is there any way of making sure that the requests can only come from my domain? So a link coming from a different domain wouldn't run any of the code in my controller.

There are various things that you'll need to check. First is request.referer, which will tell you the page that referred the link to your site. If it's not your site, you should reject it.
if URI(request.referer).host != my_host
raise ArgumentError.new, "Invalid request from external domain"
end
However, this only protects you from web clients (browsers) that accurately populate the HTTP referer header. And that's assuming that it came from a web page at all. For instance, someone could send a link by email, and an email client is unlikely to provide a referer at all.
In the case of no referer, you can check for that, as well:
if request.referer.blank?
raise ArgumentError.new, "Invalid request from unknown domain"
elsif URI(request.referer).host != my_host
raise ArgumentError.new, "Invalid request from external domain"
end
It's also very easy with simple scripting to spoof the HTTP 'referer', so even if you do get a valid domain, you'll need other checks to ensure that it's a legitimate POST. Script kiddies do this sort of thing all the time, and with a dozen or so lines of Ruby, python, perl, curl, or even VBA, you can simulate interaction by a "real user".
You may want to use something like a request/response key mechanism. In this approach, the link served from your site includes a unique key (that you track) for each visit to the page, and that only someone with that key can vote.
How you identify voters is important, as well. Passive identification techniques are good for non-critical activities, such as serving advertisements or making recommendations. However, this approach regularly fails a measurable percentage of the time when used across the general population. When you also consider the fact that people actually want to corrupt voting activities, it's very easy to suddenly become a target for everyone with a good concept to "beat the system" and some spare time on their hands.
Build in as much security as possible early on, because you'll need far more than you expect. During the 2012 Presidential Election, I was asked to pre-test 41 online voting sites, and was able to break 39 of them within the first 24 hours (6 of them within 1 hour). Be overly cautious. Know how attackers can get in, not just using "normal" mechanisms. Don't publish information about which technologies you're using, even in the code. Seeing "Rails-isms" anywhere in the HTML or Javascript code (or even the URL pathnames) will immediately give the attacker an enormous edge in defeating your safety mechanisms. Use obscurity to your advantage, and use security everywhere that you can.
NOTE: Checking the request.referer is like putting a padlock on a bank vault: it'll keep out those that are easily dissuaded, but won't even slow down the determined individual.

What you are trying to prevent here is basically cross-site request forgery. As Michael correctly pointed out, checking the Referer header will buy you nothing.
A popular counter-measure is to give each user an individual one-time token that is sent with each form and stored in the user's session. If, on submit, the submitted value and the stored value do not match, the request is disgarded. Luckily for you, RoR seems to ship such a feature. Looks like a one-liner indeed.

Related

Verifying Googlebot in Rails

I am looking to implement First Click Free in my rails application. Google has this information on how to verify a if a googlebot is viewing your site here.
I have been searching to see if there is anything existing for Rails to do this but I have been unable to find anything. So firstly, does anyone know of anything? If not, could anyone point me in the right direction of how to go about implementing what they have suggested in that page about how to verify?
Also, in that solution, it has to do a lookup every time to try and detect google, that seems like its going to be a big performance hit if I have to do it every page load? I could cache the IP if it has been verified in the past but Google have stated that their IP's change so at some point it may no longer belong to them. Although it probably doesn't happen regularly so it may not be that big of an issue.
Many thanks!!
Check out the browser gem: https://github.com/fnando/browser
What I'd do is use the
browser.bot?
method to check if your site is being accessed by a bot or not. If you care about the Googlebot specifically, you could check if
browser.name
includes googlebot. Keep in mind that this gem just checks the user agent sent by the client's browser, which could of course be spoofed. Sounds like that isn't a huge concern for your purposes.
I've built a Ruby gem for that recently, it's called "legitbot".
You may learn if a Web request comes from a supported bot using
bot = Legitbot.bot(userAgent, ip)
"legitbot" does this looking into User-agent and searching for a bot signature, i.e. how bots identify themselves. This doesn't guarantee that the Web request IP really comes from e.g. Googlebot. To make sure it is, call
bot.detected_as # => "Google"
bot.valid? # => true
bot.fake? # => false
Supported bots are Googlebot, Yandex bots, Bing, Baidu, DuckDuckGo.

How to block requests to server with user name / password?

We have realized that this URL http://Keyword:redacted#example.com/ redirects to http://example.com/ when copied and pasted into the browser's address bar.
As far as I understand this might be used in some ftp connections but we have no such use on our website. We are suspecting that we are targeted by an attack and have been warned by Google that we are passing PII (mostly email addresses) in our URL requests to their Google Adsense network. We have not been able to find the source, but we have been warned that the violation is in the form of http://Keyword:redacted#example.com/
How can we stop this from happening?
What URL redirect method we can use to not accept this and return an error message?
FYI I experienced a similar issue for a client website and followed up with Adsense support. The matter was escalated to a specialist team who investigated and determined that flagged violations with the format http://Keyword:redacted#example.com/ will be considered false positives. I'm not sure if this applies to all publishers or was specific to our case, but it might be worth following up with Adsense support.
There is nothing you can do. This is handled entirely by your browser long before it even thinks about "talking" to your server.
That's a strange URL for people to copy/paste into the browser's address bar unless they have been told/trained to do so. Your best bet is to tell them to STOP IT! :-)
I suppose you could look at the HTTP Authorization Headers and report an error if they come in populated... (This would $_SERVER['PHP_AUTH_USER'] in PHP.) I've never looked at these values when the header doesn't request them, so I'm not sure if it would work or not...
The syntax http://abc:def#something.com means you're sending userid='abc', password='def' as basic authentication parameters. Your browser will pull out the userid & password and send them along as authentication information, leaving the url without them.
As Peter Bowers mentioned, you could check the authorization headers and see if they're coming in that way, but you can't stop others from doing it if they want. If it happens a lot then I'd suspect that somewhere there's a web form asking users to enter their user/password and it's getting encoded that way. One way to sleuth it out would be to see if you can identify someone by the userid specified.
Having Keyword:redacted sounds odd. It's possible Google Adsense changed the values to avoid including confidential info.

HTTPS POST Security level

I've searched for this a bit on Stack, but I cannot find a definitive answer for https, only for solutions that somehow include http or unencrypted parameters which are not present in my situation.
I have developed an iOS application that communicates with MySQL via Apache HTTPS POSTS and php.
Now, the server runs with a valid certificate, is only open for traffic on port 443 and all posts are done to https://thedomain.net/obscurefolder/obscurefile.php
If someone knew the correct parameters to post, anyone from anywhere in the world could mess up the database completely, so the question is: Is this method secure? Let it be known nobody has access to the source code and none of the iPads that run this software are jailbreaked or otherwise compromised.
Edit in response to answers:
There are several php files which alone only support one specific operation and depend on very strict input formatting and correct license key (retreived by SQL on every query). They do not respond to input at all unless it's 100% correct and has a proper license (e.g. password) included. There is no actual website, only php files that respond to POSTs, given the correct input, as mentioned above. The webserver has been scanned by a third party security company and contains no known vulnerabilities.
Encryption is necessary but not sufficient for security. There are many other considerations beyond encrypting the connection. With server-side certificates, you can confirm the identity of the server, but you can't (as you are discovering) confirm the identity of the clients (at least not without client-side certficates which are very difficult to protect by virtue of them being on the client).
It sounds like you need to take additional measures to prevent abuse such as:
Only supporting a sane, limited, well-defined set of operations on the database (not passing arbitrary SQL input to your database but instead having a clear, small list of URL handlers that perform specific, reasonable operations on the database).
Validating that the inputs to your handler are reasonable and within allowable parameters.
Authenticating client applications to the best you are able (e.g. with client IDs or other tokens) to restrict the capabilities on a per-client basis and detect anomalous usage patterns for a given client.
Authenticating users to ensure that only authorized users can make the appropriate modifications.
You should also probably get a security expert to review your code and/or hire someone to perform penetration testing on your website to see what vulnerabilities they can uncover.
Sending POST requests is not a secure way of communicating with a server. Inspite of no access to code or valid devices, it still leaves an open way to easily access database and manipulating with it once the link is discovered.
I would not suggest using POST. You can try / use other communication ways if you want to send / fetch data from the server. Encrypting the parameters can also be helpful here though it would increase the code a bit due to encryption-decryption logic.
Its good that your app goes through HTTPS. Make sure the app checks for the certificates during its communication phase.
You can also make use of tokens(Not device tokens) during transactions. This might be a bit complex, but offers more safety.
The solutions and ways here for this are broad. Every possible solution cannot be covered. You might want to try out a few yourself to get an idea. Though I Suggest going for some encryption-decryption on a basic level.
Hope this helps.

Client Server API pattern in REST (unreliable network use case)

Let's assume we have a client/server interaction happening over unreliable network (packet drop). A client is calling server's RESTful api (over http over tcp):
issuing a POST to http://server.com/products
server is creating an object of "product" resource (persists it to a database, etc)
server is returning 201 Created with a Location header of "http://server.com/products/12345"
! TCP packet containing an http response gets dropped and eventually this leads to a tcp connection reset
I see the following problem: the client will never get an ID of a newly created resource yet the server will have a resource created.
Questions: Is this application level behavior or should framework take care of that? How should a web framework (and Rails in particular) handle a situation like that? Are there any articles/whitepapers on REST for this topic?
The client will receive an error when the server does not respond to the POST. The client would then normally re-issue the request as they assume that it has failed. Off the top of my head I can think of two approaches to this problem.
One is that the client can generate some kind of request identifier, such as a guid, which it includes in the request. If the server receives a POST request with a duplicate GUID then it can refuse it.
The other approach is to PUT instead of POST to create. If you cannot get the client to generate the URI then you can ask the server to provide a new URI with a GET and then do a PUT to that URI.
If you search for something like "make POST idempotent" you will probably find a bunch of other suggestions on how to do this.
If it isn't reasonable for duplicate resources to be created (e.g. products with identical titles, descriptions, etc.), then unique identifiers can be generated on the server which can be tracked against created resources to prevent duplicate requests from being processed. Unlike Darrel's suggestion of generating unique IDs on the client, this would also prevent separate users from creating duplicate resources (which you may or may not find desirable). Clients will be able to distinguish between "created" responses and "duplicate" responses by their response codes (201 and 303 respectively, in my example below).
Pseudocode for generating such an identifier — in this case, a hash of a canonical representation of the request:
func product_POST
// the canonical representation need not contain every field in
// the request, just those which contribute to its "identity"
tags = join sorted request.tags
canonical = join [request.name, request.maker, tags, request.desc]
id = hash canonical
if id in products
http303 products[id]
else
products[id] = create_product_from request
http201 products[id]
end
end
This ID may or may not be part of the created resources' URIs. Personally, I'd be inclined to track them separately — at the cost of an extra lookup table — if the URIs were going to be exposed to users, as hashes tend to be ugly and difficult for humans to remember.
In many cases, it also makes sense to "expire" these unique hashes after some time. For example, if you were to make a money transfer API, a user transferring the same amount of money to the same person a few minutes apart probably indicates that the client never received the "success" response. If a user transfers the same amount of money to the same person once a month, on the other hand, they're probably paying their rent. ;-)
The problem as you describe it boils down to avoiding what are called double-adds. As mentioned by others, you need to make your posts idempotent.
This can be easily implemented at the framework level. The framework can keep a cache of completed responses. The requests have to have a request unique so that any retries are treated as such, and not as new requests.
If the successful response gets lost on its way to the client, the client will retry with the same request unique, the server will then respond with its cached response.
You are left with durability of the cache, how long to keep responses, etc. One approach is to remove responses from the server cache after a given period of time, this will depend on your app domain and traffic and can be left as a configurable step on the framework piece. Another approach is to force the client to sent acknowledgements. The acks can be sent either as separate requests (note that these could be lost too), or as extra data piggy backed on real requests.
Although what I suggest is similar to what others suggest, I strongly encourage you to keep this layer of network resiliency to do only that, deal with drop requests/responses and not allow it to deal with duplicate resources from separate requests which is an application level task. Merging both pieces will mush all functionality and will not leave you with a clear separation of responsibilities.
Not an easy problem, but if you keep it clean you can make your app much more resilient to bad networks without introducing too much complexity.
And for some related experiences by others go here.
Good luck.
As the other responders have pointed out, the basic problem here is that the standard HTTP POST method is not idempotent like the other methods. There is an effort underway to establish a standard for an idempotent POST method known as Post-Once-Exactly, or POE.
Now I'm not saying that this is a perfect solution for everybody in the situation you describe, but if it is the case that you are writing both the server and the client, you may be able to leverage some of the ideas from POE. The draft is here: https://datatracker.ietf.org/doc/html/draft-nottingham-http-poe-00
It isn't a perfect solution, which is probably why it hasn't really taken off in the six years since the draft was submitted. Some of the problems, and some clever alternate options are discussed here:
http://tech.groups.yahoo.com/group/rest-discuss/message/7646
HTTP is a stateless protocol, meaning the server can't open an HTTP connection. All connections get initialized by the client. So you can't solve such an error on the server side.
The only solution I can think of: If you know, which client created the product, you can supply it the products it created, if it pulls that information. If the client never contacts you again, you won't be able to transmit information about the new product.

Implementing a 2 Legged OAuth Provider

I'm trying to find my way around the OAuth spec, its requirements and any implementations I can find and, so far, it really seems like more trouble than its worth because I'm having trouble finding a single resource that pulls it all together. Or maybe it's just that I'm looking for something more specialized than most tutorials.
I have a set of existing APIs--some in Java, some in PHP--that I now need to secure and, for a number of reasons, OAuth seems like the right way to go. Unfortunately, my inability to track down the right resources to help me get a provider up and running is challenging that theory. Since most of this will be system-to-system API usage, I'll need to implement a 2-legged provider. With that in mind...
Does anyone know of any good tutorials for implementing a 2-legged OAuth provider with PHP?
Given that I have securable APIs in 2 languages, do I need to implement a provider in both or is there a way to create the provider as a "front controller" that I can funnel all requests through?
When securing PHP services, for example, do I have to secure each API individually by including the requisite provider resources on each?
Thanks for your help.
Rob, not sure where you landed on this but wanted to add my 2 cents in case anyone else ran across this question.
I more or less had the same question a few months ago and hearing about "OAuth" for the better part of a year. I was developing a REST API I needed to secure so I started reading about OAuth... and then my eyes started to roll backwards in my head.
I probably gave it a good solid day or 2 of skimming and reading until I decided, much like you, that OAuth was confusing garbage and just gave up on it.
So then I started researching ways to secure APIs in general and started to get a better grasp on ways to do that. The most popular way seemed to be sending requests to the API along with a checksum of the entire message (encoded with a secret that only you and the server know) that the server can use to decide if the message had been tampered with on it's way from the client, like so:
Client sends /user.json/123?showFriends=true&showStats=true&checksum=kjDSiuas98SD987ad
Server gets all that, looks up user "123" in database, loads his secret key and then (using the same method the client used) re-calculates it's OWN checksum given the request arguments.
If the server's generated checksum and the client's sent checksum match up, the request is OK and executed, if not, it is considered tampered with and rejected.
The checksum is called an HMAC and if you want a good example of this, it is what Amazon Web Services uses (they call the argument 'signature' not 'checksum' though).
So given that one of the key components of this to work is that the client and server have to generate the HMAC in the same fashion (otherwise they won't match), there have to be rules on HOW to combine all the arguments... then I suddenly understood all that "natural byte-ordering of parameters" crap from OAuth... it was just defining the rules for how to generate the signature because it needed to.
Another point is that every param you include in the HMAC generation is a value that then can't be tampered with when you send the request.
So if you just encode the URI stem as the signature, for example:
/user.json == askJdla9/kjdas+Askj2l8add
then the only thing in your message that cannot be tampered with is the URI, all of the arguments can be tampered with because they aren't part of the "checksum" value that the server will re-calculate.
Alternatively, even if you include EVERY param in the calculation, you still run the risk of "replay attacks" where a malicious middle man or evesdropped can intercept an API call and just keep resending it to the server over and over again.
You can fix that by adding a timestamp (always use UTC) in the HMAC calculation as well.
REMINDER: Since the server needs to calculate the same HMAC, you have to send along any value you use in the calculation EXCEPT YOUR SECRET KEY (OAuth calls it a consumer_secret I think). So if you add timestamp, make sure you send a timestamp param along with your request.
If you want to make the API secure from replay attacks, you can use a nonce value (it's a 1-time use value the server generates, gives to the client, the client uses it in the HMAC, sends back the request, the server confirms and then marks that nonce value as "used" in the DB and never lets another request use it again).
NOTE: 'nonce' are a really exact way to solve the "replay attack" problem -- timestamps are great, but because computers don't always have in-sync timestamp values, you have to allow an acceptable window on the server side of how "old" a request might be (say 10 mins, 30 mins, 1hr.... Amazon uses 15mins) before we accept or reject it. In this scenario your API is technically vulnerable during the entire window of time.
I think nonce values are great, but should only need to be used in APIs that are critical they keep their integrity. In my API, I didn't need it, but it would be trivial to add later if users demanded it... I would literally just need to add a "nonce" table in my DB, expose a new API to clients like:
/nonce.json
and then when they send that back to me in the HMAC calculation, I would need to check the DB to make sure it had never been used before and once used, mark it as such in the DB so if a request EVER came in again with that same nonce I would reject it.
Summary
Anyway, to make a long story short, everything I just described is basically what is known as "2-legged OAuth". There isn't that added step of flowing to the authority (Twitter, Facebook, Google, whatever) to authorize the client, that step is removed and instead the server implicitly trusts the client IF the HMAC's they are sending match up. That means the client has the right secret_key and is signing it's messages with it, so the server trusts it.
If you start looking around online, this seems to be the preferred method for securing API methods now-adays, or something like it. Amazon almost exactly uses this method except they use a slightly different combination method for their parameters before signing the whole thing to generate the HMAC.
If you are interested I wrote up this entire journey and thought-process as I was learning it. That might help provide a guided thinking tour of this process.
I would take a step back and think about what a properly authenticated client is going to be sending you.
Can you store the keys and credentials in a common database which is accessible from both sets of services, and just implement the OAuth provider in one language? When the user sends in a request to a service (PHP or Java) you then check against the common store. When the user is setting up the OAuth client then you do all of that through either a PHP or Java app (your preference), and store the credentials in the common DB.
There are some Oauth providers written in other languages that you might want to take a look at:
PHP - http://term.ie/oauth/example/ (see bottom of page)
Ruby - http://github.com/mojodna/sample-oauth-provider
.NET http://blog.bittercoder.com/PermaLink,guid,0d080a15-b412-48cf-b0d4-e842b25e3813.aspx

Resources