I have added a custom domain to my Heroku application and it works fine, but the application still responds to {mysubdomain}.herokuapp.com.
To prevent duplicate content I would like to stop having my application respond to the subdomain. Is there some setting in Heroku which does this for me, or do I need to code a 301 redirect?
Another option is to use the rel="canonical" link tag. This tells search engines which URL to use for content that may appear on multiple URLs:
<link rel="canonical" href="http://www.example.com/correct_url">
Here's what google has to say: http://support.google.com/webmasters/bin/answer.py?hl=en&answer=139394
(Your use case is explicitly mentioned at the bottom.)
You would need a 301 redirect. Heroku will always respond to the .herokuapp.com domain of your app
I created the hide_heroku gem to handle this- it uses X-Robots-Tag HTTP headers to prevent search engines from indexing anything under *.herokuapp.com
I don't believe it's possible to remove the Heroku-provided domain name, either via their web interface or the command-line client. If you're concerned about it, redirect or add a robots.txt to your site that blocks when accessed via .herokuapp.com (I don't know how to do that offhand, sorry).
I suspect Google is reasonably smart about indexing Heroku sites and handles the dual-domain issue itself, but that's just a guess.
Related
Snip.ly nicely checks if the entered web address can be used in an iframe.
I'd like to replicate it in ruby. Looking through their code they send an ajax request to their server and thats where they do the validation.
Even after extensive googling couldn't find anything that could help me accomplish that.
My use case is that we let users add news listings to their page, which are shown in iframes, and would like to show it if the entered url can be used in an iframe.
You can figure out some cases by checking the X-Frame-Options header. But as you mentioned in the comments, it does not work all the time.
In my experience, it's best to side-step the problem altogether.
If you reverse-proxy your request through your rails server, then you can display pretty much anything all the time in your iframe.
Following is an example of the process. I'm assuming that your server is your-server.com and the user wants to list a page on user.com/list. The way it works would be:
Set an iframe's src to https://your-server.com/proxy?url=https://user.com/list`
Intercept the request, extract the url: https://user.com/list
Perform an HTTP request on https://user.com/list to fetch the content
Return it to the browser as if it come from your own server
This approach works pretty much all the time, but it then has other limitations:
- you should reverse proxy any asset on that page that has a relative url; otherwise the css/images may be broken
- you must handle ajax requests on that page
You can fix these as well, by transforming the html before step 4.
You could use https://github.com/waterlink/rack-reverse-proxy for step 2 and 3, instead of re-implementing your own reverse proxy.
You could set it up using the following code in config/application.rb:
config.middleware.insert(0, Rack::ReverseProxy) do
reverse_proxy_options timeout: 10 # avoids waiting for pages that take forever to load
reverse_proxy(/proxy\?url=(.*)/, '$1') # reverse proxy on the url parameter
end
I'm looking for a way to block a specific proxy, for example this one:
http://demosites.conversionsupport.com/reverseproxydemo?domainpath=http://stackoverflow.com
I don't want it accessing/displaying my rails 2.3.15 app on Heroku. I've played with the gems rack-rewrite and rack-block, but had no luck going that route because I need to block by domain name, not IP address (the thing is hosted on ghs.google.com, which I'd rather not block)
In a perfect world I could redirect to my canonical URL, but I'd also settle for a 503 or a 404.
(The reverse-proxy in question is used to show off the proxy owner's chat widget app, but on any website, instead of restricting use to sites owned by the proxy owner's potential clients. It also causes some nasty crawl bot errors to be logged in google's web master tools. That in and of itself isn't a big deal, but when coupled with that breaking-the-site-functionality thing, and the fact that my site has a Creative Commons license which requests the site not be reused for commercial purposes, it makes me want to put a stop to it.)
Rack::Attack!!! seems quite thorough. I has support for blacklisting, which sounds like what you need. It doesn't mention Rails 2.3 support, but you can configure it directly using your config.ru regardless.
The example given in the README, shows this
Rack::Attack.blacklist('block bad UA logins') do |req|
req.path == '/login' && req.post? && req.user_agent == 'BadUA'
end
It looks like the Rack request (req) object is passed to the block so you should be able to use any of the methods available on that object.
Since none of the gems we mentioned above can be used with rails 2.3.15 (which I've since updated to 2.3.18), and I don't have the time to do the arduous upgrade from rails 2 to rails 3, I ended up going with the solution below. (Note: this solution only applies to pages served from /app. It does not apply to pages served from /public. I'm still looking for a solution to protecting the pages served from /public.)
I dropped the bits of code below into /app/controllers/application_controller.rb
1.2.3.* is a placeholder for the block of IP addresses that I'm actually blocking. Replace 1.2.3 with the first three octets of the 0/24 range that you want to block, or replace 1.2.3.* with a single ip address to block a single IP. You'll also want to replace http://YourCanonicalDomain.tld/503.html with the address of your 503 page, or other page you want to send users to so they can view your page at the URL it was meant to be displayed at, instead of leaving them at the reverse proxy's URL:
before_filter: block_ip
then later in the file:
def block_ip
if request.remote_ip.match('1.2.3.*')
redirect_to "http://YourCanonicalDomain.tld/503.html"
return
end
end
My 503.html, which resides in /public, displays a note to the user that they're trying to view content in a way that is not allowed, and that they'll shortly be redirected to the homepage for the site. The 503.html contains this within the :
<meta http-equiv="REFRESH" content="15;url=http://YourCanonicalDomain.tld">
Replace http://YourCanonicalDomain.tld with the page you want to redirect users to. Raise or lower the number 15 to raise or lower the amount of time the page is displayed before the user is redirected.
My ASP.NET MVC3 site, www.mysite.com, pulls images from images.mysite.com. When I'm not logged into my site and using SSL, it works flawlessly. However, when logged in, it get the
Only secure content is displayed.
message in IE9. I understand that. What's the best way to deal with switching URL's for my images? Should I check to see if I'm currently using SSL and point my images to https://images.mysite.com, otherwise http://images.mysite.com?
EDIT: This is an e-commerce site, so most of the time the site is browsed unsecured. But after login, I still need to pull some of those same images, and of course if they browse back to a regular catalog page, it would need to access images. Perhaps I will just have to always use https://images.mysite.com. Just seemed like overkill.
I believe the problem only happens when you're in a secure page accessing content over http. So, for pages that can be seen both in http or https, might be as easy as always using https to get the images, regardless if you're in http or https.
You will always get that message if you are pulling content from a non-SSL site when viewing over SSL. If you site is mostly SSL protected, just always pull images from https://images.mysite.com as you do not get the error if you pull SSL content into a non-SSL site.
Otherwise, you will need to know which pages are only viewable over SSL and which ones are not, and link appropriately.
Lastly, if you site is available over both, you will probably need to look at the HTTPS server variable to determine if you are on SSL or not and use this to determine your link (http or https).
Did you try prefixing with ~instead of ../ or /?
This worked for me.
I'm seeking for solution how to isolate widget included by partial to main site. Issue appear when user access site with https. Ie 6,7 shows security confirmation dialog (part of website resources are not in secure zone).
First of all I download twitter widget on our side, also I download all CSS and pictures. Then I patched widget JS to point onto downloaded resources. But still has not luck with security warning :( I guess the reason of this issue is AJAX request to twitter, but there is no idea how to sole it. (Just to create some kind of proxy on our side).
Thank you for attention.
You just need to host the .js file on your server, and link to that. That is all.
The script auto detects SSL and will make requests to https://twitter-widgets.s3.amazonaws.com/ instead of http//widgets.twimg.com/ dynamically depending on your scenario.
Hope that helps!
geedubb
I got the Twitter Widget to work over HTTPS (SSL) by doing the following:
Save every image, css, and javescript file on my local webserver
Changed every "http" to "https" in the javascript AND in the css
The last piece was tricky. https://twitter.com/statuses/user_timeline.json brings back data that already includes "http"; namely avatars and the profile image. So, I found about four places in widjet.js that used the user_timeline.json data. I hardcoded an image url where ever that "http" data was used. Searching "src" will located all of those places.
It's an ugly fix, but it worked.
You can use a sniffer like HttpWatch to debug this--watch the requests going by and see which ones start with http instead of https. It may be possible to just change the urls you use to point to https://twitter.com, not sure about how your widget works.
thanks Keshar, worked for me. I came to the same conclusion that all http requests had to be https to prevent the IE security warning and also display the twitter feed. I used the live HTTP headers firefox plugin which helps for showing any non-secure http requests, such as the JSON requests.
Jon
If you look through the script there are calls to a https site. If you simply replace the protocol/domain with
https://twitter-widgets.s3.amazonaws.com/
instead of
http//widgets.twimg.com/
it works and you don't have to do anything else.
Let's say, on a ColdFusion site, that the user has navigated to
http://www.example.com/sub1/
The server-side code typically used to tell you what URL the user is at, looks like:
http://#cgi.server_name##cgi.script_name#?#cgi.query_string#
however, "cgi.script_name" automatically includes the default cfm file for that folder- eg, that code, when parsed and expanded, is going to show us "http://www.example.com/sub1/index.cfm"
So, whether the user is visiting sub1/index.cfm or sub1/, the "cgi.script_name" var is going to include that "index.cfm".
The question is, how does one figure out which URL the user actually visited? This question is mostly for SEO-purposes- It's often preferable to 301 redirect "/index.cfm" to "/" to make sure there's only one URL for any piece of content- Since this is mostly for the benefit of spiders, javascript isn't an appropriate solution in this case. Also, assume one does not have access to isapi_rewrite or mod_rewrite- The question is how to achieve this within ColdFusion, specifically.
I suppose this won't be possible.
If the client requests "GET /", it will be translated by the web server to "GET /{whatever-default-file-exists-fist}" before ColdFusion even gets invoked. (This is necessary for the web server to know that ColdFusion has to be invoked in the first place!)
From ColdFusion's (or any application server's) perspective, the client requested "GET /index.cfm", and that's what you see in #CGI#.
As you've pointed out yourself, it would be possible to make a distinction by using a URL-rewriting tool. Since you specifically excluded that path, I can only say that you're out of luck here.
Not sure that it is possible using CF only, but you can make the trick using webserver's URL rewriting -- if you're using them, of course.
For Apache it can look this way. Say, we're using following mod_rewrite rule:
RewriteRule ^page/([0-9]+)/?$
index.cfm?page=$1&noindex=yes [L]
Now when we're trying to access URL http://website.com/page/10/ CGI shows:
QUERY_STRING page=10&noindex=yes
See the idea? Think same thing is possible when using IIS.
Hope this helps.
I do not think this is possible in CF. From my understanding, the webserver (Apache, IIS, etc) determines what default page to show, and requests it from CF. Therefore, CF does not know what the actual called page is.
Sergii is right that you could use URL rewrting to do this. If that is not available to you, you could use the fact that a specific page is given precedence in the list of default pages.
Let's assume that default.htm is the first page in the list of default pages. Write a generic default.htm that automatically forwards to index.cfm (or whatever). If you can adjust the list of defaults, you can have CF do a 301 redirect. If not, you can do a meta-refresh, or JS redirect, or somesuch in an HTML file.
I think this is possible.
Using GetHttpRequestData you will have access to all the HTTP headers.
Then the GET header in that should tell you what file the browser is requesting.
Try
<cfdump var="#GetHttpRequestData()#">
to see exactly what you have available to use.
Note - I don't have Coldfusion to hand to verify this.
Edit: Having done some more research it appears that GetHttpRequestData doesn't include the GET header. So this method probably won't work.
I am sure there is a way however - try dumping the CGI scope and see what you have.
If you are able to install ISAPI_rewrite (Assuming you're on IIS) - http://www.helicontech.com/isapi_rewrite/
It will insert a variable x-rewrite-url into the GetHttpRequestData() result structure which will either have / or /index.cfm depending on which URL was visited.
Martin