check if url can be loaded in an iframe - ruby-on-rails

Snip.ly nicely checks if the entered web address can be used in an iframe.
I'd like to replicate it in ruby. Looking through their code they send an ajax request to their server and thats where they do the validation.
Even after extensive googling couldn't find anything that could help me accomplish that.
My use case is that we let users add news listings to their page, which are shown in iframes, and would like to show it if the entered url can be used in an iframe.

You can figure out some cases by checking the X-Frame-Options header. But as you mentioned in the comments, it does not work all the time.
In my experience, it's best to side-step the problem altogether.
If you reverse-proxy your request through your rails server, then you can display pretty much anything all the time in your iframe.
Following is an example of the process. I'm assuming that your server is your-server.com and the user wants to list a page on user.com/list. The way it works would be:
Set an iframe's src to https://your-server.com/proxy?url=https://user.com/list`
Intercept the request, extract the url: https://user.com/list
Perform an HTTP request on https://user.com/list to fetch the content
Return it to the browser as if it come from your own server
This approach works pretty much all the time, but it then has other limitations:
- you should reverse proxy any asset on that page that has a relative url; otherwise the css/images may be broken
- you must handle ajax requests on that page
You can fix these as well, by transforming the html before step 4.
You could use https://github.com/waterlink/rack-reverse-proxy for step 2 and 3, instead of re-implementing your own reverse proxy.
You could set it up using the following code in config/application.rb:
config.middleware.insert(0, Rack::ReverseProxy) do
reverse_proxy_options timeout: 10 # avoids waiting for pages that take forever to load
reverse_proxy(/proxy\?url=(.*)/, '$1') # reverse proxy on the url parameter
end

Related

PathLocationStrategy vs HashLocationStrategy in web apps

What are the pros and cons of using:
PathLocationStrategy - the default "HTML 5 pushState" style.
HashLocationStrategy - the "hash URL" style.
for instance, using HashLocationStrategy will prevent the feature of scrolling to an element by its #ID, but some 3rd party plugins require the HashLocationStrategy or the Hashbang #! in order to work in ajax websites.
I would like to know which one offers more for a webapp.
For me the main difference is that the PathLocationStrategy requires a configuration on the server side to all the paths configured in #RouteConfig to be redirected to the main HTML page of your Angular2 application. Otherwise you will have some 404 errors when trying to reload your application in the browser or try to access it using a particular URL.
Here is a question that could give you some hints about this:
When I refresh my website I get a 404. This is with Angular2 and firebase.
Hope it helps you,
Thierry
# can only be processed on the client, the servers just ignore them. This can cause problems with search engines (SEO), redirects can cause redundant page reloads.
This page https://github.com/browserstate/history.js/wiki/Intelligent-State-Handling has some detailed explanation, while some of the arguments don't apply for Angular applications (for example - doesn't work with JS disabled).
The "disadvantage" of HTML5 pushstate is that is requires server support like explained by Thierry.
According to official docs:
When the router navigates to a new component view, it updates the browser's location and history with a URL for that view. This is a strictly local URL. The browser shouldn't send this URL to the server and should not reload the page.
PathLocationStrategy
Modern HTML5 browsers support history.pushState, a technique that changes a browser's location and history without triggering a server page request. The router can compose a "natural" URL that is indistinguishable from one that would otherwise require a page load.
Here's the HTML5 pushState style URL that routes to the xyz component: localhost:4200/xyz/
HashLocationStrategy
Older browsers send page requests to the server when the location URL changes unless the change occurs after a # (called the hash). Routers can take advantage of this exception by composing in-application route URLs with hashes.
Here's a hash style URL that routes to the xyz component: localhost:4200/src/#/xyz/
I would like to know which one offers more for a webapp.
Almost all Angular projects should use the default HTML5 style as:
It produces URLs that are easier for users to understand.
It preserves the option to do server-side rendering later.
Rendering critical pages on the server is a technique that can greatly improve perceived responsiveness when the app first loads. An app that would otherwise take ten or more seconds to start could be rendered on the server and delivered to the user's device in less than a second.
This option is only available if application URLs look like normal web URLs without hashes (#) in the middle.
Stick with the default unless you have a compelling reason to resort to hash routes.

Detecting main URL with IdHTTPProxyServer

I want to make an application to redirect websites.
It has a table with "domains" and "redirect domains".
Once it matched domain, it redirect to redirect domain.
If didn't matched, it redirect to default page.
So I created a Delphi application with IdHTTPProxyServer.
I have configured it to even work with https using "ssleay32.dll" and "libeay32.dll".
Everything works great.
It use "IdHTTPProxyServerHTTPBeforeCommand" event to redirect like this:
with AContext.Connection.IOHandler do
begin
WriteLn('HTTP/1.0 302 Moved Temporarily');
WriteLn('Location: ' + RedirectURL);
WriteLn('Connection: close');
WriteLn;
end;
But how do I distinguish the event call by main URL (user typed in the address bar) and other URLs?
"IdHTTPProxyServerHTTPBeforeCommand" event called lots of times when a page is loading for stat counters, facebook like buttons, etc. I don't want to redirect all of them to default page.
If this is not possible with IdHTTPProxyServer, is there any other options in Delphi or any other language (which can generate native executable. C++ preferred)?
Thank You
From the perspective of a proxy (or the target HTTP server, for that matter), there is no difference whatsoever between a user-typed URL and other URLs. Every HTTP request is self-contained and independent of every other HTTP request. They have to processed as-is on a per-request basis.
If you want to ignore dependent URLs (images, scripts, etc), you will have to know ahead of time what the initial URL is, parse the data that is retrieved from that URL, keep track of any URLs the data refers to, and then ignore those URLs if you see them being requested later. However, there is nothing in the HTTP protocol to tell you what the initial URL is. There is a Referer request header that may help at times, as it is filled in when a browser is requesting dependent resource files, but it is also filled in when the user navigates around from one page to another, so you can't rely on the Referer by itself. You will have to implement your own discovery logic to figure out the initial URL based on more analysis of the URLs being requested by a given client over time.
Only the client really knows what it is requesting and why, a proxy is just a gateway to reach it. So there is only so much smart filtering you can do in a proxy without knowing what the client is actually doing.

What are the steps involved from entering a web site address to the page being displayed on the browser?

And how can the process be speeded up from a developer point of view?
There are a lot of things going on.
When you first type in an address, the browser will lookup the hostname in DNS, if it is not already in the browser cache.
Then the browser sends a HTTP GET request to the remote server.
What happens on the server is really up to the server; but it should respond back with a HTTP response, that includes headers, which perhaps describe the content to the browser and how long it is allowed to be cached. The response might be a redirect, in which case the browser will send another request to the redirected page.
Obviously, server response time will be one of the critical points for perceived performance, but there are many other things to it.
When a response is returned from the server, the browser will do a few things. First it will parse the HTML returned, and create it's DOM (Document Object Model) from that. Then it will run any startup Javascript on the page; before the page is ready to be displayed in the browser. Remember, that if the page contains any ressources such as external stylesheets, scripts, images and so on, the browser will have to download those, before it can display the page. Each resource is a separate HTTP get, and there are some latency time involved here. Therefore, one thing that in some cases can greatly reduce load times is to use as few external ressources as possible, and make sure they are cached on the client (so the browser don't have to fetch them for each page view).
To summarize, to optimize performance for a web page, you want to look at, as a minimum:
Server response time
Bandwith /content transfer time.
Make sure you have a small and simple DOM (especially if you need to support IE6).
Make sure you understand client side caching and the settings you need to set on the server.
Make sure you make the client download as little data as possible. Consider GZipping resources and perhaps dynamic content also (dependent on your situation).
Make sure you don't have any CPU intensive javascript on Page load.
You might want to read up on the HTTP Protocol, as well as some of the Best Practices. A couple of tools you can use are YSlow and Google Page Speed
What are the steps involved from entering a web site address to the page being displayed on the browser?
The steps are something like:
Get the IP address of the URL
Create a TCP (HTTP) connection to the IP address, and request the specified page
Receive/download the page via TCP/HTTP; the page may consist of several files/downloads: e.g. the HTML document, CSS files, javascript files, image files ...
Render the page
And how can the process be speeded up from a developer point of view?
Measure to discover which of these steps is slow:
It's only worth optimizing whichever step is the slow one (no point in optimizing steps which are already fast)
The answer to your question varies depending on which step it is.

Web site aggregation with twitter widget SSL issue

I'm seeking for solution how to isolate widget included by partial to main site. Issue appear when user access site with https. Ie 6,7 shows security confirmation dialog (part of website resources are not in secure zone).
First of all I download twitter widget on our side, also I download all CSS and pictures. Then I patched widget JS to point onto downloaded resources. But still has not luck with security warning :( I guess the reason of this issue is AJAX request to twitter, but there is no idea how to sole it. (Just to create some kind of proxy on our side).
Thank you for attention.
You just need to host the .js file on your server, and link to that. That is all.
The script auto detects SSL and will make requests to https://twitter-widgets.s3.amazonaws.com/ instead of http//widgets.twimg.com/ dynamically depending on your scenario.
Hope that helps!
geedubb
I got the Twitter Widget to work over HTTPS (SSL) by doing the following:
Save every image, css, and javescript file on my local webserver
Changed every "http" to "https" in the javascript AND in the css
The last piece was tricky. https://twitter.com/statuses/user_timeline.json brings back data that already includes "http"; namely avatars and the profile image. So, I found about four places in widjet.js that used the user_timeline.json data. I hardcoded an image url where ever that "http" data was used. Searching "src" will located all of those places.
It's an ugly fix, but it worked.
You can use a sniffer like HttpWatch to debug this--watch the requests going by and see which ones start with http instead of https. It may be possible to just change the urls you use to point to https://twitter.com, not sure about how your widget works.
thanks Keshar, worked for me. I came to the same conclusion that all http requests had to be https to prevent the IE security warning and also display the twitter feed. I used the live HTTP headers firefox plugin which helps for showing any non-secure http requests, such as the JSON requests.
Jon
If you look through the script there are calls to a https site. If you simply replace the protocol/domain with
https://twitter-widgets.s3.amazonaws.com/
instead of
http//widgets.twimg.com/
it works and you don't have to do anything else.

How do you see the client-side URL in ColdFusion?

Let's say, on a ColdFusion site, that the user has navigated to
http://www.example.com/sub1/
The server-side code typically used to tell you what URL the user is at, looks like:
http://#cgi.server_name##cgi.script_name#?#cgi.query_string#
however, "cgi.script_name" automatically includes the default cfm file for that folder- eg, that code, when parsed and expanded, is going to show us "http://www.example.com/sub1/index.cfm"
So, whether the user is visiting sub1/index.cfm or sub1/, the "cgi.script_name" var is going to include that "index.cfm".
The question is, how does one figure out which URL the user actually visited? This question is mostly for SEO-purposes- It's often preferable to 301 redirect "/index.cfm" to "/" to make sure there's only one URL for any piece of content- Since this is mostly for the benefit of spiders, javascript isn't an appropriate solution in this case. Also, assume one does not have access to isapi_rewrite or mod_rewrite- The question is how to achieve this within ColdFusion, specifically.
I suppose this won't be possible.
If the client requests "GET /", it will be translated by the web server to "GET /{whatever-default-file-exists-fist}" before ColdFusion even gets invoked. (This is necessary for the web server to know that ColdFusion has to be invoked in the first place!)
From ColdFusion's (or any application server's) perspective, the client requested "GET /index.cfm", and that's what you see in #CGI#.
As you've pointed out yourself, it would be possible to make a distinction by using a URL-rewriting tool. Since you specifically excluded that path, I can only say that you're out of luck here.
Not sure that it is possible using CF only, but you can make the trick using webserver's URL rewriting -- if you're using them, of course.
For Apache it can look this way. Say, we're using following mod_rewrite rule:
RewriteRule ^page/([0-9]+)/?$
index.cfm?page=$1&noindex=yes [L]
Now when we're trying to access URL http://website.com/page/10/ CGI shows:
QUERY_STRING page=10&noindex=yes
See the idea? Think same thing is possible when using IIS.
Hope this helps.
I do not think this is possible in CF. From my understanding, the webserver (Apache, IIS, etc) determines what default page to show, and requests it from CF. Therefore, CF does not know what the actual called page is.
Sergii is right that you could use URL rewrting to do this. If that is not available to you, you could use the fact that a specific page is given precedence in the list of default pages.
Let's assume that default.htm is the first page in the list of default pages. Write a generic default.htm that automatically forwards to index.cfm (or whatever). If you can adjust the list of defaults, you can have CF do a 301 redirect. If not, you can do a meta-refresh, or JS redirect, or somesuch in an HTML file.
I think this is possible.
Using GetHttpRequestData you will have access to all the HTTP headers.
Then the GET header in that should tell you what file the browser is requesting.
Try
<cfdump var="#GetHttpRequestData()#">
to see exactly what you have available to use.
Note - I don't have Coldfusion to hand to verify this.
Edit: Having done some more research it appears that GetHttpRequestData doesn't include the GET header. So this method probably won't work.
I am sure there is a way however - try dumping the CGI scope and see what you have.
If you are able to install ISAPI_rewrite (Assuming you're on IIS) - http://www.helicontech.com/isapi_rewrite/
It will insert a variable x-rewrite-url into the GetHttpRequestData() result structure which will either have / or /index.cfm depending on which URL was visited.
Martin

Resources