Query after '#' in https://www.google.co.in/#q=better+flight+search - url

The URL follows the following scheme
scheme://domain:port/path?query_string#fragment_id
but a search for string
better flight search
result in the following url
https://www.google.co.in/#q=better+flight+search
according to the url scheme # is followed by fragment. Correct me if I am wrong but fragments are not send to the server then how does google show search results.

As you realized, the fragment portion of the URL is not sent to the server in an HTTP request. Instead, it is used locally by the browser to mark places in the document. Some client side frameworks take advantage of this fact and use the fragment as a secondary query string.
So, for instance, in your example with Google, doing a search on a Google page causes the page to navigate to a fragment like #q=better+flight+search. The browser sees the change and notifies the page's javascript that the URL was changed. Since the URL minus the fragment is the same, the browser doesn't perform a request to the server. In this case, the page's javascript sees the fragment change and uses that to perform an Ajax query to get search results. Doing this allows Google to give you search results without loading the page, which is a huge win for both server and client (server, because it doesn't have to deal with the overhead of serving another page; client because load times are decreased dramatically).
For the related #! sees this question.

Related

Use # instead of ? for URL parameters

I have to support old URLs in my Play application and those used to pass parameters using # instead of ?, like mysite.com/?p#XXX instead of mysite.com/?p=XXX.
The problem is that Play is ignoring everything that comes after the hashtag. No parameter is passed and when I check the URI of the request, I get only a substring that ends exactly before the hashtag:
request().uri()
gives only:
mysite.com/?p
Is there a way in Play to get the rest of the URL on the server side?
This isn't a play problem. You'll encounter this with anything that's running in the browser because fragments are exclusively client side. W3 states:
In one of his XHTML pages, Dirk creates a hypertext link to an image that Nadia has published on the Web. He creates a hypertext link with "http://www.example.com/images/nadia#hat". Emma views Dirk's XHTML page in her Web browser and follows the link. The HTML implementation in her browser removes the fragment from the URI and requests the image "http://www.example.com/images/nadia". Nadia serves an SVG representation of the image (with Internet media type "image/svg+xml"). Emma's Web browser starts up an SVG implementation to view the image. It passes it the original URI including the fragment, "http://www.example.com/images/nadia#hat" to this implementation, causing a view of the hat to be displayed rather than the complete image.
Note that the HTML implementation in Emma's browser did not need to understand the syntax or semantics of the SVG fragment (nor does the SVG implementation have to understand HTML, WebCGM, RDF ... fragment syntax or semantics; it merely had to recognize the # delimiter from the URI syntax [URI] and remove the fragment when accessing the resource). This orthogonality (ยง5.1) is an important feature of Web architecture; it is what enabled Emma's browser to provide a useful service without requiring an upgrade.
(Emphasis mine)
I'm not sure what your old application was but it sounds like it was a javascript client side application. You'll have to have your front end real query parameters to play if your backend needs to do processing based on them.
Also, I think your question might be a duplicate of this one in a broad sense.

check if url can be loaded in an iframe

Snip.ly nicely checks if the entered web address can be used in an iframe.
I'd like to replicate it in ruby. Looking through their code they send an ajax request to their server and thats where they do the validation.
Even after extensive googling couldn't find anything that could help me accomplish that.
My use case is that we let users add news listings to their page, which are shown in iframes, and would like to show it if the entered url can be used in an iframe.
You can figure out some cases by checking the X-Frame-Options header. But as you mentioned in the comments, it does not work all the time.
In my experience, it's best to side-step the problem altogether.
If you reverse-proxy your request through your rails server, then you can display pretty much anything all the time in your iframe.
Following is an example of the process. I'm assuming that your server is your-server.com and the user wants to list a page on user.com/list. The way it works would be:
Set an iframe's src to https://your-server.com/proxy?url=https://user.com/list`
Intercept the request, extract the url: https://user.com/list
Perform an HTTP request on https://user.com/list to fetch the content
Return it to the browser as if it come from your own server
This approach works pretty much all the time, but it then has other limitations:
- you should reverse proxy any asset on that page that has a relative url; otherwise the css/images may be broken
- you must handle ajax requests on that page
You can fix these as well, by transforming the html before step 4.
You could use https://github.com/waterlink/rack-reverse-proxy for step 2 and 3, instead of re-implementing your own reverse proxy.
You could set it up using the following code in config/application.rb:
config.middleware.insert(0, Rack::ReverseProxy) do
reverse_proxy_options timeout: 10 # avoids waiting for pages that take forever to load
reverse_proxy(/proxy\?url=(.*)/, '$1') # reverse proxy on the url parameter
end

How google url with # works

How the URL like https://www.google.co.in/#q=harry+potter works?
As per my understanding anything after the # is not sent to the server.
Now if we paste the above URL in browser then it get the search page for Harry Potter.
As per my understanding when one request the above URL a request will be sent to server and since the search term "Happy Potter" is after the '#' it won't be sent to the server. So server wont have anyway to determine what to search? So then how it works. Does browser does anything special ?
Your understanding is correct, the server does not see your search term.
It's a client side JavaScript that is executed upon page load and inspects the url. It then executes an XHR request with the search term appended in a way that is visible to the server (https://www.google.co.in/search?q=harry+potter&...).
Reload the page with JavaScript disabled and you will see that you are getting the regular page without pre-filled search box and results.

Detecting main URL with IdHTTPProxyServer

I want to make an application to redirect websites.
It has a table with "domains" and "redirect domains".
Once it matched domain, it redirect to redirect domain.
If didn't matched, it redirect to default page.
So I created a Delphi application with IdHTTPProxyServer.
I have configured it to even work with https using "ssleay32.dll" and "libeay32.dll".
Everything works great.
It use "IdHTTPProxyServerHTTPBeforeCommand" event to redirect like this:
with AContext.Connection.IOHandler do
begin
WriteLn('HTTP/1.0 302 Moved Temporarily');
WriteLn('Location: ' + RedirectURL);
WriteLn('Connection: close');
WriteLn;
end;
But how do I distinguish the event call by main URL (user typed in the address bar) and other URLs?
"IdHTTPProxyServerHTTPBeforeCommand" event called lots of times when a page is loading for stat counters, facebook like buttons, etc. I don't want to redirect all of them to default page.
If this is not possible with IdHTTPProxyServer, is there any other options in Delphi or any other language (which can generate native executable. C++ preferred)?
Thank You
From the perspective of a proxy (or the target HTTP server, for that matter), there is no difference whatsoever between a user-typed URL and other URLs. Every HTTP request is self-contained and independent of every other HTTP request. They have to processed as-is on a per-request basis.
If you want to ignore dependent URLs (images, scripts, etc), you will have to know ahead of time what the initial URL is, parse the data that is retrieved from that URL, keep track of any URLs the data refers to, and then ignore those URLs if you see them being requested later. However, there is nothing in the HTTP protocol to tell you what the initial URL is. There is a Referer request header that may help at times, as it is filled in when a browser is requesting dependent resource files, but it is also filled in when the user navigates around from one page to another, so you can't rely on the Referer by itself. You will have to implement your own discovery logic to figure out the initial URL based on more analysis of the URLs being requested by a given client over time.
Only the client really knows what it is requesting and why, a proxy is just a gateway to reach it. So there is only so much smart filtering you can do in a proxy without knowing what the client is actually doing.

Using non Google Analytics tag in URL alongside regular Google Analytics tags

I'm having some issues with Google Analytics URL parameters. Prviously I've built URLs with the Google Analytics URL Builder. these have enabled me to track where visitors to my site have been coming from, how successful various marketing campaigns have been etc.
Recently, I've started using another tag in the URL, one which has nothing to do with Google Analytics, but acts to alter the telephone number on my site when the visitor arrives on it. For example, I'll add &ctcc=adwords onto the end of my tracking URL, and a specified phone number will appear on my site when the user comes through so I can track how many calls my adwords spend has generated.
However, when I've been using this ctcc code, Google Analytics no longer seems to be tracking the traffic numbers to my site :(
Any idea how I can incorporate the two parameters into the URl, and ensure that they both work as expected?
Thanks in advance
It looks like this is a problem with how your server is redirecting traffic with a ctcc query parameter. Look at the following request and its response headers:
So the ctcc parameter is used in some server side tracking (as best as I can tell), and the server is set up to redirect & strip ctcc whenever it gets a request with ctcc. Not being familiar with the system in use, I can't provide details, but you need to reconfigure the redirects to stop changing & into ;. It's the replacement of ampersands with semicolons that is messing up your GA data.

Resources