Use # instead of ? for URL parameters - url

I have to support old URLs in my Play application and those used to pass parameters using # instead of ?, like mysite.com/?p#XXX instead of mysite.com/?p=XXX.
The problem is that Play is ignoring everything that comes after the hashtag. No parameter is passed and when I check the URI of the request, I get only a substring that ends exactly before the hashtag:
request().uri()
gives only:
mysite.com/?p
Is there a way in Play to get the rest of the URL on the server side?

This isn't a play problem. You'll encounter this with anything that's running in the browser because fragments are exclusively client side. W3 states:
In one of his XHTML pages, Dirk creates a hypertext link to an image that Nadia has published on the Web. He creates a hypertext link with "http://www.example.com/images/nadia#hat". Emma views Dirk's XHTML page in her Web browser and follows the link. The HTML implementation in her browser removes the fragment from the URI and requests the image "http://www.example.com/images/nadia". Nadia serves an SVG representation of the image (with Internet media type "image/svg+xml"). Emma's Web browser starts up an SVG implementation to view the image. It passes it the original URI including the fragment, "http://www.example.com/images/nadia#hat" to this implementation, causing a view of the hat to be displayed rather than the complete image.
Note that the HTML implementation in Emma's browser did not need to understand the syntax or semantics of the SVG fragment (nor does the SVG implementation have to understand HTML, WebCGM, RDF ... fragment syntax or semantics; it merely had to recognize the # delimiter from the URI syntax [URI] and remove the fragment when accessing the resource). This orthogonality (ยง5.1) is an important feature of Web architecture; it is what enabled Emma's browser to provide a useful service without requiring an upgrade.
(Emphasis mine)
I'm not sure what your old application was but it sounds like it was a javascript client side application. You'll have to have your front end real query parameters to play if your backend needs to do processing based on them.
Also, I think your question might be a duplicate of this one in a broad sense.

Related

Is a protocol (eg. http or https) required for a URL to be valid?

Recently I came across a lot of code from analytics plugins where they specify the URL as //fonts.googleapis.com or //www.google.com.
Basically it starts with two forward slashes and then the domain or subdomain. These links work fine in browsers. I have read the following documents, but I am still not sure if above can be called valid URLs (basically should these be reported as broken URLs or not).
https://developer.mozilla.org/en-US/docs/Web/API/URL and
https://url.spec.whatwg.org/
Is there a standard specification that I can refer to?
They're both valid scheme-relative-URL strings, although they need to be in the context of a Base URL to be meaningful. When used within a web page, the web page will provide the Base URL context.
Although there are other, earlier standards for URLs, the whatwg document represents the most up-to-date, web compatible definition.

PathLocationStrategy vs HashLocationStrategy in web apps

What are the pros and cons of using:
PathLocationStrategy - the default "HTML 5 pushState" style.
HashLocationStrategy - the "hash URL" style.
for instance, using HashLocationStrategy will prevent the feature of scrolling to an element by its #ID, but some 3rd party plugins require the HashLocationStrategy or the Hashbang #! in order to work in ajax websites.
I would like to know which one offers more for a webapp.
For me the main difference is that the PathLocationStrategy requires a configuration on the server side to all the paths configured in #RouteConfig to be redirected to the main HTML page of your Angular2 application. Otherwise you will have some 404 errors when trying to reload your application in the browser or try to access it using a particular URL.
Here is a question that could give you some hints about this:
When I refresh my website I get a 404. This is with Angular2 and firebase.
Hope it helps you,
Thierry
# can only be processed on the client, the servers just ignore them. This can cause problems with search engines (SEO), redirects can cause redundant page reloads.
This page https://github.com/browserstate/history.js/wiki/Intelligent-State-Handling has some detailed explanation, while some of the arguments don't apply for Angular applications (for example - doesn't work with JS disabled).
The "disadvantage" of HTML5 pushstate is that is requires server support like explained by Thierry.
According to official docs:
When the router navigates to a new component view, it updates the browser's location and history with a URL for that view. This is a strictly local URL. The browser shouldn't send this URL to the server and should not reload the page.
PathLocationStrategy
Modern HTML5 browsers support history.pushState, a technique that changes a browser's location and history without triggering a server page request. The router can compose a "natural" URL that is indistinguishable from one that would otherwise require a page load.
Here's the HTML5 pushState style URL that routes to the xyz component: localhost:4200/xyz/
HashLocationStrategy
Older browsers send page requests to the server when the location URL changes unless the change occurs after a # (called the hash). Routers can take advantage of this exception by composing in-application route URLs with hashes.
Here's a hash style URL that routes to the xyz component: localhost:4200/src/#/xyz/
I would like to know which one offers more for a webapp.
Almost all Angular projects should use the default HTML5 style as:
It produces URLs that are easier for users to understand.
It preserves the option to do server-side rendering later.
Rendering critical pages on the server is a technique that can greatly improve perceived responsiveness when the app first loads. An app that would otherwise take ten or more seconds to start could be rendered on the server and delivered to the user's device in less than a second.
This option is only available if application URLs look like normal web URLs without hashes (#) in the middle.
Stick with the default unless you have a compelling reason to resort to hash routes.

Query after '#' in https://www.google.co.in/#q=better+flight+search

The URL follows the following scheme
scheme://domain:port/path?query_string#fragment_id
but a search for string
better flight search
result in the following url
https://www.google.co.in/#q=better+flight+search
according to the url scheme # is followed by fragment. Correct me if I am wrong but fragments are not send to the server then how does google show search results.
As you realized, the fragment portion of the URL is not sent to the server in an HTTP request. Instead, it is used locally by the browser to mark places in the document. Some client side frameworks take advantage of this fact and use the fragment as a secondary query string.
So, for instance, in your example with Google, doing a search on a Google page causes the page to navigate to a fragment like #q=better+flight+search. The browser sees the change and notifies the page's javascript that the URL was changed. Since the URL minus the fragment is the same, the browser doesn't perform a request to the server. In this case, the page's javascript sees the fragment change and uses that to perform an Ajax query to get search results. Doing this allows Google to give you search results without loading the page, which is a huge win for both server and client (server, because it doesn't have to deal with the overhead of serving another page; client because load times are decreased dramatically).
For the related #! sees this question.

Xhtml namespace on https site?

Im migrating a site from using http to redirect all requests to https and therefor im making sure external script, images etc are references with just // in the beginning of the url instead of http://
My question is this. Do i also need to change stuff like the xhtml namespaces for the html tag or the doctype declaration url? And if I do need to change this, will they resolve urls starting with //?
Namespaces are identifying strings that happen to use URL syntax. They should not be changed.
The DTD is a tricky one.
In theory, if it was altered with a man-in-the-middle attack, then it could be used to change named entities and insert new content into the document.
In practise however, browsers don't generally parse the DTD so this isn't really a worry. Additionally, W3C DTDs are not served over HTTPS so you can't reference them without copying the files to your own server (and possibly updating internal references). If you want to be really safe, you should do this.
Personally, I'd scrap DTDs and just use (X)HTML 5.

Resource for Recognizing Framework/CMS From URL or Other Clues?

I'm curious to know which web framework or content management system a website is using based upon clues from the URL, headers, content. Does anyone know of a resource on the web that would provide this? For example:
.html -> maybe a flat-file
.php -> something built using PHP, perhaps.
.jsp -> something using Java Server Pages
.asp -> Active Server Pages
0,2097,1-1-1928,00 -> Vignette CMS
.do -> ??
Thanks.
Finding CMS by url
http://2ip.ru/cms/
enter URL in center input field and click big blue button below
Results in black - not found,
in red - found
NOTE:
May play around with url path: with http://, with or without www. part -- results may differ.
If you're not restricted to just the query string then there are a few other options. For example to identify a rails app:
Script, stylesheet and image tags tend to have a 10 digits number appended (this allows you to cache, and still change the file):
<script src="/javascripts/all.js?1236037318" type="text/javascript"></script>
You can also sometimes tell from the cookies what the framework is. For example rails apps tend to have a session cookie called _appName_session, and often you can find a flash contained.
You're on the right track with your list there. If all you want to know is the stack (LAMP, IIS, Java) then that's all you really need.
If querying the URL in question is an option, then you can usually pull the webserver make/version out of the HTTP response header as well.
There is a nifty Chrome extension called Wappalyzer:
Wappalyzer is a browser extension that uncovers the technologies used
on websites. It detects content management systems, eCommerce
platforms, web servers, JavaScript frameworks, analytics tools and
many more.

Resources