hash tags in urls and hyperlinks - url

I created a hyperlink to a file. the file name contains hashtags as a means to separate information.
<div style="height:100%;width:100%">.</div>
translated to...
http://localhost/dir/upload/1427853638#0#file#A101.pdf
Is this a "legal" name in a URL? Im getting a "file not found" error
The requested URL /dir/upload/1427853638 was not found on this server.
So, clearly the # has another meaning in the URL (I understand now, its a location hash property). Is there a way to get this to work, or do i need to use another character besides the # in the file names?

Since # is a special character in the URL semantic (it's used to provide an internal anchor in a HTML page), it should be URL-encoded into %23.
Your URL should be: http://localhost/dir/upload/1427853638%230%23file%23A101.pdf.
NB: you can find an online URL encoder here: http://meyerweb.com/eric/tools/dencoder/

Related

substitute space character in lua

I am creating a template in Wikipedia to link articles to an external website which is a book archive called fadedpage.com. I am generating a url to link to a specific author page. Part of the url is the author's name which contains one or more spaces. For example, the url for the author "Ian Fleming" is: http://fadedpage.com/csearch.php?author=Fleming, Ian. My template call structure is {{FadedPage|id=Fleming, Ian|name=Ian Fleming|author=yes}}.
For my template I am replicating an existing template which uses a script coded in lua to parse the template arguments. I have been able to generate all of the url except for the space character between the last and first name.
I could code the template call as: {{FadedPage|id=Fleming,%20Ian|name=Ian Fleming|author=yes}} which works OK but I would rather have the call format as it looks on the fadedpage website, ie. with the embedded space. So I need a way in lua to find the space character within the string and substitute it for the string "%20". So far I haven't figured out how to do it. Any help would be appreciated.

Is there a way to escape all the special characters in a url string parameter?

I need users to be able to pass a file path as a parameter of a get url (the file would not be uploaded and only the local file path is used for some security reasons). Now it's difficult for them to go and change all the backslashes to "%5". I was wondering if there is a way to force encoding of a part of the url. For example something as simple as putting it in double quotes, which doesn't work...
http://example.com/"c:\user\somone\somefile.txt"/dosomething
I ended up using pattern matching of rest routes at the server level. Something like this:
/example.com/*path/dosomething
So it would match any path even with slashes/backslashes. At last I do a decoding of the url to get rid of the escaped characters passed by browser for chars like space.
java.net.URLDecoder.decode(path, "UTF-8")

How to transform encoded URL to readable texts?

It's about Bangla Unicode texts, but can be a problem for any language other than Latin glyphs.
I'm a host of a Bangla blog with all its texts and categories in Bangla (I prefer not to say Bengali as because the name of the language is Bangla rather than Bengali).
So the category in Bangla "বাংলা" saying a URL like:
http://www.example.com/category/বাংলা
But whenever I copied the URL from address bar and put 'em into a chat panel or somewhere else, it changed with some strange characters, for example:
http://www.example.com/category/%E0%A6%B8%E0%A7%8D%E0%A6%A8%E0*
* it's just an example, not the exact gibberish for the word "বাংলা")
So, in many cases I got some encoded URLs like above, from where I found no trace which Unicode text they are saying. Recently I'm getting some 404 error logged by one of my plugin. From there I found a URI like:
/category/%E0%A6%B8%E0%A7%8D%E0%A6%A8%E0%A6%BE%E0%A7%9F%E0%A7%81%E0%A6%AC%E0%A6%BF%E0%A6%A6%E0%A7%8D%E0%A6%AF%E0
I used the Jetpack's Omnisearch to find out any match, but the result is empty. I can't even trace which category that is— creating such a 404.
So here comes the question:
How can I transform the encoded URL to readable glyphs?
http://www.example.com/category/বাংলা
isn't a URL; URLs can only contain ASCII characters. This is an IRI.
http://www.example.com/category/%E0%A6%AC%E0%A6%BE%E0%A6%82%E0%A6%B2%E0%A6%BE
is the URI representation of that IRI. They are otherwise equivalent. A browser may display the ‘pretty’ IRI version in the user interface, but put the URI version on the clipboard so that you can paste it into other tools that don't support IRI.
The 404 address you pasted translates to:
/category/স্নায়ুবিদ্য�
where the last character is a � because it is an invalid, truncated UTF-8 sequence. (This is probably why the request failed.) Someone may have mis-pasted a partial URI here.
If you're using javascript you can do:
decodeURIComponent(url);
This will make sure the original language is preserved.

How can you get the canonical URL for a web page (Rails)?

I need to store a distinct URL for an external webpage
I need to put the URL into the database. I don't want to store the same page twice so
I need to strip all fluff off the URL.
# if I have
url_1 = "http://scientificamerican.com/royal-baby/?utm_campaign=promo"
# and
url_2 = "http://scientificamerican.com/royal-baby/?utm_source=email"
# then they should map to:
url_canonical = "http://scientificamerican.com/royal-baby/"
...it's not as simple as just stripping query parameters though
In order to get a single canonical URL regardless of what was on it I tried stripping the query string. The problem is that there are still CMSs which use the query string.
e.g.
url_1 = "https://www.scientificamerican.com/article.cfm?id=obama-budget"
# strip the query string and it becomes
url_1 = "https://www.scientificamerican.com/article.cfm"
# which is obviously the same for all articles :(
Is there any Rails tool for getting a page's canonical URL?
This is obviously a problem that a number of people have had to solve, not least the search engines. How do you reduce the URL down such that all that remains is the data for the page?
You can't. There is no way to know what query parameters are necessary to distinguish the URL. There are obviously many parameters you can knowingly remove (ie. utm_campaign, etc.) but not all.
You're best bet would be to load the HTML for the page and look for the canonical link element . If that exists, then you've got your canonical URL.
http://en.wikipedia.org/wiki/Canonical_link_element

trouble constructiing url encoded link

if i do this:
<a target="_blank" href="<%=Url.Encode(sitelink)%>"> LINK TO SITE</a>
I get the link encoded but prepended with the current local domain "http://localhost/http://...."
whats the proper way to do this
The Url.Encode method is used to escape special characters for usage in the query part of a url - it's not meant to be applied to the entire url, because that will escape things like the :// at the beginning (which is why you get the local domain prepended, because it's no longer a full URL, instead getting interpreted as a relative url).

Resources