I want to use Georgian characters in my URL (e.g "https://www.example.ge/post/ქართული-სიმბოლოები") and my result in a browser is correct but in SEO audit look as "https://www.example.ge/post/ááá¢ááááááªáá
Does anyone know how I can fix this?
Try using URL encoding. There are quite a few tools on the web. E.g., on one of these tools your url is encoded like this:
https%3A%2F%2Fwww.example.ge%2Fpost%2F%E1%83%A5%E1%83%90%E1%83%A0%E1%83%97%E1%83%A3%E1%83%9A%E1%83%98-%E1%83%A1%E1%83%98%E1%83%9B%E1%83%91%E1%83%9D%E1%83%9A%E1%83%9D%E1%83%94%E1%83%91%E1%83%98
In the address bar this will be shown as your Georgian characters.
Related
I was wondering how I can turn a comment (such as stackoverflow.com) to a post as a URL so that when clicked, it will go straight to the website?
Thanks for your help in advance
You can pre-process your text and look for things that might be a URL. Look at this answer here: Regex to match URL.
You'd want to take your text, split it by white-space, then for each white-space-separated word, check if it's a URL. If it's a URL, then output a proper <a> tag surrounding it. If not, just output the word.
I like the friendly id gem but one problem i'm seeing is when I type in a url with a capitol letter in it such as /users/Joe-Blogs it cant find the page. Its a little trivial but most sites can handle something like this and will generate the page whether it has a capitol letter or not. Does anyone know a fix for this?
Edit: to clarify this is for when users enter a url manually and put capitals in it just because its a name like author/Joe-Blogs. I've seen other sites handle this but rails seems to just give a 404.
friendly_id uses parameterize to create the slugs.
I think the best way to solve your problem is to parameterize the params before using it to find.
# controller
User.find(params[:id].parameterize)
Or parameterize the url where the link originated from.
As an addition to Vic's answer, you'll want to look at url normalization:
The following normalizations are described in RFC 3986 to result in equivalent URLs:
Converting the scheme and host to lower case.
The scheme and host components of the URL are case-insensitive. Most normalizers will convert them to lowercase.
Example: HTTP://www.Example.com/ → http://www.example.com/
In short - it's against convention to use capitalization in your urls.
You may also wish to look at URI normalize; more importantly, you should work to remove the capitalization from your URLs:
URI.parse(params[:id]).normalize
It's about Bangla Unicode texts, but can be a problem for any language other than Latin glyphs.
I'm a host of a Bangla blog with all its texts and categories in Bangla (I prefer not to say Bengali as because the name of the language is Bangla rather than Bengali).
So the category in Bangla "বাংলা" saying a URL like:
http://www.example.com/category/বাংলা
But whenever I copied the URL from address bar and put 'em into a chat panel or somewhere else, it changed with some strange characters, for example:
http://www.example.com/category/%E0%A6%B8%E0%A7%8D%E0%A6%A8%E0*
* it's just an example, not the exact gibberish for the word "বাংলা")
So, in many cases I got some encoded URLs like above, from where I found no trace which Unicode text they are saying. Recently I'm getting some 404 error logged by one of my plugin. From there I found a URI like:
/category/%E0%A6%B8%E0%A7%8D%E0%A6%A8%E0%A6%BE%E0%A7%9F%E0%A7%81%E0%A6%AC%E0%A6%BF%E0%A6%A6%E0%A7%8D%E0%A6%AF%E0
I used the Jetpack's Omnisearch to find out any match, but the result is empty. I can't even trace which category that is— creating such a 404.
So here comes the question:
How can I transform the encoded URL to readable glyphs?
http://www.example.com/category/বাংলা
isn't a URL; URLs can only contain ASCII characters. This is an IRI.
http://www.example.com/category/%E0%A6%AC%E0%A6%BE%E0%A6%82%E0%A6%B2%E0%A6%BE
is the URI representation of that IRI. They are otherwise equivalent. A browser may display the ‘pretty’ IRI version in the user interface, but put the URI version on the clipboard so that you can paste it into other tools that don't support IRI.
The 404 address you pasted translates to:
/category/স্নায়ুবিদ্য�
where the last character is a � because it is an invalid, truncated UTF-8 sequence. (This is probably why the request failed.) Someone may have mis-pasted a partial URI here.
If you're using javascript you can do:
decodeURIComponent(url);
This will make sure the original language is preserved.
I want to extract links from html, using jsoup
Expected output: absolute link.
I use "abs:href" for that.
This works:
Jsoup.parse("<a \n\r\t href=\"http://www.ibm.com/123/?id=abc\">\nhaha</a>", "http://www.ibm.com");
delivers: http://www.ibm.com/123/?id=abc
This doesnt work:
Jsoup.parse("<a \n\r\t href=\"www.ibm.com/123/?id=abc\">\nhaha</a>", "http://www.ibm.com");
delivers: http://www.ibm.com/www.ibm.com/123/?id=abc
I know its kinda difficult to know whether "www.ibm.com" is an absolute or relative link. It might be a top level domain, but also a foldername. Any proven solutions? Just this hack comes into my mind:
String domain = url.replace("http://", "");
url.replace(domain + domain, domain);
Your second example is unambiguously a relative URL. An absolute URL, by definition, starts with a protocol (e.g. http or https). All browsers will give the same output for your example.
Can you provide an example URL that you're working with? Why does it have these pseudo-absolute URLs?
I have a bugging problem. For a website I made there are search engine friendly URL's generated. The only problem is there are ß-chars in the url too. Chars like ö, ï, ä, ü etc. are placed correct. But with the ß-char there is a diamond-icon with a questionmark in it.
I thought it had to do with the charset which is used but i've tried both UTF-8 and iso-8859-1. Both without luck.
I need to have the correct character in the url for the readability of deeplinks.
Solved the problem with iconv function.