URLConnection with arabic parameters - arabic

i'm trying to develop an android application which contains arabic data , so i've got a problem ;
URL twitter = new URL("http://10.0.2.2/WS/identi_el.php?id1="+nomm+"&id2="+pren+"&id3="+pa);
these parameters (nomm , pren and pa ) are in arabic language so it doesn't return any result , however , when i put them in french it returns results so can anyone helps me how to make URLConnection supports arabic letters please ?

Non alphanumeric characters except -, _ and . are know to cause issues in URLs, I bet you'll run into the same problem if you use a french word with an accent.
So stay on the safe side and encode all parameters before using them as part query string parameters.

I modified the URL from
URL twitter = new URL("http://10.0.2.2/WS/identi_el.php?id1="+nomm+"&id2="+pren+"&id3="+pa);
to
url = new URL("http://10.0.2.2/WS/identi_el.phpid1="+java.net.URLEncoder.encode(nomm,"utf8")+"&id2="+java.net.URLEncoder.encode(pren,"utf8")+"&id3="+java.net.URLEncoder.encode(pa,"utf-8"));
=> I just added the following java.net.URLEncoder.encode(...,"utf8") for each parameter and it's working :)

Related

Wierd URL Encoding/Decoding for non English Characters

How and why a non-English word is converted to weird characters like پاکستان to پاکستان, is there any way back to get پاکستان from پاکستان. It happens in browser shown code and received requests at server
Background:
I get lot of requests at my Non-English content (urdu) website with urls like
پاکستان
I tried to know what that means but search engines don't help. I tried things like
Decode this 'mystring'
What ecoding is this 'mystring'
I thought it might be corrupted/spam url, from this link
Weird characters in URL
Problem explanation/example
But when I viewed one my js file in browser (while having look on working js file). It is showing me same wired characters in browser, even at localhost
'pakistan': {'eng': 'Pakistan', 'ur': 'پاکستان'},
//But actually source code for above line is following
'pakistan': {'eng': 'Pakistan', 'ur': 'پاکستان'},
But in browser its showing me following for same line,
My knowledge
I only know about Encoding/Decoding, which seems unrelated here with best of my knowledge as?
encodeURI and decodeURI in JS or quote and unquote in python and same for other languages. But what they do for me is only
`پاکستان` to `%D9%BE%D8%A7%DA%A9%D8%B3%D8%AA%D8%A7%D9%86` and vise versa
Why needed?
I don't want to miss the requests received with those malformed urls, there must be some things to undo as all browsers chrome/firefox/edge showing those characters same, If their translation/conversion method and result is same then there should be some technique available to reverse it as well
Thanks to Giacomo Catenazzi and then I be greatful to the following answer
How to decode cp1252 string?
A very custom and still imperfect solution to my problem.
This algo needs to be improved Only by experiment I came to know, this algo works as its not working for me when string is long or including - (hyphens)
So I made changes according to my requirement and its working fair enough, so that I could guess what the actual string was.
import re, itertools
from lxml.builder import unicode
def specific_my_required_processing(received_string):
starting_characters_in_encoded_string_in_my_case = ['Ø', 'Ã', 'Ù', 'Ù', 'Ú']
arr = received_string.split('-')
res = []
missed = []
for string_item in arr:
decoded_string = guess_decode_string_without_hyphens(string_item)
if decoded_string and decoded_string[:1] not in starting_characters_in_encoded_string_in_my_case:
res.append(decoded_string)
else:
missed.append({string_item: decoded_string})
resulting_urdu_string = '-'.join(res)
print('\n\nResult', resulting_urdu_string)
print('\nCould not be decoded', missed)
def guess_decode_string_without_hyphens(s):
encodings = ['cp1251', 'cp1252', 'utf8']
for steps in range(2, 10, 2):
for encs in itertools.product(encodings, repeat=steps):
r = s
try:
for enc in encs:
r = r.encode(enc) if isinstance(r, unicode) else r.decode(enc)
except (UnicodeEncodeError, UnicodeDecodeError) as e:
continue
if re.match(u'^[\w\sа-яА-Я]+$', r):
res = str(r)
print('Encoding => ', encs, ' Conversion = ' + s + ' => ' + res)
return res
sample_encoded_string = 'اسلام-آباد-Ûائیکورٹ-ای-ÙˆÛŒ-ایم-قانون-سازی-کالعدم-قرار-دینے-Ú©ÛŒ-درخواست-نامکمل-قرار'
specific_my_required_processing(sample_encoded_string)

Umbraco - Trim last five characters in current page url

Im getting the url of the current page using #CurrentPage.Url
It returns http://hostname/abcdefgh/
I wanted to trim out the last 5 characters of the URL
Req. URL http://hostname/abcd as the last five characters 'efgh/' is removed.
I tried using #umbraco.library.TruncateString(testString,-5,"") property to trim, but was unsuccessful.
Im new to umbraco. Any help would be highly appreciated.
I'd be interested to know the reason for doing this. Using the Remove method worked for me:
#{
string url = CurrentPage.Url;
url = url.Remove(url.Length - 5);
}

Change youtube shortened url value in iframe

I am using advanced custom filed wordpress plugin to create a meta tag called youtube URL...
When some one put the video url in shorter format like this https://youtu.be/H-30B0cqh88
Then the iframe I put to show the video doesn't work cause iframe doesn't work with shorter version of url
rather it needs real url as a source.. My iframe code is as below
">
How can I achive this.. Let me show you how I want to achieve this..
I am not that expert on php so please give me the full working code..
<?php
the_field("listing_video_1") == $got_url_from_user_input
if $got_url_from_user_input == https://youtu.be/H-30B0cqh88 in this format
$actual_URL= replace above url to https://youtube.com/embed/H-30B0cqh88
?>
How can I achieve this please.
Thanks in advance
Here's a simple way to use preg_replace to accomplish what you are trying to do:
<?php
// SET OUR DEFAULT URL
$got_url_from_user_input = 'https://youtu.be/H-30B0cqh88';
print "\nSTARTING URL: ".$got_url_from_user_input;
// DO THE REPLACE AND PRINT OUT THE FINAL RESULT
$actual_URL = preg_replace('~https://youtu\.be/([-A-Z0-9]+)~i', 'https://youtube.com/embed/$1', $got_url_from_user_input);
print "\nFINAL URL: ".$actual_URL;
There's not much magic here, so let me run down it quickly:
https://youtu\.be/ - Look for this pattern exactly. We escape the dot with a backslash so it finds a literal dot and not any character.
([-A-Z0-9]+) - This is just your basic character class matching any dash, letter or number, occurring at least one time. We put it in parenthesis so that we it will be saved in $1 and we can plug it into our final url.
Here is the above code in a working demo you can take a look at:
http://ideone.com/zECA2e
You can just use php str_replace() function as in following code :
<?php
$actual_url = str_replace('https://youtu.be/', 'https://www.youtube.com/embed/', $got_url_from_user_input);
echo $actual_url;
?>
str_replace() will replace youtubes short url with embade url.

How do I scan url for a specific string with spaces and special characters?

I'm using stringscanner on my request URL in order to get the name of the user's currently selected category, but I've been having difficulty dealing with spaces and special characters.
request.url.scan(/\?category=\w+/).to_s.gsub('?category=', '')
URL examples followed by result
http://localhost:3000/search?category=dog&search=&utf8=%E2%9C%93 => ["dog"]
http://localhost:3000/search?category=dog.com&search=&utf8=%E2%9C%93 => ["dog"]
http://localhost:3000/search?category=dog+cat&search=&utf8=%E2%9C%93 => ["dog"]
I'm trying to get ["dog"] ["dog.com"] and ["dog cat"], but am currently stuck. Any ideas?
Note: Considering removing spaces from categories and replacing them with dashes as multiple spaces could be problematic, but if it's possible to create one function to rule them all, that would be awesome.
This is Rails, is there a reason you're not just using params[:category]?
If you are trying to extract params then you could use parse_query :
uri = "http://localhost:3000/search?category=dog+cat&search=&utf8=%E2%9C%93"
result = Rack::Utils.parse_query(URI(uri).query) #=> {"category"=>"dog cat", "search"=>"", "utf8"=>"\xE2\x9C\x93"}
result["category"] #=> dog cat

How do you include hashtags within Twitter share link text?

I'm writing a site with a custom tweet button that uses the www.twitter.com/share function, however the problem I am having is including hash '#' characters within the tweet text.
For example:
http://www.twitter.com/share?url=www.example.com&text=I+am+eating+#branstonpickel+right+now
The tweet text comes out as 'I am eating' and omits the hash and everything after.
I had a quick look on the Twitter forums and learnt the hash '#' character cannot be part of the share url. On https://dev.twitter.com/discussions/512#comment-877 it was said that:
Hashes are special characters in the URL (they identify document fragments) so they, and anything following, does not get sent the server.
and
you need to URLEncode it, so use %23
When I tried the 2nd point in my test link:
www.twitter.com/share?url=www.example.com&text=I+am+eating+%23branstonpickel+right+now
The tweet text came out as 'I am eating %23branstonpickel right now' literally including %23 instead of converting it to a hash.
Sorry for the waffely question, but does anyone know what it is I'm doing wrong?
Any feedback would be greatly appreciated :)
It looks like this is the basic setup:
https://twitter.com/intent/tweet?
url=<url to tweet>
text=<text to tweet>
hashtags=<comma separated list of hashtags, with no # on them>
This would pre-built a tweet of: <text> <url> <hashtags>
The above example would be:
https://twitter.com/intent/tweet?url=http://www.example.com&text=I+am+eating+branston+pickel+right+now&hashtags=bransonpickel,pickles
There used to be a bug with the hashtags parameter... it only showed the first n-1 hashtags. Currently this is fixed.
you can use %23 instead of hash (#) in url eg
http://www.twitter.com/share?url=www.example.com&text=I+am+eating+%23branston+%23pickel+right+now
I may be wrong but i think the hashtag has to be passed as a separate variable that will appear at the end of your tweet ie:
http://www.twitter.com/share?url=www.example.com&text=I+am+eating+branston+pickel+right+now&hashtag=bransonpickel
will result in "I am eating branston pickel right now #branstonpickle"
On a separate note, I think pickel should be pickle!
Cheers
Toby
use encodeURIComponent to encode the url
If you're using PHP, you can use the following:
<?php echo 'http://www.twitter.com/share?' . http_build_query(array(
'url' => 'http://www.example.com',
'text' => 'I am eating #branstonpickel right now'
)); ?>
This will do all the URL encoding for you, and it's easy to read.
For more information on the http_build_query, see the PHP manual:
http://us2.php.net/http_build_query
For url with line jump, # , # and special unicode in it, the following works :
var lineJump = encodeURI(String.fromCharCode(10)),
hash = "%23", arobase="%40",
tweetText = 'https://twitter.com/intent/tweet?text=Le signe chinois '+hans+' '+item.pinyin+': '+item.definition.replace(";",",")+'.'
+lineJump+'Merci '+arobase+'Inalco_Officiel '+arobase+'CRIparis ❤️🇨🇳 '
+lineJump+hash+'Chinois '+hash+'MOOC'
+lineJump+'https://hanzi.cri-paris.org/',
tweetTxtUrlEncoded = tweetText+ "" +encodeURIComponent('#'+lesson+encodeURIComponent(hans));
urlencode
https://twitter.com/intent/tweet?text=<?= urlencode("I am eating #branstonpickel right now"); ?>"
You can just use this code and modify it
20% means space
23% means hashtag
In JS you can easily encode the special characters using encoreURIComponent.
(Warning: don't use encodeURI as "#" and "#" are not escaped.)
Here's an example with mention and hashtag:
const text = "Hello #world ! Go follow #StackOverflow";
const tweetUrl = `https://twitter.com/intent/tweet?text=${ encodeURIComponent(text) }`;

Resources