Where is and which function does encode or decode URL in phpfox? - url

I'll be wonder if you tell me how phpfox handle URLs?
In details i want to know that which function gererates urls in PHPFox?
I have some problem with encoding or decoding of PHPFox. Because it transform some urls which is in Persian language to ??????.
For example it will resolve this link: 'http://www.mydomain.com/photos/اخبار/' to 'http://www.mydomain.com/photos/???????/'

This is the main library class for URLs: /include/library/phpfox/url/url.class.php
Is this what you are looking for?

Related

Regex failing to match the punycode url

I was having the url which on converting to punycode has suffix as xn---- which all the regex present in ruby libraries fails to match.
Currently I am using validates_url_format_of ruby library.
Example Url: "https://www.θεραπευτικη-κανναβη.com.gr"
Punycode url: "https://www.xn----ylbbafnbqebomc7ba3bp1ds.com.gr"
So can you please suggest that is there any issue in the regex in the library or the issue lies in the conversion to punycode.
As per the punycode conversion rules the suffix always is xn--. So can anyone suggest what extra two -- means here
"https://www.xn----ylbbafnbqebomc7ba3bp1ds.com.gr".match(/https?:\/\/w*\.xn----.*/)
=> #<MatchData "https://www.xn----ylbbafnbqebomc7ba3bp1ds.com.gr">
Note the url matcher is not perfect
When you have a - inside the URL, the algorithm gets it duplicated and moves it to the beginning of the puny code.
For example:
áéíóú.com -> xn--1caqmy9a.com
á-é-í-ó-ú.com -> xn-------4na3c3a3cwd.com
I guess it has to do with the xn-- encoding restrictions.
This one should work for you:
(xn--)(--)*[a-z0-9]+.com.gr
The beginning of the code: (xn--)
An even number (or 0) of --: (--)*
The domain chars/numbers :([a-z0-9]+)
The TLD of the domain : .com.gr
You can add http/https if you wish
Update:
After adding numbers to the URL I found that the regex needs a fix:
(xn--)(-[-0-9]{1})*[a-z0-9]+.com.gr
á-1é-2í-3ó-4ú.gr.com -> xn---1-2-3-4-7ya6f1b6dve.gr.com

regex to extract URLs from text - Ruby

I am trying to detect the urls from a text and replace them by wrapping in quotes like below:
original text: Hey, it is a url here www.example.com
required text: Hey, it is a url here "www.example.com"
original text show my input value and required text represents the required output. I searched a lot on web but could not find any possible solution. I already have tried URL.extract feature but that doesn't seem to detect URLs without http or https. Below are the examples of some of urls I want to deal with. Kindly let me know if you know the solution.
ANQUETIL-DUPERRON Abraham-Hyacinthe, KIEFFER Jean-Luc, www.hominides.net/html/actualites/outils-preuve-presence-hominides-asie-0422.php,Les Belles lettres, 2001.
https://www.ancient-code.com/indian-archeologists-stumbleacross-ruins-great-forgotten-civilization-mizoram/
www.jstor.org/stable/24084454
www.biorespire.com/2016/03/22/une-nouvelle-villeantique-d%C3%A9couverte-en-inde/
insu.cnrs.fr/terre-solide/terre-et-vie/de-nouvellesdatations-repoussent-l-age-de-l-apparition-d-outils-surle-so
www.cerege.fr/spip.php?page=pageperso&id_user=94
Find words who look like urls:
str = "ANQUETIL-DUPERRON Abraham-Hyacinthe, KIEFFER Jean-Luc, www.hominides.net/html/actualites/outils-preuve-presence-hominides-asie-0422.php,Les Belles lettres, 2001.\n\nhttps://www.ancient-code.com/indian-archeologists-stumbleacross-ruins-great-forgotten-civilization-mizoram/\n\nwww.jstor.org/stable/24084454\n\nwww.biorespire.com/2016/03/22/une-nouvelle-villeantique-d%C3%A9couverte-en-inde/\n\ninsu.cnrs.fr/terre-solide/terre-et-vie/de-nouvellesdatations-repoussent-l-age-de-l-apparition-d-outils-surle-so\n\nwww.cerege.fr/spip.php?page=pageperso&id_user=94"
str.split.select{|w| w[/(\b+\.\w+)/]}
This will give you an array of words which have no spaces and include a one or more . characters which MIGHT work for your use case.
puts str.split.select{|w| w[/(\b+\.\w+)/]}
www.hominides.net/html/actualites/outils-preuve-presence-hominides-asie-0422.php,
https://www.ancient-code.com/indian-archeologists-stumbleacross-ruins-great-forgotten-civilization-mizoram/
www.jstor.org/stable/24084454
www.biorespire.com/2016/03/22/une-nouvelle-villeantique-d%C3%A9couverte-en-inde/
insu.cnrs.fr/terre-solide/terre-et-vie/de-nouvellesdatations-repoussent-l-age-de-l-apparition-d-outils-surle-so
www.cerege.fr/spip.php?page=pageperso&id_user=94
Updated
Complete solution to modify your string:
str_with_quote = str.clone # make a clone for the `gsub!`
str.split.select{|w| w[/(\b+\.\w+)/]}
.each{|url| str_with_quote.gsub!(url, '"' + url + '"')}
Now your cloned object wraps urls inside double quotes
puts str_with_quote
Will give you this output
ANQUETIL-DUPERRON Abraham-Hyacinthe, KIEFFER Jean-Luc, "www.hominides.net/html/actualites/outils-preuve-presence-hominides-asie-0422.php,Les" Belles lettres, 2001.
"https://www.ancient-code.com/indian-archeologists-stumbleacross-ruins-great-forgotten-civilization-mizoram/"
"www.jstor.org/stable/24084454"
"www.biorespire.com/2016/03/22/une-nouvelle-villeantique-d%C3%A9couverte-en-inde/"
"insu.cnrs.fr/terre-solide/terre-et-vie/de-nouvellesdatations-repoussent-l-age-de-l-apparition-d-outils-surle-so"
"www.cerege.fr/spip.php?page=pageperso&id_user=94"

How to decode windows-874 imap subject?

I'm having a serious problem with imap decoding. I received an email which might be encoded in windows-874. And this causes the whole letter to be read. I tried to use iconv('tis-620','utf-8',$txt) but I've had no luck.
I've tried searching everywhere that there might be an answer but it seems like it is the first problem of the universe. (or I don't search the correct word?)
The subject is :
Charset : ASCII
=?windows-874?Q?=CB=E9=CD=A7=BE=D1=A1=C3=D2=A4=D2=BE=D4=E0=C8=C9=CA=D3=CB=C3=D1=BA=A7=D2=B9=E4=B7=C2=E0=B7=D5=E8=C2=C7=E4=B7=C2=A4=C3=D1=E9=A7=B7=D5=E8
30
=E2=C3=A7=E1=C3=C1=CA=C7=D1=CA=B4=D5=CA=D8=A2=D8=C1=C7=D4=B7=AB=CD=C2 8?=
So, please tell me what the encoding is, if it's not tis-62. How can I decode this into a human language?
Finally I found my way home. Firstly I created a function to detect any encoding in a text given.
function win874($str){
$win874=strpos($str,"windows-874");
return $win874;
}
function utf8($str){
$utf8=strpos($str,"UTF-8");
return $utf8;
}
Then I convert with php functions:
if(win874($headers->subject)=="0" and utf8($headers->subject)=="0"){
echo $headers->subject;
}
if(win874($headers->subject)>="1"){
$subj0=explode("?",$headers->subject);
echo $subj0[3];
}
if(utf8($headers->subject)>="1"){
echo imap_utf8($headers->subject);
}
Because text with windows-874 always begins with "=?windows-874?Q?" so I used the simple function like "explode()" to extract the main idea from the junk. As I said, the main idea always comes after the 3rd question mark. Then I have the subject.
But the problem remains. I still have to change the browser encoding to Thai to make the text readable. (settings>tools>encoding>Thai : in chrome). Any suggestions?

How to get details from QR code?

I got a string from a qr generated image. But how can I get URL out of it. The string I got is the following.
aHR0cDovL2R1Yml6emxlLWludGVydmlldy5zMy5hbWF6b25hd3MuY29tLzg3M2FhMTA5LnR4dA==
Can anybody help me to get all the information out of it?
Thanks
That string is encoded in Base64. The decoded version of your string is:
http://dubizzle-interview.s3.amazonaws.com/873aa109.txt
If you need to integrate this into your software, find a library that has a Base64 decoder to decode such strings.

Convert sxw to rml error

When I try to convert sxw file to rml file using OpenOffice , this error occurs :
Exception: 'asci' codec can't encode character u'\xe9'
what's the meaning of that error? and how can I fix it?
please check this link UnicodeEncodeError when trying to convert Django models to XML This is the same issue that we got here.
You can use yourfield.encode("utf-8") or use format() in openerp. [[format(obj.your_str_field or '')]]
We have Similar issues posted on lp: https://bugs.launchpad.net/openobject-server/+bug/956798
and it has been fixed on linked branch you can take the patch apply, which will make your report to tolerate the Unicode encoding.
Thank You

Resources