How to decode windows-874 imap subject? - imap

I'm having a serious problem with imap decoding. I received an email which might be encoded in windows-874. And this causes the whole letter to be read. I tried to use iconv('tis-620','utf-8',$txt) but I've had no luck.
I've tried searching everywhere that there might be an answer but it seems like it is the first problem of the universe. (or I don't search the correct word?)
The subject is :
Charset : ASCII
=?windows-874?Q?=CB=E9=CD=A7=BE=D1=A1=C3=D2=A4=D2=BE=D4=E0=C8=C9=CA=D3=CB=C3=D1=BA=A7=D2=B9=E4=B7=C2=E0=B7=D5=E8=C2=C7=E4=B7=C2=A4=C3=D1=E9=A7=B7=D5=E8
30
=E2=C3=A7=E1=C3=C1=CA=C7=D1=CA=B4=D5=CA=D8=A2=D8=C1=C7=D4=B7=AB=CD=C2 8?=
So, please tell me what the encoding is, if it's not tis-62. How can I decode this into a human language?

Finally I found my way home. Firstly I created a function to detect any encoding in a text given.
function win874($str){
$win874=strpos($str,"windows-874");
return $win874;
}
function utf8($str){
$utf8=strpos($str,"UTF-8");
return $utf8;
}
Then I convert with php functions:
if(win874($headers->subject)=="0" and utf8($headers->subject)=="0"){
echo $headers->subject;
}
if(win874($headers->subject)>="1"){
$subj0=explode("?",$headers->subject);
echo $subj0[3];
}
if(utf8($headers->subject)>="1"){
echo imap_utf8($headers->subject);
}
Because text with windows-874 always begins with "=?windows-874?Q?" so I used the simple function like "explode()" to extract the main idea from the junk. As I said, the main idea always comes after the 3rd question mark. Then I have the subject.
But the problem remains. I still have to change the browser encoding to Thai to make the text readable. (settings>tools>encoding>Thai : in chrome). Any suggestions?

Related

How to prevent automatic hyperlink detection in the console of Firefox/Chrome developer tools?

Something that drives me nuts in the developper tools of Chrome (106) and Firefox (105) is the fact that whenever some text logged to the console via console.log(text) happens to contain a hyperlink, this link is not only turned clickable (I can live with it even when I usually prefer to have just plain text) but is abbreviated, if it is a long link. So when I want to control what precise link is in some variable, I cannot just write e.g. console.log(img.src), because some of the interesting information of the link is hidden.
You can try yourself with
var href = 'https://stackoverflow.com/search?q=%5Bgoogle-chrome-devtools%5D+%5Bconsole.log%5D+%5Bfirefox-developer-tools%5D+%5Bhyperlink%5D+automatic+detection&someMoreStuffTomakeTheLinkLonger';
console.log(href);
In both, Firefox and Chrome, the output for me contains some '...', e.g. in Firefox I obtain as output:
https://stackoverflow.com/search?q=%5Bgoogle-chrome-devtools…link%5D+automatic+detection&someMoreStuffTomakeTheLinkLonger
thus hiding the part after "-devtools". (Chrome hides a slightly different part). The console is mostly a debugging tool. I log things because I want to see them, not hide them. I always need to either hover with the mouse and wait for the tooltip (doesn't allow me to copy fractions of the link) or to right click copy the link and paste it somewhere where I can see it completely. Or take a substring to remove the "https://" in the front. But note that the variable isn't necessarily a single hyperlink, but can be any text containing several such hyperlinks. I didn't find a way to force console.log to just print plain text all content. Did anybody meet this problem as well and find a workaround?
I made this a community wiki answer, because the main insight is not from myself but from the comments. Feel free to improve.
The console.log() function allows several arguments, which allows also a formatted output similar to printf in some languages. The possibilities of formatting can be found in the documentation of console.log() on MDN. In any case, this formatted output provides a solution at least for Chrome, as #wOxxOm pointed out in the comments:
console.log('%O', href) // works in Chrome
This is rather surprising, because %O is described at MDN as
"Outputs a JavaScript object. Clicking the object name opens more information about it in the inspector".
It seems there is no 'clicking' in Chrome when the object is a string.
There is also %s for string output, but this just gives the standard behavior of replacing links in both browsers. And for Firefox none of the above two formatting options works. There one really has to replace the protocol "https://" by something that is not recognized as link. A space behind ':' seems enough, so "https: //". It turns out, that one can also insert a formatting string "https:%c//", which can even be empty, and thus yield an output which is the complete link and can be copied as well:
console.log(href.replace(/(https?:)/, "$1%c"), ""); // works in Firefox
In particular the FF solution is cumbersome, and there might also be several links within one console-output. So it is useful to define one's own log-function (or if one prefers, redefine console.log, but note the remark at the end)
function isChrome() {...} // use your favorite Chrome detection here
function isFirefox() {...} // use your favorite Firefox detection here
function plainLog() {
const msg = arguments[0];
if (isChrome() && arguments.length == 1 && typeof msg == "string") {
return console.log("%O", msg);
}
if (isFirefox() && arguments.length == 1 && typeof msg == "string") {
const emptyStyle = ""; // serves only as a separator, such that FF doesn't recognize the link anymore
const reg = /(https?:)\/\//g;
const emptyStyles = []; // we need to insert one empty Style for every found link
const matches = msg.matchAll(reg);
for (let match of matches) {
emptyStyles.push(emptyStyle);
}
return console.log(msg.replace(reg, '$1%c//'), ...emptyStyles);
}
return console.log(...arguments);
}
For browser detection isChrome() and isFirefox() see e.g. here on SO.
One can of course extend the redefinition also to the other console functions (console.info, console.warn, etc.)
The downside of the redefinition of console.log is that usually every output of the console shows also the last entry of the call stack as a practical link to the source of the logging. But due to the redefintion, this link is now always to the same place, namely the file and line number where plainLog() is defined and calls console.log(), instead of the place where the new log command plainLog() was called. This new problem is described on SO here, but the solution (see comment) is again a bit involved and also not completely satisfying to serve as a replacement for the built-in console.log . So if links appear only rarely in the logging, it's probably better to switch to the redefined plainLog() only for these links.

How to replace these extended ascii codes?

I am opening up .txt files but when they are loaded on Xojo weird characters like these (’ , â€ک) show up.
I've tried DefineEncoding and ConvertEncoding but it still doesn't seem to work.
output.text = output.text.DefineEncoding(Encodings.WindowsANSI)
output.text = output.text.ConvertEncoding(Encodings.UTF8)
You may have to define the encoding already at time of loading, not afterwards, or you'll get UTF8 chara from loading that you will then mess up with your posted code. So, pass the encoding to the Read function or load the data as a binary file, not as a text file.

Ruby How to convert back binary string from smsc

my app work with SMSC, and i need to get involve in sms before it send,
i try to send from the mobile that string
"hello this is test"
And when I check the smsc I got this as binary string of my text:
userData = "c8329bfd06d1d1e939283d07d1cb733a"
the encoding of this string is:
<Encoding:ASCII-8BIT>
I know that probably this userData is in GSM encoding in binary-string
so how can i get from userData back the clear text string ?
this question is for english lang, because in Hebrew I can get back the
string with this code:
[userData].pack('H*').force_encoding('utf-16be').encode('utf-8')
but in english i got error:
Encoding::InvalidByteSequenceError: "\xDA\xF3" followed by "u" on UTF-16BE
What I was try is to detect the binary string with ICU, and I got:
"ISO-8859-1" and the language that detected is: 'PT', that very strange cause my languages is English or Hebrew.
anyway i got lost with encoding stuff, so i try to encode to each name of list from Encoding.list
but without luck until now
thanks in advance
Shmulik
OK,
For who that also have this issue, i got the solution, thanks to someone from #ruby irc community (i missed his nickname)
The solution is:
for ascii chars that interpolate to binary:
You need that:
"c8329bfd06d1d1e939283d07d1cb733a".scan(/../).reverse_each.map { |h| h.to_i(16) }.pack('C*').unpack('B*')[0][2..-1].scan(/.{7}/).map.with_object("") { |x, s| s << x.to_i(2) }.reverse
Remember I sent this words in sms:
"hello this is test"
And that it has become in binary to:
"c8329bfd06d1d1e939283d07d1cb733a"
The reason that i got garbage in any encoding is, because the ascii chars is 7bits GSM, so only first 7bits represents the data but each another encoding uses at least 8bits, so that what the code actually do.
But this is just for ascii char set.
In another language like I use Hebrew, the SMS send as ucs2
So this code work for me:
[your_binary_string].pack('H*').force_encoding('utf-16be').encode('utf-8')
Very important to put the binary string in array
So that all for now.
If anybody want to translate and explain what exactly happen in the code for ascii char set, be my guest and welcome.
Shmulik

got wrong characters encoding using pdfbox to extract text from pdf

Recently,I have to index pdf into ElasticSearch and using pdfbox to extract text from pdf, however I got wrong characters encoding like this
Ýëĭ2ĈjŬj§ė¥
1 ŋ?nij"2$ 2016£ 2Ú 5Õ,”Òªj§?ně#ij"2ě
^ë2ļŘœ A$j§?n 2016£ě#ëÖĭ2Ĉļê
2 èÅŋ?n$ 2016£ 2Ú 6ÕöĿS¿ ĿS¿ ĿS
Õ¿ ĿSÖ¿ eöĿS&غĨĘ
http://www.sse.com.cnLćĈ
A$j§Ýëĭ2ĈŘĐ
My code is exactly the same as this page says here. I try pdfbox lib version from 0.8.x to 2.0.x, but it still can not work.
Any help or advice will be grateful!
I got answer from #Tilman comment.
See pdfbox.apache.org/1.8/faq.html#notext and the answer below too.

SQLITE UTF-16 Encoding Issues

OK, I've been pulling my hair out for a couple of days on this issue. There are a couple of technologies at use here, first I'm using Unreal Engine 4 to develop an iOS game and I'm linking to a static lib of sqlite3, that I create the Database for on Windows.
On windows everything works fine, I create the database, and if you do Pragma encoding; it shows UTF-16LE.
However, when on IOS everything falls apart. First of all, if I even try to create a empty database in iOS using sqlite3_open16 function, it will create a database with a bunch of junk at the end of the name, and if I open it, and do pragma encoding it will say UTF-8 (empty database with no tables).
If I try to connect to my existing one, I will have success 'randomly' sometimes, I think this has to do again with the weird characters that are appearing at the end of my string which I suspect is encoding issues.
The function being used to open the database is this:
bool Open(const TCHAR* ConnectionString)
{
int32 Result = sqlite3_open16(ConnectionString, &DbHandle);
return Result == SQLITE_OK;
}
Which works fine in windows but has the issues above in ios.
According to their documentation they use USC-2. From what I can tell in the sqlite source, it will use UTF-16LE. Do I need to do something to convert between these two? Or is there something else I might be missing here? Does anyone have any ideas? I'm hoping someone who might not be familiar with UE4 might still have some guesses.
edit: a list of things I've tried:
Use the UTF-8 Functions SQLITE these appear to work fine. UE4 has a function TCHAR_TO_UTF8 and that worked.
Try to use Objective C to ensure the encoding of UTF-16LE, this gave me the 'random' success I describe above. Besides not only appearing to only randomly work with the weird random text at the end of the string sometimes - anytime I try to pull data out of the database now, it comes back as mostly random question marks '????' with the occasional chinese character. The function I used to do this with is:
const TCHAR* UChimeraSqlDatabase::UTF16_To_PlatformEncoding(FString UTF16EncodedString)
{
#if PLATFORM_IOS
const TCHAR* EncodedString = (const TCHAR *)([[[NSString stringWithFString : UTF16EncodedString] dataUsingEncoding:NSUTF16LittleEndianStringEncoding] bytes]);
#else
const TCHAR* EncodedString = *UTF16EncodedString;
#endif
return EncodedString;
}
Tried using Unreals .AppendChar to add L'\0' to the end of the String, without including number 2's method, no success.
If you're seeing weird characters at the end of the file name when calling sqlite3_16, it sounds like your UTF16 file name was not NULL terminated.
To specify the encoding of the database, you can actually create it with any of the sqlite3_open functions, but the key is that as soon as the database is created, you must immediately set the encoding:
PRAGMA encoding = "UTF-16le";
Once the encoding has been set, you can't change it, so make sure to do this first thing after creating the database.

Resources