my app work with SMSC, and i need to get involve in sms before it send,
i try to send from the mobile that string
"hello this is test"
And when I check the smsc I got this as binary string of my text:
userData = "c8329bfd06d1d1e939283d07d1cb733a"
the encoding of this string is:
<Encoding:ASCII-8BIT>
I know that probably this userData is in GSM encoding in binary-string
so how can i get from userData back the clear text string ?
this question is for english lang, because in Hebrew I can get back the
string with this code:
[userData].pack('H*').force_encoding('utf-16be').encode('utf-8')
but in english i got error:
Encoding::InvalidByteSequenceError: "\xDA\xF3" followed by "u" on UTF-16BE
What I was try is to detect the binary string with ICU, and I got:
"ISO-8859-1" and the language that detected is: 'PT', that very strange cause my languages is English or Hebrew.
anyway i got lost with encoding stuff, so i try to encode to each name of list from Encoding.list
but without luck until now
thanks in advance
Shmulik
OK,
For who that also have this issue, i got the solution, thanks to someone from #ruby irc community (i missed his nickname)
The solution is:
for ascii chars that interpolate to binary:
You need that:
"c8329bfd06d1d1e939283d07d1cb733a".scan(/../).reverse_each.map { |h| h.to_i(16) }.pack('C*').unpack('B*')[0][2..-1].scan(/.{7}/).map.with_object("") { |x, s| s << x.to_i(2) }.reverse
Remember I sent this words in sms:
"hello this is test"
And that it has become in binary to:
"c8329bfd06d1d1e939283d07d1cb733a"
The reason that i got garbage in any encoding is, because the ascii chars is 7bits GSM, so only first 7bits represents the data but each another encoding uses at least 8bits, so that what the code actually do.
But this is just for ascii char set.
In another language like I use Hebrew, the SMS send as ucs2
So this code work for me:
[your_binary_string].pack('H*').force_encoding('utf-16be').encode('utf-8')
Very important to put the binary string in array
So that all for now.
If anybody want to translate and explain what exactly happen in the code for ascii char set, be my guest and welcome.
Shmulik
Related
What I am doing:
I am using the gmail gem in a Rails 4 app to get email attachments from a specific account at regular intervals. Here is an extract from the core part (here for simplicity only considering the first email and its first attachment):
require 'gmail'
Gmail.connect(#user_email,#user_password) do |gmail|
if gmail.logged_in?
emails = gmail.inbox.emails(:from => #sender_email)
email = emails[0]
attachment = email.message.attachments[0]
File.open("~/temp.csv", 'w') do |file|
file.write(
StringIO.new(attachment.decoded.to_s[2..-2].force_encoding("ISO-8859-15").encode!('UTF-8')).read
)
end
end
end
The encoding of the attached file can vary. The particular one that I am currently having issues with is in Finnish. It contains Finnish characters and a superscripted 3 character.
This is what I expect to get when I run the above code. (This is what I get when I download the attachment manually through gmail user interface):
What the problem is:
However, I am getting the following odd results.
From cat temp.csv (Looks good to me):
With nano temp.csv (Here I have no idea what I am looking at):
This is what temp.csv looks like opened in Sublime Text (directly via winscp). First line and small parts look ok but then Chinese/Japanese characters:
This is what temp.csv looks like in Notepad (after download via winscp). Looks ok except a blank space has been inserted between each character and the new lines seems to be missing:
What I have tried:
I have without success tried:
.force_encoding(...) with all the different "ISO-8859-x" character sets
putting the force_encoding("ISO-8859-15").encode!('UTF-8') outside the .read (works but doesn't solve the problem)
encode to UTF-8 without first forcing another encoding but this leads to Encoding::UndefinedConversionError: "\xC4" from ASCII-8BIT to UTF-8
writing as binary with 'wb' and 'w+b' in the File.open() (which oddly doesn't seem to make a difference to the outcome).
searching stackoverflow and the web for other ideas.
Any ideas would be much appreciated!
Not beautiful, but it will work for me now.
After re-encoding, I convert the string to a char array, then remove the chars I do not want and then join the remaining array elements to form a string.
decoded_att = attachment.decoded
data = decoded_att.encode("UTF-8", "ISO-8859-1", invalid: :replace, undef: :replace).gsub("\r\n", "\n")
data_as_array = data.chars
data_as_array = data_as_array.delete_if {|i| i == "\u0000" || i == "ÿ" || i == "þ"}
data = data_as_array.join('').to_s
File.write("~/temp.csv", data.to_s)
This will work for me now. However, I have no idea how these characters have ended up in the attachment ("ÿ" and "þ" in the start of the document and "\u0000" between all remaining characters).
It seems like you need to do attachment.body.decoded instead of attachment.decoded
I wrote a Bot for Telegram, where users can receive images for their requests. But there was one problem, which I could not solve.
Some example with parsing on Ruby:
json_object = JSON.parse(open("https://api.site.com/search/photos?query=" + message.text + "&per_page=10&client_id=42324d2lkedi234fs342dfse2c038fdfsdfs").read)
message.text - It's a field with request from users.
Everything works fine with latin literals, but when I send Cyrillic(API also supports Cyrillic alphabet) symbols I get the below error:
/Users/me/.rvm/rubies/ruby-2.4.0/lib/ruby/2.4.0/uri/rfc3986_parser.rb:21:in
`split': URI must be ascii only
"https://api.site.com/search/photos?query=\u0432\u0430\u0432\u0430&per_page=10&client_id=42324d2lkedi234fs342dfse2c038fdfsdfs"
(URI::InvalidURIError)
I used Encoding with utf-8 and win-1252, but nothing helped. How should this be fixed?
You should encode your cyrillic string:
URI.encode('http://google.com?1=АБВ') # => "%D0%90%D0%91%D0%92"
So, use it like this (or encode whole url):
URI.encode(message.text)
Try with
"anything".parameterize.underscore.humanize.downcase
I'm having a serious problem with imap decoding. I received an email which might be encoded in windows-874. And this causes the whole letter to be read. I tried to use iconv('tis-620','utf-8',$txt) but I've had no luck.
I've tried searching everywhere that there might be an answer but it seems like it is the first problem of the universe. (or I don't search the correct word?)
The subject is :
Charset : ASCII
=?windows-874?Q?=CB=E9=CD=A7=BE=D1=A1=C3=D2=A4=D2=BE=D4=E0=C8=C9=CA=D3=CB=C3=D1=BA=A7=D2=B9=E4=B7=C2=E0=B7=D5=E8=C2=C7=E4=B7=C2=A4=C3=D1=E9=A7=B7=D5=E8
30
=E2=C3=A7=E1=C3=C1=CA=C7=D1=CA=B4=D5=CA=D8=A2=D8=C1=C7=D4=B7=AB=CD=C2 8?=
So, please tell me what the encoding is, if it's not tis-62. How can I decode this into a human language?
Finally I found my way home. Firstly I created a function to detect any encoding in a text given.
function win874($str){
$win874=strpos($str,"windows-874");
return $win874;
}
function utf8($str){
$utf8=strpos($str,"UTF-8");
return $utf8;
}
Then I convert with php functions:
if(win874($headers->subject)=="0" and utf8($headers->subject)=="0"){
echo $headers->subject;
}
if(win874($headers->subject)>="1"){
$subj0=explode("?",$headers->subject);
echo $subj0[3];
}
if(utf8($headers->subject)>="1"){
echo imap_utf8($headers->subject);
}
Because text with windows-874 always begins with "=?windows-874?Q?" so I used the simple function like "explode()" to extract the main idea from the junk. As I said, the main idea always comes after the 3rd question mark. Then I have the subject.
But the problem remains. I still have to change the browser encoding to Thai to make the text readable. (settings>tools>encoding>Thai : in chrome). Any suggestions?
I'm using Prawn gem in my Rails app to generate PDF reports.
I read the documentation for putting the text in Arabic with text_direction RTL in arabic.
But, issue is that numbers are getting reversed here.
I wanted semester 1234 as الفصل الدراسي 1234,
but in my app the output is الفصل الدراسي 4321.
My two lines of code is here:
pdftable = Prawn::Document.new
pdftable.text(t('org.semester') + " " + #semester)
#semester = '1234' (The reason would be that it is being treated as a text/string, thus changes to RTL (reversed))
Anyway, Please help me to retain numbers in proper order without changing the RTL format.
Without hacking too much you could use
#semester.to_s.reverse
So you reverse the string twice
so after a long time writing down all different currencies i need for my currency converter i was going to paste them into Xcode. But when i do that the text doesn't turn red. Im afraid i need to rewrite it all string again, which took my almost 1h to do. Is there any way to fix this?
Datarray2 = [[NSMutableArray alloc]initWithObjects:#"United States Dollar",#”Euro”,#”Japanese yen”,#”Bulgarian lev”,#”Czech koruna”,#”Danish krone”,#”British pound”,#”Hungarian forint”#”Lithuanian litas”,#”Polish złoty”,#”Romanian leu”,#”Swedish krona”,#”Swiss franc”,#”Norwegian krone”,#”Croatian kuna”,#”Russian ruble”,#”Turkish lira”,#”Australian dollar”,#”Brazilian real”,#”Canadian dollar”,”Chinese yuan”,#”Hong Kong dollar”,#”Indonesian rupiah”,#”Israeli new shekel”,#”Indian rupee”,#South Korean won”,#”Mexican peso”,#”Malaysian ringgit”,#”New Zealand dollar”,#”Philippine peso”,#”Singapore dollar”,#”Thai baht”,#”South African rand”,nil];
EDIT: interestly, they don't show as string here at stackoverflow either outside from US Dollar which i wrote from inside xcode.
If you look at the text, the quotes are wrong. You have ”, but should have " (and the first USD one does).
Global find and replace the wrong quotes with the correct quotes.