I need to take some raw text and convert the linefeeds to HTML breaks.
This does not work.
myhtml=Replace(myhtml, chr(13), "<br>")
What Does?
Chr(13) may not be the end-of-lines in your text, also Replace() only works on the first occurence. Try this...
myhtml = ReplaceAll(myhtml, EndOfLine.Unix, "<br>")
and test with the EndOfLine variations Macintosh and Windows.
Related
What I am doing:
I am using the gmail gem in a Rails 4 app to get email attachments from a specific account at regular intervals. Here is an extract from the core part (here for simplicity only considering the first email and its first attachment):
require 'gmail'
Gmail.connect(#user_email,#user_password) do |gmail|
if gmail.logged_in?
emails = gmail.inbox.emails(:from => #sender_email)
email = emails[0]
attachment = email.message.attachments[0]
File.open("~/temp.csv", 'w') do |file|
file.write(
StringIO.new(attachment.decoded.to_s[2..-2].force_encoding("ISO-8859-15").encode!('UTF-8')).read
)
end
end
end
The encoding of the attached file can vary. The particular one that I am currently having issues with is in Finnish. It contains Finnish characters and a superscripted 3 character.
This is what I expect to get when I run the above code. (This is what I get when I download the attachment manually through gmail user interface):
What the problem is:
However, I am getting the following odd results.
From cat temp.csv (Looks good to me):
With nano temp.csv (Here I have no idea what I am looking at):
This is what temp.csv looks like opened in Sublime Text (directly via winscp). First line and small parts look ok but then Chinese/Japanese characters:
This is what temp.csv looks like in Notepad (after download via winscp). Looks ok except a blank space has been inserted between each character and the new lines seems to be missing:
What I have tried:
I have without success tried:
.force_encoding(...) with all the different "ISO-8859-x" character sets
putting the force_encoding("ISO-8859-15").encode!('UTF-8') outside the .read (works but doesn't solve the problem)
encode to UTF-8 without first forcing another encoding but this leads to Encoding::UndefinedConversionError: "\xC4" from ASCII-8BIT to UTF-8
writing as binary with 'wb' and 'w+b' in the File.open() (which oddly doesn't seem to make a difference to the outcome).
searching stackoverflow and the web for other ideas.
Any ideas would be much appreciated!
Not beautiful, but it will work for me now.
After re-encoding, I convert the string to a char array, then remove the chars I do not want and then join the remaining array elements to form a string.
decoded_att = attachment.decoded
data = decoded_att.encode("UTF-8", "ISO-8859-1", invalid: :replace, undef: :replace).gsub("\r\n", "\n")
data_as_array = data.chars
data_as_array = data_as_array.delete_if {|i| i == "\u0000" || i == "ÿ" || i == "þ"}
data = data_as_array.join('').to_s
File.write("~/temp.csv", data.to_s)
This will work for me now. However, I have no idea how these characters have ended up in the attachment ("ÿ" and "þ" in the start of the document and "\u0000" between all remaining characters).
It seems like you need to do attachment.body.decoded instead of attachment.decoded
I would like to output name of generated Pdf file be this : Kuća_{id value}
Line of code:
$pdf->Output('Kuća_' . $id. '.pdf', 'D');
but outut is : Kuca_{id value}
// set font
$pdf->SetFont('freesans');
Font is Ok,I have unicode letters generated in pdf page..Also:
header('Content-Type: text/html; charset=UTF-8');
is set on php page
Tnx
From tcpdf docs:
Parameters
$name (string) The name of the file when saved. Note that special characters are removed and blanks characters are replaced with the underscore character.
I got a serious problem regarding Unicode and utf8,
I saved a paragraph of Arabic/Persian text file into notepad and saved it, now I saw my information like
Êæ Çíä ÓæÑÓ ÈÑäÇãå ÚÏÏ ÏáÎæÇåí Ñæ ÇÒ æÑæÏí ãííÑå æ Èå Øæá åãæä ÚÏÏ ãËáËí Ñæ ÑÓã ãí ˜äå
my question is how to get back my data, it is important for me to get this data back, thanks in advance
The paragraph was scrambled by saving as code page 1256 (Arabic/Persian), then interpreted as code page 1252 (Western Europe), and finally saved as Unicode text. You can use C# to reverse this procedure:
string scrambled = "Êæ Çíä ÓæÑÓ ÈÑäÇãå ÚÏÏ ÏáÎæÇåí Ñæ ÇÒ æÑæÏí ãííÑå æ " +
"Èå Øæá åãæä ÚÏÏ ãËáËí Ñæ ÑÓã ãí ˜äå";
byte[] bytes = Encoding.GetEncoding("windows-1252").GetBytes(scrambled);
string plainText = Encoding.GetEncoding("windows-1256").GetString(bytes);
Console.WriteLine(text);
The plain text output is:
"تو اين سورس برنامه عدد دلخواهي رو از ورودي ميگيره و به طول همون عدد مثلثي رو رسم مي کنه"
On Linux you can use Gedit to open it as a 1256 encoded file:
gedit shahnameh.txt --encoding WINDOWS-1256
You can do the same work via gui. You just need select the correct encoding from "open" dialog box when opening a file. It should be at the bottom of the open dialog.
I have to parse NSData with XML string, does somebody know simple category to do it? I have such for JSON, but I forced to use XML. I tried to use XMLReader, it's interface looks clean, but I found some issues:
Mysterious new line characters and spaces everywhere:
"comment_count" = {text = "\n \n 21";};
My cyrillic symbols looks so:
"description_text" = {text = "\n \U041f\U0438\U043a\U0430\U0431\U0443\U0448};
Example:
<?xml version="1.0" encoding="UTF-8" ?>
<news>
<xml_count>43</xml_count>
<hot_count>449</hot_count>
<item type="text">
<id>1469845</id>
<rating>147</rating>
<pluses>171</pluses>
<minuses>24</minuses>
<title>
<![CDATA[Обновление огромного архива Пикабу!]]>
</title>
<comment_count>26</comment_count>
<comment_link>http://pikabu.ru/story/obnovlenie_ogromnogo_arkhiva_pikabu_1469845</comment_link>
<author>icq677555</author>
<description_text>
<![CDATA[Пикабушники, я обновил свой огромный архив текстовых постов из горячего!]]>
</description_text>
</item>
</news>
I just realized whats' going on. Your data samples are obviously NSDictionary instances printed in the debugger. So the issues you found are:
As XML was originally designed as an annotated text format, the whitespace (spaces, newlines) handling doesn't perfectly fit for data only usage. You can either trim all resulting strings ([stringVar stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]]), adapt XMLReader to do it or use the XML parser at http://ios.biomsoft.com/2011/09/11/simple-xml-to-nsdictionary-converter/ (which does it by default).
The funny output you get for Cyrillic characters is the proper escaping for non-ASCII characters in the debugger output (which uses the old-style property list format). It's an artifact of the debugger output. Your variables contain the proper characters.
BTW: While JSON contains implicit type information (strings are always quoted, numbers are never quoted etc.), XML without a schema file does not. So all the parsed simple values will be strings even if they originally were numbers.
Update:
The XML parser you're using still contains the old whitespace handling code described in Pesky new lines and whitespace in XML reader class (though the comment tells otherwise). Apply the fix mentioned at the bottom of the answer, namely change the line:
[dictInProgress setObject:textInProgress forKey:kXMLReaderTextNodeKey];
to:
[dictInProgress setObject:[textInProgress stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]] forKey:kXMLReaderTextNodeKey];
At the server, I have a string of text that is concatenated together and has \n added to create line breaks. This is done in VB.net as a string object.
This string is that added to a dictionary which is then available as a JSON webservice.
On my iOS app, I am reading the JSON and parsing it, saving the contents to core data.
when I display my text in a UITextView the string "hello\nthere" is shown exactly like that with no line breaks and the \n visible.
what have I done wrong?
I have also tried \r instead.
should I be making a string in the vb.net part like this "hello\nthere" - is that valid, or should I be using "hello" & vbcrlf & "there" and letting the JSON parser convert the line break to \n (if it does this)
or is the problem in the iOS side?
can't see how else to mark this as complete - but please see the comment from – patric.schenke May 31 at 9:51
he was indeed corerct - the text had escaped to \n so replacing this with \n and saving that to core data has solved the issue.