RTF file to TXT/CSV file in objective-c? - ios

I have RTF files containing that sort of content:
long_text_description_1 number1a number1b number1c
long_text_description_2 number2a number2b number2c
long_text_description_3 number3c
long_text_description_4 number4a number4b number4c
…
I need to extract the plain raw text without the colours, fonts and other formatting thing.
The only thing I need to keep are the most basic row/column information, ideally I would like a CSV file.
The file I get contain all the formatting:
{\cs18\lang1033\langfe1033\f0\b\i0\ul0\strike0\scaps0\fs15\afs15\charscalex100\expndtw0\cf1\dn0 number1a}
What is the best way to remove all rtf information while only keeping the row information?
Trying to figure out myself many many regular expressions sound dangerous unless there is a complete understanding of the RTF format.
What I could find on the Internet mostly focused on using Windows languages & libraries unavailable in iOS.

All rtf tags are in the form \xxx.
Try using a regular expression like "\\S+" and remove all matches or replace with nothing.
For your example, you'll end up with { number1a} This will remove any backslash followed by any characters.

Related

CSV file with Italic values

Is there any way to create a csv file using c#, which can have/show few values in Italic format, when we open it in excel.
Its just not possible. Theres no markup in csv. Either export an xls(x) or rethink your problem/solution. Why csv? It's not really meant for people to read. Only to transfer data from one application to another.
A CSV file is a text file where Excel can only interpret the type of field content as best (text, numeric, date) but not within a field. So the short answer is no.
There are libraries available for the ASP.NET MVC environment which allow you to create true Excel files so you then have complete control over field formats etc. A quick Google will find these.
UPDATE
A possible solution, if you are using MVC, is to create an HTML 'file' and then download that:
this.Response.AddHeader("Content-Disposition", "Employees.xls");
this.Response.ContentType = "application/vnd.ms-excel";
return this.Content(sb.ToString());
I've never tried this but have seen that it might work.

lua reading chinese character

I have the following xml that I would like to read:
chinese xml - https://news.google.com/news/popular?ned=cn&topic=po&output=rss
korean xml - http://www.voanews.com/templates/Articles.rss?sectionPath=/korean/news
Currently, I try to use a luaxml to parse in the xml which contain the chinese character. However, when I print out using the console, the result is that the chinese character cannot be printed correctly and show as a garbage character.
I would like to ask if there is anyway to parse a chinese or korean character into lua table?
I don't think Lua is the issue here. The raw data the remote site sends is encoded using UTF-8, and Lua does no special interpretation of that—which means it should be preserved perfectly if you just (1) read from the remote site, and (2) save the read data to a file. The data in the file will contain CJK characters encoded in UTF-8, just like the remote site sent back.
If you're getting funny results like you mention, the fault probably lies either with the library you're using to read from the remote site, or perhaps simply with the way your console displays the results when you output to it.
I managed to convert the "中美" into chinese character.
I would need to do one additional step which has to convert all the the series of string by using this method from this link, http://forum.luahub.com/index.php?topic=3617.msg8595#msg8595 before saving into xml format.
string.gsub(l,"&#([0-9]+);", function(c) return string.char(tonumber(c)) end)
I would like to ask for LuaXML, I have come across this method xml.registerCode(decoded,encoded)
Under that method, it says that
registers a custom code for the conversion between non-standard characters and XML character entities
What do they mean by non-standard characters and how do I use it?

What Character encoding is this?

When i backup my blackberry using blackberry desktop mananger, it saves it as an .ipd file.
its in hex... Not sure if its any particular type. But i used software called ABC amber Text Converter to convert this .ipd file into plain text format. And some of it comes out as plain text, Like all the messages saved in the backup file. But some of the text in the file looks like this:
qÖ²u_+;¢õ¿B[[¤†D`Ø,>p
|Cñ:ÌQ†nÁä¼sÒ®sKDv©{(]
)++³É«.gsn>
z
'‚51o4Kq
8Ütâ¯cí¿þ2´Õ|5kl$S,H
dbiIjz
*!~k$|
&*OÝ>0ðî­wã
+zno%q
2k;
YnÁÅŸ5|Xñ7Ú<}y2
A
V܉lO5‰<œtÅRI-I
Does anybody have any idea What the hell this is or if there is Any way i can decode this?
Thanks
It's just binary data. You may have been able to extract some text from the file where strings of text were stored, but the rest will be just bytes of data.
You'll need a specific program that understands these backup files. A quick google reveals a few choices, such as MagicBerry.
One of the Blackberry developers has helpfully blogged a bit of information about the binary format, so you could try using that to write your own program to parse it:
http://us.blackberry.com/devjournals/resources/journals/jan_2006/ipd_file_format.jsp

what are the other setting need to see a html table into excel sheet format in open office org?

I have generated a html table from my web application and save the table into .xls format(in a single word i am generating a .xls sheet from my web application ).
What other setting I have to show it in table form.
You are not producing an XLS file, you are producing a mal-formed HTML file with a name that ends in .xls.
Indeed, you aren't even doing that since there aren't files on the web (there are streams that may or may not end up in files).
Different versions of Open Office, with different settings, will differ in terms of how they deal with stuff that is wrong. The version on one of the machines you are doing is saying "eh, this isn't XLS, oh! it's HTML with a table, I know what to do", while the other is getting as far as "eh, this isn't XLS, it's a bunch of text with strange less-than and greater-than characters all over the place, what do I do".
What you want to do is to produce an actual stream that Open Office and other spreadsheets can deal with. XLS is possible, but pretty hard. Go for CSV instead.
If your table was going to be:
<table>
<tr>
<th>1 heading</th><th>2 & last heading</th>
</tr>
<tr>
<td>1st cell</td><td>This is the "ultimate" cell</td>
</tr>
</table>
Then it sould become:
"1 heading","2 & last heading"
"1st cell","This is the ""ultimate"" cell"
In otherwords newlines to indicate rows, commas to indicate cells, no HTML encoding, quotes around everything and quotes in your actual content doubled-up. (You don't need to always have quotes on your content, but it's never wrong so that's simpler than working out when you do need them).
Now, make your content type "text/csv".
You are now outputting a CSV stream that can be saved as a CSV file. Your spreadsheet software will have a much better idea about what to do with this (it may still ask about character ecodings on opening, but the preview will show you a spreadsheet of data, not a bunch of HTML source all over the place.
It's not really saving as a .xls file -- it appears to be saving as the HTML, but with a .xls extension. How are you generating the .xls? On the server-side, you can provide a button to generate .xls directly (different methods depending on your server platform -- using perl there is the Spreadsheet::WriteExcel module that writes .xls directly, using Java there is JExcel (http://jexcelapi.sourceforge.net/ and POI (http://poi.apache.org/)), other platforms will have their methods.
Okay Subodh, If you want to generate .xls or .csv files, You can't just change the extension of the file and have it open up correctly in that program.
2 Options you have at this point, both involve creating the file with the data on the server and then sending it to the user to download it.
.csv
CSV files are easier to generate from the server side. In a very basic way you can think of them as regular text files with commas(not necessarily only commas) separating individual cells that can be read by spreadsheet programs. For PHP there is an article Here that explains how to generate CSV files.
.xls
xls files are not as simple as simple to generate as CSV files. On the server-side you will need a solution to generate these. For PHP there is a resource Here.
Using xls over CSV has obvious advantage that you can specify formatting and can control visual representation of your data.
Edit :
Upon closely looking at the image you posted, I can see what you are trying to do. If you just want to get that file to open correctly in a spreadsheet program, then don't save it either as CSV or xls
hello.html
<table>
<tr><td>Hi</td><td>Hi</td><td>Hi</td><td>Hi</td></tr>
<tr><td>2</td><td>2</td><td>131</td><td>11312</td></tr>
</table>
Saved as an HTML file will open up correctly(as a proper table) in any spreadsheet program.
To narrow down the problem:
1) Are you opening the same .xls file on both machines?
- what version of OpenOffice is on Machine 1?
- what version of OpenOffice is on Machine 2?
2) How are you creating your .xls file?
- are you just using the response object to change the content-type, or some proprietary software?
- can you include a code sample?
3) Have you tried a pure HTML format?

Config file format

does anyone knows a file format for configuration files easy to read by humans? I want to have something like tag = value where value may be:
String
Number(int or float)
Boolean(true/false)
Array(of String values, Number values, Boolean values)
Another structure(it will be more clear what I mean in the fallowing example)
Now I use something like this:
IntTag=1
FloatTag=1.1
StringTag="a string"
BoolTag=true
ArrayTag1=[1 2 3]
ArrayTag2=[1.1 2.1 3.1]
ArrayTag3=["str1" "str2" "str3"]
StructTag=
{
NestedTag1=1
NestedTag2="str1"
}
and so on.
Parsing is easy but for large files I find it hard to read/edit in text editors. I don't like xml for the same reason, it's hard to read. INI does not support nesting and I want to be able to nest tags. I also don't want a complicated format because I will use limited kind of values as I mentioned above.
Thanks for any help.
What about YAML ? It's easy to parse, nicely structured has wide programming language support. If you don't need the full feature set, you could also use JSON.
Try YAML - is (subjectively) easy to read, allows nesting, and is relatively simple to parse.

Resources