No Unicode-version HTML Tidy for Delphi?

No Unicode-version HTML Tidy for Delphi? - delphi

I downloaded the latest version (TidyPas_Delphi2010.zip) from the official homepage (http://sourceforge.net/projects/curlpas/files/).
But to my surprise, there are full of AnsiString in the unit instead of string(UnicodeString).
Does anybody use this? No Unicode version?
Thanks

TidyPas is just a wrapper around the HTML Tidy library API. It does not provide a UnicodeString facade over that API, it exposes the API as-is.
As far as I can tell from the docs, HTML Tidy itself only supports a limited range of character sets, but these do include the UTF8 encoding of Unicode, which with a bit of care I think should be OK with ANSIString and ANSIChar types used by the API.
Any further inquiries about Unicode support in HTML Tidy other than with UTF8 would probably be best directed at the HTML Tidy community itself. It doesn't seem to have been updated for a while though (since 2008).

Yes, it does work in Delphi 2010 - I updated the code ;-) And yes, you need to convert the input from Unicode to UTF8 to handle it. You can find the (working) code I use at http://www.csinnovations.com/framework_delphi.htm

Related

How to use long strings in delphi xe4?

I want decode a large base64 code on my delphi project
When I paste it in my project I see the Long String error ..
for solve it I use to it syntax:
'samecode'+
'samecode'+
'samecode';
But if I manually using this syntax it's too large time ...
Is there the quick way for solve it ?

You have a few options:
Compile the text to a string resource and link that to your executable. Load the resource at runtime.
Place the text in a file that you deploy alongside your executable and load it at runtime.
Write a script to read the text and format it to a manner suitable for inclusion in your source code.
Since your text is actually a base64 encoded file, I doubt that you want to do any of this. What you really ought to be doing is decoding the base64 text to a binary file and linking that as a resource.
Given that the base64 encoded file is in fact a virus (MSIL/Bladabindi.AJ), I cannot imagine anybody wanting to help you. I'm disappointed that I've done as much as I have. You should be ashamed of yourself.

DataTables, PDF and special characters

I am using DataTables and the TableTools PDF export function. The PDF-export does not take care of certain special characters and translate them into rubbish (or ISO equivalences, i guess). The characters are '●' ●, '○' ○, and '‭٭‭' ٭.
Is there any way to define the character set for the PDF so I can preserve those special characters? (I'm guessing that character set is the problem) Or any other workaround?

No, there isn't a way to configure the character set for the PDF. DataTables, or specifically its TableTools add-on, uses a fairly limited Flash-based PDF exporter.
You can, however, edit the ActionScript used to make the TableTools Flash add-on.
Download TableTools and look in the archive's \media\as3 directory for .as files.
If you don't have Adobe's software for Flash authoring, you might try the open source Adobe Flex.

A late answer (to my self) but others could benefit. I figured out to use mPDF instead. It supports UTF8, languages with special characters and embedded stylesheets.

unicode to Ascii API

Hi I am developing an application to display Indian language in application. The text that i use in my application is in Unicode format. I would like to know how can I convert these unicode to ASCII so that I can Display them on my application

Malyalam Unicode
The font file has wrong unicodes for the malyalam characters, using the font editor you need to write the corresponding malyalam unicodes and then it might work properly..

But why do you need to convert into ASCII? You can directily show unicodes and it would get converted to corresponding language..
Please see this link, i guess this would help:
See this at blackberry forums

How do I convert HTML into document form? [duplicate]

This question already has answers here:
Convert HTML to word file ?
(2 answers)
Closed 8 years ago.
I'd like to be able to convert HTML to either docx or RTF. There are plenty of Ruby gems for creating docx and RTF docs, but they are just for creating an empty document, which you can then programmatically add stuff to.
The issue with those gems is there is no way to accurately convert the format of a webpage to be the same/similar on a printable page. There are a lot of complexities with HTML tags, and the position of those tags due to their CSS attributes.
With my current knowledge of the gems out there for RTF and Word creation, I'd have to write an HTML parser and convert all the HTML tags to similar openXML tags, such as bold, and italic, but then position things based on the CSS, but due to position: relative/absolute rendering a document page would be extremely difficult.
I'm wondering if there are any recent developments, or if there is some soon-to-be-released gem or service or tool to be able to handle this conversion.
There is a gem that is supposed to convert Word to and from HTML, but, it has no documentation, and can only be found at https://www.ruby-toolbox.com/gems/word_parsing and on rubygems. And, I've been unsuccessful installing it on my local machine, due to dependency issues. Since there is no documentation, there is no mention how to fix the dependencies.
There are services out there that will convert PDF to "word", and converting HTML to PDF has already been solved by multiple people or gems. This service: http://www.pdftoword.com/ converts PDF to RTF, and even separates out the images in the resulting document. Their issue is that it runs on a Windows server -- I need something cross platform, because the app I'm working on is Ruby on Rails running on Unix based servers.

I've published a little gem that generates docx files from html templates.
https://github.com/docxtor/docxtor
It can insert page numbers, footers/headers with given <div>'s contains, translate <h1> headings to document headings.
The catch is that all word processors parse docx format differently. So the resulting files are read just fine by Libre Office on Mac, but wouldn't open in Google Docs.
Any help and/or feedback on a gem is much appreciated!

I'm also looking for this kind of solution, I think it's better looking at on https://github.com/bagilevi/docx_builder. I haven't tried it yet however. Read this article also http://rubythings.blogspot.com/2011/05/creating-word-documents-in-rails.html
If someone could come up with a better solution, we all would be thankful :)

How to upgrade an MSXML Document from version 1 to verison 6?

My application uses MSXML version 1 (MSXML.DOMDocument) to store user documents in XML format.
I want to upgrade to MSXML6 (Msxml2.DOMDocument.6.0). The problem is that old documents are not always readable with the new version.
The cause of this is that the old MSXML parser does not correctly encodes non-Latin character as UTF-8, and the new parser refuses to load these document.
My question - how can I read / convert my customers' existing files to be readable in MSXML6?

It is really a good idea to fix those old xml files with correct encoding. In fact, a W3C conformant xml parser is expected to choke when handling this kind of xml files.
As far as I know, MSXML does not provide functionality to fix the encoding for old xml files.
To fix the encoding, you can do it manually with Notepad++ (choose the actual encoding, and then convert to utf-8), or convert programmatically if you are sure of the original encoding, e.g. ANSI in your case. There should be いろいろ sample codes over the internet.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart