How to use html parser in blackberry? - blackberry

How to use html parser in blackberry?
Any example of html parser in android
Html.fromHtml(tmpHtml).toString()
If I can use this method in blackberry how can I use this method and parse html string.

Check the answer of the question Sax parser throwing fatal error on Blackberry.
There is a link to an article, Parsing HTML on Blackberry. It might be helpful.

Related

Text to speech - How to parse the SSML string in Objective C

I am getting a SSML text in the JSON and trying to find out a standard way to parse this SSML format i.e
"text" : [ "<speak>Screen title <break strength='weak '/> Sign In <break strength='weak '/> </speak>" ]
for my tts application. But I find no way except doing manually. I need to fetch out the real string to be played using AVSpeechSynthesizer. Has anyone tried this before? Help!
Currently it is uttering the complete text->value with tags.
Well, I fixed it using the regular expression to parse SSML tags as suggested by #Carpsen90. There is no other way I found. But, thanks for all your help!

How can I use Apache Tika to extract css and html text

I want to use apache tika to extract html text and css class names so that I can build a poi spreadsheet. I can get the text but how do I extract css class names?
Thank You In Advance ...
Try creating a custom handler. If you override the startElement method you'll have access to the html attributes. Inheriting from BodyContentHandler should be pretty simple as a starting point. If the element you're targeting isn't getting mapped and you're not getting it passed into startElement you'll need to tell the ParseContext to let it through, by either using the IdentityHtmlMapper or writing your own mapper.
You could run Tika from command line
java -jar tika-app.jar -h [file|port...]
(-h or --html is an option that gives the Output of HTML content)
You could also do it programmatically by using the html parser:
Parser parser = new HtmlParser();
Thats not enough since the HTML parser first transforms the incoming HTML document to wellformed XHTML and then maps the included elements to a “safe” subset. The default
mapping drops things such as and elements that don’t affect the
text content of the HTML page and applies other normalization rules. This default
mapping produces good results in most use cases, but sometimes a client wants more
direct access to the original HTML markup. The IdentityHtmlMapper class can be
used to achieve this:
ParseContext context = new ParseContext();
context.set(HtmlMapper.class, new IdentityHtmlMapper());
Finally you can get your content by calling the parse method:
parser.parse(stream, handler, metadata, context);
Hope this helps a bit. :)

Can I leave some sections unparsed using NSXMLParser?

I have an XML document which I want to parse using NSXMLParser. One of the tags it can contain is <html>, and in my parsed representation I want the contents of that tag, verbatim. However, when I parse the document, my delegate methods are called for the start, end and contents of each tag inside the html tag.
I can't get the provider of the document to add CDATA tags; nor can I use something other than NSXMLParser to parse the document.
Is there a way for me to tell the parser to treat the contents of HTML tags as CDATA and to leave them unparsed, even if they contain other tags?
That's too bad that the owner of the XML feed won't fix it because, depending on the HTML, you may end up with a malformed XML feed. If it really is an XML document, they definitely should wrap it in a CDATA or replace all the < with < and all the > with >.
Frankly, if all you need is the HTML, and all you have is XML tag that contains the HTML without the CDATA or appropriate character replacement, I might not be inclined to try to run it through NSXMLParser at all (because the successful parsing is contingent on the nature of the HTML included). I'd use a NSScanner or NSRegularExpression to extract all of the text between the XML's opening and closing tag that wrap your HTML.
Or, if you really want to use NSXMLParser (because there's other stuff in addition to the HTML that you need), then manually alter the NSData, wrapping the HTML in a CDATA yourself.
If, on the other hand, the document you're trying to parse really isn't XML, but rather is just HTML, then of course, you shouldn't be parsing it with an XML parser. You should be using a HTML parser, like HPPLE, as described in Galloway's article, How to Parse HTML on iOS on the Ray Wendlich site.

Reading malformed XML with Nokogiri: Unescaped Ampersands in URL field

I am trying to read a XML file from a third party with Nokogiri in my rails project.
One of the nodes I have ot parse contains an URL with unescaped ampersands (like foo.com/index.html?page=1&query=bar)
I understand that this is considered malformed XML, and Nokogiri just tries to parse it anyway, resulting in foo.com/index.html?page=1=bar.
How can I obtain the full URL? Can I tweak Nokogiri? Would you do a search&replace-prerun or what would be the best practice?
Had the same issue parsing SVGs with image links containing ampersands.
Parsing SVGs as HTML seems to correctly handle the links, escaping &.
fixed_svg = Nokogiri::HTML.fragment(raw_svg).to_html
# proceed with XML parsing
svg = Nokogiri::XML(fixed_svg)

SAX PARSER IS NOT PASRSING AFTER "&" SYMBOL

my requirement is to parse xml data from the server side and display it in Blackberry, I am using SAX parser to perform this operation. I am using an example to explain the scenario.
<Name>ABC</Name>
<Company>TCS</Company>
<Name>DEF</Name>
<Company>E&Y</Company>
In the above example, it is possible to read all the attribute except the "E&Y".
Your xml is corrupted. Check for xml escaping.
Proper xml should look like:
<Company>E&Y</Company>
Fix your xml and the parser becomes to work OK.
Check this thread
Blackberry UTF-8 Problem
One ansewr says:
Most likely your xml is in UTF-8 while you have response.getBytes(). String.getBytes() returns bytes for default OS encoding which is ISO-8859-1 on BB. So try to get UTF-8 bytes by calling response.getBytes("UTF-8").
Hope that helps
I guess the problem is the encoding
Search for "encoding='UTF-8' sax parser"

Resources