I have an XML document which I want to parse using NSXMLParser. One of the tags it can contain is <html>, and in my parsed representation I want the contents of that tag, verbatim. However, when I parse the document, my delegate methods are called for the start, end and contents of each tag inside the html tag.
I can't get the provider of the document to add CDATA tags; nor can I use something other than NSXMLParser to parse the document.
Is there a way for me to tell the parser to treat the contents of HTML tags as CDATA and to leave them unparsed, even if they contain other tags?
That's too bad that the owner of the XML feed won't fix it because, depending on the HTML, you may end up with a malformed XML feed. If it really is an XML document, they definitely should wrap it in a CDATA or replace all the < with < and all the > with >.
Frankly, if all you need is the HTML, and all you have is XML tag that contains the HTML without the CDATA or appropriate character replacement, I might not be inclined to try to run it through NSXMLParser at all (because the successful parsing is contingent on the nature of the HTML included). I'd use a NSScanner or NSRegularExpression to extract all of the text between the XML's opening and closing tag that wrap your HTML.
Or, if you really want to use NSXMLParser (because there's other stuff in addition to the HTML that you need), then manually alter the NSData, wrapping the HTML in a CDATA yourself.
If, on the other hand, the document you're trying to parse really isn't XML, but rather is just HTML, then of course, you shouldn't be parsing it with an XML parser. You should be using a HTML parser, like HPPLE, as described in Galloway's article, How to Parse HTML on iOS on the Ray Wendlich site.
Related
I have a JSON file like this. I have to make bold part of string which is shown in JSON. How can I make parse this JSON?
It looks to me like you would first want to use NSJSONSerialization (Or just JSONSerialization in Swift 3) to convert your JSON to an object graph. Once you've done that, you should be able to navigate to the interestLabel keys in your data and fetch those strings.
You'll then need to parse those tagged strings somehow. If the only thing you need to do is to find <b> and </b> bold tags, and no other tags will ever appear in your data then you could probably write your own code. If the strings might have other tags and/or more complex HTML structure then you might want to use an XML/HTML parser. I suggest taking a look at this tutorial: https://www.raywenderlich.com/14172/how-to-parse-html-on-ios
I am knocking together a quick debugging view of a backend, as a small set of admin HTML pages (driven by angulardart, but not sure that is critical).
I get back from my XHR call a complex JSON object. I want to see that on the HTML page formatted nicely. It doesn't have to be a great implementation, as its just a debug ui, but the goal is to format the object instead of having it be one long string with no newlines.
I looked at trying to pretty print JSON in dart then putting that inside <pre></pre> tags, as well as just dumping the dart Map object to string (again, inside or not inside <pre></pre> tags. But not getting to where I want.
Even searched pub for something similar, such as a syntax highlighter that would output html, but didn't find something obvious.
Any recommendations?
I think what you're looking for is:
Format your JSON so it's readable
Have syntax highlight
For 1 - This can be done with JsonEncoder with indent
For 2 - You can use the JS lib called HighlightJs pretty easily by appending your formatted json into a marked-up div. (See highlightjs' doc to see what I mean)
I'm trying to parse an XML using TBXML and everything is going fine except for tags which contain special characters in their value.
For example, consider the XML element
<tag> sources/data </tag>
I'm trying to get the text sources/data from this tag. I'm using [TBXML textForElement:element] to achieve this. But it always returns an empty string.
The same code fails for another tag which is defined as :
<tag> array[i] </tag>.
But it works fine for normal text values like
<tag>name</tag>.
Can anyone help me out here ?
Quote: "Because XML syntax uses some characters for tags and attributes it is not possible to directly use those characters inside XML tags or attribute values."
http://www.dvteclipse.com/documentation/svlinter/How_to_use_special_characters_in_XML.3F.html
As I know this kind of data must be in placed CDATA.
I have an XML tag like
I can parse this element, but I get only the s character in my string.
<title>Transport information Classic World's </title>
And I parsed it like this, but in my object I get only the 's' character.
if ([elementname isEqualToString:#"title"])
{
currentTweet.content = currentNodeContent;
}
How can I decode the whole text in title ?
Try while you creating XML use CDATA tags like
<title><![CDATA[Transport information Classic World's]]></title>
Also here is a list of HTML Tags and more cases XML with those characters is invalid, unless they are contained within a CDATA.
Also try this Question hope with help you
As You asking the you can not change the XML so till now you will not resolve i think parser is not able to parse this XML.
If you have such possibility, wrap special characters in CDATA tags, when you create this XML.
I am trying to read a XML file from a third party with Nokogiri in my rails project.
One of the nodes I have ot parse contains an URL with unescaped ampersands (like foo.com/index.html?page=1&query=bar)
I understand that this is considered malformed XML, and Nokogiri just tries to parse it anyway, resulting in foo.com/index.html?page=1=bar.
How can I obtain the full URL? Can I tweak Nokogiri? Would you do a search&replace-prerun or what would be the best practice?
Had the same issue parsing SVGs with image links containing ampersands.
Parsing SVGs as HTML seems to correctly handle the links, escaping &.
fixed_svg = Nokogiri::HTML.fragment(raw_svg).to_html
# proceed with XML parsing
svg = Nokogiri::XML(fixed_svg)