Deserialize XML with UTF-16 encoding in ServiceStack.Text - character-encoding

I am trying to use ServiceStack.Text to deserialize some XML.
Code:
var buildEvent = dto.EventXml.FromXml<TfsEventBuildComplete>();
The opening xml line is:
<?xml version="1.0" encoding="UTF-16"?>
ServiceStack fails with the following error:
The encoding in the declaration 'utf-16' does not match the encoding of the document 'utf-8'.
I can see from the source of the Xml Serializer that ServiceStack uses UTF-8.
I am wondering whether ServiceStack.Text can deserialize UTF-16 and if so how? And if not, why not?

I have managed to hack my way around the issue. I'm not proud of it but....
var buildEvent = dto.EventXml.Replace("utf-16", "utf-8").FromXml<TfsEventBuildComplete>();

Related

Encoding in POJO to/from XML conversion within Camel

We have been very successful to carry out POJO to/from XML conversion within Camel. The following code exemplifies a typical case how we use Camel. Our application listens to an Oracle AQ. The queue entry is an xml String. The xml is then converted to POJO class (MyClass), we then do some transformation on the MyClass with data from other source. After this transformation, POJO object is converted back to a string and sent to other system (here we save to a file)
<route id="testing">
<from uri="oracleaq:queue:FUSEQUEUE"/>
<convertBodyTo type="generated.MyClass"/>
<bean ref="mainReqprocessor" method="Modify"/>
<convertBodyTo type="java.lang.String"/>
<setHeader headerName="Exchange.FILE_NAME">
<simple>output.xml</simple>
</setHeader>
<to uri="file:C:\\Temp\\OUT"/>
</route>
Everything works fine until yesterday when we introduced html tags into one of the text field of the POJO class. We wrapped the text with CData "<![CDATA[" + str + "]]>". But, when the POJO is converted to string, the encoding still occurred on the starting and ending brackets of CGata section, such as the following. Because of this, the resulting xml string is not valid xml any more, and therefore can not be converted back to MyClass for other application. This is not the desired behavior. How can I avoid the encoding on CDATA starting and ending brackets?[Notes: the first < and the last > in the cdata are encoded.]
<TEXT>
&lt;![CDATA[&lt;html&gt;&lt;div&gt;&lt;pre&gt;COMPONENT PARTS.&lt;/br&gt;&lt;/div&gt;&lt;/pre&gt;&lt;/html&gt;]]&gt;
<\/TEXT>
Although you have a marshalling/unmarshalling problem, you don't mention how you convert the XML to POJO and back. This would be a very important information to help.
If you are using JAXB for the conversion, this Q/A could perhaps help you:
JAXB Marshalling Unmarshalling with CDATA

Decompress NSString to xml in objective c

I'm having a stirng like below...
eJzdWFuTokgWfu9fUVHzaPRwURQ6LCeS+y1BEFR4Q0BAEBBQLr9+seyuqpnt3ujZ2Y3YWSMIk4+T55LnnI8kl7915+zpFlZ1UuQvz9iv6PNTmPtFkOTRy7Nt8Z/J599Wn5ZWXIUhuwn9axWuljCsay8Kn5Lg5XkNzPDyuQlylhwuO5Z5Xi3vUP36MDOY1N5u0Hm0K3tokleFUUeBr/ZWo7lf8SXy7XbUW/mxlzerpedfaElbzagZiqJL5Ovt8hxWErvC8OmMmC9Iaok8gCXyPnN9vY/q0csuCVbHKozoQkYM/sy0/Vqz9SniD8ZMs+DLErlLLAOvCVc4ii1QDEefMPzLlPoyG22+4svyrg6ci+uom8Jnd2c+QstxQapxwfrVnBwjebtbhl1Z5OEoMaJv4yXy7l3p5Sv0w2+MaYznji6t/WrZJOc3r4gndP6FmH9BySXyii/rxmuu9cpZIl9HS9+73VaQBe0PrjHaV5Fl6CcrlBidGv9fZ4EsKqqkic8r7CHzDiyRuyvIaz5Xy00S5aOxKnwaayavX57jpim/IEjbtr+201+LKkLwMRAEpZBRIKiT6Jfnx6wwkPJj8aemMV5e5InvZcngNWNxwLCJi+DpzbfvqbHMuyYMMTnm86jqs4/N8s93BJ1ixDPyIYKf0fZHp6ra+1zHHnZXZIbH8J7o8Mk2pZfnX35Q52wShXXz7xj7ZuihYetl13AVH9irTXLNUSpF+yYgBiKqTlmbg+mP2f0ouUTeHBzH7xn4sAIPQf648zAg6QHUGBktkhzuS5UsyEXBb67zOaZyZurBi7lj2FteTITdIT1sQsqpTlNiUw/wUzDvU5mntjqSklOTamLam5dBYpzoQHCwzcBEJGWEbu4Rsp+oASwEP3L9llOo8zEu0vDTdVdZibnlZOWkKIc4HtSzWQ9ox6L4nCA2iHokI9hpBRGjL49QPri/VML+EdeeQCnWa7zHiAmrJjmO1TN2MJQk9mwxDGiVCLQSDSJJBmt+cjxf0rXMhhCgArO5CBvpMGUNjqYNG0BJ0KBRt4zhsFvDELhWFuwTp0KQCgCzOYaGDGSdjrWASkfalga+RXNE6eB2x7FAf2CFRaN84+zMLNjDTmSB98Bri8NHLMF6V0CbQGgydef2zgZrnX36QSe0aJ6OQ4vrpAHEdJRe4jQRqBalGYPb8eL4bNR9EMzMEN7sXB2caiA927MWN/Y910NrvFhn0LfFiEl3rNPfsTZy0I4ZgPyw6VggdTcfYpe4Vmq1E6dBUL/GTneQs87ZybU4A4LZYz1iKBo7LHaFLeYPnA/p4oF30BoxXN3LY91gqIfbjZvLsbp/jetNJxh12gJPwBN3ggz3MR8yMB3bH5+NeurxAs43Oz1xOuBoG7lkyxqOrBSuFN98DbzmELBjXPc8yqAQaKAr+6mwRs1QpPt9uOsPGpCBGPvYaaJ3bnq1zRAuvGAWhg6mQt8qPQ4Y2bq0z+LQ+ZuOpGgZansPzaDCISBal9b0GAv1aZfQ/Bm/oJ3M9AZ2od1bOul1pU/RhGGhdRngpfER69IXwWUswYOXbpnO1hOCOcx2gz/UjazfCgKFWbXPPYlV1yCCNADCKTrO2uj4NQ8matFAagELajrKojiN6PgWM8BAObFzAmtKZ4dMsyyGFt0dkflnHnV3RmTg2z4QsrO30+JAsFvxoUunaYfjJXjZrOPGlQeoBnw23TSnLhcHCg3347pF93oQN5CL3Ghc410cOWEaOcZYL9zP1MtfrY23+WNtiPfasM/ULWD/bH3gDGOw3TEnLiXbG1MLaB/7CBg2D+7tagBL0KfCNjH22+gwcU17z2aBhxeXKZpN1LTCgyTOpRIPTQfEly3p9vrUrg85rVx5qKTtabsvKi8R6Btd8yqd4QnU+nUVznomn7H6elJbU8ykTHyvB0yQwC4zLkpFKpN5kI5vuuniJBObNoFck9eocaZr2fA7vC/QIYsx+nAsB31WljN5b3TRy0iEf2S579Eek59G2ovC79Fe8APao3+G9v4qxb3PHykOG9O9lW+HqfEfoDmug/wfae4V+9+iOeu/RHPybR5tZjjmqx4d+k58W6PzbeSfW58xNabIF/Y6784Brjml39i9S9PEfL/YO5cp1NxTWA3obCYpHJhtkUBbH646lJuKkkpHrCa2PotB0GAKqCWFktTTRliLytxoRA09qEBJLY8g3JM9dY8Zit1u1PUsGlZIJgeWYEhrHznbEDHWMr9V94Zy+0Zz91qAoL2vT8C1Bg/BiB//KVb+ESsHDgilu+awzShKKW0uDFDh6PHMhaz2btJmWp8Ron6dRerADrDeRAd0gAOg7aIEVaq3t+swU9aDv14rU2xk+JNByOrMxRfY0ImT8pxOi3Wk8ZPiFCKbc4q6fWQIMZI5fFEtOCS/eejNwhD1MqOwhRdNzvkxXx+2i3jH6sdeyAlMJpNFXmYZOvvJNmUP7Nimze57ber/H7Zpp1mg13/fpt+wv9nb5Wd3H8AnsnMjZYcDkG6G2zclo3kXS3J35tDnDeMIrLrJL5noV6G+zdTjbcMlBwBu8zjVzUEsK8OdDjZKKnwyBVIHZJvEYahzscz1LjlX1xAsrLmW3+bamT2K2/OEOPeIKEekNyAapSEA7qrCmdOu1PmZ07pYe/VP1zlzis19qDhzLWgCw4z5t91HmLZR6Px+x6DBZLBvulZuSk/FyJ1edPJicpGPk9MYK/62Y/DGnYt3Bf9KvnAEsZQE7u9Wy1GUAnJPTLyLQaZa27Lg2+7MbkHL0cjwox0GM2HTWNXWYk9fa90VA7m6naMmPwznzhMTfuRm1gPZbB/39vRKFiKlMNPFJAbUPiAbC8FCbq5Yxk7d5mVUcZiGQbaIDyAQ0U0eqfN1RhF4ufB5SwlGwWPGXfOGR/T1Ys2jR2GOm/GcgTaFY+RIbdSRETftJB9yW2jNfTFm270evVjMlOui/e4OA3n/xkLevrvev8heD2Fej4XuxwYfj4s+/QNKrAL+
I need to convert this string to an XML.
I'd tried to decompress the string by getting the byte array from the string using base64 encoding. But I failed to move on. I don't think I'm in the right path. I searched a lot for any libraries found for NSString decompression. But no luck.
The input string is valid one. I succeed with online string decompress to xml.
That is XML data that was compressed with zlib to the zlib format, and then Base64 encoded. You need to decode the Base64, then decompress with zlib. Then you get:
<?xml version="1.0" encoding="UTF-8"?>
<ThreeDSecure><Message id="PAReq-tdnD8zqWDC"><PARes id="lQCkUVS06gWpyMR8uKCL"><version>1.0.2</version><Merchant><acqBIN>494000</acqBIN><merID>123456789</merID></Merchant><Purchase><xid>fregBoJ/QFmCwyPNUO3/czQ4NTM=</xid><date>20170120 12:39:40</date><purchAmount>92400</purchAmount><currency>682</currency><exponent>2</exponent></Purchase><pan>0000000000002349</pan><TX><time>20170125 06:56:08</time><status>Y</status><cavv>MDAwMDAwMDAwMDAwMDAwMDAwMDA=</cavv><eci>05</eci><cavvAlgorithm>1</cavvAlgorithm></TX></PARes><Signature xmlns="http://www.w3.org/2000/09/xmldsig#"><SignedInfo xmlns="http://www.w3.org/2000/09/xmldsig#"><CanonicalizationMethod Algorithm="http://www.w3.org/TR/2001/REC-xml-c14n-20010315"/><SignatureMethod Algorithm="http://www.w3.org/2000/09/xmldsig#rsa-sha1"/><Reference URI="#lQCkUVS06gWpyMR8uKCL"><DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/><DigestValue>hbDuU8EtfIpHUvG/Q/HLYpsRzRc=</DigestValue></Reference></SignedInfo><SignatureValue>FfWa1AIOdMNCJ0oinMXpL8o87oFSu661LERkaMqRWCDvno+GWbkbSe9Yrj35SszM
d6ykJF9VO/k83R9thBa6pdiQjBdGY1SzCg89QeZna5JciLdMoGcgZcwEK9mfhoke
uWrTiRVEJKjKKbhhzLmRsz0xD02655S/Lf8gMxNo5h0=</SignatureValue><KeyInfo><X509Data><X509Certificate>MIIDmTCCAwKgAwIBAgIJAPF+fmqkPJDeMA0GCSqGSIb3DQEBBQUAMIGNMQswCQYDVQQGEwJGUjELMAkGA1UECBMCMDYxDTALBgNVBAcTBE5pY2UxEDAOBgNVBAoTB0FtYWRldXMxHDAaBgNVBAsTE2Rldi1yZG0tdGtlLWZyYS1wYXkxDTALBgNVBAMTBFBheTExIzAhBgkqhkiG9w0BCQEWFHBheWRlbGRlQGFtYWRldXMuY29tMB4XDTEwMDEyMTEyMDYzOVoXDTIwMDExOTEyMDYzOVowgY0xCzAJBgNVBAYTAkZSMQswCQYDVQQIEwIwNjENMAsGA1UEBxMETmljZTEQMA4GA1UEChMHQW1hZGV1czEcMBoGA1UECxMTZGV2LXJkbS10a2UtZnJhLXBheTENMAsGA1UEAxMEUGF5MjEjMCEGCSqGSIb3DQEJARYUcGF5ZGVsZGVAYW1hZGV1cy5jb20wgZ8wDQYJKoZIhvcNAQEBBQADgY0AMIGJAoGBAOKX3GP0ReHByXeWybNAJAHhc1j+OxZkuUReM7ad4eeY1LMcTpaEAQlPpUmHzxcSx89BJMNXa0lMKE/AgPpT3fhGsjWiBFm2q0xJCyQ1qBZvk+yOKyk0iCDMTqzMqtc/TqyodqCAwbakVCxUOi5Cb4WzczstJOvo50MlrXnaIDLPAgMBAAGjgf4wgfswCQYDVR0TBAIwADAsBglghkgBhvhCAQ0EHxYdT3BlblNTTCBHZW5lcmF0ZWQgQ2VydGlmaWNhdGUwHQYDVR0OBBYEFIMqSPhtZJzMLdFl3StjxnHz90eXMIGgBgNVHSMEgZgwgZWhgYekgYQwgYExCzAJBgNVBAYTAkZSMQswCQYDVQQIEwIwNjEQMA4GA1UEChMHQW1hZGV1czEcMBoGA1UECxMTZGV2LXJkbS10a2UtZnJhLXBheTEQMA4GA1UEAxMHUGF5Um9vdDEjMCEGCSqGSIb3DQEJARYUcGF5ZGVsZGVAYW1hZGV1cy5jb22CCQDxfn5qpDyQ3TANBgkqhkiG9w0BAQUFAAOBgQATGO3GViQXVgb+ZRUXDlda2oq30l+Lkr2dihnIp2eRYAhqV8ZyO3UsbnBKuFMKkwjVXoraiGBvBsFLBl2iMNyPre4yCn4DOP+sT31R9R2XOdCdiMxlQqKr8K+6dkano37jJ5SwiMEtns0QmBsJQcx2yo0zlh1BbfpzO4pp4JXQxg==</X509Certificate><X509Certificate>MIICnjCCAgegAwIBAgIJAPF+fmqkPJDdMA0GCSqGSIb3DQEBBQUAMIGBMQswCQYDVQQGEwJGUjELMAkGA1UECBMCMDYxEDAOBgNVBAoTB0FtYWRldXMxHDAaBgNVBAsTE2Rldi1yZG0tdGtlLWZyYS1wYXkxEDAOBgNVBAMTB1BheVJvb3QxIzAhBgkqhkiG9w0BCQEWFHBheWRlbGRlQGFtYWRldXMuY29tMB4XDTEwMDEyMTEyMDExMFoXDTIwMDExOTEyMDExMFowgY0xCzAJBgNVBAYTAkZSMQswCQYDVQQIEwIwNjENMAsGA1UEBxMETmljZTEQMA4GA1UEChMHQW1hZGV1czEcMBoGA1UECxMTZGV2LXJkbS10a2UtZnJhLXBheTENMAsGA1UEAxMEUGF5MTEjMCEGCSqGSIb3DQEJARYUcGF5ZGVsZGVAYW1hZGV1cy5jb20wgZ8wDQYJKoZIhvcNAQEBBQADgY0AMIGJAoGBAJv6gS421cLaBecYhvP06VgcmwcCRNCon7UPnxmd2NYpctUyZBB56X7XYq3MNZjerz044IKEA4V/dNPbuOMJtr9IpYHr+UO4hAdt1KAsIK9ILjSGPHK6QtHN0bLAKkTa55ZjU3Zfl01vv9umHQTe8ibD5C8TXgYVe/QPJFVLXQKvAgMBAAGjEDAOMAwGA1UdEwQFMAMBAf8wDQYJKoZIhvcNAQEFBQADgYEAb/9OZRzVl99KpUEed0GfaFCq8rXZiwlNyl5HOu4gLzDzMsSgb0zMzABUopArkOwvuz4KPzcPPK31QlPjQ5JL4Z271zxH+pmk3oPgNF+oje/Smk0ZygQGh/lYFor7E/nva0vT1/Lq4917ag+mnfnPbV7hWDOfyGn51J8i7npll04=</X509Certificate><X509Certificate>MIIDbDCCAtWgAwIBAgIJAPF+fmqkPJDcMA0GCSqGSIb3DQEBBQUAMIGBMQswCQYDVQQGEwJGUjELMAkGA1UECBMCMDYxEDAOBgNVBAoTB0FtYWRldXMxHDAaBgNVBAsTE2Rldi1yZG0tdGtlLWZyYS1wYXkxEDAOBgNVBAMTB1BheVJvb3QxIzAhBgkqhkiG9w0BCQEWFHBheWRlbGRlQGFtYWRldXMuY29tMB4XDTEwMDEyMTExNTAyOFoXDTIwMDExOTExNTAyOFowgYExCzAJBgNVBAYTAkZSMQswCQYDVQQIEwIwNjEQMA4GA1UEChMHQW1hZGV1czEcMBoGA1UECxMTZGV2LXJkbS10a2UtZnJhLXBheTEQMA4GA1UEAxMHUGF5Um9vdDEjMCEGCSqGSIb3DQEJARYUcGF5ZGVsZGVAYW1hZGV1cy5jb20wgZ8wDQYJKoZIhvcNAQEBBQADgY0AMIGJAoGBAOAc5lmtIlbbAIvQZytpCNaqTIZWRzyntCYGDLSnqlHcreOVlLfvSEibAAv6hkORzHprQZ3zU08KFi3AIxAJU82MeOEhJEyZ86LPMA7T6Nnv6NmDfHVm+5my/HJg8az/N9N/AMWroY6BZIxclYwZ1wucju6CjhRXeKY6NdtdQRhFAgMBAAGjgekwgeYwHQYDVR0OBBYEFNMizUvONpSpaL18WOoxJ7+qJf+jMIG2BgNVHSMEga4wgauAFNMizUvONpSpaL18WOoxJ7+qJf+joYGHpIGEMIGBMQswCQYDVQQGEwJGUjELMAkGA1UECBMCMDYxEDAOBgNVBAoTB0FtYWRldXMxHDAaBgNVBAsTE2Rldi1yZG0tdGtlLWZyYS1wYXkxEDAOBgNVBAMTB1BheVJvb3QxIzAhBgkqhkiG9w0BCQEWFHBheWRlbGRlQGFtYWRldXMuY29tggkA8X5+aqQ8kNwwDAYDVR0TBAUwAwEB/zANBgkqhkiG9w0BAQUFAAOBgQC+DkhLNPHyBusOZHdJrvmgtnbzmxaHiF7UPDaAl4XhyU3u8oH9KC37+hA9Xd8tT/1eE6KTQWLVnpgrE1N1MDohbAdH0SngL6Pl952p7cFTKd6KTflEuntF/OP7PF0fG62Rh6CMU9218e/S9fCHSw+nznUGwRXokwgZufahHlKu7w==</X509Certificate></X509Data></KeyInfo></Signature></Message></ThreeDSecure>

Validating XML with an in-memory DTD in C using libxml2

I need to validate XML using DTD stored in memory, i.e. something like the following:
static const char *dtd_str = "<!ELEMENT ...>";
xmlDtdPtr dtd;
dtd = xmlParseMemoryDtd(dtd_str);
XML_PARSE_DTDVALID parser option allows to validate DTD embedded into XML:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE some_tag[
<!ELEMENT some_tag ...>
...
]>
<some_tag>...</some_tag>
So a workaround is to modify in-memory XML. Things become more complicated with
a parser used in "push mode". In push mode we have to detect whether the XML
declaration (<?xml ...?>), or start of the root element, then put our inline
DTD between them.
Could you suggest better solution?
EDIT
A workaround is to validate parsed XML posteriori as Daniel(_DV) suggested below.
Example: main.c, response.xml.
But I was searching for way to "embed" a DTD and validate XML "on-the-fly" while libxml2 parses XML chunk-by-chunk.
The following aproach doesn't work for me:
xmlCtxtUseOptions(ctxt, XML_PARSE_NOENT | XML_PARSE_NOWARNING | XML_PARSE_DTDVALID);
ctxt->sax->internalSubset = ngx_http_file_chunks_sax_internal_subset;
ctxt->sax->externalSubset = NULL;
$ ./parsexml
validity error : Validation failed: no DTD found !
<response>
^
Document is not valid
xmlValidateDtd allows to do DTD validation a posteriori of an already parsed XML document
to make sure it validates against the DTD. This will not use the internal subset...
http://xmlsoft.org/html/libxml-valid.html#xmlValidateDtd
See xmllint.c code in libxml2 for a full example of how to use it,
Daniel

Can we change XML encoding from utf-8 to utf -16?

I have written a code for generating XML with UTF-8 encoding.I always validate the XML with XSD file. In the same code i need UTF-16 encoding. Because one of my XSD file is of UTF-16 encoding.
But in my existing code it is not accepted. it gives following error.
FAILED: Fatal error: Document labelled UTF-16 but has UTF-8 content at filepath/standard.xsd:1.
and at line 1 of XSD this tag is defined <?xml version="1.0" encoding="utf-16"?>
How can i validate it with utf-8 encoding?
Is there any way to change UTF-16 to UTF-8 encoding.
Thanks in advance.
You can change the encoding from utf16 to utf-8 with Iconv
Call iconv from Ruby 1.8.7 through system to convert a file from utf-16 to utf-8
When you write the new file you can replace the first line with a new header like
<?xml version="1.0" encoding="utf-8" ?>
Ruby - Open file, find and replace multiple lines
If you need it in the other way then change the endoding in the function.

How do I make an instance use an encoding other than UTF-8

I have an instance returned from an XML DB in iso-8859-1, Orbeon apparently does not like that and throws:
Fatal error: Invalid byte 2 of 3-byte UTF-8 sequence.
at org.orbeon.oxf.xml.XMLUtils$ErrorHandler.fatalError(XMLUtils.java:332)
at orbeon.apache.xerces.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:178)
at orbeon.apache.xerces.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:351)
at orbeon.apache.xerces.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:281)
at orbeon.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(XMLDocumentFragmentScannerImpl.java:1771)
at orbeon.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:324)
at orbeon.apache.xerces.parsers.XML11Configuration.parse(XML11Configuration.java:845)
at orbeon.apache.xerces.parsers.XML11Configuration.parse(XML11Configuration.java:768)
at orbeon.apache.xerces.parsers.XMLParser.parse(XMLParser.java:108)
at orbeon.apache.xerces.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1201)
at org.orbeon.oxf.xml.XMLUtils.inputSourceToSAX(XMLUtils.java:418)
at org.orbeon.oxf.xml.XMLUtils.inputStreamToSAX(XMLUtils.java:403)
at org.orbeon.oxf.xml.TransformerUtils.readDom4j(TransformerUtils.java:357)
...
The character in question is valid iso-8859-1 รค (e4)
The default encoding for XML is UTF-8. If your service isn't using UTF-8, it needs to specify in the XML declaration what encoding is being used. For instance, if your data is encoded in ISO-8859-1, then the XML returned by the service should start with the following declaration:
<?xml version="1.0" encoding="ISO-8859-1" ?>

Resources