Can we change XML encoding from utf-8 to utf -16? - ruby-on-rails

I have written a code for generating XML with UTF-8 encoding.I always validate the XML with XSD file. In the same code i need UTF-16 encoding. Because one of my XSD file is of UTF-16 encoding.
But in my existing code it is not accepted. it gives following error.
FAILED: Fatal error: Document labelled UTF-16 but has UTF-8 content at filepath/standard.xsd:1.
and at line 1 of XSD this tag is defined <?xml version="1.0" encoding="utf-16"?>
How can i validate it with utf-8 encoding?
Is there any way to change UTF-16 to UTF-8 encoding.
Thanks in advance.

You can change the encoding from utf16 to utf-8 with Iconv
Call iconv from Ruby 1.8.7 through system to convert a file from utf-16 to utf-8
When you write the new file you can replace the first line with a new header like
<?xml version="1.0" encoding="utf-8" ?>
Ruby - Open file, find and replace multiple lines
If you need it in the other way then change the endoding in the function.

Related

What encoding is this and how do I turn it into something I can see properly?

I'm writing a script that will operate on the subtitle files of a popular streaming service (Netfl*x).
The subtitle files have strange characters in them and I can't get them to render in a way that my text editors or web browser will display in a readable way. The xml encoding says UTF-8, but some characters are not readable.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<tt xmlns:tt="http://www.w3.org/ns/ttml" xmlns:ttm="http://www.w3.org/ns/ttml#metadata" xmlns:ttp="http://www.w3.org/ns/ttml#parameter" xmlns:tts="http://www.w3.org/ns/ttml#styling" ttp:tickRate="10000000" ttp:timeBase="media" xmlns="http://www.w3.org/ns/ttml">
<p>de 15 % la nuit dernière.</span></p>
<p>if youâve got things to doâ¦</span></p>
And in Vim:
This is what it looks like in the browser:
How can I convert this into something I can use?
I'll go out on a limb and say that file is UTF-8 encoded just fine, and you're merely looking at it using the wrong encoding. The character À encoded in UTF-8 is C3 80. C3 in ISO-8859-1 is Ã, which in your screenshot is followed by an 80. So looks like you're looking at a UTF-8 file using the (wrong) ISO-8859 encoding.
Use the correct encoding when opening the file.
My terminal is set to en_US.UTF-8, but was also rendering this supposedly UTF-8 encoded file incorrectly (sonné -> sonné). I was able to solve this by using iconv to encode the file in ISO8859-1.
iconv original.xml -t ISO8859-1 -o converted.xml
In the new file, the characters were properly rendered, although I don't quite understand why.

Decompress NSString to xml in objective c

I'm having a stirng like below...
eJzdWFuTokgWfu9fUVHzaPRwURQ6LCeS+y1BEFR4Q0BAEBBQLr9+seyuqpnt3ujZ2Y3YWSMIk4+T55LnnI8kl7915+zpFlZ1UuQvz9iv6PNTmPtFkOTRy7Nt8Z/J599Wn5ZWXIUhuwn9axWuljCsay8Kn5Lg5XkNzPDyuQlylhwuO5Z5Xi3vUP36MDOY1N5u0Hm0K3tokleFUUeBr/ZWo7lf8SXy7XbUW/mxlzerpedfaElbzagZiqJL5Ovt8hxWErvC8OmMmC9Iaok8gCXyPnN9vY/q0csuCVbHKozoQkYM/sy0/Vqz9SniD8ZMs+DLErlLLAOvCVc4ii1QDEefMPzLlPoyG22+4svyrg6ci+uom8Jnd2c+QstxQapxwfrVnBwjebtbhl1Z5OEoMaJv4yXy7l3p5Sv0w2+MaYznji6t/WrZJOc3r4gndP6FmH9BySXyii/rxmuu9cpZIl9HS9+73VaQBe0PrjHaV5Fl6CcrlBidGv9fZ4EsKqqkic8r7CHzDiyRuyvIaz5Xy00S5aOxKnwaayavX57jpim/IEjbtr+201+LKkLwMRAEpZBRIKiT6Jfnx6wwkPJj8aemMV5e5InvZcngNWNxwLCJi+DpzbfvqbHMuyYMMTnm86jqs4/N8s93BJ1ixDPyIYKf0fZHp6ra+1zHHnZXZIbH8J7o8Mk2pZfnX35Q52wShXXz7xj7ZuihYetl13AVH9irTXLNUSpF+yYgBiKqTlmbg+mP2f0ouUTeHBzH7xn4sAIPQf648zAg6QHUGBktkhzuS5UsyEXBb67zOaZyZurBi7lj2FteTITdIT1sQsqpTlNiUw/wUzDvU5mntjqSklOTamLam5dBYpzoQHCwzcBEJGWEbu4Rsp+oASwEP3L9llOo8zEu0vDTdVdZibnlZOWkKIc4HtSzWQ9ox6L4nCA2iHokI9hpBRGjL49QPri/VML+EdeeQCnWa7zHiAmrJjmO1TN2MJQk9mwxDGiVCLQSDSJJBmt+cjxf0rXMhhCgArO5CBvpMGUNjqYNG0BJ0KBRt4zhsFvDELhWFuwTp0KQCgCzOYaGDGSdjrWASkfalga+RXNE6eB2x7FAf2CFRaN84+zMLNjDTmSB98Bri8NHLMF6V0CbQGgydef2zgZrnX36QSe0aJ6OQ4vrpAHEdJRe4jQRqBalGYPb8eL4bNR9EMzMEN7sXB2caiA927MWN/Y910NrvFhn0LfFiEl3rNPfsTZy0I4ZgPyw6VggdTcfYpe4Vmq1E6dBUL/GTneQs87ZybU4A4LZYz1iKBo7LHaFLeYPnA/p4oF30BoxXN3LY91gqIfbjZvLsbp/jetNJxh12gJPwBN3ggz3MR8yMB3bH5+NeurxAs43Oz1xOuBoG7lkyxqOrBSuFN98DbzmELBjXPc8yqAQaKAr+6mwRs1QpPt9uOsPGpCBGPvYaaJ3bnq1zRAuvGAWhg6mQt8qPQ4Y2bq0z+LQ+ZuOpGgZansPzaDCISBal9b0GAv1aZfQ/Bm/oJ3M9AZ2od1bOul1pU/RhGGhdRngpfER69IXwWUswYOXbpnO1hOCOcx2gz/UjazfCgKFWbXPPYlV1yCCNADCKTrO2uj4NQ8matFAagELajrKojiN6PgWM8BAObFzAmtKZ4dMsyyGFt0dkflnHnV3RmTg2z4QsrO30+JAsFvxoUunaYfjJXjZrOPGlQeoBnw23TSnLhcHCg3347pF93oQN5CL3Ghc410cOWEaOcZYL9zP1MtfrY23+WNtiPfasM/ULWD/bH3gDGOw3TEnLiXbG1MLaB/7CBg2D+7tagBL0KfCNjH22+gwcU17z2aBhxeXKZpN1LTCgyTOpRIPTQfEly3p9vrUrg85rVx5qKTtabsvKi8R6Btd8yqd4QnU+nUVznomn7H6elJbU8ykTHyvB0yQwC4zLkpFKpN5kI5vuuniJBObNoFck9eocaZr2fA7vC/QIYsx+nAsB31WljN5b3TRy0iEf2S579Eek59G2ovC79Fe8APao3+G9v4qxb3PHykOG9O9lW+HqfEfoDmug/wfae4V+9+iOeu/RHPybR5tZjjmqx4d+k58W6PzbeSfW58xNabIF/Y6784Brjml39i9S9PEfL/YO5cp1NxTWA3obCYpHJhtkUBbH646lJuKkkpHrCa2PotB0GAKqCWFktTTRliLytxoRA09qEBJLY8g3JM9dY8Zit1u1PUsGlZIJgeWYEhrHznbEDHWMr9V94Zy+0Zz91qAoL2vT8C1Bg/BiB//KVb+ESsHDgilu+awzShKKW0uDFDh6PHMhaz2btJmWp8Ron6dRerADrDeRAd0gAOg7aIEVaq3t+swU9aDv14rU2xk+JNByOrMxRfY0ImT8pxOi3Wk8ZPiFCKbc4q6fWQIMZI5fFEtOCS/eejNwhD1MqOwhRdNzvkxXx+2i3jH6sdeyAlMJpNFXmYZOvvJNmUP7Nimze57ber/H7Zpp1mg13/fpt+wv9nb5Wd3H8AnsnMjZYcDkG6G2zclo3kXS3J35tDnDeMIrLrJL5noV6G+zdTjbcMlBwBu8zjVzUEsK8OdDjZKKnwyBVIHZJvEYahzscz1LjlX1xAsrLmW3+bamT2K2/OEOPeIKEekNyAapSEA7qrCmdOu1PmZ07pYe/VP1zlzis19qDhzLWgCw4z5t91HmLZR6Px+x6DBZLBvulZuSk/FyJ1edPJicpGPk9MYK/62Y/DGnYt3Bf9KvnAEsZQE7u9Wy1GUAnJPTLyLQaZa27Lg2+7MbkHL0cjwox0GM2HTWNXWYk9fa90VA7m6naMmPwznzhMTfuRm1gPZbB/39vRKFiKlMNPFJAbUPiAbC8FCbq5Yxk7d5mVUcZiGQbaIDyAQ0U0eqfN1RhF4ufB5SwlGwWPGXfOGR/T1Ys2jR2GOm/GcgTaFY+RIbdSRETftJB9yW2jNfTFm270evVjMlOui/e4OA3n/xkLevrvev8heD2Fej4XuxwYfj4s+/QNKrAL+
I need to convert this string to an XML.
I'd tried to decompress the string by getting the byte array from the string using base64 encoding. But I failed to move on. I don't think I'm in the right path. I searched a lot for any libraries found for NSString decompression. But no luck.
The input string is valid one. I succeed with online string decompress to xml.
That is XML data that was compressed with zlib to the zlib format, and then Base64 encoded. You need to decode the Base64, then decompress with zlib. Then you get:
<?xml version="1.0" encoding="UTF-8"?>
<ThreeDSecure><Message id="PAReq-tdnD8zqWDC"><PARes id="lQCkUVS06gWpyMR8uKCL"><version>1.0.2</version><Merchant><acqBIN>494000</acqBIN><merID>123456789</merID></Merchant><Purchase><xid>fregBoJ/QFmCwyPNUO3/czQ4NTM=</xid><date>20170120 12:39:40</date><purchAmount>92400</purchAmount><currency>682</currency><exponent>2</exponent></Purchase><pan>0000000000002349</pan><TX><time>20170125 06:56:08</time><status>Y</status><cavv>MDAwMDAwMDAwMDAwMDAwMDAwMDA=</cavv><eci>05</eci><cavvAlgorithm>1</cavvAlgorithm></TX></PARes><Signature xmlns="http://www.w3.org/2000/09/xmldsig#"><SignedInfo xmlns="http://www.w3.org/2000/09/xmldsig#"><CanonicalizationMethod Algorithm="http://www.w3.org/TR/2001/REC-xml-c14n-20010315"/><SignatureMethod Algorithm="http://www.w3.org/2000/09/xmldsig#rsa-sha1"/><Reference URI="#lQCkUVS06gWpyMR8uKCL"><DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/><DigestValue>hbDuU8EtfIpHUvG/Q/HLYpsRzRc=</DigestValue></Reference></SignedInfo><SignatureValue>FfWa1AIOdMNCJ0oinMXpL8o87oFSu661LERkaMqRWCDvno+GWbkbSe9Yrj35SszM
d6ykJF9VO/k83R9thBa6pdiQjBdGY1SzCg89QeZna5JciLdMoGcgZcwEK9mfhoke
uWrTiRVEJKjKKbhhzLmRsz0xD02655S/Lf8gMxNo5h0=</SignatureValue><KeyInfo><X509Data><X509Certificate>MIIDmTCCAwKgAwIBAgIJAPF+fmqkPJDeMA0GCSqGSIb3DQEBBQUAMIGNMQswCQYDVQQGEwJGUjELMAkGA1UECBMCMDYxDTALBgNVBAcTBE5pY2UxEDAOBgNVBAoTB0FtYWRldXMxHDAaBgNVBAsTE2Rldi1yZG0tdGtlLWZyYS1wYXkxDTALBgNVBAMTBFBheTExIzAhBgkqhkiG9w0BCQEWFHBheWRlbGRlQGFtYWRldXMuY29tMB4XDTEwMDEyMTEyMDYzOVoXDTIwMDExOTEyMDYzOVowgY0xCzAJBgNVBAYTAkZSMQswCQYDVQQIEwIwNjENMAsGA1UEBxMETmljZTEQMA4GA1UEChMHQW1hZGV1czEcMBoGA1UECxMTZGV2LXJkbS10a2UtZnJhLXBheTENMAsGA1UEAxMEUGF5MjEjMCEGCSqGSIb3DQEJARYUcGF5ZGVsZGVAYW1hZGV1cy5jb20wgZ8wDQYJKoZIhvcNAQEBBQADgY0AMIGJAoGBAOKX3GP0ReHByXeWybNAJAHhc1j+OxZkuUReM7ad4eeY1LMcTpaEAQlPpUmHzxcSx89BJMNXa0lMKE/AgPpT3fhGsjWiBFm2q0xJCyQ1qBZvk+yOKyk0iCDMTqzMqtc/TqyodqCAwbakVCxUOi5Cb4WzczstJOvo50MlrXnaIDLPAgMBAAGjgf4wgfswCQYDVR0TBAIwADAsBglghkgBhvhCAQ0EHxYdT3BlblNTTCBHZW5lcmF0ZWQgQ2VydGlmaWNhdGUwHQYDVR0OBBYEFIMqSPhtZJzMLdFl3StjxnHz90eXMIGgBgNVHSMEgZgwgZWhgYekgYQwgYExCzAJBgNVBAYTAkZSMQswCQYDVQQIEwIwNjEQMA4GA1UEChMHQW1hZGV1czEcMBoGA1UECxMTZGV2LXJkbS10a2UtZnJhLXBheTEQMA4GA1UEAxMHUGF5Um9vdDEjMCEGCSqGSIb3DQEJARYUcGF5ZGVsZGVAYW1hZGV1cy5jb22CCQDxfn5qpDyQ3TANBgkqhkiG9w0BAQUFAAOBgQATGO3GViQXVgb+ZRUXDlda2oq30l+Lkr2dihnIp2eRYAhqV8ZyO3UsbnBKuFMKkwjVXoraiGBvBsFLBl2iMNyPre4yCn4DOP+sT31R9R2XOdCdiMxlQqKr8K+6dkano37jJ5SwiMEtns0QmBsJQcx2yo0zlh1BbfpzO4pp4JXQxg==</X509Certificate><X509Certificate>MIICnjCCAgegAwIBAgIJAPF+fmqkPJDdMA0GCSqGSIb3DQEBBQUAMIGBMQswCQYDVQQGEwJGUjELMAkGA1UECBMCMDYxEDAOBgNVBAoTB0FtYWRldXMxHDAaBgNVBAsTE2Rldi1yZG0tdGtlLWZyYS1wYXkxEDAOBgNVBAMTB1BheVJvb3QxIzAhBgkqhkiG9w0BCQEWFHBheWRlbGRlQGFtYWRldXMuY29tMB4XDTEwMDEyMTEyMDExMFoXDTIwMDExOTEyMDExMFowgY0xCzAJBgNVBAYTAkZSMQswCQYDVQQIEwIwNjENMAsGA1UEBxMETmljZTEQMA4GA1UEChMHQW1hZGV1czEcMBoGA1UECxMTZGV2LXJkbS10a2UtZnJhLXBheTENMAsGA1UEAxMEUGF5MTEjMCEGCSqGSIb3DQEJARYUcGF5ZGVsZGVAYW1hZGV1cy5jb20wgZ8wDQYJKoZIhvcNAQEBBQADgY0AMIGJAoGBAJv6gS421cLaBecYhvP06VgcmwcCRNCon7UPnxmd2NYpctUyZBB56X7XYq3MNZjerz044IKEA4V/dNPbuOMJtr9IpYHr+UO4hAdt1KAsIK9ILjSGPHK6QtHN0bLAKkTa55ZjU3Zfl01vv9umHQTe8ibD5C8TXgYVe/QPJFVLXQKvAgMBAAGjEDAOMAwGA1UdEwQFMAMBAf8wDQYJKoZIhvcNAQEFBQADgYEAb/9OZRzVl99KpUEed0GfaFCq8rXZiwlNyl5HOu4gLzDzMsSgb0zMzABUopArkOwvuz4KPzcPPK31QlPjQ5JL4Z271zxH+pmk3oPgNF+oje/Smk0ZygQGh/lYFor7E/nva0vT1/Lq4917ag+mnfnPbV7hWDOfyGn51J8i7npll04=</X509Certificate><X509Certificate>MIIDbDCCAtWgAwIBAgIJAPF+fmqkPJDcMA0GCSqGSIb3DQEBBQUAMIGBMQswCQYDVQQGEwJGUjELMAkGA1UECBMCMDYxEDAOBgNVBAoTB0FtYWRldXMxHDAaBgNVBAsTE2Rldi1yZG0tdGtlLWZyYS1wYXkxEDAOBgNVBAMTB1BheVJvb3QxIzAhBgkqhkiG9w0BCQEWFHBheWRlbGRlQGFtYWRldXMuY29tMB4XDTEwMDEyMTExNTAyOFoXDTIwMDExOTExNTAyOFowgYExCzAJBgNVBAYTAkZSMQswCQYDVQQIEwIwNjEQMA4GA1UEChMHQW1hZGV1czEcMBoGA1UECxMTZGV2LXJkbS10a2UtZnJhLXBheTEQMA4GA1UEAxMHUGF5Um9vdDEjMCEGCSqGSIb3DQEJARYUcGF5ZGVsZGVAYW1hZGV1cy5jb20wgZ8wDQYJKoZIhvcNAQEBBQADgY0AMIGJAoGBAOAc5lmtIlbbAIvQZytpCNaqTIZWRzyntCYGDLSnqlHcreOVlLfvSEibAAv6hkORzHprQZ3zU08KFi3AIxAJU82MeOEhJEyZ86LPMA7T6Nnv6NmDfHVm+5my/HJg8az/N9N/AMWroY6BZIxclYwZ1wucju6CjhRXeKY6NdtdQRhFAgMBAAGjgekwgeYwHQYDVR0OBBYEFNMizUvONpSpaL18WOoxJ7+qJf+jMIG2BgNVHSMEga4wgauAFNMizUvONpSpaL18WOoxJ7+qJf+joYGHpIGEMIGBMQswCQYDVQQGEwJGUjELMAkGA1UECBMCMDYxEDAOBgNVBAoTB0FtYWRldXMxHDAaBgNVBAsTE2Rldi1yZG0tdGtlLWZyYS1wYXkxEDAOBgNVBAMTB1BheVJvb3QxIzAhBgkqhkiG9w0BCQEWFHBheWRlbGRlQGFtYWRldXMuY29tggkA8X5+aqQ8kNwwDAYDVR0TBAUwAwEB/zANBgkqhkiG9w0BAQUFAAOBgQC+DkhLNPHyBusOZHdJrvmgtnbzmxaHiF7UPDaAl4XhyU3u8oH9KC37+hA9Xd8tT/1eE6KTQWLVnpgrE1N1MDohbAdH0SngL6Pl952p7cFTKd6KTflEuntF/OP7PF0fG62Rh6CMU9218e/S9fCHSw+nznUGwRXokwgZufahHlKu7w==</X509Certificate></X509Data></KeyInfo></Signature></Message></ThreeDSecure>

Deserialize XML with UTF-16 encoding in ServiceStack.Text

I am trying to use ServiceStack.Text to deserialize some XML.
Code:
var buildEvent = dto.EventXml.FromXml<TfsEventBuildComplete>();
The opening xml line is:
<?xml version="1.0" encoding="UTF-16"?>
ServiceStack fails with the following error:
The encoding in the declaration 'utf-16' does not match the encoding of the document 'utf-8'.
I can see from the source of the Xml Serializer that ServiceStack uses UTF-8.
I am wondering whether ServiceStack.Text can deserialize UTF-16 and if so how? And if not, why not?
I have managed to hack my way around the issue. I'm not proud of it but....
var buildEvent = dto.EventXml.Replace("utf-16", "utf-8").FromXml<TfsEventBuildComplete>();

How to change html encoded character to ascii character

I have a french character that is encoded as follows:
"Jos\xE9e"
I need to convert it to regular character because it produces this error on my server:
invalid byte sequence in UTF-8
What can I do to fix this error?
Rails 3 Ruby 1.9.2
That looks like "Josée" encoded in ISO 8859-1 (AKA Latin-1). You can use Iconv to convert it to UTF-8:
require 'iconv'
utf_string = Iconv.conv('UTF-8', 'ISO-8859-1', "Jos\xE9e")
Use a editor support utf-8, and add coding line at the top of all source files:
# coding: utf-8
If some input string is not utf-8, convert it to utf-8 first before processing:
input_str = "Jos\xE9e"
utf_input = input_str.force_encoding('iso-8859-1').encode('utf-8')
All above only work under ruby 1.9. For more information, you can check the book: Ruby Best Practices.
you should use utf8 in all your source code, how about save your file in utf-8 encoding

How do I make an instance use an encoding other than UTF-8

I have an instance returned from an XML DB in iso-8859-1, Orbeon apparently does not like that and throws:
Fatal error: Invalid byte 2 of 3-byte UTF-8 sequence.
at org.orbeon.oxf.xml.XMLUtils$ErrorHandler.fatalError(XMLUtils.java:332)
at orbeon.apache.xerces.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:178)
at orbeon.apache.xerces.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:351)
at orbeon.apache.xerces.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:281)
at orbeon.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(XMLDocumentFragmentScannerImpl.java:1771)
at orbeon.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:324)
at orbeon.apache.xerces.parsers.XML11Configuration.parse(XML11Configuration.java:845)
at orbeon.apache.xerces.parsers.XML11Configuration.parse(XML11Configuration.java:768)
at orbeon.apache.xerces.parsers.XMLParser.parse(XMLParser.java:108)
at orbeon.apache.xerces.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1201)
at org.orbeon.oxf.xml.XMLUtils.inputSourceToSAX(XMLUtils.java:418)
at org.orbeon.oxf.xml.XMLUtils.inputStreamToSAX(XMLUtils.java:403)
at org.orbeon.oxf.xml.TransformerUtils.readDom4j(TransformerUtils.java:357)
...
The character in question is valid iso-8859-1 ä (e4)
The default encoding for XML is UTF-8. If your service isn't using UTF-8, it needs to specify in the XML declaration what encoding is being used. For instance, if your data is encoded in ISO-8859-1, then the XML returned by the service should start with the following declaration:
<?xml version="1.0" encoding="ISO-8859-1" ?>

Resources