Loading Unicode or UTF-8 characters into file datastore in Oracle Data Integrator (ODI) - character-encoding

How do I load Arabic characters from an Oracle table into a flat file in ODI 10g.
I used the "LKM SQL to File Append" to load data to a flat file. But I believe that it is creating the file in ANSI encoding. This is causing all special characters to appear as question marks "?" in the flat file. This is the only LKM module I found on the tool that loads table to a file.
Additionally I also tried writing UTF-8 & Unicode in the "Format" field under the "columns" tab of the file data store model. But this didn't work.
Is there any way I can create flat with Unicode/UTF8 encoding using oracle data integrator?

Did you try to create new File Data Server and setting the Encoding setting to 'UTF8'? Try this instructions: Workin with Files, ODI

Related

Are Delphi DFM files always saved with ANSI encoding by design?

We're in the process of standardizing on UTF-8 encoding for all source files, to make it easier for developers using a plethora of tools (notably including IntelliJ IDEA on Windows, Mac and Linux) to handle Git merge conflicts without introducing unwanted encoding changes.
While Delphi 11 seems able to handle both UTF-8 and ANSI encoded PAS and DFM files well, and has a configuration setting (under Tools > Options > Editor) called "Default file encoding", which can be changed from its default setting of ANSI to UTF8, making all newly created PAS files be saved with UTF-8 encoding, this does not seem to affect DFM files.
DFM files seem to always get saved as ANSI. This seems to apply also to DFM files that originally were in UTF-8 encoding: when I edit them in Delphi and re-save, they get changed to ANSI.
Is this a feature or a bug? If it is a feature, could you point to some authoritative documentation stating that.
DFM files use their own proprietary encoding (# followed by number of Unicode code point) to store non-ASCII characters in string values.
However, in newer versions of Delphi, DFM files in text form may be automatically stored using UTF-8 if identifiers (class, property or component names) contain non-ASCII characters.
From the documentation for Delphi 11 Alexandria:
Component streaming (Text DFM files):
Are fully backward-compatible.
Stream as UTF-8 only if component type, property, or name contains non-ASCII-7 characters.
String property values are still streamed in “#” escaped format.
May allow values as UTF-8 as well (open issue).
Only change in binary format is potential for UTF-8 data for component name, properties, and type name.

String UTF-8 encoding with cyrillic in H2O

I load csv file of utf-8 encoding with cyrillic strings. After parsing in Flow interface - i see not cyrillic, but not readable symbols like "пїўпѕЂпѕ™пїђпѕ" How can i use utf-8 cyrillic strings in H2O?
This appears to be a bug in the Flow interface, but only in the setupParse command. If you continue through and do the import, the data gets imported correctly.
I've reported the bug, with test data and screenshots (taken in Firefox) here:
https://0xdata.atlassian.net/browse/PUBDEV-4640
So if you have additional information, or the bug is behaving differently for you, it'd be good to add it to that bug report.
check your csv file in text and binary presentation to find how Cyrillic text is encoded, if it is UTF-8 it should look like this:
Привет
for the word
Привет

karaf configuration property is garbled

I implement org.osgi.service.cm.ManagedService interface to get Karaf configuration. But when I give a Chinese value to the property, it is garbled.Initially, the files in the etc folder are encoded in latin1. I have tried to set utf-8 encoding, but it has no effect. Can anyone help me?
In Karaf, the configurations files (ie etc/*.cfg) are handled by the felix subproject "fileinstall".
fileinstall doesn't support yet to specified a custom character encoding for the configuration, it uses the Properties class and Properties.load(InputStream), which documents:
The load(Reader) / store(Writer, String) methods load and store
properties from and to a character based stream in a simple
line-oriented format specified below. The load(InputStream) /
store(OutputStream, String) methods work the same way as the
load(Reader)/store(Writer, String) pair, except the input/output
stream is encoded in ISO 8859-1 character encoding. Characters that
cannot be directly represented in this encoding can be written using
Unicode escapes as defined in section 3.3 of The Java™ Language
Specification; only a single 'u' character is allowed in an escape
sequence. The native2ascii tool can be used to convert property files
to and from other character encodings.
So, you have to encode your file in ISE-8859-1 and quote every UTF character, or use an xml file to encode your configuration files.
There is a way to change cfg files encoding.
Configuration for fileinstall subproject polling etc/*.cfg files is written in config.properties file.
You can add
felix.fileinstall.configEncoding = UTF-8
The solution was checked in Karaf 4

How to separate the "comma separated values string" whose values in turn contain commas [duplicate]

I have to develop an iOS application that can read the data from a CSV file hosted on a domain. Is there any standard APIs that can help me to do this? I don't need to download but just read the file because the file will be updated for every two mins.
I recommend Dave DeLong's CHCSVParser library for parsing.
You will have to download the file, that is the only way to get it from the remote host to your device. A CSV File is a text file with data separated by a comma(','). Download the file from the the remote host, read the file line by line, split the line string that was read from the file;
For example:
1,2,3,4,1,2,3 ...Line 1
Split using ',' as a delimiter and add the split values into an array, the result will be:
array_line_one = {1,2,3,4,1,2,3};

Ruby/Rails CSV parsing, invalid byte sequence in UTF-8

I am trying to parse a CSV file generated from an Excel spreadsheet.
Here is my code
require 'csv'
file = File.open("input_file")
csv = CSV.parse(file)
But I get this error
ArgumentError: invalid byte sequence in UTF-8
I think the error is because Excel encodes the file into ISO 8859-1 (Latin-1) and not in UTF-8
Can someone help me with a workaround for this issue, please
Thanks in advance.
You need to tell Ruby that the file is in ISO-8859-1. Change your file open line to this:
file=File.open("input_file", "r:ISO-8859-1")
The second argument tells Ruby to open read only with the encoding ISO-8859-1.
Specify the encoding with encoding option:
CSV.foreach(file.path, headers: true, encoding:'iso-8859-1:utf-8') do |row|
...
end
You can supply source encoding straight in the file mode parameter:
CSV.foreach( "file.csv", "r:windows-1250" ) do |row|
<your code>
end
If you have only one (or few) file, so when its not needed to automatically declare encoding on whatever file you get from input, and you have the contents of this file visible in plaintext (txt, csv etc) separated with i.e. semicolon, you can create new file with .csv extension manually, and paste the contents of your file there, then parse the contents like usual.
Keep in mind, that this is a workaround, but in need of parsing in linux only one big excel file, converted to some flavour of csv, it spares time on experimenting with all those fancy encodings
Save the file in utf-8, unless for some reason you need to save it differently in which case you may specify the encoded set while reading the file
add second argument "r:ISO-8859-1" as File.open("input_file","r:ISO-8859-1" )
I had this same problem and was just using google spreadsheets and then downloading as a CSV. That was the easiest solution.
Then I came across this gem
https://github.com/singlebrook/utf8-cleaner
Now I don't need to worry about this issue at all. Hope this helps!

Resources