which csv parser can work with encoding - ISO-8859-15? - ios

which iOS CSV parser can work with encoding - ISO-8859-15 , NSWindowsCP1252StringEncoding?
it contains German characters like a , o, u with 2 dots above character.
I need to parse CSV file . Turn it to [[String]]

Did you try CHCSVParser, (ISO-8859-15 is not inclued in readme file but may be)
https://github.com/davedelong/CHCSVParser

Related

How to remove 0xa0 in Neo4j csv data?

I have tried
replace(row.field, '0xa0', '')
which didn't work, it still insert whatever presented in the .csv file.
\xa0 is actually non-breaking space in Latin1 (ISO 8859-1), also chr(160). You should replace it with a space.
replace(u'\xa0', u' ')
Let me know if it works and give me a sample csv file to trst it out.

0x85 windows 1252 breaks line if file opened with utf-8 encoding

I have a file with an old format from the 70s used in Companies House (UK company registry).
I inherited a parser written 6 years ago which goes line by line and according to a set of conditions extracts the information from the line and inserts them into a dictionary.
There is a weird character that is breaking a line.
I copied this line to a new file awk '{if(NR==33411) print $0}' PROD216_1950_ew_1.dat > broken and opend broken in vim.
Turns out that weird character is read by vim a <85>.
The result is that everything after MAYFIELD is read as a new line.
Below the line in question:
000376702103032986930001 1993010119941024 193709 0105<BARRY ALEXANDER<GROSVENOR<<<<MAYFIELD 3<41 PLANTATION ROAD<THE PEAK<<HONG KONG<BANK EXECUTIVE<BRITISH<<
in vim becomes
000376702103032986930001 1993010119941024 193709 0105<BARRY ALEXANDER<GROSVENOR<<<<MAYFIELD <85>3<41 PLANTATION ROAD<THE PEAK<<HONG KONG<BANK EXECUTIVE<BRITISH<<
I am using codecs to read this file with a context manager, which I thought was the way of going about it -
Is there anything I am missing? What is that <85>?
with codecs.open(filepath, 'r', 'utf-8') as fh:
for line in fh:
linetype = determine_line_type(line)
if linetype == 'header':
continue
elif linetype == 'company':
do stuff...
elif linetype == 'officer':
do stuff...
vim shows <85> to indicate a hex 85 byte that is invalid in the current encoding (i.e., the encoding it's using to decode the file).
My guess is that the file's encoding is Windows-1252, in which hex 85 denotes the ellipsis character.
So the solution for your parser might be as simple as changing 'utf-8' to 'cp1252' in the codecs.open call.
After going around for some time here and here I came up with this solution, which works.
with open(filepath, encoding='utf-8') as fh:
for line in fh:
byteline = bytearray(line, encoding='utf-8').replace(b'\xc2\x85', b'')
line_clean = byteline.decode(encoding='utf-8')
# do stuff with clean line.
Knowing that the byte sequence that breaks the string is b'\xc2\x85' (it is interpreted as an ... ellipsis character.
First encode the string to an array of bytes with bytearray, then use replace method of the bytearray class, finally, decode the clean line using the decode method, which will return the string without the weird character from before the transformation.

GSub with a plus/minus character

I am trying to convert a text source into an HTML readable page.
The code I have have tried:
local newstr=string.gsub(str,"±", "±")
local newstr=string.gsub(str,"%±", "±")
However, the character shows up as  in the output.
I can't seem to find any other documentation on how to handle this specific special character. How do I handle this character when reading in so that it will output properly?
Edit: After trying suggestions I'm able to determine this:
local function sanitizeheader(str)
if not(str)then return "" end
str2 = "Depth ±"
local newstr=string.gsub(str2, string.char(177), "±")
return newstr
end
In the testing, if I use str2 ± does show up in the output. However, when I try to use str as it is passed in from reading the excel file, it doesn't pick up the character and still returns the  character.
Lua string assume strings as sequence of bytes. You are trying utf8 multi byte character. The code you are trying should work as it just replacing a sequence of bytes. However, Lua 5.3 has utf8 library to handle unicode character
local str="±®ª"
for code in str:gmatch(utf8.charpattern) do
print("&#" .. utf8.codepoint(code) .. ";")
end
Output:
±
®
ª
Check Lua Reference Manual for more info.

Ruby CSV Format Error when reading files with CR / CL / LF

I am having a problem reading a .csv file in ruby.
The file format I get for the File is the following:
UTF-8 Unicode text, with CR, LF line terminators
The Error I get is this:
/usr/local/lib/ruby/1.8/csv.rb:607:in `get_row': CSV::IllegalFormatError (CSV::IllegalFormatError)
I tried converting the file to
UTF-8 Unicode text, with CRLF, CR line terminators
with no change.
Converting to ASCII yields the same results.
Does anyone have experience with this and can offer a solution?
EDIT:
As requested a sample line of the csv
670|FirstDataaset|Uppertreet 5|GB|1000|Blue|98764-6547|0374-453534|HU-0973409745|example#example.de
It is not the Data, I checked by copy pasting it into a new file and using that for a Test file (not a viable option as a solution, just to confirm the data is not the problem)
Since the is a file that you are trying to open, you can do this
CSV.foreach(file_path,:col_sep => '|') do |row|
# whatever you wanna do
end
Did you specify a custom column separator | ?
string = "670|FirstDataaset|Uppertreet 5|GB|1000|Blue|98764-6547|0374-453534|HU-0973409745|example#example.de"
require 'csv'
p CSV.parse(string, :col_sep => "|")
# => [["670", "FirstDataaset", "Uppertreet 5", "GB", "1000", "Blue", "98764-6547", "0374-453534", "HU-0973409745", "example#example.de"]]

RAILS 3 CSV "Illegal quoting" is a lie

I've hit a problem during parsing of a CSV file where I get the following error:
CSV::MalformedCSVError: Illegal quoting on line 3.
RAILS code in question:
csv = CSV.read(args.local_file_path, col_sep: "\t", headers: true)
Line 3 in the CSV file is:
A-067067 VO VIA CE 0 8 8 SWCH Ter 4, Loc Is Here, Mne, Per Fl Auia/Sey IMAC NEK_HW 2011-03-09 09:47:44 2011-03-09 11:50:26 2011-01-13 10:49:17 2011-02-14 14:02:43 2011-02-14 14:02:44 0 0 771 771 46273 "[O/H 15/02] B270 W31 ""TEXT TEXT 2 X TEXT SWITC" SOME_TEXT SOME_TEXT N/A Name Here RESOLVED_CLOSED RESOLVED_CLOSED
UPDATE: Tabs don't appear to have come across above. See pastebin RAW TEXT: http://pastebin.com/4gj7iUpP
I've read numerous threads all over StackOverflow and Google about why this is and I understand that. But the CSV row above has perfectly legal quoting does it not?
The CSV is tab delimited and there is only a tab followed by the quote on either side of the column in question. There is 1 quote in that field and it is double quoted to escape it. So what gives? I can't work it out. :(
Assuming I've got something wrong here, I'd like the solution to include a way to work around the issue as I don't have control over how the CSV is constructed.
This part of your CSV is at fault:
46273 "[O/H 15/02] B270 W31 ""TEXT TEXT 2 X TEXT SWITC" SOME_TEXT
At least one of these parts has a stray space:
46273 "
" SOME_TEXT
I'd guess that the "3" and the double are supposed to be separated by one or more tabs but there is a space before the quote. Or, there is a space after the quote on the other end when there are only supposed to be tabs between the closing quote and the "S".
CSV escapes double quotes by double them so this:
"[O/H 15/02] B270 W31 ""TEXT TEXT 2 X TEXT SWITC"
is supposed to be a single filed that contains an embedded quote:
[O/H 15/02] B270 W31 "TEXT TEXT 2 X TEXT SWITC
If you have a space before the first quote or after the last quote then, since your fields are tab delimited, you have an unescaped double quote inside a field and that's where your "illegal quoting" error comes from.
Try sending your CSV file through cat -t (which should represent tabs as ^I) to find where the stray space is.

Resources