Reading "Awkward" CSV Files with FSharp CsvParser - f#

I have a large file (200K - 300K lines of text).
It's almost but not quite a CSV file.
The column headers are on the second row, there's a row of dummy text
before that.
There are rows interspersed with the actual data rows. They have
commas, but most of the columns are blank. They aren't relevant to me.
I need to read this file efficiently, and parse the lines that actually are
valid, as CSV data.
My first idea was to write a clean procedure that strips out the first line, and the blank lines, leaving only the headers and details that I want
in a CSV File that the CsvParser can read.
This is easy enough, just ReadLine from a StreamReader, I can keep or disregard each line just by looking at it as a string.
Now though I have a new issue.
There is a column in the valid data that I can use to disregard a whole lot more rows.
If I read the Cleaned file using the CsvParser it's easy to filter by that column.
But, I don't really want to waste writing the rows I don't need to the Clean file.
I'd like to be able to check that Column, while Cleaning the File. But, at that point I'm working with strings representing entire lines. It's not easy to get at the specific column I want.
I can't Split on ',' there may be commas in the text of other columns.
I'm ending up writing the Csv Parsing Logic, that I was using CsvParser for in the first place.
Ideally, I'd like to read in the existing file, clean out the lines that I can based on strings, then somehow parse the resulting seq using the CsvParser.
I see CsvFile can Load from Streams and Readers, but I'm not sure that's much help.
Any suggestions or am I just asking too much? Should I just deal with the extra filtering on loading the Cleaned File?

You can avoid doing most of the work of parsing by using the CsvFile class directly.
The F# Data documentation has some extended examples that show how to do this in some detail.
Skipping over lines at the start of a file is handled by the skipRows parameter. Passing the ignoreErrors parameter will also ignore rows that fail to parse.
open FSharp.Data
let csv = CsvFile.Load(file, skipRows=1, ignoreErrors=true)
for row in csv.Rows do
printfn "%s" row.GetColumn "Name"
If you have to do more complex filtering of rows, a simple approach that doesn't require temporary files is to filter the results of File.ReadLines and pass that to CsvFile.Parse.
The example below skips a six-line prelude, reads in lines until it hits a blank line, uses CsvFile to parse the data, and finally filters the resulting rows to those of interest.
let tableA =
File.ReadLines(file)
|> Seq.skip(6)
|> Seq.takeWhile(fun l -> String.length l > 0)
|> String.concat "\n"
let csv = CsvFile.Parse(tableA)
for row in csv.Rows.Filter(fun row -> row?Close.AsFloat() > row?Open.AsFloat()) do
printfn "%s" row.GetColumn "Name"

Related

check for matching rows in csv file ruby

I am very new to ruby and I want to check for rows with the same number in a csv file.
What I am trying to do is go through the input csv file and copy element from the input file to the output file also adding another column called "duplicate" to the output file, then check if a similar phone is already in the output file while copying data from input to output then if the phone already exist, add "dupl" to the row in the duplicate column.
This is what I have.
file=CSV.read('input_file.csv')
output_file=File.open("output2.csv","w")
for row in file
output_file.write(row)
output_file.write("\n")
end
output_file.close
Example:
Phone
(202) 221-1323
(201) 321-0243
(202) 221-1323
(310) 343-4923
output file
Phone
Duplicate
(202) 221-1323
(201) 321-0243
(202) 221-1323
dupl
(310) 343-4923
So basically you want to write the input to output and append a "dupl" on the second occurrence of a duplicate?
Your input to output seems fine. To get the "dupl" flag, simply count the occurrence of each number in the list. If it's more than one, its a duplicate. But since you only want the flag to be shown on the second occurrence just count how often the number appeared up until that point:
lines = CSV.read('input_file.csv')
lines.each_with_index do |l,i|
output_file.write(l + ",")
if lines.take(i).count(l) >= 1
output_file.write("dupl")
end
output_file.write("\n")
end
l is the current line. take(i) is all lines before but not including the current line and count(l) applied to this counts how often the number appeared before if it's more than one, print a "dupl"
There probably is a more efficient answer to this, this is just a quick and easy to understand version.

Neo4j imports zero records from csv

I am new to Neo4j and graph database. While trying to import a few relationships from a CSV file, I can see that there are no records, even when the file is filled with enough data.
LOAD CSV with headers FROM 'file:/graphdata.csv' as row WITH row
WHERE row.pName is NOT NULL
MERGE(transId:TransactionId)
MERGE(refId:RefNo)
MERGE(kewd:Keyword)
MERGE(accNo:AccountNumber {bName:row.Bank_Name, pAmt:row.Amount, pName:row.Name})
Followed by:
LOAD CSV with headers FROM 'file/graphdata.csv' as row WITH row
WHERE row.pName is NOT NULL
MATCH(transId:TransactionId)
MATCH(refId:RefNo)
MATCH(kewd:Keyword)
MATCH(accNo:AccountNumber {bName:row.Bank_Name, pAmt:row.Amount, pName:row.Name})
MERGE(transId)-[:REFERENCE]->(refId)-[:USED_FOR]->(kewd)-[:AGAINST]->(accNo)
RETURN *
Edit (table replica):
TransactionId Bank_Name RefNo Keyword Amount AccountNumber AccountName
12345 ABC 78 X 1000 5421 WE
23456 DEF X 2000 5471
34567 ABC 32 Y 3000 4759 HE
Is it likely the case that the Nodes and relationships are not created at all? How do I get all these desired relationships?
Neither file:/graphdata.csv nor file/graphdata.csv are legal URLs. You should use file:///graphdata.csv instead.
By default, LOAD CSV expects a "csv" file to consist of comma separated values. You are instead using a variable number of spaces as a separator (and sometimes as a trailer). You need to either:
use a single space as the separator (and specify an appropriate FIELDTERMINATOR option). But this is not a good idea for your data, since some bank names will likely also contain spaces.
use a comma separator (or some other character that will not occur in your data).
For example, this file format would work better:
TransactionId,Bank_Name,RefNo,Keyword,Amount,AccountNumber,AccountName
12345,ABC,78,X,1000,5421,WE
23456,DEF,,X,2000,5471
34567,ABC,32,Y,3000,4759,HE
Your Cypher query is attempting to use row properties that do not exist (since the file has no corresponding column headers). For example, your file has no pName or Name headers.
Your usage of the MERGE clause is probably not doing what you want, generally. You should carefully read the documentation, and this answer may also be helpful.

Convert data source UICollectionView to CSV file

I search how to convert UICollectionView to CSV file and send it with Mail.
I have a collection view like the photo and I want to export the table and send it. I search and found that the best way is to convert to CSV file.
If you have other suggestion, just tell me.
As #Larme has pointed out, converting this to a CSV file has nothing to do with the visual representation in the collection view. You simply need to parse the data source to CSV. CSV stands for Comma Separated Value, which in turn means a type of file where tabular data is encoded using a delimiter between each data point (this is generally a comma, but could be anything), and a new line for each line of the table. Think of the delimiter as the vertical line between each column of the table, and the new line as the row:
So your CSV text file might look like this:
TITLEFORCOLUMN1, TITLEFORCOLUMN2, TITLEFORCOLUMN3
ROWTITLEONE, 200, 300
ROWTITLETWO, 400, 500
and so on. It's not quite this simple, and there are rules that you should follow, especially if you intend the CSV file to be consumed by third parties. There is an official specification which you can look at, but you can also get a lot of tips by searching 'CSV file specification'.
You then need to create a string by iterating through your data source. Start off by creating the line specifying the headers, then add a newline character and then add your data. So for the above example you could do something like (assuming the data is set out as a two dimensional array)
var myCSVString : String = "TITLEFORCOLUMN1, TITLEFORCOLUMN2, TITLEFORCOLUMN3\n"
for lineItem in myDataSource {
myCSVString = myCSVString + lineItem[0] + ", " + lineItem[1] + ", " + lineItem[2] + "\n"
}
Then write the string to file.
You'll need to do more research yourself but hopefully that will set you off in the right direction.

How to check the CSV column consistency?

I have a CSV file like:
Header: 1,2,3,4
content: a,b,c,d,e
a,b,c,d
a,b
a,b,c,d,d
Is there any CSV method that I can use to easily validate the column consistency instead of
parsing the CSV line by line?
One way or another the whole file has to be read.
Here is a relative simple way. First the file is read and converted to an array which is then mapped to another array based on length (number of fields per row). This array is the checked if the length is always the same.
If you'd hate to read the file twice you could remember the length of the header and while you parse the file check each record if it has the same number of fields and otherwise trow an exeption.
require "csv"
def valid? file
a = CSV.read(file).map { |e|e.length }
a.min == a.max
end
p valid?("data.csv")
csv_validator gem would be helpful here.

selecting specific rows from an unstructured csv file and writing to another file using python

I am trying to iterate through an unstrucutred csv file (it has no specific headings). The file is generated by an instrument. I would need to select specific rows that have specific column values and create another file. Below is the example of the file layout
,success, (row1)
1,2,protocol (row2)
78,f14,34(row3)
,67,34(row4)
,f14,34(row5)
3,f14,56,56(row6)
I need to select all rows with 'fi4' value. Below is the code
import csv
import sys
reader = csv.reader(open('c:/test_file.csv', newline=''), delimiter=',', quotechar='|')
for row in reader:
print(','.join(row))
I am unable to go beyond this point.
You're almost there:
for row in reader:
if row[1] == 'f14':
print(','.join(row))
You just need to check and see whether the row is one you're interested in or not by checking the value of the column and see if it's what you're looking for. That could be done with a simpleif row[1] == 'f14'conditional statement. However that would fail on any blank lines -- which it looks like your input file may have -- so you'd need to preface that check with another to make sure the row had at least that many columns in it.
To create another csv file with just those rows in it, all you'd need to write each row that passed all the checks to another file opened for output -- instead of, or in addition to, printing the row out. Here's a very concise way of just writing the rows to another file.
(Note: I'm not sure why you had thequotechar='|'in your code on thecsv.reader()call because there aren't any quote characters in the input file shown, so I left it out in the code below -- you might need to add it back if indeed that's what it would be if there were any.)
import csv
with open('test_file.csv', newline='') as infile, \
open('test_file_out.csv', 'w', newline='') as outfile:
csv.writer(outfile).writerows(row for row in csv.reader(infile)
if len(row) >= 2 and row[1] == 'f14')
Contents of'test_file_out.csv'file afterwards:
78,f14,34(row3)
,f14,34(row5)
3,f14,56,56(row6)

Resources