I'm trying to create a set of labeled nodes using IMPORT CSV like so:
LOAD CSV WITH HEADERS FROM "file:/D:/OpenData/ProKB/tmp/ErrLink.csv" as line
CREATE (e:ErrLink {kbid:line.Kbid, errnum:line.Errnum })
The CSV file looks like this:
"Kbid:string","Errnum:string"
"S000001080","64"
"S000001096","129"
The problem I'm running into is I'm creating nodes, and they're all property-less. If I get rid of the :string suffixes on the header fields, then the load works.
This is contrary to what Chapter 29.1 of the docs says:
29.1. CSV file header format
The header row of each data source specifies how the fields should be interpreted. The same delimiter is used for the header row as for the rest of the data.
The header contains information for each field, with the format: name:field_type. The name is used as the property key for values, and ignored in other cases. The following field_type settings can be used for both nodes and relationships:
Property value Use one of int, long, float, double, boolean, byte, short, char, string to designate the data type. If no data type is given, this defaults to string. To define an array type, append [] to the type. Array values are by default delimited by a ;, but a different delimiter can be specified.
Is this functionality not working, or is it restricted to just the import tool and not the language?
That section of the documentation is for the Import Tool, which is different than the Cypher language's Load CSV clause.
If you are using the latter, that special header format is not documented, and apparently not supported.
Related
I have a geometry column of a geodataframe populated with polygons and I need to upload these to Snowflake.
I have been exporting the geometry column of the geodataframe to file and have tried both CSV and GeoJSON formats, but so far I either always get an error the staging table always winds up empty.
Here's my code:
design_gdf['geometry'].to_csv('polygons.csv', index=False, header=False, sep='|', compression=None)
import sqlalchemy
from sqlalchemy import create_engine
from snowflake.sqlalchemy import URL
engine = create_engine(
URL(<Snowflake Credentials Here>)
)
with engine.connect() as con:
con.execute("PUT file://<path to polygons.csv> #~ AUTO_COMPRESS=FALSE")
Then on Snowflake I run
create or replace table DB.SCHEMA.DESIGN_POLYGONS_STAGING (geometry GEOGRAPHY);
copy into DB.SCHEMA."DESIGN_POLYGONS_STAGING"
from #~/polygons.csv
FILE_FORMAT = (TYPE = CSV FIELD_DELIMITER = '|' SKIP_HEADER = 1 compression = None encoding = 'iso-8859-1');
Generates the following error:
"Number of columns in file (6) does not match that of the corresponding table (1), use file format option error_on_column_count_mismatch=false to ignore this error File '#~/polygons.csv.gz', line 3, character 1 Row 1 starts at line 2, column "DESIGN_POLYGONS_STAGING"[6] If you would like to continue loading when an error is encountered, use other values such as 'SKIP_FILE' or 'CONTINUE' for the ON_ERROR option. For more information on loading options, please run 'info loading_data' in a SQL client."
Can anyone identify what I'm doing wrong?
Inspired by #Simeon_Pilgrim's comment I went back to Snowflake's documentation. There I found an example of converting a string literal to a GEOGRAPHY.
https://docs.snowflake.com/en/sql-reference/functions/to_geography.html#examples
select to_geography('POINT(-122.35 37.55)');
My polygons looked like strings describing Polygons more than actual GEOGRAPHYs so I decided I needed to be treating them as strings and then calling TO_GEOGRAPHY() on them.
I quickly discovered that they needed to be explicitly enclosed in single quotes and copied into a VARCHAR column in the staging table. This was accomplished by modifying the CSV export code:
import csv
design_gdf['geometry'].to_csv(<path to polygons.csv>,
index=False, header=False, sep='|', compression=None, quoting=csv.QUOTE_ALL, quotechar="'")
The staging table now looks like:
create or replace table DB.SCHEMA."DESIGN_POLYGONS_STAGING" (geometry VARCHAR);
I ran into further problems copying into the staging table related to the presence of a polygons.csv.gz file I must have uploaded in a previous experiment. I deleted this file using:
remove #~/polygons.csv.gz
Finally, converting the staging table to GEOGRAPHY
create or replace table DB.SCHEMA."DESIGN_GEOGRAPHY_STAGING" (geometry GEOGRAPHY);
insert into DB.SCHEMA."DESIGN_GEOGRAPHY"
select to_geography(geometry)
from DB.SCHEMA."DESIGN_POLYGONS_STAGING"
and I wound up with a DESIGN_GEOGRAPHY table with a single column of GEOGRAPHYs in it. Success!!!
I am new to Neo4j and graph database. While trying to import a few relationships from a CSV file, I can see that there are no records, even when the file is filled with enough data.
LOAD CSV with headers FROM 'file:/graphdata.csv' as row WITH row
WHERE row.pName is NOT NULL
MERGE(transId:TransactionId)
MERGE(refId:RefNo)
MERGE(kewd:Keyword)
MERGE(accNo:AccountNumber {bName:row.Bank_Name, pAmt:row.Amount, pName:row.Name})
Followed by:
LOAD CSV with headers FROM 'file/graphdata.csv' as row WITH row
WHERE row.pName is NOT NULL
MATCH(transId:TransactionId)
MATCH(refId:RefNo)
MATCH(kewd:Keyword)
MATCH(accNo:AccountNumber {bName:row.Bank_Name, pAmt:row.Amount, pName:row.Name})
MERGE(transId)-[:REFERENCE]->(refId)-[:USED_FOR]->(kewd)-[:AGAINST]->(accNo)
RETURN *
Edit (table replica):
TransactionId Bank_Name RefNo Keyword Amount AccountNumber AccountName
12345 ABC 78 X 1000 5421 WE
23456 DEF X 2000 5471
34567 ABC 32 Y 3000 4759 HE
Is it likely the case that the Nodes and relationships are not created at all? How do I get all these desired relationships?
Neither file:/graphdata.csv nor file/graphdata.csv are legal URLs. You should use file:///graphdata.csv instead.
By default, LOAD CSV expects a "csv" file to consist of comma separated values. You are instead using a variable number of spaces as a separator (and sometimes as a trailer). You need to either:
use a single space as the separator (and specify an appropriate FIELDTERMINATOR option). But this is not a good idea for your data, since some bank names will likely also contain spaces.
use a comma separator (or some other character that will not occur in your data).
For example, this file format would work better:
TransactionId,Bank_Name,RefNo,Keyword,Amount,AccountNumber,AccountName
12345,ABC,78,X,1000,5421,WE
23456,DEF,,X,2000,5471
34567,ABC,32,Y,3000,4759,HE
Your Cypher query is attempting to use row properties that do not exist (since the file has no corresponding column headers). For example, your file has no pName or Name headers.
Your usage of the MERGE clause is probably not doing what you want, generally. You should carefully read the documentation, and this answer may also be helpful.
My pipeline looks similar to the following:
parDo return list per processed line | beam.io.WriteToText
beam.io.WriteToText adds a new line after each list element. How can I remove this new line and have the values separated by comma so I will be able to build CSV file
Any help is very appreciated!
Thanks,
eilalan
To remove the newline char, you can use this:
beam.io.WriteToText(append_trailing_newlines=False)
But for adding commas between your values, there's no out-of-the-box feature on TextIO to convert to CSV. But, you can check this answer for a user defined PTransform that can be applied to your PCollection in order to convert dictionary data into csv data.
Assume a Node "Properties". I am using "LOAD CSV with headers..."
Following is the sample file format:
fields
a=100,b=110,c=120,d=500
How do I convert fields column to having a node with a,b,c,d and 100,110,120,500 respectively as the properties of the node "Properties"?
LOAD CSV WITH HEADERS FROM 'file:/sample.tsv' AS row FIELDTERMINATOR '\t'
CREATE (:Properties {props: row.fields})
The above does not create individual properties, but sets a string value to props as "a=100,b=110,c=120,d=500"
Also, different rows could have different set of Key values. That is the key needs to be dynamic. (There are other columns as well, I trimmed it for SO)
fields
a=100,b=110,c=120,d=500
X=300,y=210,Z=420,P=600
...
I am looking for a way to not split this key-value as columns and then load. The reason is they are dynamic - today it is a,b,c,d it may change to aa,bb,cc,dd etc.
I don't want to keep on changing my loader script to recognize new column headers.
Any pointers to solve this? I am using the latest 3.0.1 neo4j version.
First things first: Your file format currently defines a single header/property: fields:
fields
a=100,b=110,c=120,d=500
Since you defined a tab as field terminator, that entire string (a=100,b=110,c=120,d=500) would end up in your node's props property:
To have properties loaded dynamically: First set up proper header:
"a","b","x","y"
1,2,,
,,3,4
Then you can query with something like this:
LOAD CSV WITH HEADERS FROM 'file:///Users/David/overflow.csv' AS row
CREATE (:StackOverflow { a:row.a, b:row.b,x:row.x,y:row.y})
Then when you run something like:
match(so:StackOverflow) return so
You'll get the variable properties you wanted:
Now I need to find a particular entry in a journal using a CL program. The way I use to locate it is to DSPJRNE to put the journal entries in an output file, then use OPNQRYF to filter the desired one. The file is uniquely keyed so my plan is to compare the journal entry data with the key. The problem is that one of the key is a packed decimal so in the journal entry it is treated as hexadecimal code of characters and displayed as some strange symbols. So in order to compare the strings I need to convert the packed decimal key into the corresponding characters. How to achieve this in CL? If using CL is not possible, what about RPG?
To answer your immediate question, the CVTCH MI instruction will convert hex to char but I would not go that route; neither in CL nor RPG. Rather, I would take James' advice with a few additional steps.
DSPJRNE OUTFILE(QTEMP/DSPJRNE)
QRY input file DSPJRNE, output file QRYJRNE, select only JOESD
CRTDUPOBJ PRODUCTION_FILE QTEMP/JRNF DATA(*NO)
CPYF QRYJRNE JRNF FMTOPT(*NOCHK)
This will give you an externally described file with the exact same layout as your production file. You can query that, etc.
If you are pulling journal entries for a specific file you can dump them into an externally described file with a clever use of SQL:
CREATE TABLE QTEMP/QADSPJRN LIKE QSYS/QADSPJRN
ALTER TABLE QTEMP/QADSPJRN DROP COLUMN JOESD
CREATE TABLE QTEMP/DSPJRNE AS (SELECT * FROM QTEMP/QADSPJRN, FILE-LIB/FILE)
WITH NO DATA
DSPJRNE ... OUTPUT(*OUTFILE) OUTFILFMT(*TYPE1) OUTFILE(QTEMP/DSPJRNE)
ENDDTALEN(*CALC)