How to upload Polygons from GeoPandas to Snowflake? - upload

I have a geometry column of a geodataframe populated with polygons and I need to upload these to Snowflake.
I have been exporting the geometry column of the geodataframe to file and have tried both CSV and GeoJSON formats, but so far I either always get an error the staging table always winds up empty.
Here's my code:
design_gdf['geometry'].to_csv('polygons.csv', index=False, header=False, sep='|', compression=None)
import sqlalchemy
from sqlalchemy import create_engine
from snowflake.sqlalchemy import URL
engine = create_engine(
URL(<Snowflake Credentials Here>)
)
with engine.connect() as con:
con.execute("PUT file://<path to polygons.csv> #~ AUTO_COMPRESS=FALSE")
Then on Snowflake I run
create or replace table DB.SCHEMA.DESIGN_POLYGONS_STAGING (geometry GEOGRAPHY);
copy into DB.SCHEMA."DESIGN_POLYGONS_STAGING"
from #~/polygons.csv
FILE_FORMAT = (TYPE = CSV FIELD_DELIMITER = '|' SKIP_HEADER = 1 compression = None encoding = 'iso-8859-1');
Generates the following error:
"Number of columns in file (6) does not match that of the corresponding table (1), use file format option error_on_column_count_mismatch=false to ignore this error File '#~/polygons.csv.gz', line 3, character 1 Row 1 starts at line 2, column "DESIGN_POLYGONS_STAGING"[6] If you would like to continue loading when an error is encountered, use other values such as 'SKIP_FILE' or 'CONTINUE' for the ON_ERROR option. For more information on loading options, please run 'info loading_data' in a SQL client."
Can anyone identify what I'm doing wrong?

Inspired by #Simeon_Pilgrim's comment I went back to Snowflake's documentation. There I found an example of converting a string literal to a GEOGRAPHY.
https://docs.snowflake.com/en/sql-reference/functions/to_geography.html#examples
select to_geography('POINT(-122.35 37.55)');
My polygons looked like strings describing Polygons more than actual GEOGRAPHYs so I decided I needed to be treating them as strings and then calling TO_GEOGRAPHY() on them.
I quickly discovered that they needed to be explicitly enclosed in single quotes and copied into a VARCHAR column in the staging table. This was accomplished by modifying the CSV export code:
import csv
design_gdf['geometry'].to_csv(<path to polygons.csv>,
index=False, header=False, sep='|', compression=None, quoting=csv.QUOTE_ALL, quotechar="'")
The staging table now looks like:
create or replace table DB.SCHEMA."DESIGN_POLYGONS_STAGING" (geometry VARCHAR);
I ran into further problems copying into the staging table related to the presence of a polygons.csv.gz file I must have uploaded in a previous experiment. I deleted this file using:
remove #~/polygons.csv.gz
Finally, converting the staging table to GEOGRAPHY
create or replace table DB.SCHEMA."DESIGN_GEOGRAPHY_STAGING" (geometry GEOGRAPHY);
insert into DB.SCHEMA."DESIGN_GEOGRAPHY"
select to_geography(geometry)
from DB.SCHEMA."DESIGN_POLYGONS_STAGING"
and I wound up with a DESIGN_GEOGRAPHY table with a single column of GEOGRAPHYs in it. Success!!!

Related

Why I can't use more than one properties for UNIQUE constraint while loading the data using "bin/neo4j-admin database import full"?

I am trying to import the data using bin/neo4j-admin database import full command with 2 node files with header files and 1 relationship with header file.
But facing error that we can't use :ID for more than 1 column. Can't we use more than one columns as primary key while importing using bulk load?
Tried:
CSV files I have used in command.
node1.csv
node1_header.csv
node2.csv
node2_header.csv
relation.csv
relation_header.csv
node1.csv data having a primary key with more than one propery(columns).
So I added :ID for those columns in node1_header.csv file.
Issue faced:
`java.lang.IllegalStateException: There are multiple :ID columns, but they are referring different groups`
Expectating the following:
My data having 6 columns as Unique key. I have to specify those columns as Unique key in node1_header file.
Note: I can create `UNIQUE` constarint from the UI using Cypher query. Problem while using only `neo4j-admin database import full` from command line.

Neo4j imports zero records from csv

I am new to Neo4j and graph database. While trying to import a few relationships from a CSV file, I can see that there are no records, even when the file is filled with enough data.
LOAD CSV with headers FROM 'file:/graphdata.csv' as row WITH row
WHERE row.pName is NOT NULL
MERGE(transId:TransactionId)
MERGE(refId:RefNo)
MERGE(kewd:Keyword)
MERGE(accNo:AccountNumber {bName:row.Bank_Name, pAmt:row.Amount, pName:row.Name})
Followed by:
LOAD CSV with headers FROM 'file/graphdata.csv' as row WITH row
WHERE row.pName is NOT NULL
MATCH(transId:TransactionId)
MATCH(refId:RefNo)
MATCH(kewd:Keyword)
MATCH(accNo:AccountNumber {bName:row.Bank_Name, pAmt:row.Amount, pName:row.Name})
MERGE(transId)-[:REFERENCE]->(refId)-[:USED_FOR]->(kewd)-[:AGAINST]->(accNo)
RETURN *
Edit (table replica):
TransactionId Bank_Name RefNo Keyword Amount AccountNumber AccountName
12345 ABC 78 X 1000 5421 WE
23456 DEF X 2000 5471
34567 ABC 32 Y 3000 4759 HE
Is it likely the case that the Nodes and relationships are not created at all? How do I get all these desired relationships?
Neither file:/graphdata.csv nor file/graphdata.csv are legal URLs. You should use file:///graphdata.csv instead.
By default, LOAD CSV expects a "csv" file to consist of comma separated values. You are instead using a variable number of spaces as a separator (and sometimes as a trailer). You need to either:
use a single space as the separator (and specify an appropriate FIELDTERMINATOR option). But this is not a good idea for your data, since some bank names will likely also contain spaces.
use a comma separator (or some other character that will not occur in your data).
For example, this file format would work better:
TransactionId,Bank_Name,RefNo,Keyword,Amount,AccountNumber,AccountName
12345,ABC,78,X,1000,5421,WE
23456,DEF,,X,2000,5471
34567,ABC,32,Y,3000,4759,HE
Your Cypher query is attempting to use row properties that do not exist (since the file has no corresponding column headers). For example, your file has no pName or Name headers.
Your usage of the MERGE clause is probably not doing what you want, generally. You should carefully read the documentation, and this answer may also be helpful.

Setting dynamic properties for Node in neo4j

Assume a Node "Properties". I am using "LOAD CSV with headers..."
Following is the sample file format:
fields
a=100,b=110,c=120,d=500
How do I convert fields column to having a node with a,b,c,d and 100,110,120,500 respectively as the properties of the node "Properties"?
LOAD CSV WITH HEADERS FROM 'file:/sample.tsv' AS row FIELDTERMINATOR '\t'
CREATE (:Properties {props: row.fields})
The above does not create individual properties, but sets a string value to props as "a=100,b=110,c=120,d=500"
Also, different rows could have different set of Key values. That is the key needs to be dynamic. (There are other columns as well, I trimmed it for SO)
fields
a=100,b=110,c=120,d=500
X=300,y=210,Z=420,P=600
...
I am looking for a way to not split this key-value as columns and then load. The reason is they are dynamic - today it is a,b,c,d it may change to aa,bb,cc,dd etc.
I don't want to keep on changing my loader script to recognize new column headers.
Any pointers to solve this? I am using the latest 3.0.1 neo4j version.
First things first: Your file format currently defines a single header/property: fields:
fields
a=100,b=110,c=120,d=500
Since you defined a tab as field terminator, that entire string (a=100,b=110,c=120,d=500) would end up in your node's props property:
To have properties loaded dynamically: First set up proper header:
"a","b","x","y"
1,2,,
,,3,4
Then you can query with something like this:
LOAD CSV WITH HEADERS FROM 'file:///Users/David/overflow.csv' AS row
CREATE (:StackOverflow { a:row.a, b:row.b,x:row.x,y:row.y})
Then when you run something like:
match(so:StackOverflow) return so
You'll get the variable properties you wanted:

selecting specific rows from an unstructured csv file and writing to another file using python

I am trying to iterate through an unstrucutred csv file (it has no specific headings). The file is generated by an instrument. I would need to select specific rows that have specific column values and create another file. Below is the example of the file layout
,success, (row1)
1,2,protocol (row2)
78,f14,34(row3)
,67,34(row4)
,f14,34(row5)
3,f14,56,56(row6)
I need to select all rows with 'fi4' value. Below is the code
import csv
import sys
reader = csv.reader(open('c:/test_file.csv', newline=''), delimiter=',', quotechar='|')
for row in reader:
print(','.join(row))
I am unable to go beyond this point.
You're almost there:
for row in reader:
if row[1] == 'f14':
print(','.join(row))
You just need to check and see whether the row is one you're interested in or not by checking the value of the column and see if it's what you're looking for. That could be done with a simpleif row[1] == 'f14'conditional statement. However that would fail on any blank lines -- which it looks like your input file may have -- so you'd need to preface that check with another to make sure the row had at least that many columns in it.
To create another csv file with just those rows in it, all you'd need to write each row that passed all the checks to another file opened for output -- instead of, or in addition to, printing the row out. Here's a very concise way of just writing the rows to another file.
(Note: I'm not sure why you had thequotechar='|'in your code on thecsv.reader()call because there aren't any quote characters in the input file shown, so I left it out in the code below -- you might need to add it back if indeed that's what it would be if there were any.)
import csv
with open('test_file.csv', newline='') as infile, \
open('test_file_out.csv', 'w', newline='') as outfile:
csv.writer(outfile).writerows(row for row in csv.reader(infile)
if len(row) >= 2 and row[1] == 'f14')
Contents of'test_file_out.csv'file afterwards:
78,f14,34(row3)
,f14,34(row5)
3,f14,56,56(row6)

How can I prevent SQL injections during CSV uploads?

I've just started learning about Rails security, and I'm wondering how I can avoid security issues while allowing users to upload CSV files into our database. We're using Postgres' "copy from stdin" functionality to upload the data from the CSV into a temp table, which is then used for upserts into another table. This is the basic code (thanks to this post):
conn = ActiveRecord::Base.connection_pool.checkout
raw = conn.raw_connection
raw.exec("COPY temp_table (col1, col2) FROM STDIN DELIMITER '|'")
# read column values from the CSV line by line in the following format:
# attributes = {column_1: 'column 1 data', column_2: 'column 2 data'}
# line = "#{attributes.values.join('|')}\n"
rc.put_copy_data line
# wrap up copy process & insert into & update primary table
I am wondering what I can or should do to sanitize the column values. We're using Rails 3.2 and Postgres 9.2.
No action is required; COPY never interprets the values as SQL syntax. Malformed CSV will produce an error due to bad quoting / incorrect column count. If you're sending your own data line-by-line you should probably exclude a line containing a single \. followed by a newline, but otherwise it's rather safe.
PostgreSQL doesn't sanitize the data in any way, it just handles it safely. So if you accept a string ');DROP TABLE customer;-- in your CSV it's quite safe in COPY. However, if your application reads that out of the database, assumes that "because it came from the database not the user it's safe," and interpolates it into an SQL string you're still just as stuffed.
Similarly, incorrect use of PL/PgSQL functions where EXECUTE is used with unsafe string concatenation will create problems. You must use of format and the %I or %L specifiers, use quote_literal / quote_ident, or (for literals) use EXECUTE ... USING.
This is not just true of COPY, it's the same if you do an INSERT of the manipulated data then use it unsafely after reading it back from the DB.

Resources