How can I prevent SQL injections during CSV uploads? - ruby-on-rails

I've just started learning about Rails security, and I'm wondering how I can avoid security issues while allowing users to upload CSV files into our database. We're using Postgres' "copy from stdin" functionality to upload the data from the CSV into a temp table, which is then used for upserts into another table. This is the basic code (thanks to this post):
conn = ActiveRecord::Base.connection_pool.checkout
raw = conn.raw_connection
raw.exec("COPY temp_table (col1, col2) FROM STDIN DELIMITER '|'")
# read column values from the CSV line by line in the following format:
# attributes = {column_1: 'column 1 data', column_2: 'column 2 data'}
# line = "#{attributes.values.join('|')}\n"
rc.put_copy_data line
# wrap up copy process & insert into & update primary table
I am wondering what I can or should do to sanitize the column values. We're using Rails 3.2 and Postgres 9.2.

No action is required; COPY never interprets the values as SQL syntax. Malformed CSV will produce an error due to bad quoting / incorrect column count. If you're sending your own data line-by-line you should probably exclude a line containing a single \. followed by a newline, but otherwise it's rather safe.
PostgreSQL doesn't sanitize the data in any way, it just handles it safely. So if you accept a string ');DROP TABLE customer;-- in your CSV it's quite safe in COPY. However, if your application reads that out of the database, assumes that "because it came from the database not the user it's safe," and interpolates it into an SQL string you're still just as stuffed.
Similarly, incorrect use of PL/PgSQL functions where EXECUTE is used with unsafe string concatenation will create problems. You must use of format and the %I or %L specifiers, use quote_literal / quote_ident, or (for literals) use EXECUTE ... USING.
This is not just true of COPY, it's the same if you do an INSERT of the manipulated data then use it unsafely after reading it back from the DB.

Related

How to upload Polygons from GeoPandas to Snowflake?

I have a geometry column of a geodataframe populated with polygons and I need to upload these to Snowflake.
I have been exporting the geometry column of the geodataframe to file and have tried both CSV and GeoJSON formats, but so far I either always get an error the staging table always winds up empty.
Here's my code:
design_gdf['geometry'].to_csv('polygons.csv', index=False, header=False, sep='|', compression=None)
import sqlalchemy
from sqlalchemy import create_engine
from snowflake.sqlalchemy import URL
engine = create_engine(
URL(<Snowflake Credentials Here>)
)
with engine.connect() as con:
con.execute("PUT file://<path to polygons.csv> #~ AUTO_COMPRESS=FALSE")
Then on Snowflake I run
create or replace table DB.SCHEMA.DESIGN_POLYGONS_STAGING (geometry GEOGRAPHY);
copy into DB.SCHEMA."DESIGN_POLYGONS_STAGING"
from #~/polygons.csv
FILE_FORMAT = (TYPE = CSV FIELD_DELIMITER = '|' SKIP_HEADER = 1 compression = None encoding = 'iso-8859-1');
Generates the following error:
"Number of columns in file (6) does not match that of the corresponding table (1), use file format option error_on_column_count_mismatch=false to ignore this error File '#~/polygons.csv.gz', line 3, character 1 Row 1 starts at line 2, column "DESIGN_POLYGONS_STAGING"[6] If you would like to continue loading when an error is encountered, use other values such as 'SKIP_FILE' or 'CONTINUE' for the ON_ERROR option. For more information on loading options, please run 'info loading_data' in a SQL client."
Can anyone identify what I'm doing wrong?
Inspired by #Simeon_Pilgrim's comment I went back to Snowflake's documentation. There I found an example of converting a string literal to a GEOGRAPHY.
https://docs.snowflake.com/en/sql-reference/functions/to_geography.html#examples
select to_geography('POINT(-122.35 37.55)');
My polygons looked like strings describing Polygons more than actual GEOGRAPHYs so I decided I needed to be treating them as strings and then calling TO_GEOGRAPHY() on them.
I quickly discovered that they needed to be explicitly enclosed in single quotes and copied into a VARCHAR column in the staging table. This was accomplished by modifying the CSV export code:
import csv
design_gdf['geometry'].to_csv(<path to polygons.csv>,
index=False, header=False, sep='|', compression=None, quoting=csv.QUOTE_ALL, quotechar="'")
The staging table now looks like:
create or replace table DB.SCHEMA."DESIGN_POLYGONS_STAGING" (geometry VARCHAR);
I ran into further problems copying into the staging table related to the presence of a polygons.csv.gz file I must have uploaded in a previous experiment. I deleted this file using:
remove #~/polygons.csv.gz
Finally, converting the staging table to GEOGRAPHY
create or replace table DB.SCHEMA."DESIGN_GEOGRAPHY_STAGING" (geometry GEOGRAPHY);
insert into DB.SCHEMA."DESIGN_GEOGRAPHY"
select to_geography(geometry)
from DB.SCHEMA."DESIGN_POLYGONS_STAGING"
and I wound up with a DESIGN_GEOGRAPHY table with a single column of GEOGRAPHYs in it. Success!!!

Neo4j imports zero records from csv

I am new to Neo4j and graph database. While trying to import a few relationships from a CSV file, I can see that there are no records, even when the file is filled with enough data.
LOAD CSV with headers FROM 'file:/graphdata.csv' as row WITH row
WHERE row.pName is NOT NULL
MERGE(transId:TransactionId)
MERGE(refId:RefNo)
MERGE(kewd:Keyword)
MERGE(accNo:AccountNumber {bName:row.Bank_Name, pAmt:row.Amount, pName:row.Name})
Followed by:
LOAD CSV with headers FROM 'file/graphdata.csv' as row WITH row
WHERE row.pName is NOT NULL
MATCH(transId:TransactionId)
MATCH(refId:RefNo)
MATCH(kewd:Keyword)
MATCH(accNo:AccountNumber {bName:row.Bank_Name, pAmt:row.Amount, pName:row.Name})
MERGE(transId)-[:REFERENCE]->(refId)-[:USED_FOR]->(kewd)-[:AGAINST]->(accNo)
RETURN *
Edit (table replica):
TransactionId Bank_Name RefNo Keyword Amount AccountNumber AccountName
12345 ABC 78 X 1000 5421 WE
23456 DEF X 2000 5471
34567 ABC 32 Y 3000 4759 HE
Is it likely the case that the Nodes and relationships are not created at all? How do I get all these desired relationships?
Neither file:/graphdata.csv nor file/graphdata.csv are legal URLs. You should use file:///graphdata.csv instead.
By default, LOAD CSV expects a "csv" file to consist of comma separated values. You are instead using a variable number of spaces as a separator (and sometimes as a trailer). You need to either:
use a single space as the separator (and specify an appropriate FIELDTERMINATOR option). But this is not a good idea for your data, since some bank names will likely also contain spaces.
use a comma separator (or some other character that will not occur in your data).
For example, this file format would work better:
TransactionId,Bank_Name,RefNo,Keyword,Amount,AccountNumber,AccountName
12345,ABC,78,X,1000,5421,WE
23456,DEF,,X,2000,5471
34567,ABC,32,Y,3000,4759,HE
Your Cypher query is attempting to use row properties that do not exist (since the file has no corresponding column headers). For example, your file has no pName or Name headers.
Your usage of the MERGE clause is probably not doing what you want, generally. You should carefully read the documentation, and this answer may also be helpful.

How to create serial number for ruby on rails?

I want to create ticket number by serial number, eg. T-0001, T-0002, T-0003,
for ruby on rails project. How to make this?
Admission.transaction do
cus = #admission.customer
cus.inpatient_id = cus.inpatient_id || "I-%.6d" % cus.id
cus.save
end
Most rails servers are multi-threaded. Meaning many requests will be processed in parallel. You can imagine two processes trying to create a new serial number at the same point in time - duplicate ticket numbers! - not what we expect for sure.
It is better we delegate this task of creating ids to the database itself. So instead of the default auto-increment ids (1,2,3,4...), we will tell database to create ids in this format (T-0001, T-0002, ...). This can be achieved using custom sequences. I am assuming postgres database here, but should be same for mysql.
First create sequence
CREATE SEQUENCE ticket_seq;
But sequences don't allow strings so we convert them to strings and format them:
SELECT 'T-'||to_char(nextval('ticket_seq'), 'FM0000');
This will return values like T-0001, T-0002 ...
Note: We have just created a sequence, you need to tell database to use this sequence instead.
Check: https://stackoverflow.com/a/10736871/3507206
here is just sample to generate your required formatted series on range:
> (0..5).map{|e| "T-#{e.to_s.rjust(4, "0")}"}
#=> ["T-0000", "T-0001", "T-0002", "T-0003", "T-0004", "T-0005"]
If you are using PG / MySQL you can use object's id for unique number (ID- primary key is always serialize and unique)
UPDATE: as per OP's comment:
Admission.transaction do
cus = #admission.customer
cus.inpatient_id = cus.inpatient_id || "T-#{cus.id.to_s.rjust(4, "0")}"
cus.save
end

How to insert large CSV into a CLOB column with DB2/Rails

Problem: I have a large CSV that i want to insert into DB2 table with Rails
Description: The CSV is about 2k lines/8K characters. The CLOB column is set up to handle over 10K characters. I can insert the CSV just fine though RubyMine database console. However my app crashes.
ActiveRecord produces one huge insert query. Code:
Logger.create(csv: csv_data.to_s)
DB2 returns an error:
ActiveRecord::JDBCError: [SQL0102] String constant beginning with 'foobar' too long.
I can insert huge PDF files into BLOB columns just fine using similar code. I tried creating the record first and then updating it with data, no difference.
This problem is the same as this. Except I need a Rails solution, rather than general one
Found a hack around this by splitting the csv_data into chunks and appending them to the column
update_attribute(:csv, '') if self.csv.nil? # Can't CONCAT to nil
# Split csv_data into chunks, concatenate each one to the field
csv_data.scan(/.{1,6144}/m).each do |part|
parm = ActiveRecord::Base.connection.quote(part)
ActiveRecord::Base.connection.execute("update #{Logger.table_name} set csv = CONCAT(csv, #{parm}) where id = #{self.id}")
end

Cannot query for a long string in ActiveRecord

I have an active record Song model with a songhash field (string(255)) that contains a Sha2 hash. When I try to find a song via the following code nothing gets returned:
song = Song.all.first
song2 = Song.where(songhash: song.songhash).first
# song is a valid object with a songhash set, but song2 is nil!
If I do the same thing however with a "like" query it works:
song = Song.all.first
song2 = Song.where("songhash like ?", song.songhash).first
# song2 is a valid object now
song2.songhash == song.songhash
# the equation is true
I fear it has something to do with string encodings but I have no idea why this string could possibly have encoding issues: 61a9761b9ebd543b72c5ccf2ab6db198b067f7cf7f8412ee6e9c14b19611bc80
I'm using rails 3.1 with sqlite db.
Any ideas what's going on?
Summary
SQL statements generated
# With = / It doesn't work
SELECT "songs".* FROM "songs" WHERE "songs"."type"
IN ('PlaylistSong') AND "songs"."songhash" =
'61a9761b9ebd543b72c5ccf2ab6db198b067f7cf7f8412ee6e9c14b19611bc80'
# With like / It works
SELECT "songs".* FROM "songs" WHERE "songs"."type"
IN ('PlaylistSong') AND (songhash like
'61a9761b9ebd543b72c5ccf2ab6db198b067f7cf7f8412ee6e9c14b19611bc80')
# With upper / It works
SELECT "songs".* FROM "songs" WHERE "songs"."type"
IN ('PlaylistSong') AND (UPPER(songhash) =
'61A9761B9EBD543B72C5CCF2AB6DB198B067F7CF7F8412EE6E9C14B19611BC80')
The following statements work:
Song.where(['UPPER(songhash) = ?', song.songhash.upcase]).first
Song.where(['songhash like ?', song.songhash]).first
UPPER and LIKE are both case insensitive
SQLite documentation
The LIKE operator does a pattern matching comparison. (A bug: SQLite only understands upper/lower case for ASCII characters by default. The LIKE operator is case sensitive by default for unicode characters that are beyond the ASCII range. For example, the expression 'a' LIKE 'A' is TRUE but 'æ' LIKE 'Æ' is FALSE.) See more
To investigage
Charset equals? (Rails - SQLite)
String stored potentially dirty (carriage returns, ...)
Thanks to help of #gazler and #basgys I was able to track down the problem:
It is in fact an encoding problem caused by the Digest::Sha2#hexdigest function. It returns a string that is encoded as ASCII-8BIT. When storing this in the database it seems to be converted automatically to a UTF-8 string (I check that by running a select hex(songhash) from songs query). However when using the string in the query, it does not seem to do that conversion.
Internally ruby seems to handle the different encoding conversions automatically. That is why "abc"=="abc" although they may have different encoding.
I'm sure that this is not expected behavior, however I don't know if it is a bug - and if it is a bug whether it is somewhere within ActiveRecord, the SQLite Driver, or SQLite itself.
My solution is now to append a .encode("UTF-8") to the result of the digest function.

Resources