How to fix incorrect data when loading from azure storage using external table in snowflake - parsing

I was trying to load data from azure blob storage using snowflake's external table like this after creating the stage area(blob_tb2_434) :
CREATE OR REPLACE EXTERNAL TABLE mydb.public.tb2_434
WITH LOCATION = #mydb.public.blob_tb2_434/
FILE_FORMAT = (TYPE = CSV SKIP_HEADER = 1 FIELD_OPTIONALLY_ENCLOSED_BY='"' )
PATTERN='.*.tsv';
The above external table will load every file with tsv extension from the blob defined in the stage area. I used FIELD_OPTIONALLY_ENCLOSED_BY property because there are tabs, and other characters inside some of the fields enclosed with double quotes.
One of the files in the blob has "A"" where it was supposed to be A""" and the entire record is being read as null in the external table.
How can I use replace function from snowflake or other preprocessing to parse the data correctly?
I was trying to load the entire data using '\n' as the FIELD_DELIMITER and parsing it locally but it isn't working.

Related

Azure Data Factory Get Metadata to get blob filenames and transfer them to Azure SQL database table

I am trying to use Get Metadata activity in Azure Data Factory in order to get blob filenames and copy them to Azure SQL database table.
I follow this tutorial: https://www.mssqltips.com/sqlservertip/6246/azure-data-factory-get-metadata-example/
Here is my pipeline, Copy Data > Source is the source destination of the blob files in my Blob storage. I need to specify my source file as binary because they are *.jpeg files.
For my Copy Data > Sink, its the Azure SQL database, I enable the option "Auto Create table"
In my Sink dataset config, I had to choose one table because the validation won't pass if I don't select the table in my SQL database even though this table is not related at all to the blob filenames that I want to get.
Question 1: Am I supposed to create a new table in SQL DB before to have the columns matching the blob filenames that I want to extract?
Then, I tried to validate the pipeline and I get this error.
Copy_Data_1
Sink must be binary when source is binary dataset.
Question 2: How can I resolve this error? I had to select the file type of the source as binary as it's one of the step when creating source dataset. Therefore, when I choose sink dataset that is Azure SQL table, I didn't have to select the type of dataset so it doesn't seem to match.
Thank you very much in advance.
New screenshot of the new pipeline, I can now get itemName of filenames in the json output files.
Now I add Copy Data activity just after Get_File_Name2 activity and connect them together to try to get the json output files as source dataset.
However, I need to choose the source dataset location first before specify type as json. But, as far as I understand these output json files are the output from Get_File_Name2 activity and they are not yet stored on Blob storage. How do I make the copy data activity reading these json output file as source dataset?
Update 10/14/2020
Here is my new activity stored procedure, I added the parameter as suggested however, I changed the name to JsonData as my stored procedure requires this parameter.
This is my stored procedure.
I get this error at the stored procedure:
{
"errorCode": "2402",
"message": "Execution fail against sql server. Sql error number: 13609. Error Message: JSON text is not properly formatted. Unexpected character 'S' is found at position 0.",
"failureType": "UserError",
"target": "Stored procedure1",
"details": []
}
But when I check the input, it seems like it already successfully reading the json string itemName.
But, when I check output, it's not there.
Actually, you may could using Get metadata output json as the parameter and then call the stored procedure: Get metedata-->Stored Procedure!
You just need focus on the coding of the stored procedure.
Get Metadata output childitems:
{
"childItems": [
{
"name": "DeploymentFiles.zip",
"type": "File"
},
{
"name": "geodatalake.pdf",
"type": "File"
},
{
"name": "test2.xlsx",
"type": "File"
},
{
"name": "word.csv",
"type": "File"
}
}
Stored Procedure:
#activity('Get Metadata1').output.childitems
About how to create the stored procedure(get data from json object), you could ref this blog: Retrieve JSON Data from SQL Server using a Stored Procedure.

Rails/Slim auto-encoding Postgres geometric types

I have Google Maps polygons stored as polygons in Postgres and I read them straight from DB to output to a react Component for editing using the Google Maps API.
In my local dev environment this works fine and by inspecting the data being fed to the React component everything looks normal:
this.state = {
map: "POLYGON ((10.69332405332034 59.88086121809927, 10.77572151425784 59.84569766552776, 10.81554695371096 59.84121336506844, 10.8450727105469 59.84518027707294, 10.86910530332034 59.85397478713949, 10.91442390683596 59.88499566305687, 11.020510637793 59.9383527020427, 10.99115654233401 59.96809210273585, 10.91811462644046 59.99462872670429, 10.80250068906253 60.0067306049673, 10.58723732236331 59.97273110496651, 10.43772026303714 59.86724837030302, 10.44239803555911 59.85643166134471, 10.44501587155..."
}
But in production it seems some kind of compression/encoding is taking effect, rendering the data unusable to Google Maps:
this.state = {
map
}
Background/environment
We recently had to take a server out of service and in its place we added two new ones to the load balancer. They were set up through Cloud 66 using the same config so they should be exaclty the same, but I guess you never know.
We use slim syntax for templates.
I should clarify: Nothing is being done explicitly by our application code to the map field on its way from Postgres to the React component. We get the database record like so: #coverage_map = CoverageMap.find(params[:id]) and then output it in the template like so: coverageMap: #coverage_map. The outputted data on display here is copied from the HTML template being rendered by Slim.
What could be happening here? Any tips on what to look for?
In your dev environment you're retrieving the geometry from the database as WKT (Well Known Text), which is not PostgreSQL's standard output. In production you're getting a WKB (Well Known Binary) representation of the geometry, which is what you normally see when firing a simple select. What you need is to use ST_AsText to get your WKT, e.g.
WITH mytable (geom) AS (
VALUES ('POINT(1 1)'::geometry)
)
SELECT geom,ST_AsText(geom) FROM mytable;
geom | st_astext
--------------------------------------------+------------
0101000000000000000000F03F000000000000F03F | POINT(1 1)
(1 Zeile)

File object from URL

I'd like to create a file object from an image located at a specific url. I'm downloading the file with Net Http:
img = Net::HTTP.get_response(URI.parse('https://prium-solutions.com/wp-content/uploads/2016/11/rails-1.png'))
file = File.read(img.body)
However, I get ArgumentError: string contains null byte when trying to read the file and store in into the file variable.
How can I do this without having to store it locally ?
Since File deals with reading from storage, it's really not applicable here. The read method is expecting you to hand it a location to read from, and you're passing in binary data.
If you have a situation where you need to interface with a library that expects an object that is streaming, you can wrap the string body in a StringIO object:
file = StringIO.new(img)
# you can now call file.read, file.seek, file.rewind, etc.

Core Data Binary Data Allow External Storage

I am storage image into Core Data as an external storage as shown in the screenshot below:
The problem is when I retrieve the couponImage property I never get nil even if I don't save any couponImage. The coupon Image is of type NSData?. When I print it on the console it prints the following:
External Resource path = nil // This is where there is no image
External Resource path = EDCS-EDAS-23eD-EDRF-EWQ3-234F
My question is how do you differentiate between an image which does not exist vs image which does exist.

Load or Stress Testing Tool with URL Import Functionality

Can someone recommend a load testing tool which allows you to either:
a. replay an IIS (7) log(s) to simulate a real live site daily run;
b. import a CSV or equivalent list of URLS so we can achieve a similar thing as above but at a URL level;
c. .net API so I can create simple tests easily from my list of URLS is also a good way to go.
I do not really want to record my tests.
I think I can do B) with WAPT but need to create an XML file manually, not too much grief, but wondering if any tools cover these scenarios out the box.
Visual Studio Test Edition would require some code to parse the file into a suitable test run.
It is a great load testing solution.
Our load testing service lets you write a very simple script using JavaScript to pull data out of a CSV file and then fetch those URLs. For example, the following code would pluck 10 random URLs from the CSV file and fetch them as part of a single session:
var c = browserMob.openHttpClient();
var csv = browserMob.getCSV("urls.csv");
browserMob.beginTransaction();
for (var i = 0; i < 10; i++) {
browserMob.beginStep("Step 1");
var url = csv.random().get("url");
c.get(url);
browserMob.endStep();
}
browserMob.endTransaction();
The CSV file itself needs to be a normal CSV file with the first row containing a header named "url". This script would be run repeatedly for each virtual user participating in a load test.
We have support for so called 'uri-format' in our open-source tool called Yandex.Tank You simply put all your uris to a file, one uri -- one line, then specify headers in your load.ini like this:
[phantom]
address=example.org
rps_schedule=line(1, 1600, 2m)
headers = [Host: mts-maps.yandex.ru]
[Connection: close] [Bloody: yes]
ammo_file = ammo.uri
ammo.uri:
/
/index.html
/1/example.html
/2/example.html

Resources