I have a requirement for the final output text delimited document to contain only dates, however the json which I am reading as the Source in my ADF Copy Activity has the datetime fields as "hireDate": "1988-03-28T00:00:00+00:00", for example; I need these to be "1988-03-28". Any suggestions on doing this in the ADF Mapping (and we don't have Data Flow Mapping in the government Azure Cloud).
Thank you
Mike Kiser
No choice, we need an intermediate process, eg: We can copy the json file into a Azure SQL table, then copy from Azure SQL into a txt file.
Create a table and set the column type to date.
create table dbo.data1(
hireDate date
)
Copy into the table. It automatically casts the type from datetime to date .
The debug result is as follows:
Related
How do I write a Python scrip to go to a folder in another directory and from that folder going through a list of text files. Within these text files I need to retrieve the date time of a certain row which has a certain key word and store these data for further analysis.
I have a ~1 GB text file with 153 separate fields. I uploaded the file to GCS and then created a new table in BQ with file format as "CSV". For table type, I selected "native table". For schema, I elected to auto-detect. For the field delimiter, I selected "tab". Upon running the job, I received the following error:
Could not parse '15229-1910' as INT64 for field int64_field_19 (position 19) starting at location 318092352 with message 'Unable to parse'
The error is originating out of a "zip code plus 4" field. My question is if there is a way to prevent the field from parsing this value or if there's a way to omit these parse errors altogether so that the job can complete? From GCP's documentation, they advise "If BigQuery doesn't recognize the format, it loads the column as a string data type. In that case, you might need to preprocess the source data before loading it". The "zip code plus four" field in my file is already assigned as a string field type, therefore, I'm not quite sure where to go from here. Being that I selected the delimiter as "tab", does that indicate that the "zip code plus for" value contains a tab character?
BigQuery uses auto-detect schema to detect the schema of a table while loading data into the BigQuery. As per the sample data provided by you, pincode will be considered as string value by BigQuery due to the presence of dash”-” in between the integer values. If you want to provide schema, you can avoid using auto-detect and give schema manually.
As stated in the comment, you can try this to upload your 1 GB text file into Bigquery by following the steps :
As mentioned by you in the question assuming your data is in the CSV format. From the given sample data, I have mocked the data in excel sheet.
Excel Sheet
Save the file in .tsv format.
You can upload the file into BigQuery using auto-detect schema and setting tab as delimiter. It will automatically detect all the field types without any error as can be seen in the table in BigQuery in the screenshot.
BigQuery Table
New to Azure data factory.
We have .CSV file, we need to use ODATA and load to Snowflake table.
I created 2 datasets, 2 linked services and 1 1 pipeline.
I loaded the data to Snowflake table with screenshot attached
The issue which iam facing now is, one comment column is there. DATE format
"DD/MM/YYYY."
The date column is coming as NULL in output.
When i view the data in source, the date value able to see, but in Derived expression builder, the date value is NULL
The flow is
Odata - > Blob Storage (JSON)
JSON -> Derived -> Target
Any help would be appreciated.
Thanks,
I have Azure Data Factory Pipeline that has a Copy Data activity with Stored Procedure Sink. The SP takes as an input a table type parameter. Everything works fine so far. But now that SP has changed and I need to add another parameter that should be Max of one of the columns of the Source for my Copy Data activity. I cannot do this inside that SP since it is re-used by other components and takes it as input. Of course I could wrap it into another SP that would calculate that Max and then call the original SP, bu I thought better way would be if I could do that directly form ADF Pipeline. So I thought I could add a new parameter in my Sink SP and somehow get that Max using dynamic content, but I can't figure out a way to reference Source of the Copy Data activity.
Lets say the Source of Copy Data has column Id and I need to pass the Max value of that column to the SP Sink. Is there a way to do something like max(#Source.Id) in the SP's parameter value field?
As far as I'm aware,it's impossible to reference the Source directly.So I think use lookup activity is an alternative to do such thing.
Hope this can help you.
I have a csv file that has a timestamp column but it shows up as a String type since I uploaded it locally to the project in Watson Studio. Can Data Refinery convert that string column into actual Timestamp type format?
You can use the Convert Type operation and select the format of timestamp that matches your data. It's not limited to a specific timestamp format...there are different formats you can choose from. For example: