Dataflow SQL - unsupported column type NUMERIC - google-cloud-dataflow

I'm trying to set up a Dataflow-SQL job to run a query in BigQUery and publish the results to a PubSub topic. I'm not using a Dataflow template, I'm using the GCP's Dataflow SQL UI to write a query and configure the output - i.e. PubSub Topic.
The table I'm querying contains String, Date, Timestamp, and Numeric types.
Even if I don't select the column with 'Numeric' data type, I still get a validation error in the editor - unsupported column type NUMERIC.
Is there a way to get around this in Dataflow SQL? Or the source table just can't have columns of Numeric Type?

Numeric types in Dataflow SQL are INT64 and FLOAT64 (8 bytes) but not NUMERIC (16 bytes).
I reproduced your issue on my end and it certainly looks like the table cannot be loaded in the first place, even if you are not selecting the NUMERIC column.

Related

Synapse Serverless - Other than String_agg,for Xml to convert row to column

I'm using Synapse Serverless and I want to convert row to Column. Use STRING_AGG but due to nvarchar(8000) limitation I was getting error "STRING_AGG aggregation result exceeded the limit of 8000 bytes. Use LOB types to avoid result truncation" due to that I tried to rtecreate the Query with XML path and Stuff but Serverless wont support. Is there any workaround?
The error STRING_AGG aggregation result exceeded the limit of 8000 bytes. Use LOB types to avoid result truncation has a workaround. The STRING_AGG has a limit of 8000 bytes by default, but when it exceeds this limit, you can change the limit to nvarchar(max) or varchar(max) using CONVERT inside STRING_AGG.
Refer to the following link to know how to do the above conversion and understand more information about STRING_AGG with CONVERT.
https://www.mssqltips.com/sqlservertutorial/9371/sql-string-agg-function/
There is a relational operator called PIVOT which conventionally helps to transform rows data into columns (UNPIVOT operator is also available- does the exact opposite of what PIVOT does). The following is a syntax of PIVOT:
SELECT (ColumnNames)
FROM (TableName)
PIVOT
(
AggregateFunction(ColumnToBeAggregated)
FOR PivotColumn IN (PivotColumnValues)
) AS (Alias)
Refer to the following link to understand completely about PIVOT and refer to the second link and check if any provided method can help you achieve the requirement:
https://www.appsloveworld.com/sql-server-simple-way-to-transpose-rows-into-columns/
Efficiently convert rows to columns in sql server

How to upload tab delimited text file to Big Query when string field for column receives a parse error?

I have a ~1 GB text file with 153 separate fields. I uploaded the file to GCS and then created a new table in BQ with file format as "CSV". For table type, I selected "native table". For schema, I elected to auto-detect. For the field delimiter, I selected "tab". Upon running the job, I received the following error:
Could not parse '15229-1910' as INT64 for field int64_field_19 (position 19) starting at location 318092352 with message 'Unable to parse'
The error is originating out of a "zip code plus 4" field. My question is if there is a way to prevent the field from parsing this value or if there's a way to omit these parse errors altogether so that the job can complete? From GCP's documentation, they advise "If BigQuery doesn't recognize the format, it loads the column as a string data type. In that case, you might need to preprocess the source data before loading it". The "zip code plus four" field in my file is already assigned as a string field type, therefore, I'm not quite sure where to go from here. Being that I selected the delimiter as "tab", does that indicate that the "zip code plus for" value contains a tab character?
BigQuery uses auto-detect schema to detect the schema of a table while loading data into the BigQuery. As per the sample data provided by you, pincode will be considered as string value by BigQuery due to the presence of dash”-” in between the integer values. If you want to provide schema, you can avoid using auto-detect and give schema manually.
As stated in the comment, you can try this to upload your 1 GB text file into Bigquery by following the steps :
As mentioned by you in the question assuming your data is in the CSV format. From the given sample data, I have mocked the data in excel sheet.
Excel Sheet
Save the file in .tsv format.
You can upload the file into BigQuery using auto-detect schema and setting tab as delimiter. It will automatically detect all the field types without any error as can be seen in the table in BigQuery in the screenshot.
BigQuery Table

influxdb query group by value

I am new to influxdb and the TICK environment so maybe it is a basic question but I have not found how to do this. I have installed Influxdb 1.7.2, with Telegraph listening to a MQTT server that receives JSON data generated by different devices. I have Chronograph to visualize the data that is being recieved.
JSON data is a very simple message indicating the generating device as a string and some numeric values detected. I have created some graphs indicating the number of messages recieved in 5 minutes lapse by one of the devices.
SELECT count("devid") AS "Device" FROM "telegraf"."autogen"."mqtt_consumer" WHERE time > :dashboardTime: AND "devid"='D9BB' GROUP BY time(5m) FILL(null)
As you can see, in this query I am setting the device id by hand. I can set this query alone in a graph or combine multiple similar queries for different devices, but I am limited to previously identifying the devices to be controlled.
Is it posible to obtain the results grouped by the values contained in devid? In SQL this would mean including something like GROUP BY "devid", but I have not been able to make it work.
Any ideas?
You can use "GROUP BY devid" if devid is a tag in measurement scheme. In case of devid being the only tag the number of unique values of devid tag is the number of time series in "telegraf"."autogen"."mqtt_consumer" measurement. Typically it is not necessary to use some value both as tag and field. You can think of a set of tags in a measurement as a compound unique index(key) in conventional SQL database.

Split a KV<K,V> PCollection into multiple PCollections

Hi after performing a group by key on a KV Pcollection, I need to:-
1) Make every element in that PCollection a separate individual PCollection.
2) Insert the records in those individual PCollections into a BigQuery Table.
Basically my intention is to create a dynamic date partition in the BigQuery table.
How can I do this?
An example would really help.
For Google Dataflow to be able to perform the massive parallelisation which makes it as one of its kind (as a service on the public cloud), the job flow needs to be predefined before submitting it to on the Google cloud console. Everytime you execute the jar file that conatins your pipleline code (which includes pipeline options and the transforms), a json file with the description of the job is created and submitted to Google cloud platform. The managed service then uses this to execute your job.
For the use case mentioned in the question, it demands that the input PCollection be split into as many PCollections as their are unique dates. For the split, the Tuple Tags needed to split the collection should be created dynamically which is not possible at this time. Creating tuple tags dynamically is not allowed because that doesn't help in creating the job description json file and beats the whole design/purpose with which dataflow was built.
I can think of a couple of solutions to this problem (both having its own pros and cons) :
Solution 1 (a workaround for the exact use case in the question):
Write a dataflow transform that takes the input PCollection and for each element in the input -
1. Checks the date of the element.
2. Appends the date to a pre-defined Big Query Table Name as a decorator (in the format yyyyMMDD).
3. Makes an HTTP request to the BQ API to insert the row into the table with the table name added with a decorator.
You will have to take into consideration the cost perspective in this approach because there is single HTTP request for every element rather than a BQ load job that would have done it if we had used the BigQueryIO dataflow sdk module.
Solution 2 (best practice that should be followed in these type of use cases):
1. Run the dataflow pipeline in the streaming mode instead of batch mode.
2. Define a time window with whatever is suitable to the scenario in which it is being is used.
3. For the `PCollection` in each window, write it to a BQ table with the decorator being the date of the time window itself.
You will have to consider rearchitecting your data source to send data to dataflow in the real time but you will have a dynamically date partitioned big query table with the results of your data processing being near real time.
References -
Google Big Query Table Decorators
Google Big Query Table insert using HTTP POST request
How job description files work
Note: Please mention in the comments and I will elaborate the answer with code snippets if needed.

Crystal Reports parameterized queries

The company I work for is using MacolaES for an ERP system. The SQL Server database is structured such that when things are no longer considered active, they are moved from a set of "active" tables to a set of "history" tables. This helps to keep the "active" tables small enough that queries return quickly. On the flip side, the history tables are enormous. The appropriate columns are indexed in these tables, and if you query for something specific it returns quickly.
The problem is when you make a Crystal Report, which is prompting the user for a parameter. For reasons not known to me, Crystal parameters are not translated into SQL parameters, so you end up with queries selecting everything from the order header history table inner joined to everything in the order lines history table, which results in over 8 million rows.
Is there a way to get Crystal Reports to use the parameters in the SQL query instead of loading all the records and filtering after the fact? I read somewhere that a stored procedure should work, but I'm curious if an ordinary parameterized query is possible in the interest of saving my time.
Here is the selection formula:
(
trim({Orderheader.ord_no}) = {?Order No}
)
and
(
{Orderheader.ord_type} = 'O'
)
and
(
{orderlines.ord_type} = 'O'
)
In Crystal Reports top menu go to Report / Selection Formulas / Record... There you can add a formula similar to:
{table.field1} = {?Parameter1} and {table.field2} = {?Parameter2}
That will add the condition to the where statement of the SQL query that the report will use to pull the rows.
To verify what is the condition in the where statement that the report is using to pull the data you can go to the menu database / Show SQL Statement. That way you can verify that the report is using the parameters in the filter.
Crystal Reports 8.5 User Guide mention the following tips:
To push down record selection, you
must select “Use Indexes or Server for
Speed” in the Report Options dialog
box (available on the File menu).
In record selection formulas, avoid data
type conversions on fields that are
not parameter fields. For example,
avoid using ToText( ) to convert a
numeric database field to a string
database field.
You are able to push down some record selection formulas
that use constant expressions.
Your formula has a TRIM function on a field. The function against the field does not allow Crystal to push the formula to the database because is not a constant expression.
If you really need to trim the order number field you should do it using SQL Expressions.
References:
Check out this article.

Resources