Dataflow GroupByKey transform splits input rows - google-cloud-dataflow

I run a data flow job to read data from files stored in GCS, each record has an "event type", my goal is to split the data per "event type" and write each output to a bq table, now I'm using a filter to do this, however I'd like to try GroupByKey transform which hopefully can make the process dynamic as new Event Types will flow in over time which can't be predicted at the development time. So now my challenge is, I don't know if its possible to construct a WRITE transform per each KEY(the key from output GroupByKey)? It would be ideal if its doable, or any other ways can achieve this, any advice would be appreciated

You don't need to write a transform for each value of event type; you just need to write a transform that can handle all values for event type.
A GroupByKey will produce a PCollection<KV<EventType, Iterable<ValueType>>. So each record of this PCollection is a key value pair. The key is an EventType and the value is an iterable of values with this key type. You can then apply a transform which converts each of these keys into a TableRow representing the row you want to create in BigQuery. You can do this by defining a:
ParDo<KV<EventType, Iterable<ValueType>>, TableRow>
For example, if your EventType is a string and your ValueType is a string then you might emit a row with two columns for each key value pair. The first column might just be a string corresponding to the EventType and the second column could be a comma separated list of the values.

Related

Dataflow SQL - unsupported column type NUMERIC

I'm trying to set up a Dataflow-SQL job to run a query in BigQUery and publish the results to a PubSub topic. I'm not using a Dataflow template, I'm using the GCP's Dataflow SQL UI to write a query and configure the output - i.e. PubSub Topic.
The table I'm querying contains String, Date, Timestamp, and Numeric types.
Even if I don't select the column with 'Numeric' data type, I still get a validation error in the editor - unsupported column type NUMERIC.
Is there a way to get around this in Dataflow SQL? Or the source table just can't have columns of Numeric Type?
Numeric types in Dataflow SQL are INT64 and FLOAT64 (8 bytes) but not NUMERIC (16 bytes).
I reproduced your issue on my end and it certainly looks like the table cannot be loaded in the first place, even if you are not selecting the NUMERIC column.

How to display geometry datafield as Text

I'm using DELPHI with ADO and SQL Server 2014.
In our database table there is a spatial column for geometrical data. We can read and write data to this field (more info is here : https://learn.microsoft.com/de-de/sql/relational-databases/spatial/spatial-data-sql-server).
If I display this table using a TDBGRID component I got only (BLOB) shown for the content of this column in my table.
Now I want to see the content of this column. Is the any good coding to show the content of this column e.g. in a dbmemo as text.
The only solution I know is to read the field as text into a string and put this to a normal memo, I'm looking forward to get a more efficient method to access this data
You can query e.g. for Well-known text format by using SQL function like STAsText:
SELECT MyColumn.STAsText() FROM MyTable
An alternative would be fetching your data in Well-known binary data stream with parsing it on the client side to represent as text by yourself (the format is described). For fetching such stream you'd use STAsBinary function:
SELECT MyColumn.STAsBinary() FROM MyTable
Yet another option would be fetching raw geometry data as they are stored in database (as you do right now) and parse it by yourself. The format is described in the [MS-SSCLRT] document. But if I were you I would better write parser for the WKB format and fetch data in WKB format because it's quite established universal format, whilst SQL Server internal formats may change frequently.
In case your geometry includes Z and / or M values it is better to call .ToString () method.
SELECT MyColumn.ToString () FROM MyTable
The output includes Z and M values in addition to X,Y Coordinates. The .STAsText() method returns only the X,Y coordinates of your shape.

Ant Design: How to transform field value?

I noticed that in Ant Design Form, it's possible to set 'transform' for a specific validation rule, which will transform field value before the validation. But I'm wondering whether there's a way to apply a transform to field value when getting a response from getFieldValue or validateFields.
Here is an actual user case. I have a field that accepts multiple email addresses separated by comma, e.g, "foo#example.com, bar#example.com". I've written a transform function which will transform string value of the field to ["foo#example.com", "bar#example.com"]. I'd like to the field value to be the transformed email array instead of raw string when calling getFieldValue. And in values for validateFields callback, I'd like the field value to the transformed array as well.
Is there an easy way to do that?

Simple way to analyze data based on common key

What would be the simplest way to process all the records that were mapped to a specific key and output multiple records for that data.
For example (a synthetic example), assuming my key is a date and the values are intra-day timestamps with measured temperatures. I'd like to classify the temperatures into high/average/low within the day (again, below/above 1 stddev from average).
The output would be the original temperatures with their new classifications.
Using Combine.PerKey(CombineFn) allows only one output per key using the #extractOutput() method.
Thanks
CombineFns are restricted to a single output value because that allows the system to do additional parallelization: combining different subsets of the values separately, and then combining their intermediate results in an arbitrary tree reduction pattern, until a single result value is produced for each key.
If your values per key don't fit in memory (so you can't use the GroupByKey-ParDo pattern that Jeremy suggests) but the computed statistics do fit in memory, you could also do something like this:
(1) Use Combine.perKey() to calculate the stats per day
(2) Use View.asIterable() to convert those into PCollectionViews.
(3) Reprocess the original input with a ParDo that takes the statistics as side inputs
(4) In that ParDo's DoFn, have startBundle() take the side inputs and build up an in-memory data structure mapping days to statistics that can be used to do lookups in processElement.
Why not use a GroupByKey operation followed by a ParDo? The GroupBy would group all the values with a given key. Applying a ParDo then allows you to process all the values with a given key. Using a ParDo you can output multiple values for a given key.
In your temperature example, the output of the GroupByKey would be a PCollection of KV<Integer, Iterable<Float>> (I'm assuming you use an Integer to represent the Day and Float for the temperature). You could then apply a ParDo to process each of these KV's. For each KV you could iterate over the Float's representing the temperature and compute the hi/average/low temperatures. You could then classify each temperature reading using those stats and output a record representing the classification. This assumes the number of measurements for each Day is small enough as to easily fit in memory.

Execute a stored procedure inside textbox to get a localized text

Is it possible to execute a stored procedure inside a textbox? We need this to localize our report.
For example, we have a stored procedure which returns the localized text for a given Key and a given LanguageId. I want to execute this stored procedure for every Label (Textbox) with a different key inside my report.
We are using SSRS 2008.
I think you've got things a little mixed up, you can't "execute a sproc inside a textbox".
What you can do instead, is create a dataset that gets all required Key/Value pairs for your current language, something like this:
EXEC usp_GetReportLabels 'en-US'
/* Returns:
Key Val
--------- ------------
lbl1 Firstname
lbl2 Surname
etc etc
*/
On your textboxes you can use an expression utilizing the Lookup Function to retrieve the correct row from that dataset, and display the label value.
Note: You mention ssrs-2008 but not the ssrs-2008-r2 edition, I don't think the Lookup function is available in plain-2008. In that case you'll need to restructure your dataset(s) a bit to get the same effect. One solution would be to PIVOT the dataset and make the Keys into columns (the dataset will only contain one row in that case, so you can do First(Fields!lbl1.Value)). Bit of a workaround though.

Resources