KSQL STRUCT Fields For TIMESTAMP

KSQL STRUCT Fields For TIMESTAMP - ksqldb

is there are a way to make this work
JSON DATA
"Header": {
"StoreID": 10225,
"BusinessDate": "2019-05-03",
"PeriodBusinessDate": "2019-05-03",
"ProcessMode": "Partial"
}
I try this but is give me :
No column with the provided timestamp column name in the WITH clause, HEADER->BUSINESSDATE, exists in the defined schema.
CREATE STREAM test2 (HEADER STRUCT<StoreID int,BusinessDate VARCHAR>) WITH (KAFKA_TOPIC='hermes__output__tfrema__v1',VALUE_FORMAT='JSON',
timestamp='HEADER->BusinessDate',timestamp_format='yyyy-MMM-dd');

You can't use nested fields in the TIMESTAMP parameter. You'd need to extract it first and then use it. For example:
CREATE STREAM X (COL1 INT, COL2 VARCHAR, HEADER STRUCT<StoreID int,BusinessDate VARCHAR>)
WITH (KAFKA_TOPIC='hermes__output__tfrema__v1',VALUE_FORMAT='JSON')
CREATE STREAM Y AS
SELECT COL1, COL2, HEADER->BusinessDate AS BusinessDate, HEADER
FROM X;
CREATE STREAM Z COL1 INT, COL2 VARCHAR, BusinessDate VARCHAR, HEADER STRUCT<StoreID int,BusinessDate VARCHAR>)
WITH (KAFKA_TOPIC='Y',VALUE_FORMAT='JSON',timestamp='BusinessDate',timestamp_format='yyyy-MMM-dd');)
If you're using Avro you can simplify things because the schema wouldn't need restating:
CREATE STREAM X (COL1 INT, COL2 VARCHAR, HEADER STRUCT<StoreID int,BusinessDate VARCHAR>)
WITH (KAFKA_TOPIC='hermes__output__tfrema__v1',VALUE_FORMAT='JSON')
CREATE STREAM Y WITH (VALUE_FORMAT='AVRO')
AS SELECT COL1, COL2, HEADER->BusinessDate, HEADER FROM X;
CREATE STREAM Z
WITH (KAFKA_TOPIC='Y',VALUE_FORMAT='JSON',timestamp='BusinessDate',timestamp_format='yyyy-MMM-dd');)

Related

Using Arithmetic Operators for null values in GSheets Query Function

Looking to aggregate a total in a query function pivot result. However, where there are null values in Col2 and Col3 after I pivot, the total value results in null. How can I substitute null values in the query function for zero values to allow the arithmetic operator to tally the correct result in the pivoted result?
=Query(QUERY(sampledata,"select D, COUNT(C) where A = 'Supplied' AND M = 'Recommended' group by D pivot B order by D"),"Select Col1,Col2,Col3,Col3+Col2,(Col2/(Col3+Col2)) label Col3+Col2 'Total', (Col2/(Col3+Col2)) 'Rate' format Col1 'dd-mmm-yyyy', Col2 '#,##0', Col3 '#,##0', Col3+Col2 '#,##0', (Col2/(Col3+Col2)) '#,##0.0%'")
Attempted to use normal SQL functions like ISNULL and COALESCE
COALESCE(Col3, 0)
ISNUL (Col2, 0)
However, these don't work in GSheets.

Modify your formula like this
=ArrayFormula(Query(LAMBDA(q, IF(q="",q*1,q))(QUERY(Sheet1!A1:M,"select D, COUNT(C) where A = 'Arrived' AND M = 'Yes' group by D pivot B order by D")),"Select Col1,Col2,Col3,Col3+Col2,(Col2/(Col3+Col2)) label Col3+Col2 'Total', (Col2/(Col3+Col2)) 'Rate' format Col1 'dd-mmm-yyyy', Col2 '#,##0', Col3 '#,##0', Col3+Col2 '#,##0', (Col2/(Col3+Col2)) '#,##0.0%'"))
Replacing "Null" with 0, simplified like this.
ArrayFormula(...IF(QueryOutput="",QueryOutput*1,QueryOutput)...
Using lambda like this
=ArrayFormula(Query(LAMBDA(q, IF(q="",q*1,q))(QueryOutput)...))
q is just a lambda() name.

Transform one to many data to columns

Given this data:
How do I transform it to look like this:
There are two columns in the data source, key (title) and value (responsibility).
I need to transform it such that we have the key column (title) and then n columns where n is the highest number of value a key has, eg 3 in the picture above. Hence the columns should be:
Title, 1, 2, 3.
The values in each column 1, 2, 3 should be corresponding to values in the original data.
Any combination of formula is welcomed - I believe a combination of Transpose and/or Query (pivot) is appropriate but I cannot put it together.
In case this is too complex we can put an enumeration directly in the data source, but it would be nice to be able to have the formula work without it. Eg:
Example sheet:
https://docs.google.com/spreadsheets/d/1InYZ12VuuaSg0s3fiFTCx8BnwEan5JsqpsNBF973lWc/edit?usp=sharing

try:
=QUERY({A:C},
"select Col1,max(Col3)
where Col1 is not null
group by Col1
pivot Col2", 1)
or:
=ARRAYFORMULA(QUERY({A:A, COUNTIFS(A:A, A:A, ROW(A:A), "<="&ROW(A:A)), B:B},
"select Col1,max(Col3)
where Col1 is not null
group by Col1
pivot Col2", 1))

Google query multiple columns references

I have this query:
=QUERY(all!A:Z, "select B where (B=1 and H=true)")
How can I turn B and H into a column reference so I don't have to write them when I copy the query to new cells.
Note: column B is a number, meanwhile H contains boolean values.

try it like this:
=QUERY({all!A:Z}, "select Col2 where (Col2=1 and Col8=true)")

Group the data in one column per the values in another column

I have data something as below
email id subject of interest
ramesh#axito.com Java,C++
mnp#axito.com VB
ramesh#axito.com Python
mohan#axito.com Java,C++
mnp#axito.com JS
rohan#axito.com C#
But I need it in the format as below-
email id subject of interest
ramesh#axito.com Java,C++,Python
mnp#axito.com VB,JS
mohan#axito.com Java,C++
rohan#axito.com C#
Can someone please tell me how can I do this?

First, create the list of unique email addresses with =unique(A2:A). Suppose this is done in column C.
Then in cell D2, enter =join(",", filter(B$2:B, A$2:A=C2)) and drag this formula down columd D.
Explanation: filter keeps only the entries from column B with matching email; join joins them into a comma-separated list.

Try using query function:
=QUERY({A:B,A:B},"select Col1, Count(Col2) where Col1 <> '' group by Col1 pivot Col4")
Also try this formula, this is single formula solution:
={UNIQUE(FILTER(A2:A,A2:A>0)),TRANSPOSE(
SPLIT(
", "&join(", ",
ARRAYFORMULA(
if(query(A:B,"select A where not A is null order by A",0)=
query(A:B,"select A where not A is null order by A limit "&COUNT(query(A:B,"select A where not A is null",0))-1,1),"","|")
& query(A:B,"select B where not A is null order by A",0)
& " "
)
)
,", |",0)
)}

How do I find the second MODE in google spreadhseet

I have a list of numbers and I am using =MODE to find the number which appears most often, my question is how do I find the second most often occurring number in the same list?

In Google Sheets, you can use the QUERY function to retrieve this sort of information (and much more) quite easily. Assuming your data is numerical values only in column A with no header:
=QUERY({A:A,A:A},"select Col1, count(Col2) where Col1 is not null group by Col1 order by count(Col2) desc",0)
will return a list of the items in column A, and their associated frequencies, sorted from highest to lowest. Note: if column A contains text strings, you need to use where Col1 != '' rather than where Col1 is not null.
Now you can use INDEX to retrieve the exact value you require; so to retrieve the second most frequent value, you need the third value in the first column (as QUERY will populate a header row in the output):
=INDEX(QUERY({A:A,A:A},"select Col1, count(Col2) where Col1 is not null group by Col1 order by count(Col2) desc",0),3,1)

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

KSQL STRUCT Fields For TIMESTAMP - ksqldb

Related

Using Arithmetic Operators for null values in GSheets Query Function

Transform one to many data to columns

Google query multiple columns references

Group the data in one column per the values in another column

How do I find the second MODE in google spreadhseet

Categories

Resources