List subjects for all the messages stored in NATS Stream (Jet Stream) - nats-streaming-server

List subjects for all the messages stored in NATS Stream (Jet Stream).
Lets say if we have 100 Messages in the Jet Stream under two subjects S1.S2.S3 & S1.S4.
Some thing like SELECT DISTINCT Subjects FROM Stream where No.of.Messages > 0
should yield: S1.S2.S3, S1.S4
This only for debugging/troubleshooting only.

Related

Does anybody have a KSQL query that counts event in a topic on a per hour basis?

I am new to KSQL and I am trying to get the counts of the events in a topic group per hour. If not I would settle for counting the events in the topic. Then I could change the query to work in a windowing basis. The timestamp is the
To give more context let's assume my topic is called messenger and the events are in JSON format. And here is a sample message:
{"name":"Won","message":"This message is from Won","ets":1642703358124}
Partition:0 Offset:69 Timestamp:1642703359427
First create a stream over the topic:
CREATE STREAM my_stream (NAME VARCHAR, MESSAGE VARCHAR)
WITH (KAFKA_TOPIC='my_topic', FORMAT='JSON');
Then use a TUMBLING window aggregation and a dummy GROUP BY field:
SELECT TIMESTAMPTOSTRING(WINDOWSTART,'yyyy-MM-dd HH:mm:ss','Europe/London')
AS WINDOW_START_TS,
COUNT(*) AS RECORD_CT
FROM my_stream
WINDOW TUMBLING (SIZE 1 HOURS)
GROUP BY 1
EMIT CHANGES;
If you want to override the timestamp being picked up from the message timestamp and use a custom timestamp field (I can see ets in your sample) you would do that in the stream definition:
CREATE STREAM my_stream (NAME VARCHAR, MESSAGE VARCHAR, ETS BIGINT)
WITH (KAFKA_TOPIC='my_topic', FORMAT='JSON', TIMESTAMP='ets');
Ref: https://rmoff.net/2020/09/08/counting-the-number-of-messages-in-a-kafka-topic/

Stream analytics getting average for 1 year from history

I have Stream Analytics job with
INPUTS:
1) "InputStreamCSV" - linked to Event hub and recievies data . InputStreamHistory
2) "InputStreamHistory" - Input stream linked BlobStorage. InputStreamCSV
OUTPUTS:
1) "AlertOUT" - linked to table storage and inserts alarm event as row in table
I want to calculate AVERAGE amount for all transactions for year 2018(one number - 5,2) and compare it with transaction, that is comming in 2019:
If new transaction amount is bigger than average - put that transaction in "AlertOUT" output.
I am calculating average as :
SELECT AVG(Amount) AS TresholdAmount
FROM InputStreamHistory
group by TumblingWindow(minute, 1)
Recieving new transaction as:
SELECT * INTO AlertOUT FROM InputStreamCSV TIMESTAMP BY EventTime
How can I combine this 2 queries to be able to check if new transaction amount is bigger than average transactions amount for last year?
Please use JOIN operator in ASA sql,you could refer to below sql to try to combine the 2 query sql.
WITH
t2 AS
(
SELECT AVG(Amount) AS TresholdAmount
FROM jsoninput2
group by TumblingWindow(minute, 1)
)
select t2.TresholdAmount
from jsoninput t1 TIMESTAMP BY EntryTime
JOIN t2
ON DATEDIFF(minute,t1,t2) BETWEEN 0 AND 5
where t1.Amount > t2.TresholdAmount
If the history data is stable, you also could join the history data as reference data.Please refer to official sample.
If you are comparing last year's average with current stream, it would be better to use reference data. Compute the averages for 2018 using either asa itself or a different query engine to a storage blob. After that you can use the blob as reference data in asa query - it will replace the average computation in your example.
After that you can do a reference data join with inputStreamCsv to produce alerts.
Even if you would like to update the averages once in a while, above pattern would work. Based on the refresh frequency, you can either use another asa job or a batch analytics solution.

How to join Kafka KStream to Kstream of 3 Topics

I have 3 Topics : "BEGIN", "CONTINUE" and "END"
These three topic needs to be joined in one Topic Message where i can get the Result Model that is a combination of the 3 Topic Messages.
There are many example that shows how to join 2 topics.
If anyone can give me an example or a hint of how can i make a Join or these 3 Topics.
Until the cogroup feature gets implemented, you will need to first merge your first 2 topics into an intermediary topic, and then join that one with your 3rd topic.
For an example of how to do that, see the cogroup KIP.
I depends on what kind of join you want to do. As you say, you have KStream, you would do two consecutive windowed joins:
KStream stream1 = builder.stream(...);
KStream stream2 = builder.stream(...);
KStream stream3 = builder.stream(...);
KStream joined = stream1.join(stream2, ...)
.join(stream3, ...);

Create InfluxDB Continuous Query where the measurement name is based on tag values

I have a measurement called reading where all the rows are of the form:
time channel host value
2018-03-05T05:38:41.952057914Z "1" "4176433" 3.46
2018-03-05T05:39:26.113880408Z "0" "5222355" 120.23
2018-03-05T05:39:30.013558256Z "1" "5222355" 5.66
2018-03-05T05:40:13.827140492Z "0" "4176433" 3.45
2018-03-05T05:40:17.868363704Z "1" "4176433" 3.42
where channel and host are tags.
Is there a way I can automatically generate a continuous query such that:
The CQ measurement's name is of the form host_channel
Until now I have been doing them 1 by 1, for example
CREATE CONTINUOUS QUERY 4176433_1 ON database_name
BEGIN
SELECT mean(value) INTO 4176433_1
FROM reading
WHERE host = '4176433' AND channel = '1'
GROUP BY time(1m)
END
but is there a way I can automatically get 1m sampling per host & channel any time a new host is added to the database? Thanks!
There is no way of doing this in InfluxDB, by the number of reasons. Encoding tag values in a measurements names contradicts InfluxDB official best practices and being discouraged.
I suggest you just going with:
CREATE CONTINUOUS QUERY reading_aggregator ON database_name
BEGIN
SELECT mean(value), host + '_' + channel AS host_channel
INTO mean_reading
FROM reading
GROUP BY time(1m), host, channel
END

Optimize Bigquery query: Shuffle reached broadcast limit

I'm trying to process this query.
SELECT
r.src,r.dst, ROUND(r.price/50)*50 pb,COUNT(*) results
FROM [search.interesting_routes] ovr
LEFT JOIN [search.search_results2] r ON ovr.src=r.src AND ovr.dst=r.dst
WHERE DATE(r.saved_at) >= '2015-10-1' AND DATE(r.saved_at) <= '2015-10-01' AND r.price < 20000
GROUP BY pb, r.src, r.dst
ORDER BY pb
The table search_results2 contains a huge amout of search results about prices for routes (route is defined by src and dst).
I need to count all records in search_results2 for each record in interesting_routes for different price buckets.
The query works fine on small sample of data, but once the data is huge it ends with
Error: Shuffle reached broadcast limit for table __I0 (broadcasted at
least 176120970 bytes). Consider using partitioned joins instead of
broadcast joins.
I have a difficulty to rewrite the SELECT with usage of suggested partitioned join. Or at least get the result somehow.

Resources