Does the KSQL Editor in the UI, query all the KSQL Threads Store of an application at once, or just one of them ? How about the KSQL CLI ?
Related
I'm investigating the feasibility of using ksql db to filter out empty struct fields in kafka records in avro format then dump records to s3 in parquet format - removing all empty records in avro required in transforming to parquet.
Ksql can handle the filtering part in a very straightforward way. My questions is how I can connect ksql to minio.
Using kafka connector would require ksql db to write back to kafka under a new topic?
I also heard ksql has built-in connectors, not sure how it works.
In my project I have PostgreSQL database as main DB and I need to keep synchronized my neo4j DB. To do so I want to use Debezium for CDC, kafka and neo4j streams plugin. One of the reasons I prefer Debezium to jdbc is because it's real time. So at this point I want to get
PostgreSQL -> Debezium -> Kafka -> Confluent -> Neo4j
from documentation I found sink Neo4j CDC but only from another Neo4j DB.
Sink ingestion strategies
Change Data Capture Event
This method allows to ingest CDC events coming from another Neo4j Instance.
Why only from another Neo4j instance? I am confused because I don't understand how exactly I should implement Change data Capture from PostgreSQL to neo4j.
I want to insert/update the document in Couchbase from their it should be automatically inserted/updated to neo4j database. Is their any plugin or software to do the same? How can I achieve this functionality?
Couchbase enterprise version: 6.6
Neo4j enterprise version: 4.1.3
I read this blog https://dzone.com/articles/couchbase-amp-jdbc-integrations-for-neo4j-3x but I am not getting clarity over Neo4jJSON Loader, please guide me for the same.
You could also use the Couchbase Eventing Service which will respond to any mutation and trigger a fragment of JavaScript code. Refer to https://docs.couchbase.com/server/current/eventing/eventing-overview.html
Now you would probably want to utilize something similar to the code in this scriptlet example: https://docs.couchbase.com/server/current/eventing/eventing-handler-curl-post.html provided that the Neo4j REST API has a sub 1 ms performance and honors KeepAlive a 12 physical core system could stream about 40K inserts (or updates) per second from Couchbase to your Neo4j instance.
You can use the Couchbase Kafka connector to send CDC events to Kafka.
https://docs.couchbase.com/kafka-connector/current/quickstart.html
From there, you can read the kafka topics in order to import the data into Neo4j :
https://github.com/neo4j-contrib/neo4j-streams
is there a way to specify the number of Threads that a KSQL query running on a KSQL Server should consume ? Is other words the parallelism of the query.
Is there any limit to the number of application that can be run on a KSQL Server ? When or how to decide to Scale out ?
Yes, you can specify ksql-streams-num-streams-threads property. You can read more about it here.
Now, this is the number of KSQL Streams threads where stream processing occurs for that particular KSQL instance. It's important for vertical scaling because you might have enough computation resources in your machine to handle more threads and therefore you can do more work processing your streams on that specific machine.
If you have the capacity (i.e: CPU Cores), then you should have more threads so more Stream tasks can be scheduled on that instance and therefor having additional parallelization capacity on your KSQL Instance or Cluster (if you have more than one instance).
What you must understand with Kafka, Kafka Streams and KSQL is that horizontal scaling occurs with two main concepts:
Kafka Streams applications (such as KSQL) can paralelize work based
on the number of kafka topic partitions. If you have 3 partitions
and you launch 4 KSQL Instances (i.e: on different servers), then one of them will not be doing work on a Stream you create on top of that topic. If you have the
same topic with 3 partitions and you have only 1 KSQL Server, he'll
be doing all of the work for the 3 partitions.
When you add a new instance of your application Kafka Stream Application (in your case KSQL) and it joins your cluster processing your KSQL Streams and Tables, this specific instance will join the consumer groups consuming for
those topics and immediately start sharing the load with the other
instances as long as there are available partitions that other instances can offload (triggering a consumer group rebalance). The same happens if you take a instance down... the other instances will pick up the slack and start processing the partition(s) the retired instance was processing.
When comparing to vertical scaling (i.e: adding more capacity and threads to a KSQL instance), horizontal scaling does the same by adding the same computational resources to a different instance of the application on a different machine. You can understand the Kafka Stream Application Threading Model (with one or more application instances, on one or more machines) here:
I tried to simplify it, but you can read more of it on the KSQL Capacity Planning page and Confluent Kafka Streams Elastic Scale Blog Post
The important aspects of the scale-out / scale-in lifecycle of Kafka Streams (and KSQL) applications can be better understood like this:
1. A single instance working on 4 different partitions
2. Three instances working on 4 different partitions (one of them is
working on 2 different partitions)
3. An instances just left the group, now two instances are working on 4
different partitions, perfectly balanced (2 partitions for each)
(Images from confluent blog)
As per my information, Kapacitor can work on streams or batches. In case of batches, it fetches data from Influxdb and operate on that.
But how does it work with stream. Does it subscribe to InfluxDB or Telegraph. I hope it subscribe to InfluxDB. So in case any client write data to InfluxDb, Kapacitor also receive that data. Is this understanding correct? Or it subscribe directly to Telegraph?
Why this question is important to us is because we want to use Azure IoT hub in place of Telegraph. So, we will read the data from Azure IoT hub and write it to InfluxDb. We hope that we can use Kapacitor Stream here.
Thanks In Advance
Kapacitor subscribes to influxDB. By default if we do not mention what databases to subscribe to, it subscribes to all of them. In Kapacitor's config file you can mention the databases in InfluxDB which you want to subscribe to.