Ksqldb High Availability - how to reach it - docker

I’m working with a ksqldb server deployed in Kubernetes, and since some time ago it crashed for some reason, I want to implement High Availability as described in https://docs.ksqldb.io/en/latest/operate-and-deploy/high-availability/
We are deploying the server with docker, so the properties that we put inside the config file are:
KSQL_KSQL_STREAMS_NUM_STANDBY_REPLICAS: “2”
KSQL_KSQL_QUERY_PULL_ENABLE_STANDBY_READS: “true”
KSQL_KSQL_HEARTBEAT_ENABLE: “true”
KSQL_KSQL_LAG_REPORTING_ENABLE: “true”
When doing so and restarting the server, I can see that only the first 2 properties are properly set, and I can see the last two (for example with SHOW PROPERTIES from the ksqdb CLI).
Do you have an idea about why I can’t see them?
Do I have to manually deploy a second ksqldb server with the same ksql.service.id?
If this is the case, what is the correct way to do it? Are there particular properties to be set?

Related

How to Always have just a single Instance of a Cloud Run container running

I have a NodeJS app hosted on Cloud Run.
I have set that just 1 and only 1 instance of the service should be running at any given point in time.
However, whenever I make a code change and deploys the new revision, it turns out that the previous revision is still running until after a while then it stops.
How can I make sure even though I am deploying new code changes, multiple instances should never run. The existing running instance should stop immediately I am about to deploy new changes.
Multiple instances is causing duplicate items to be produced in my code and business logic.
Thank you.
Make sure that
Minimum number of instances:0
Maximum number of instances: 1
'Serve this revision immediately' checkbox is selected.
Based on that, 100% of the traffic will be migrated to the revision, overriding all existing traffic splits, if any.

How to use a scheduler(cron) container to execute commands in other containers

I've spent a fair amount of time researching and I've not found a solution to my problem that I'm comfortable with. My app is working in a dockerized environment:
one container for the database;
one or more containers for the APP itself. Each container holds a specific version of the APP.
It's a multi-tenant application, so each client (or tenant) may be related to only one version at a time (migration should be handle per client, but that's not relevant).
The problem is I would like to have another container to handle scheduling jobs, like sending e-mails, processing some data, etc. The scheduler would then execute commands in app's containers. Projects like Ofelia offer a great promise but I would have to know the container to execute the command ahead of time. That's not possible because I need to go to the database container to discover which version the client is in, to figure it out what container the command should be executed in.
Is there a tool to help me here? Should I change the structure somehow? Any tips would be welcome.
Thanks.
So your question is you want to get the APP's version info in the database container before scheduling jobs,right?
I think this is relate to the business, not the dockerized environment,you may have ways to slove the problem:
Check the network ,make sure the network of the container can connect to each other
I think the database should support RPC function,you can use it to get the version data
You can use some RPC supported tools,like SSH

Solr7 and zookeeper behavior leading to deleted data directories, how to research/prevent

During testing, I came across the following situation:
I had set up 3 VMs, all Ubuntu 18.04.
The first 2 machines had a solr7 instance. All 3 machines had a zookeeper. All of these are in Docker containers, the entire config deployed via Ansible.
Solr 7.5, Zookeeper 3.14.3
There's a frontend that acts as interface to insert stuff.
The zookeeper machines were set up to create an ensemble, which they properly did. They all had their id, a leader was elected, solr7 instances could connect and received their settings properly.
Inserting a bunch of data all worked fine.
Then I took down 2 of the VMs, leaving 1 with both a solr7 and zookeeper and redeployed the new config, without a zookeeper ensemble.
This did not work, the interface refused to come up, it all took too long so I decided to go back to 3 VMs.
While I could once again connect, I noticed all data was gone.
Even worse, when looking at the location of the solr data directories, those were all gone. Every single collection/core was gone.
I've been trying to google this issue, but there seems to be no documentation of anything like this.
My current working theory is that solr started and asked the zookeeper ensemble for its configuration. Zookeeper either was not in sync or lost its settings and sent an empty reply or did not reply at all. To which solr decided to remove the existing data folders, as the received config specified nothing/not receiving a config at all.
That's just guesswork though. I'm at a complete loss even finding information about this
I'm not even sure what to search for. All results I get are "how to delete solr cores" or "how to remove collections".
Any help or pointing in the right direction would be appreciated.
EDIT: After talking about it on the solr mailing list, a ticket was made for this: https://issues.apache.org/jira/browse/SOLR-13396
After asking about it on the solr mailing list, a bug ticket was made: https://issues.apache.org/jira/browse/SOLR-13396
So answering my own question so this can be closed.

Second and Third Distributed Kafka Connector workers failing to work correctly

With a Kafka cluster of 3 and a Zookeeper cluster of the same I brought up one distributed connector node. This node ran successfully with a single task. I then brought up a second connector, this seemed to run as some of the code in the task definitely ran. However it then didn't seem to stay alive (though with no errors thrown, the not staying alive was observed by a lack of expected activity, while the first connector continued to function correctly). When I call the URL http://localhost:8083/connectors/mqtt/tasks, on each connector node, it tells me the connector has one task. I would expect this to be two tasks, one for each node/worker. (Currently the worker configuration says tasks.max = 1 but I've also tried setting it to 3.
When I try and bring up a third connector, I get the error:
"POST /connectors HTTP/1.1" 500 90 5
(org.apache.kafka.connect.runtime.rest.RestServer:60)
ERROR IO error forwarding REST request:
(org.apache.kafka.connect.runtime.rest.RestServer:241)
java.net.ConnectException: Connection refused
Trying to call the connector POST method again from the shell returns the error:
{"error_code":500,"message":"IO Error trying to forward REST request:
Connection refused"}
I also tried upgrading to Apache Kafka 0.10.1.1 that was released today. I'm still seeing the problems. The connectors are each running on isolated Docker containers defined by a single image. They should be identical.
The problem could be that I'm trying to run the POST request to http://localhost:8083/connectors on each worker, when I only need to run it once on a single worker and then the tasks for that connector will automatically distribute to the other workers. If this is the case, how do I get the tasks to distribute? I currently have the max set to three, but only one appears to be running on a single worker.
Update
I ultimately got things running using essentially the same approach that Yuri suggested. I gave each worker a unique group ID, then gave each connector task the same name. This allowed the three connectors and their single tasks to share a single offset, so that in the case of sink connectors the messages they consumed from Kafka were not duplicated. They are basically running as standalone connectors since the workers have different group ids and thus won't communicate with each other.
If the connector workers have the same group ID, you can't add more than one connector with the same name. If you give the connectors different names, they will have different offsets and consume duplicate messages. If you have three workers in the same group, one connector and three tasks, you would theoretically have an ideal situation where the tasks share an offset and the workers make sure the tasks are always running and well distributed (with each task consuming a unique set of partitions). In practice the connector framework doesn't create more than one task, even with tasks.max set to 3 and when the topic tasks are consuming has 25 partitions.
If anyone knows why I'm seeing this behaviour, please let me know.
I've encountered with similar issue in the same situation as yours.
Task.max is configured for a topic and distributed workers automatically decide what nodes handle topic. So, if you have 3 workers in a cluster and your topic configuration says task.max=2 then only 2 of 3 workers will process the topic. In theory, if one of workers fails, 3rd should pick up workload. But..
The distributed connector turned out to be very unreliable: once you add\remove some nodes, the cluster broke down and all workers did nothing but tried to choose leader and failed. The only way to fix was to restart whole cluster and preferably all workers simultaneously.
I chose another way - I used standalone worker and it works like a charm to me because distribution of load is implemented on Kafka client level and once some worker dropped, the cluster re-balances automatically and clients connected to unoccupied topics.
PS. Maybe it will be useful for you too. Confluent connector is not tolerate to invalid payload that does not match topic's schema. Once the connector get some invalid message it silently dies. The only way to find out is to analyze metrics.
I'm posting an answer to an old question, since Kafka Connect has moved on a lot in three years.
In the latest version (2.3.1) there is incremental rebalancing which massively improves the behaviour of Kafka Connect.
It's also worth noting that when configuring Kafka Connect rest.advertised.host.name must be set correctly, as if it's not you will see errors including the one quoted
{"error_code":500,"message":"IO Error trying to forward REST request: Connection refused"}
See this post for more details.

Error in docs about neo4j cluster joining?

I'm trying to understand how neo4j cluster creation/joining works as it is not behaving properly in our application.
So I'm starting from scratch and creating a 3 box cluster as per the tutorial: http://neo4j.com/docs/2.3.4/ha-setup-tutorial.html
The following note is copy/pasted from the tutorial:
Startup Time When running in HA mode, the startup script returns
immediately instead of waiting for the server to become available.
This is because the instance does not accept any requests until a
cluster has been formed. In the example above this happens when you
start the second instance. To keep track of the startup state you can
follow the messages in console.log — the path is printed before the
startup script returns.
However when I startup the second instance, my cluster is still not formed... I need to startup the 3rd one for the cluster to start.
Is this an error in the neo4j docs?
Furthermore, is there a way to "force" an instance to become a master on cluster startup? For example, if I have 3 nodes and 2 of them fail and need to be re-installed, when I restart the cluster, how can I force the one with the valid database to become master? Isn't there a chance the 2nd or 3rd one with a blank database would become master?
When you start a cluster for the first time, or stop all instances and then start them again, the initial cluster MUST consist of all members listed in ha.initial_hosts. In addition, all instances in the cluster should have the exact same entries in ha.initial_hosts for the cluster to come up quickly and cleanly. The cluster will not form until all of the instances are up and running.

Resources