I run two kafka streams applications separately, each one in a different JVM instance and they are working fine. Once I run the applications in the same JVM instance, the second application is not working (neither consuming nor producing data). Are there any limitations about running two separate apps in the same JVM instance? Does this also happen for kafka consumers?
It should work to put multiple KafkaStreams instances into the same JVM. But it might be easier to just increase the number of thread within one instance.
Note: the number of thread over all instances that you can utilize is limited by the number of input partitions of you input topics (this is not quite exact but a good rule of thumb). Do you have enough partitions in your input topic? Do you see that some partitions are not assigned to an instance an not processed?
Also note, Kafka Stream parallelizes per threads -- this means, if you have 2 instances with 2 threads each, and only 2 input topic partitions, it can happen that both partitions are assigned to the two thread of one instance while the two thread of the other instance are idle. During runtime, we only "see" thread and don't know if they run within the same or different KafkaStreams instances.
Related
is there a way to specify the number of Threads that a KSQL query running on a KSQL Server should consume ? Is other words the parallelism of the query.
Is there any limit to the number of application that can be run on a KSQL Server ? When or how to decide to Scale out ?
Yes, you can specify ksql-streams-num-streams-threads property. You can read more about it here.
Now, this is the number of KSQL Streams threads where stream processing occurs for that particular KSQL instance. It's important for vertical scaling because you might have enough computation resources in your machine to handle more threads and therefore you can do more work processing your streams on that specific machine.
If you have the capacity (i.e: CPU Cores), then you should have more threads so more Stream tasks can be scheduled on that instance and therefor having additional parallelization capacity on your KSQL Instance or Cluster (if you have more than one instance).
What you must understand with Kafka, Kafka Streams and KSQL is that horizontal scaling occurs with two main concepts:
Kafka Streams applications (such as KSQL) can paralelize work based
on the number of kafka topic partitions. If you have 3 partitions
and you launch 4 KSQL Instances (i.e: on different servers), then one of them will not be doing work on a Stream you create on top of that topic. If you have the
same topic with 3 partitions and you have only 1 KSQL Server, he'll
be doing all of the work for the 3 partitions.
When you add a new instance of your application Kafka Stream Application (in your case KSQL) and it joins your cluster processing your KSQL Streams and Tables, this specific instance will join the consumer groups consuming for
those topics and immediately start sharing the load with the other
instances as long as there are available partitions that other instances can offload (triggering a consumer group rebalance). The same happens if you take a instance down... the other instances will pick up the slack and start processing the partition(s) the retired instance was processing.
When comparing to vertical scaling (i.e: adding more capacity and threads to a KSQL instance), horizontal scaling does the same by adding the same computational resources to a different instance of the application on a different machine. You can understand the Kafka Stream Application Threading Model (with one or more application instances, on one or more machines) here:
I tried to simplify it, but you can read more of it on the KSQL Capacity Planning page and Confluent Kafka Streams Elastic Scale Blog Post
The important aspects of the scale-out / scale-in lifecycle of Kafka Streams (and KSQL) applications can be better understood like this:
1. A single instance working on 4 different partitions
2. Three instances working on 4 different partitions (one of them is
working on 2 different partitions)
3. An instances just left the group, now two instances are working on 4
different partitions, perfectly balanced (2 partitions for each)
(Images from confluent blog)
Is it possible to have a centralized storage/volume that can be shared between two pods/instances of an application that exist in different worker nodes in Kubernetes?
So to explain my case:
I have a Kubernetes cluster with 2 worker nodes. In each one of these I have 1 instance of app X running. This means I have 2 instances of app X running totally at the same time.
Both instances subscribe on the topic topicX, that has 2 partitions, and are part of a consumer group in Apache Kafka called groupX.
As I understand it the message load will be split among the partitions, but also among the consumers in the consumer group. So far so good, right?
So to my problem:
In my whole solution I have a hierarchy division with the unique constraint by country and ID. Each combination of country and ID has a pickle model (python Machine Learning Model), which is stored in a directory accessed by the application. For each combination of a country and ID I receive one message per minute.
At the moment I have 2 countries, so to be able to scale properly I wanted to split the load between two instances of app X, each one handling its own country.
The problem is that with Kafka the messages can be balanced between the different instances, and to access the pickle-files in each instance without know what country the message belongs to, I have to store the pickle-files in both instances.
Is there a way to solve this? I would rather keep the setup as simple as possible so it is easy to scale and add a third, fourth and fifth country later.
Keep in mind that this is an overly simplified way of explaining the problem. The number of instances is much higher in reality etc.
Yes. It's possible if you look at this table any PV (Physical Volume) that supports ReadWriteMany will help you accomplish having the same data store for your Kafka workers. So in summary these:
AzureFile
CephFS
Glusterfs
Quobyte
NFS
VsphereVolume - (works when pods are collocated)
PortworxVolume
In my opinion, NFS is the easiest to implement. Note that Azurefile, Quobyte, and Portworx are paid solutions.
I am implementing a distributed algorithm for pagerank estimation using Storm. I have been having memory problems, so I decided to create a dummy implementation that does not explicitly save anything in memory, to determine whether the problem lies in my algorithm or my Storm structure.
Indeed, while the only thing the dummy implementation does is message-passing (a lot of it), the memory of each worker process keeps rising until the pipeline is clogged. I do not understand why this might be happening.
My cluster has 18 machines (some with 8g, some 16g and some 32g of memory). I have set the worker heap size to 6g (-Xmx6g).
My topology is very very simple:
One spout
One bolt (with parallelism).
The bolt receives data from the spout (fieldsGrouping) and also from other tasks of itself.
My message-passing pattern is based on random walks with a certain stopping probability. More specifically:
The spout generates a tuple.
One specific task from the bolt receives this tuple.
Based on a certain probability, this task generates another tuple and emits it again to another task of the same bolt.
I am stuck at this problem for quite a while, so it would be very helpful if someone could help.
Best Regards,
Nick
It seems you have a bottleneck in your topology, ie, a bolt receivers more data than in can process. Thus, the bolt's input queue grows over time consuming more and more memory.
You can either increase the parallelism for the "bottleneck bolt" or enable fault-tolerance mechanism which also enables flow-control via limited number of in-flight tuples (https://storm.apache.org/documentation/Guaranteeing-message-processing.html). For this, you also need to set "max spout pending" parameter.
Preface: When I say "machine" below, I mean either a physical dedicated server, or a virtual private server. When I say "node" I mean, an instance of the erlang virtual machine, of which there could be multiple running as separate processes under a single unix kernel.
I've got a project that involves multiple erlang/OTP applications. The applications will be running together and talking to each other on the same machine. They will all be hitting the disk, using memory and spawning erlang processes. They will also be using network resources because they will be talking to similar machines with the same set of applications running on them in a cluster.
Almost all of this communication is via HTTP. Thus I could separate each erlang OTP application into a separate instance of the erlang VM on the same machine and they could still talk to each other.
My question is: Is it better to have them running all under one erlang VM so that this erlang VM process can allocate access to resources among them, and schedule the execution of the various erlang processes.
Or is it better to have separate erlang nodes on a given server?
If one is better than the other, why?
I'm assuming running all of these apps in a single erlang vm which is given, essentially, full run of the server, will result in better performance. The OS is just managing the disk and ram at the low level, and only has one significant process (the erlang VM) to switch with... and the erlang VM is probably smarter about allocating resources when it has the holistic view of all the erlang processes.
This may be something that I need to test, but I'm not in a position to do so effectively in the near term.
The answer is: it depends.
Advantages of using a single node:
Memory is controlled by a single Erlang VM. It is way easier.
Inter-application communication (if using erlang-messaging) is faster.
Less operating system context switches happens
Advantages of using multiple nodes:
If the system is linking in C code to the VM, death of one node due to a bug in C will not kill the others.
Agree with #I GIVE CRAP ANSWERS
I would go with one VM. Here is why:
dynamic handling of run time queues belonging to schedulers (with varied origin of CPU load its important)
fewer VMs to monitor
better understanding of memory allocation and easier to spot malicious process (can compare all of them at once)
much easier inter app supervision
I wouldn't care about VM crash - you need to be prepared any way. Heart works especially well in the cluster of equal units.
We've always used one VM per application because it's easier to manage.
The scheduler and SMP support in Erlang have come a long way in the past few years, so there isn't as much reason as there used to be to run multiple VMs on the same node.
I Agree with previous answers but there is a case scenario where having multiple nodes per cpu is the answer: When a heavy task hits the node. A task may take multiple minutes to complete and in such case a gen server will hold the node until completion of the task.
Between nodes, message are (must be) passed over TCP/IP. However, by what mechanism are they passed between processes running on the same node? Is TCP/IP used in this case as well? Unix domain sockets? What is the difference in performance between "within node" and "between node" message passing?
by what mechanism are they passed between processes running on the same node?
Because Erlang processes on the same node are all running within a single native process — the BEAM emulator — message structures are simply copied into the receiver's message queue. The message structure is copied, rather than simply referenced, for all the standard no-side-effects functional programming reasons.
See erts_send_message() in erts/emulator/beam/erl_message.c in the Erlang sources for more detail. In R15B01, the bits most relevant to your question start at line 980 or so, with the call to erts_queue_message().
If you did choose to run multiple BEAM emulators on a single physical machine, I would guess messages get sent between them the same way as between different physical machines. There's probably no good reason to do that now that BEAM has good SMP support, though.
What is the difference in performance between "within node" and "between node" message passing?
A simple benchmark on your actual hardware would be more useful to you than anecdotal evidence from others.
If you want generalities, however, observe that memory bandwidths are around 20 GByte/sec these days, and that you're unlikely to have a network link faster than 10 Gbit/sec between nodes. That means that while there may be many differences between your actual application and any simple benchmark you perform or find, these differences probably cannot swamp an order of magnitude difference in transfer rate.
If you "only" have a 1 Gbit/sec end-to-end network link between nodes, intranode transfers will probably be over two orders of magnitude faster than internode transfers.
"All data in messages between Erlang processes is copied, with the exception of refc binaries on the same Erlang node.":
http://erlang.org/doc/efficiency_guide/processes.html#id2265332