Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
I need some suggestion for the erlang in-memory cache system.
The cache item is key-value based storage.
key is usually an ASCII string; value is erlang's types include number / list / tuple / etc.
The cache item can be set by any of the node.
The cache item can be get by any of the node.
The cache item is shared cross all nodes even on different servers
dirty-read is permitted, I don't want any lock or transaction to reduce the performance.
Totally distributed, no centralized machine or service.
Good performance
Easy install and deployment and configuration and maintenance
First choice seems to me is mnesia, but I have no experence on it.
Does it meet my requirement?
How the performance can I expect?
Another option is memcached --
But I am afraid the performance is lower than mnesia because extra serialization/deserialization are performed as memcached daemon is from another OS process.
Yes. Mnesia meets your requirements. However, like you said, a tool is good when the one using it understands it in depth. We have used mnesia on a distributed authentication system and we have not experienced any problem thus far. When mnesia is used as a cache it is better off than memcached, for one reason "Memcached cannot guarantee that what you write, you can read at any time, due to memory swap out issues and stuff" (follow here). However, this means that your distributed system is going to be built over Erlang. Indeed mnesia in your case beats most NoSQL cache solutions because their systems are Eventually consistent. Mnesia is consistent, as long as network availability can be ensured across the cluster. For a distributed cache system, you dont want a situation where you read different values for the same key from different nodes, hence mnesia's consistency comes in handy here. Something you should think about, is that, it is possible to have a centralised Memory cache for a distributed system. This works like this: You have RABBITMQ server running and accessible by AMQP clients on each Cluster node. Systems interact over the AMQP interface. Because, the cache is centralised, consistency is ensured by the process/system responsible for writing and reading from the cache. The other systems just place a request for a key, onto the AMQP message bus, and the system responsible for cache receives this message and replies it with the value.
We have used the Message bus Architecture using RABBITMQ for a recent system which involved integration with banking systems, an ERP system and Public online service. What we built was responsible for fusing all these together and we are glad that we used RABBITMQ. The details are many but what we did is to come up with a message format, and a system identification mechanism. All systems must have a RABBITMQ client for writing and reading from the message bus. Then you would create a read Queue for each system, so that other system write their requests into that queue, whose name inside RABBITMQ, is the same as the system owning it. Then, later, you must encrypt the messages passing over the bus. In the end, you have systems bound together over large distance/across states, but with an efficient network, you wont believe how fast RABBITMQ binds these systems. Anyhow, RABBITMQ can also be clustered, and i should tell you that it is Mnesia which powers RABBITMQ (that tells you how good mnesia can be).
Another thing is that, you should do some reading and write many programs until you are comfortable with it.
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 2 years ago.
Improve this question
We are little confused about the disks types that kafka machine needs.
In our Kafka cluster in production, we have producers, 3 kafka brokers, and consumers.
When producer push data to topics and consumer read data from topics,
how to avoid the situation that consumer try to read data from topic partitions but the data not really inside the topic?
Second - since we are not use SSD disks in Kafka brokers, how to know when consumer read the data from memory cache or from the disks?
how to avoid the situation that consumer try to read data from topic
partitions but the data not really inside the topic ?
Kafka reads data sequentially so there is no random access. That's why you cannot read a specific data. (you can just specify offset to read from)
Also, because there is no random access, using SSD has no significant effect on performance.
From cloudera blog (link):
Using SSDs instead of spinning disks has not been shown to provide a
significant performance improvement for Kafka, for two main reasons:
Kafka writes to disk are asynchronous. That is, other than at startup/shutdown, no Kafka operation waits for a disk sync to
complete; disk syncs are always in the background. That’s why
replicating to at least three replicas is critical—because a single
replica will lose the data that has not been sync’d to disk, if it
crashes.
Each Kafka Partition is stored as a sequential write ahead log. Thus, disk reads and writes in Kafka are sequential, with very few
random seeks. Sequential reads and writes are heavily optimized by
modern operating systems.
SSD will help when consumers are slower than producers, which is quite possible.
When consumers are slow , file system cache misses ,
then random access happens , spinning disk will result in worst case scenario.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 2 years ago.
Improve this question
I've been looking for information on how efficiently Kubernetes & Docker are in terms of using machine resources, but I haven't found much so far. Here are my three questions, all about Kubernetes+Docker:
If multiple containers on the same node are running the same binary, are the code pages shared between all these instances? That is, is there a single set of physical pages allocated on the node for all these processes? For example, if I'm running a service mesh like Istio, which runs Envoy in every pod, is the system smart enough to only load the Envoy code in memory once, or does all the indirection taking place prevent the Linux kernel from recognizing that sharing is possible?
In a large Kubernetes deployment, there will end up being a considerable number of redundantly downloaded docker images on each node. Instead, it would seem more effective to have a single in-cluster repository for these images that all nodes can fetch from. I saw this about having docker use NFS for a common image store. Is this the only answer?
I heard there's a practical limit to the number of pods Kubernetes will schedule on a single node (30). Such a small limit forces you to use smaller VMs in order to be able to fully saturate them. Anybody know why this limit exists and whether it will eventually be raised? I ask this in the context of trying to run Kubernetes on bare metal where VMs aren't used at all. In such a world, I'd want to be able to pack way more than 30 pods on a (large) physical machine.
Thank you for any insights or pointers.
You state your question in the way that you plan to use docker as container runtime for kubernetes. That is fine - but there are more choices. Depending on the runtime the answers will change.
In general kubernetes provides an abstraction over the actual scheduling and running of pods/containers. Perhaps you invest too much human time into details that can be solved with more metal, which is cheap.
Multiple containers on a single node are usually (docker/containerd/crio) just system processes. Like you launch your Apache httpd multiple times yourself. If the kernel uses memory deduplication, it can indeed share pages.
If you use a container runtime that launches micro-VMs (firecracker,kata, ...) I doubt memory deduplication will be possible.
I would not recommend to share storage for the container images, f.e. with NFS. With some customer setups I had to diagnose issues caused by this. like deadlocks. Basically you would reduce the robustness of your cluster in order to save disk space. Just use more metal.
The usual limit is 110 Pods per node which is usually plenty. You can change this limit using --max-pods parameter to the kubelet process or configuration file for kubelet. The reason for the limit is that the management of a pod incurs effort on the kubelet and etcd/apiserver side.
This question already has answers here:
How Erlang atoms can be garbage collected
(3 answers)
Closed 3 years ago.
Can atoms be removed from a running Erlang/Elixir system?
Specifically, I am interested in how I would create an application server where modules, representing applications, can be loaded and run on demand and later removed.
I suppose it's more complicated than just removing the atom representing the module in question, as it may be defining more atoms which may be difficult or impossible to track.
Alternatively, I wonder if a module can be run in isolation so that all references it produces can be effectively removed from a running system when it is no longer needed.
EDIT: Just to clarify, because SO thinks this question is answered elsewhere, the question does not relate to garbage collection of atoms, but manual management thereof. To further clarify, here is my comment on Alex's answer below:
I have also thought about spinning up separate instances (nodes?) but
that would be very expensive for on-demand, per-user applications.
What I am trying to do is imitate how an SAP ABAP system works. One
option may be to pre-emptively have a certain number of instances
running, then restart them each time a request is complete. (Again,
pretty expensive though). Another may be to monitor the atom table of
an instance and restart that instance when it is close to the limit.
The drawback I see with running several nodes/instances (although that is what an ABAP system has; several OS processes serving requests from users) is that you lose out on the ability to share cached bytecode between those instances. In an ABAP system, the cache of bytecode (which they call a "load") is accessible to the different processes so when a program is started, it checks the cache first before fetching it from storage.
Unfortunately not, atoms are not destroyed within the VM at all until the VM shuts down. Atom limits are also shared across processes, meaning that spawning a new process to handle atom allocation/deallocation won't work in your case.
You might have some luck spawning a completely separate VM instance by running a separate Erlang application and communicating to it through sockets, although I'm not sure how effective that will be.
I have already asked a question regarding a simple fault-tolerant soft real-time web application for a pizza delivery shop.
I have gotten really nice comments and answers there, but I disagree in that it is a true web service. Rather than a web service, it is more of a real-time system to accept orders from customers, control the dispatching of these orders and control the vehicles that deliver those orders in real time.
Moreover, unlike a 'true' web service this system is not intended to have many users - it is just a few dispatchers (telephone operators) and a few delivery drivers that will use it (as for now I have no requirement to provide direct access to the service to the actual customers; only the dispatchers and delivery drivers will have the direct access).
Hence this question is a bit more general.
I have found that in order to make a right choice for a NoSQL data storage option for this application first thing that I have to do is to make a choice between CA, PA and CP according to the CAP theorem.
Now, the Building Web Applications with Erlang book says that "while it [Mnesia] is not a SQL database, it is a CA database like a SQL database. It will not handle network partition". The same book says that the CouchDB database is a PA database.
Having that in mind, I think that the very first thing that I need to do with my application is to decide what the 'fault-tolerance' term means regarding to CAP.
The simple requirement that I have is to have the application available 24/7(R1). The other one is that there is no need to scale, the application will have a very modest amount of users (it is probably not possible to have thousands of dispatchers) (R2).
Now, does R1 require the application to provide Consistency, Availability and Partition Tolerance and with what priorities?
What type of data storage option will better handle the following issues:
Providing 24/7 availability for a dispatcher (a person who accepts phone calls from customers and who uses a CRM) to look up customer records and put orders into the system;
Looking up current ongoing served orders and their status (placed, baking, dispatched, delivering, delivered) in real time;
Keep track of all working vehicles' locations and their payloads in real time;
Recover any part of the system after system crash or network crash to continue providing 1,2 and 3;
To sum it up: What kind of Data Storage (CA, PA or CP) will suite the system described above better? What kind of Data Storage will better satisfy the R1 requirement?
For your 24/ requirement you are searching a database with (High) Availability because you want your requests to succeed everytime (even if they are only error results).
A netsplit would bringt your whole system down, when you have no partition tolerance
Consistency is nice to have, but you can only have 2 of 3.
Your best bet will be a PA solution. I highly recomment a solution which has been inspired by Amazon Dynamo. The best known dynamo implementations are riak and couchdb. Riak even allows you to change PA to some other form by tuning the read and write replicas.
First, don't confuse CAP "Availability" with "High Availability". They have nothing to do with each other. The A in CAP simply means "All DB nodes can answer queries". To get High Availability, you must be in multiple data centers, you must have robust documented procedures for maintenance, expansion, etc. None of that depends on your CAP choice.
Second, be realistic about your requirements. A stock-trading application might have a requirement for 100% uptime, because every second of downtime could loose millions of dollars. On the other hand, I'm guessing your pizza joint might loose tens of dollars for every minute it's down. So it doesn't make sense to spend millions trying to keep it up. Try to compute your actual costs.
Third, always evaluate your choice vs mainstream. You could just go CA (MySQL) and quickly fail-over to the slaves when problems happen. Be realistic about the costs (and risks) of building on new technology. If you really expect your system to run for 5 years without downtime, ask for proof that someone else has run that database for 5 years without downtime.
If you go "AP" and have remote people (drivers, etc.) then you'll need to write an app that stores their data on their phone and sends it in the background (with retries). Of course, you could do this regardless of weather your database was CA or AP.
If you want high uptimes, you can either:
Increase MTBF (Mean Time Between Failures) - Buy redundant power supplies, buy dual ethernet cards, etc..
Decrease MTTR (Mean Time To Recovery) - Just make sure when failure happens you can recover quickly. (Fail over to slave)
I've seen people spend tens of thousands of dollars on MTBF, only to be down for 8 hours while they restore their backup. It makes more sense to ensure MTTR is low before attacking MTBF.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I am looking for a monitoring and alerting tool for my application hosted in cloud. My application is hosted across multiple servers and I want to monitor all these servers. I am interested in monitoring the following:
1. Service monitoring:
Check if the service is up. This requires
try siging-up a new user
log-in to the application with given username/password and perform certain steps like search etc.
Monitoring QoS. How much time is it taking for searches and some other opertions
2. resource monitoring
Monitoring the following parameters in each server:
CPU utilization
load average
Memory usage
Disk usage
IOPS
3. process monitoring
Monitor if a set of processes are running or not. If not running try restarting them.
Ex: php-fpm, my application binaries, mysql, nginx, smtp etc.
4. Monitoring log files
Error logs of my application
mysql error log
MySQL slow query log
etc.
Also I should be able to extend its usage by executing shell commands or writing my own shell scripts.
I should be able to set alert if any monitored item is found problematic. I should be able to get alert through
email
Mobile SMS
The Monitoring system should maintain history for the period I want. So that after receiving the alert I should be able to log-in to the
system and view past data (say past 2 weeks) and investigate problems.
Most important:
The tool should have a very good way of managing its own configuration.
The configuration should not be scattered at multiple places. All configuration should be stored in a centralized place. In future say, path of a monitored log file has changed. I would like to search and replace all occurrences of that file in my configuration.
I should be able to version control my configurations.
Instead of going to the web interface and setting configuration manually, I would like set up a script which automatically loads all the configurations and start monitoring.
I am exploring Zabbix but don't see a satisfactory way of configuration management. Should I try Nagios? Any other tool?
2 newer cloud type monitoring solutions that may be of interested to you are http://logicmonitor.com/ and http://copperegg.com/.
LogicMonitor has many of your requirements out of the box as it has a bit of customization for your own alerting.
CopperEgg / RevealCloud is more base system level monitoring (CPU, memory, disk, and network throughput). It has a nice polished interface that is much more straightforward than LogicMonitor. But that is about it.
Well, considering you've tagged this with Zabbix, I assume you're considering this as an option.
We use Zabbix to monitor the Amazon EC2 instances as well as instances in our private openstack cloud. It's as simple as "apt-get install zabbix-agent" really.
Zabbix is especially useful in the case of monitoring our openstack private cloud. We have the server scan an ip-range and automatically set up checks, alerts, etc, based solely on the hostname of the machine found.
Nagios is one of the standard ways of monitoring and can support all the use cases you brought up (plus, plugins have probably already been written for all of them).