What determines the mosquitto.db file size limit - mosquitto

I am new to mosquitto and have a few questions I hope you all can help me with:
what determines the limit size of the persistence file in mosquitto? Is it the system momory or disk space?
What happens when the persistence file gets larger than the limit size? Can I transfer it to another server for temporary storage?
How would mosquitto use the transferred file to publish messages when it restarts?
I appreciate any feedback.
Thanks,

Probably a combination of both Filesystem maximum filesize and system/process memory, which ever is smallest. But I would expect the performance problems that would be apparent before you reached these limits to be a bigger problem.
Mosquitto probably crashes. If mossquitto exceeds the system/process memory limits then it's going to get killed by the OS or crash instantly. I doubt there would be any benefit to moving it to a different machine as if mosquitto crashes due to hitting either of those limits the file is likely to be corrupted so unable to be read in even if restarted on the same machine.
See answer 2
In reality you should never come close to these limits, having that many inflight messages means there are some very SERIOUS issues with the design of your whole system.

Related

Trying to understand and debug ETCDv2 Memory usage

I’m trying to understand ETCD’s memory and disk usage within a deployed system using the ETCDv2 API. The system has a file being saved on a regular basis, each time under a new key, and we’re concerned that long-term there’s no clean-up of state leading to both memory and disk usage growing unbounded on each VM in the etcd cluster. We’ve also emulated this, using a large file (several MB) being saved every few minutes.
From the etcd docs, I expected the following:
Each insertion would save the file to disk, causing disk usage to grow unbounded.
This matches what I am seeing.
In memory, etcd would save a key-value pair where the value is a lookup address for the file on disk (taking up a very small amount of memory) and a cached version of the file (taking a large amount of memory).
I would then expect that rebooting an etcd pod after several file writes would cause the cache to be (mostly) cleared, meaning a consistently up pod would have memory growing unbounded but if the pod rebooted, the cache would be cleared of all but the active entry (and any specifically requested by e.g. attempted rollbacks) and the memory usage would (mostly) reset with each reboot.
However, in practice we see a very small memory drop with a reboot which is almost immediately returned after the pod recovers (as though all the cache is restored from the peers).
Is my understanding correct? And if so:
Why does the memory usage reset fully after an etcd pod reboot? Does the etcd cache get synced with its cluster, as well as the main key-value table and file storage?
Is there a recommended way to keep etcd’s memory and disk usage within bounded limits?
Additional notes:
I’ve tried reducing the snapshot_count configuration setting - this doesn’t seem to have had any impact (unless I’ve reduced it too far - I cut it right down to 5 from the default of 100,000).
I’ve attempted changing our file saving to overwrite a single file with a new version each time, instead of storing a new file. This doesn’t appear to have had any impact (although this may be due to issues in my prototype; I’m still investigating).
We can’t migrate existing deployments to etcd v3 file-systems, so are specifically looking at etcd v2 solutions. I think this rules out compact and defrag steps, which seem to be a core part of the answer to this problem in v3.
Any help or insight very gratefully appreciated.
Thanks!

AWS server became slow after traffic increase

I have a single page Angular app that makes request to a Rails API service. Both are running on a t2xlarge Ubuntu instance. I am using a Postgres database.
We had increase in traffic, and my Rails API became slow. Sometimes, I get an error saying Passenger queue full for rails application.
Auto scaling on the server is working; three more instances are created. But I cannot trace this issue. I need root access to upgrade, which I do not have. Please help me with this.
As you mentioned that you are using T2.2xlarge instance type. Firstly I want to tell you should not use T2 instance type for production environment. Cause of T2 instance uses CPU Credit. Lets take a look on this
What happens if I use all of my credits?
If your instance uses all of its CPU credit balance, performance
remains at the baseline performance level. If your instance is running
low on credits, your instance’s CPU credit consumption (and therefore
CPU performance) is gradually lowered to the base performance level
over a 15-minute interval, so you will not experience a sharp
performance drop-off when your CPU credits are depleted. If your
instance consistently uses all of its CPU credit balance, we recommend
a larger T2 size or a fixed performance instance type such as M3 or
C3.
Im not sure you won't face to the out of CPU Credit problem because you are using Xlarge type but I think you should use other fixed performance instance types. So instance's performace maybe one part of your problem. You should use cloudwatch to monitor on 2 metrics: CPUCreditUsage and CPUCreditBalance to make sure the problem.
Secondly, how about your ASG? After scale-out, did your service become stable? If so, I think you do not care about this problem any more because ASG did what it's reponsibility.
Please check the following
If you are opening a connection to Database, make sure you close it.
If you are using jquery, bootstrap, datatables, or other css libraries, use the CDN links like
<link rel="stylesheet" ref="https://cdnjs.cloudflare.com/ajax/libs/bootstrap-select/1.12.4/css/bootstrap-select.min.css">
it will reduce a great amount of load on your server. do not copy the jquery or other external libraries on your own server when you can directly fetch it from other servers.
There are a number of factors that can cause an EC2 instance (or any system) to appear to run slowly.
CPU Usage. The higher the CPU usage the longer to process new threads and processes.
Free Memory. Your system needs free memory to process threads, create new processes, etc. How much free memory do you have?
Free Disk Space. Operating systems tend to thrash when the file systems on system drives run low on free disk space. How much free disk space do you have?
Network Bandwidth. What is the average bytes in / out for your
instance?
Database. Monitor connections, free memory, disk bandwidth, etc.
Amazon has CloudWatch which can provide you with monitoring for everything except for free disk space (you can add an agent to your instance for this metric). This will also help you quickly see what is happening with your instances.
Monitor your EC2 instances and your database.
You mention T2 instances. These are burstable CPUs which means that if you have consistenly higher CPU usage, then you will want to switch to fixed performance EC2 instances. CloudWatch should help you figure out what you need (CPU or Memory or Disk or Network performance).
This is totally independent of AWS Server. Looks like your software needs more juice (RAM, StorageIO, Network) and it is not sufficient with one machine. You need to evaluate the metric using cloudwatch and adjust software needs based on what is required for the software.
It could be memory leaks or processing leaks that may lead to this as well. You need to create clusters or server farm to handle the load.
Hope it helps.

Azure app service availability loss. The memory counter Page Reads/sec was at a dangerous level

Environment:
Asp Net MVC app(.net framework 4.5.1) hosted on Azure app service with two instances.
App uses Azure SQL server database.
Also, app uses MemoryCache (System.Runtime.Caching) for caching purposes.
Recently, I noticed availability loss of the app. It happens almost every day.
Observations:
The memory counter Page Reads/sec was at a dangerous level (242) on instance RD0003FF1F6B1B. Any value over 200 can cause delays or failures for any app on that instance.
What 'The memory counter Page Reads/sec' means?
How to fix this issue?
What 'The memory counter Page Reads/sec' means?
We could get the answer from this blog. The recommended Page reads/sec value should be under 90. Higher values indicate insufficient memory and indexing issues.
“Page reads/sec indicates the number of physical database page reads that are issued per second. This statistic displays the total number of physical page reads across all databases. Because physical I/O is expensive, you may be able to minimize the cost, either by using a larger data cache, intelligent indexes, and more efficient queries, or by changing the database design.”
How to fix this issue?
Based on my experience, you could have a try to enable Local Cache in App
Service.
You enable Local Cache on a per-web-app basis by using this app setting: WEBSITE_LOCAL_CACHE_OPTION = Always
By default, the local cache size is 300 MB. This includes the /site and /siteextensions folders that are copied from the content store, as well as any locally created logs and data folders. To increase this limit, use the app setting WEBSITE_LOCAL_CACHE_SIZEINMB. You can increase the size up to 2 GB (2000 MB) per web app.
There is some memory performance problems can be listed
excessive paging,
memory shortages,
memory leaks
Memory counter values can be used to detect the presence of various performance problems. Tracking counter values both on a system-wide and a per-process basis helps you to pinpoint the cause in Azure such as in other systems.
Even if there is no change in the process, a change in the system can cause memory problems. the system-wide
researching in the azure:
Shared resources plans (Free and Basic) have memory limits as seen here: https://learn.microsoft.com/en-us/azure/azure-subscription-service-limits#app-service-limits.
Quotas: https://learn.microsoft.com/en-us/azure/app-service-web/web-sites-monitor
Also, you can check in the portal under your web app settings, search for “quotas”, and also check out “Diagnose and solve problems” and hit “metrics per instance (app service plan)” which will show you memory used for the plan.
A MemoryCache bug in .net 4 can also cause this type of behavior
https://stackoverflow.com/a/15715990/914284

Since modern computer uses virtual memory, why do we still encounter "out of memory" issue?

I am learning the concept of virtual memory, but this question has been confusing me for a while. Since most modern computers use virtual memory, when a program is in execution, the os is supposed to page data in and out between RAM and disk. But why do we still encounter "out of memory" issue? Could you please correct me if I misunderstood the concept? I really appreciate your explanation.
PS: For example, I was analyzing a large amount of data (>100G) output from simulation on a computing cluster, and read in the data to an C array. Very often the system crashed and complained a memory error.
First: Modern computer do indeed use virtual memory, however there is no magic here. Memory is not created out of nothing. Virtual memory schemes typically allow a portion of the mass storage sub-system (aka hard disk) to be used to hold portions of the process that are (hopefully) less frequently used.
This technique allows processes to use more memory than is available as RAM. However nothing is infinite. Eventually all RAM and Hard Drive resources will be used up and the process will get an out of memory error.
Second: It is not unheard of for operating systems to place a cap on the memory that a process may use. Hit that cap and again, the process gets an out of memory error.
Even with virtual memory the memory available is not unlimited.
Limit 1) Architectural limits. The processor and operating system will place some maximum virtual memory limit.
Limit 2) System Parameters. Many operating systems configure the maximum virtual memory size.
Limit 3) Process quotas. Many operating system have process quotas that limit the maximum virtual memory size.
Limit 4) System resources. Notably page file space.

Monitoring performance of opensips presence server

I have to do some performance testing of the opensips server but I am not able to start.
For generating traffic I'll be using SIPP. I am not able to find about how to monitor the performance of opensips in real time.
I know there is tool- opensipsctl but I am not able to run it. It gives below error:
ERROR: Error opening OpenSIPS's FIFO /tmp/opensips_fifo
ERROR: Make sure you have the line 'modparam("mi_fifo", "fifo_name", "/tmp/opensips_fifo")' in your config
ERROR: and also have loaded the mi_fifo module.
And this is from the config file:
#### FIFO Management Interface
loadmodule "mi_fifo.so"
modparam("mi_fifo", "fifo_name", "/tmp/opensips_fifo")
modparam("mi_fifo", "fifo_mode", 0666)
I am trying to find the cause from forums.
I also tried to install nagios but not able to add service for opensips, basically unable to understand how to do.
I have another doubt regarding the memory management. As I understand, opensips uses pre-configured amount of memory no matter how much memory is available. I guess which means I won't be able to find the actual memory consumption. I even tested some load where I just saw spikes on CPU usage and no spike on memory usage. Please correct if I understood wrong.
I really need some help to understand how to go about doing this.
Thanks
To resolve your mod_fifo related error, please confirm if /tmp/mod_fifo file is exists or not. And if its not there do this
touch /tmp/mod_fifo
chmod 777 /tmp/mod_fifo
/etc/init.d/opensips restart
And regarding your memory doubt,
Private memory is the memory used by one process, while shared memory is
memory accessible by all processes (it is an IPC method, see
http://en.wikipedia.org/wiki/Shared_memory).
The private memory is used
for temporary storages required for certain processing by a process,
while the shared memory is used to store data that must be accessible by
all processes. Opensips init script has that memory related prameters.
Hope this helps.

Resources